Optimizing Imbalanced Data Classification: Under Sampling Algorithm Strategy with Classification Combination

Authors

  • Nauval Dwi Primadya Universitas Dian Nuswantoro
  • Adhitya Nugraha Universitas Dian Nuswantoro https://orcid.org/0000-0001-5366-110X
  • Sahrul Yudha Fahrezi Universitas Dian Nuswantoro
  • Ardytha Luthfiarta Universitas Dian Nuswantoro

DOI:

https://doi.org/10.31358/techne.v23i2.435

Keywords:

Random Undersampling, IoT Attacks 2023, Combination Algorithm

Abstract

The security of Internet of Things devices is a factor that must be considered because device damage and data theft can occur. Internet of Things devices are very useful in various sectors, such as health, transportation, and industrial sectors. Attacks on Internet of Things devices increase every year. To overcome this, it is necessary to take a research approach with machine learning. The dataset used is CIC IoT Attacks 2023 from the University Of New Brunswick. To be able to produce good data, it is necessary to do random under sampling as a way to overcome data imbalance. Then, modeling is done using the KNN algorithm, Random Forest, Logistic Regression, Adaboost, And Perceptron. The result of this research is that random forest has the best accuracy result of 99.73%. From these results, it can be concluded that the random under-sampling technique can improve the accuracy of data imbalance.

Downloads

Download data is not yet available.

References

E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, and A. A. Ghorbani, “CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment,” Sensors, vol. 23, no. 13, Jul. 2023, doi: 10.3390/s23135941.

M. Zolanvari, M. A. Teixeira and R. Jain, "Effect of Imbalanced Datasets on Security of Industrial IoT Using Machine Learning," 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), Miami, FL, USA, pp. 112-117, 2018, doi: 10.1109/ISI.2018.8587389.

Al-Hadhrami, Y., and Hussain, F. K., “DDoS attacks in IoT networks: a comprehensive systematic literature review,” World Wide Web, vol. 24, no. 3, pp. 971-1001, 2021

J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour, “Boosting methods for multi-class imbalanced data classification: an experimental review,” J Big Data, vol. 7, no. 1, Dec. 2020, doi: 10.1186/s40537-020-00349-y.

N. Noorhalim, A. Ali, and S. M. Shamsuddin, “Handling Imbalanced Ratio for Class Imbalance Problem Using SMOTE,” in Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017), Springer Singapore, 2019, pp. 19–30. doi: 10.1007/978-981-13-7279-7_3.

A. Ilham, “Komparasi Algoritma Klasifikasi Dengan Pendekatan Level Data Untuk Menangani Data Kelas Tidak Seimbang,” Jurnal Ilmiah Ilmu Komputer, vol. 3, no. 1, 2017, [Online]. Available: http://ejournal.fikom-unasman.ac.id

G. Gumelar, Norlaila, Q. Ain, Riza Marsuciati, S. A. Bambang, A. Sunyoto, and M. S. Mustafa, “Kombinasi Algoritma Sampling Dengan Algoritma Klasifikasi Untuk Meningkatkan Performa Klasifikasi Dataset Imbalance,” Prosiding SISFOTEK, vol. 5, no. 1, 250–255, 2021. Available: https://www.seminar.iaii.or.id/index.php/SISFOTEK/article/view/295

A. Y. Triyanto and R. Kusumaningrum, “Implementasi Teknik Sampling untuk Mengatasi Imbalanced Data pada Penentuan Status Gizi Balita dengan Menggunakan Learning Vector Quantization Implementation of Sampling Techniques for Solving Imbalanced Data Problem in Determination of Toddler Nutritional Status using Learning Vector Quantization,” vol. 19, pp. 39–50, 2017.

F. A. Rafrastara, C. Supriyanto, C. Paramita, Y. P. Astuti, and F. Ahmed, “Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method,” vol. 8, no. 2, 2023, [Online].

Available: https://orangedatamining.com/

E. Saputro and D. Rosiyadi, “Penerapan Metode Random Over-Under Sampling Pada Algoritma Klasifikasi Penentuan Penyakit Diabetes,” Bianglala Informatika, vol. 10, no. 1, pp. 42-47, 2022.

I. Kurniawan, D.C.P. Buani, W.A. Abdussomad, and E. Fitriani, “Penerapan Teknik Random Undersampling untuk Mengatasi Imbalance Class dalam Prediksi Kebakaran Hutan Menggunakan Algoritma Decision Tree,” Forest, vol. 14, 244, 2023.

B. Hakim, “Analisa Sentimen Data Text Preprocessing Pada Data Mining Dengan Menggunakan Machine Learning,” JBASE - Journal of Business and Audit Information Systems, vol. 4, no. 2, Aug. 2021, doi: 10.30813/jbase.v4i2.3000.

A. Syukron and A. Subekti, “Penerapan Metode Random Over-Under Sampling dan Random Forest untuk Klasifikasi Penilaian Kredit,” Jurnal Informatika, vol. 5, no. 2, 2018.

A. Nurhopipah and U. Hasanah, “Dataset Splitting Techniques Comparison For Face Classification on CCTV Images,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 14, no. 4, p. 341, Oct. 2020, doi: 10.22146/ijccs.58092.

R. Nuari, A. Apriliyani, J. Juwari, and K. Kusrini, “Implementasi Metode K-Nearest Neighbor (KNN) untuk Memprediksi Varietas Padi yang Cocok untuk Lahan Pertanian,” Jurnal Informa: Jurnal Penelitian dan Pengabdian Masyarakat, vol. 4, no. 2, pp. 28-34, 2018.

A. R. Isnain, J. Supriyanto, and M. P. Kharisma, “Implementation of K-Nearest Neighbor (K-NN) Algorithm for Public Sentiment Analysis of Online Learning,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 2, p. 121, Apr. 2021, doi: 10.22146/ijccs.65176.

L. Fadilah, “Klasifikasi Random Forest pada data imbalanced”, Bachelor's thesis, Fakultas Sains dan Teknologi UIN Syarif Hidayatullah Jakarta, 2018.

W. A. Setyati, S. Sunaryo, A. Rezagama, A. K. Widodo, And M. F. A. Yulianto, “Penerapan Regresi Logistik Dalam Penentuan Faktor Yang Mempengaruhi Jumlah Wisatawan Ecotourism Desa Bedono,” Jurnal Enggano, vol. 5, no. 1, pp. 11–22, Apr. 2020, doi: 10.31186/jenggano.5.1.11-22.

A. Bisri and R. S. Wahono, “Penerapan Adaboost untuk Penyelesaian Ketidakseimbangan Kelas pada Penentuan Kelulusan Mahasiswa dengan Metode Decision Tree,” Journal of Intelligent Systems, vol. 1, no. 1, 2015, [Online]. Available: http://journal.ilmukomputer.org

Z. K. S. Domas and R. Rakhmadi, “Peningkatan Performa Decision Tree dengan AdaBoost untuk Klasifikasi Kekurangtransparanan Informasi Anti-Korupsi,” Applied Information System and Management (AISM), vol. 5, no. 2, pp. 75–82, Sep. 2022, doi: 10.15408/aism.v5i2.24887.

S. Grania and T.M.S. Mulyana, “Penerapan Algoritma Perceptron Pada Jaringan Syaraf Tiruan Dalam Pembagian Jurusan,” Jurnal Teknologi Informasi, vol. 11, no. 2, 2017

I. Düntsch and G. Gediga, “Confusion matrices and rough set data analysis,” Journal of Physics: Conference Series, vol. 1229, no. 1, 012055, 2019, doi: 10.1088/1742-6596/1229/1/012055.

Downloads

Published

29-11-2024

How to Cite

Nauval Dwi Primadya, Adhitya Nugraha, Sahrul Yudha Fahrezi, & Ardytha Luthfiarta. (2024). Optimizing Imbalanced Data Classification: Under Sampling Algorithm Strategy with Classification Combination. Techné : Jurnal Ilmiah Elektroteknika, 23(2), 277–288. https://doi.org/10.31358/techne.v23i2.435

Issue

Section

Articles