Perbandingan performa deteksi cyberbullying dengan transformer, deep learning, dan machine learning

Authors

  • Fuad Muftie Universitas Nusa Mandiri
  • Kamal Muftie Yafi Universitas Indonesia
  • Qinthara Muftie Addina Universitas Pendidikan Indonesia

DOI:

https://doi.org/10.31571/saintek.v13i1.4002

Keywords:

Transformers, Sentiment Analysis, Natural Language Processing, Deep Learning

Abstract

Peningkatan aktivitas browsing terutama di situs media sosial mengakibatkan rawannya terjadi cyberbullying (perundungan dunia maya). Telah banyak dilakukan penelitian untuk melakukan pendeteksian cyberbullying, baik dengan metode machine learning maupun deep learning. Dalam penelitian ini dilakukan perbandingan performa pengklasifikasian data teks apakah termasuk cyberbullying atau bukan, dengan menggunakan algoritma Transformer. Kemudian dilakukan perbandingan performa metode transformer dengan metode deep learning lain (RNN, LSTM, dan GRU) serta dengan metode machine learning (Naïve Bayes, Logistic Regression, SVM, dan Decision Tree). Hasil terbaik untuk model deep learning adalah dataset Youtube dengan model Transformer yang mendapat akurasi 98.49%. Kemudian hasil terbaik model machine learning adalah dataset Youtube dengan model SVM dan menggunakan feature Tf-Idf yang mendapat akurasi 97.82%.

Downloads

Download data is not yet available.

Author Biographies

Kamal Muftie Yafi, Universitas Indonesia

Fakultas Matematika dan Ilmu Pengetahuan Alam

Qinthara Muftie Addina, Universitas Pendidikan Indonesia

Fakultas Pendidikan Bahasa dan Sastra Indonesia

References

Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226, 107134. https://doi.org/https://doi.org/10.1016/j.knosys.2021.107134

Caselli, T., Basile, V., Mitrović, J., & Granitzer, M. (2021). HateBERT: Retraining BERT for abusive language detection in English. Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 17–25. https://doi.org/10.18653/v1/2021.woah-1.3

Chugh, D., Anjum, A., & Katarya, R. (2021). Automated news summarization using transformers.

Dadvar, M., & Eckert, K. (2020). Cyberbullying detection in social networks using deep learning based models. In Big Data Analytics and Knowledge Discovery: 22nd International Conference, DaWaK 2020, Bratislava, Slovakia, September 14–17, 2020, Proceedings 22 (pp. 245-255). Springer International Publishing.

Elsafoury, F. (2020). Cyberbullying datasets. Mendeley Data. https://doi.org/10.17632/jf4pzyvnpj.1

Elsafoury, F., Katsigiannis, S., Pervez, Z., & Ramzan, N. (2021). When the Timeline meets the pipeline: A survey on automated cyberbullying detection. IEEE Access, 9, 103541–103563. https://doi.org/10.1109/ACCESS.2021.3098979

Glazkova, A. (2020). A Comparison of synthetic oversampling methods for multi-class text classification. CoRR, abs/2008.0. https://arxiv.org/abs/2008.04636

Iwendi, C., Srivastava, G., Khan, S., & Maddikunta, P. K. R. (2020). Cyberbullying detection solutions based on deep learning architectures. Multimedia Systems. https://doi.org/10.1007/s00530-020-00701-5

Jabeen, F., & Treur, J. (2018). Computational analysis of bullying behavior in the social media era BT - Computational Collective Intelligence (N. T. Nguyen, E. Pimenidis, Z. Khan, & B. Trawiński (eds.); pp. 192–205). Springer International Publishing.

Kennedy, S., Walsh, N., Sloka, K., Foster, J., & Mccarren, A. (2020). Fact or factitious? contextualized opinion spam detection.

Kovács, G. (2019). An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Applied Soft Computing, 83, 105662. https://doi.org/https://doi.org/10.1016/j.asoc.2019.105662

Ma, E. (2019). NLP Augmentation. https://github.com/makcedward/nlpaug

Mehta, P., & Pandya, D. S. (2020). A Review on sentiment analysis methodologies, Practices And Applications. International Journal of Scientific & Technology Research, 9, 601–609.

Rupapara, V., Rustam, F., Shahzad, H. F., Mehmood, A., Ashraf, I., & Choi, G. S. (2021). Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC Model. IEEE Access, 9, 78621–78634. https://doi.org/10.1109/ACCESS.2021.3083638

Shaikh, A. R., Alhoori, H., & Sun, M. (2023). YouTube and science: models for research impact. Scientometrics, 128(2), 933-955.

Sato, M., Orihara, R., Sei, Y., Tahara, Y., & Ohsuga, A. (2018). Text classification and transfer learning based on character-level deep convolutional neural networks BT - agents and artificial intelligence (J. van den Herik, A. P. Rocha, & J. Filipe (eds.); pp. 62–81). Springer International Publishing.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Lukasz, & Polosukhin, I. (2017). Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010.

Wei, J., & Zou, K. (2019). EDA: Easy data augmentation techniques for boosting performance on text classification tasks. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, abs/1901.1, 6382–6388. https://doi.org/10.18653/v1/d19-1670

Zhong, H., Miller, D. J., & Squicciarini, A. (2019). Flexible inference for cyberbully incident detection. In U. Brefeld, A. Marascu, F. Pinelli, E. Curry, B. MacNamee, N. Hurley, E. Daly, & M. Berlingerio (Eds.), European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018 (pp. 356-371 BT-Machine Learning and Knowledge Disco). Springer Verlag. https://doi.org/10.1007/978-3-030-10997-4_22

Downloads

Published

2024-06-30

How to Cite

Muftie, F., Yafi, K. M., & Addina, Q. M. (2024). Perbandingan performa deteksi cyberbullying dengan transformer, deep learning, dan machine learning. Jurnal Pendidikan Informatika Dan Sains, 13(1), 75–87. https://doi.org/10.31571/saintek.v13i1.4002