Perbandingan performa deteksi cyberbullying dengan transformer, deep learning, dan machine learning
DOI:
https://doi.org/10.31571/saintek.v13i1.4002Keywords:
Transformers, Sentiment Analysis, Natural Language Processing, Deep LearningAbstract
Peningkatan aktivitas browsing terutama di situs media sosial mengakibatkan rawannya terjadi cyberbullying (perundungan dunia maya). Telah banyak dilakukan penelitian untuk melakukan pendeteksian cyberbullying, baik dengan metode machine learning maupun deep learning. Dalam penelitian ini dilakukan perbandingan performa pengklasifikasian data teks apakah termasuk cyberbullying atau bukan, dengan menggunakan algoritma Transformer. Kemudian dilakukan perbandingan performa metode transformer dengan metode deep learning lain (RNN, LSTM, dan GRU) serta dengan metode machine learning (Naïve Bayes, Logistic Regression, SVM, dan Decision Tree). Hasil terbaik untuk model deep learning adalah dataset Youtube dengan model Transformer yang mendapat akurasi 98.49%. Kemudian hasil terbaik model machine learning adalah dataset Youtube dengan model SVM dan menggunakan feature Tf-Idf yang mendapat akurasi 97.82%.
Downloads
References
Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226, 107134. https://doi.org/https://doi.org/10.1016/j.knosys.2021.107134
Caselli, T., Basile, V., Mitrović, J., & Granitzer, M. (2021). HateBERT: Retraining BERT for abusive language detection in English. Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 17–25. https://doi.org/10.18653/v1/2021.woah-1.3
Chugh, D., Anjum, A., & Katarya, R. (2021). Automated news summarization using transformers.
Dadvar, M., & Eckert, K. (2020). Cyberbullying detection in social networks using deep learning based models. In Big Data Analytics and Knowledge Discovery: 22nd International Conference, DaWaK 2020, Bratislava, Slovakia, September 14–17, 2020, Proceedings 22 (pp. 245-255). Springer International Publishing.
Elsafoury, F. (2020). Cyberbullying datasets. Mendeley Data. https://doi.org/10.17632/jf4pzyvnpj.1
Elsafoury, F., Katsigiannis, S., Pervez, Z., & Ramzan, N. (2021). When the Timeline meets the pipeline: A survey on automated cyberbullying detection. IEEE Access, 9, 103541–103563. https://doi.org/10.1109/ACCESS.2021.3098979
Glazkova, A. (2020). A Comparison of synthetic oversampling methods for multi-class text classification. CoRR, abs/2008.0. https://arxiv.org/abs/2008.04636
Iwendi, C., Srivastava, G., Khan, S., & Maddikunta, P. K. R. (2020). Cyberbullying detection solutions based on deep learning architectures. Multimedia Systems. https://doi.org/10.1007/s00530-020-00701-5
Jabeen, F., & Treur, J. (2018). Computational analysis of bullying behavior in the social media era BT - Computational Collective Intelligence (N. T. Nguyen, E. Pimenidis, Z. Khan, & B. Trawiński (eds.); pp. 192–205). Springer International Publishing.
Kennedy, S., Walsh, N., Sloka, K., Foster, J., & Mccarren, A. (2020). Fact or factitious? contextualized opinion spam detection.
Kovács, G. (2019). An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Applied Soft Computing, 83, 105662. https://doi.org/https://doi.org/10.1016/j.asoc.2019.105662
Ma, E. (2019). NLP Augmentation. https://github.com/makcedward/nlpaug
Mehta, P., & Pandya, D. S. (2020). A Review on sentiment analysis methodologies, Practices And Applications. International Journal of Scientific & Technology Research, 9, 601–609.
Rupapara, V., Rustam, F., Shahzad, H. F., Mehmood, A., Ashraf, I., & Choi, G. S. (2021). Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC Model. IEEE Access, 9, 78621–78634. https://doi.org/10.1109/ACCESS.2021.3083638
Shaikh, A. R., Alhoori, H., & Sun, M. (2023). YouTube and science: models for research impact. Scientometrics, 128(2), 933-955.
Sato, M., Orihara, R., Sei, Y., Tahara, Y., & Ohsuga, A. (2018). Text classification and transfer learning based on character-level deep convolutional neural networks BT - agents and artificial intelligence (J. van den Herik, A. P. Rocha, & J. Filipe (eds.); pp. 62–81). Springer International Publishing.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Lukasz, & Polosukhin, I. (2017). Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010.
Wei, J., & Zou, K. (2019). EDA: Easy data augmentation techniques for boosting performance on text classification tasks. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, abs/1901.1, 6382–6388. https://doi.org/10.18653/v1/d19-1670
Zhong, H., Miller, D. J., & Squicciarini, A. (2019). Flexible inference for cyberbully incident detection. In U. Brefeld, A. Marascu, F. Pinelli, E. Curry, B. MacNamee, N. Hurley, E. Daly, & M. Berlingerio (Eds.), European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018 (pp. 356-371 BT-Machine Learning and Knowledge Disco). Springer Verlag. https://doi.org/10.1007/978-3-030-10997-4_22
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Fuad Muftie, Kamal Muftie Yafi, Qinthara Muftie Addina
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
In submitting the manuscript to the journal, the authors certify that:
- They are authorized by their co-authors to enter into these arrangements.
- The work described has not been formally published before, except in the form of an abstract or as part of a published lecture, review, thesis, or overlay journal. Please also carefully read Jurnal Pendidikan Informatika dan Sains Posting Your Article Policy at http://journal.ikippgriptk.ac.id/index.php/saintek/about/submissions#onlineSubmissions
- That it is not under consideration for publication elsewhere,
- That its publication has been approved by all the author(s) and by the responsible authorities – tacitly or explicitly – of the institutes where the work has been carried out.
- They secure the right to reproduce any material that has already been published or copyrighted elsewhere.
- They agree to the following license and copyright agreement.
Copyright
Authors who publish with Jurnal Pendidikan Informatika dan Sains agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.