Multilingual hate speech detection using deep learning

Vincent Vincent; Amalia Zahra

doi:10.11591/ijict.v14i3.pp1015-1023

Multilingual hate speech detection using deep learning

Vincent Vincent, Amalia Zahra

Abstract

The rise of social media has enabled public expression but also fueled the spread of hate speech, contributing to social tensions and potential violence. Natural language processing (NLP), particularly text classification, has become essential for detecting hate speech. This study develops a hate speech detection model on Twitter using FastText with bidirectional long short-term memory (Bi-LSTM) and explores multilingual bidirectional encoder representations from transformers (M-BERT) for handling diverse languages. Data augmentation techniques-including easy data augmentation (EDA) methods, back translation, and generative adversarial networks (GANs)-are employed to enhance classification, especially for imbalanced datasets. Results show that data augmentation significantly boosts performance. The highest F1-scores are achieved by random insertion for Indonesian (F1-score: 0.889, Accuracy: 0.879), synonym replacement for English (F1-score: 0.872, Accuracy: 0.831), and random deletion for German (F1-score: 0.853, Accuracy: 0.830) with the FastText + Bi-LSTM model. The M-BERT model performs best with random deletion for Indonesian (F1-score: 0.898, Accuracy: 0.880), random swap for English (F1 score: 0.870, Accuracy: 0.866), and random deletion for German (F1-score: 0.662, Accuracy: 0.858). These findings underscore that data augmentation effectiveness varies by language and model. This research supports efforts to mitigate hate speech’s impact on social media by advancing multilingual detection capabilities.

Keywords

Bi-LSTM; Data augmentation; Hate speech; M-BERT; NLP

Full Text:

PDF

DOI: http://doi.org/10.11591/ijict.v14i3.pp1015-1023

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The International Journal of Informatics and Communication Technology (IJ-ICT)
p-ISSN 2252-8776, e-ISSN 2722-2616
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJICT Stats

Username
Password
Remember me