Empowering low-resource languages: a machine learning approach to Tamil sentiment classification

Saleem Raja Abdul Samad, Pradeepa Ganesan, Justin Rajasekaran, Madhubala Radhakrishnan, Peerbasha Shebbeer Basha, Varalakshmi Kuppusamy

Abstract


Sentiment analysis is essential for deciphering public opinion, guiding decisions, and refining marketing strategies. It plays a crucial role in monitoring public sentiment, fostering customer engagement, and enhancing relationships with businesses' target audiences by analyzing emotional tones and attitudes in vast textual data. Sentiment analysis is extremely limited, particularly for languages like Tamil, due to limited application in diverse linguistic contexts with fewer resources. Given its global impact and linguistic diversity, addressing this gap is crucial for a more nuanced understanding of sentiments in India. In the context of Tamil, the need for sentiment analysis models is particularly crucial due to its status as one of the classical languages spoken by millions. The cultural, social, and historical nuances embedded in Tamil language usage require tailored sentiment analysis approaches that can capture the subtleties of sentiment expression. This paper introduces a novel method that assesses the performance of various text embedding methods in conjunction with a range of machine learning (ML) algorithms to enhance sentiment classification for Tamil text, with a specific focus on lyrics. Experiments notably emphasize FastText word embedding as the most effective method, showcasing superior results with a remarkable 78% accuracy when coupled with the support vector classification (SVC) model.

Keywords


FastText; Natural language processing; Tamil sentiment classification; Text embedding; Word embedding; Word2Vec

Full Text:

PDF


DOI: http://doi.org/10.11591/ijict.v14i3.pp941-949

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Saleem Raja Abdul Samad

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The International Journal of Informatics and Communication Technology (IJ-ICT)
p-ISSN 2252-8776, e-ISSNĀ 2722-2616
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

Web Analytics View IJICT Stats