On the Evaluation and Implementation of LSTM Model for Speech Emotion Recognition using MFCC

Sheetal Bhandari

Abstract


Speech Emotion Recognition is an emerging research field and is expected to benefit many application domains by providing effective Human Computer Interface. Researchers are extensively working towards decoding of human emotions through speech signal in order to achieve effective interface and smart response by computers. The perfection of speech emotion recognition greatly depends upon the types of features used and also on the classifier employed for recognition. The contribution of this paper is to evaluate twelve different Long Short Term Memory (LSTM) networks models as classifier based on Mel-Frequency Cepstrum Coefficients (MFCC) feature. The paper presents performance evaluation in terms of important parameters such as: precision, recall, F-measure and accuracy for four emotions like happy, neutral, sad and angry using the emotional speech databases namely Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The measurement accuracy obtained is 89% which is 9.5% more than reported in recent literature. The suitable LSTM model is further successfully implemented on Raspberry PI board creating standalone Speech Emotion Recognition system.

Keywords


Speech Emotion Recognition; RaspberryPi; Long Short Term Memory (LSTM) networks models; Mel-Frequency Cepstrum Coefficients (MFCC); Human Computer Interface



DOI: http://doi.org/10.11591/ijict.v10i3.pp%25p

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View IJICT Stats