Marathi Speech Emotion recognition using Deep Learning techniques.

Akhilesh Ketkar; Divyansh  Mishra; Madhur Nirmal; Faizan Mulla; Vaibhav Narawade

doi:10.25077/chipset.5.01.1-4.2024

PDF

Published: Apr 30, 2024

##plugins.pubIds.doi.readerDisplayName##: https://doi.org/10.25077/chipset.5.01.1-4.2024

Keywords:

Speech emotion recognition, Convolution neural network, Deep learning, Discrete-Time Fourier Transform

Akhilesh Ketkar

Divyansh Mishra

Madhur Nirmal

Faizan Mulla

Vaibhav Narawade

Abstract

In the project, an emotion recognition system from speech is proposed using deep learning. The goal of this project is to classify a speech signal into one of the five emotions listed below: anger, boredom, fear, happiness, and sadness. Snippets below from numerous Marathi movies and TV shows were used to construct the dataset for Marathi language samples which include 20 audio samples for anger, 19 for boredom, 5 for fear, and 11 for happiness. The proposed system first processes a speech signal from the time domain to the frequency domain using Discrete Time Fourier Transform (DTFT). Then, data augmentation is performed which includes noise injection, stretching, shifting, and pitch scaling of the speech signal. Next, feature extraction is performed in which 5 features were selected, which include Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), Chroma STFT, Mel Spectrogram, and Root mean square value. These features were then fed to a Convolutional Neural Network (CNN). The efficiency of the suggested system employing the CNNs is supported by experimental findings. This model’s accuracy on the test data is 80.33%, and its f1 values for anger, boredom, fear, happiness, and sadness are 0.85, 0.83, 0.50, 0.62, and 0.84, respectively.

How to Cite

[1]

A. Ketkar, D. Mishra, M. Nirmal, F. Mulla, and V. Narawade, “Marathi Speech Emotion recognition using Deep Learning techniques.”, chipset, vol. 5, no. 01, pp. 1-4, Apr. 2024.

Issue

Vol. 5 No. 01 (2024): Journal on Computer Hardware, Signal Processing, Embedded System and Networking

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Please click on this link for a detailed copyright notice.

References

[1] R. A. Khalil, E. Jones, M. I. Babar, T. Jan, M. H. Zafar and T. Alhussain, "Speech Emotion Recognition Using Deep Learning Techniques: A Review," in IEEE Access, vol. 7, pp. 117327-117345, 2019, doi: 10.1109/ACCESS.2019.2936124.

[2] W. Lim, D. Jang and T. Lee, "Speech emotion recognition using convolutional and Recurrent Neural Networks," 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016, pp. 1-4, doi: 10.1109/APSIPA.2016.7820699.

[3] Yoon, WJ., Park, KS. (2007). A Study of Emotion Recognition and Its Applications. In: Torra, V., Narukawa, Y., Yoshida, Y. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2007. Lecture Notes in Computer Science(), vol 4617. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73729-2_43.

[4] M. S. Akhtar, A. Ekbal and E. Cambria, "How Intense Are You? Predicting Intensities of Emotions and Sentiments using Stacked Ensemble [Application Notes]," in IEEE Computational Intelligence Magazine, vol. 15, no. 1, pp. 64-75, Feb. 2020, doi: 10.1109/MCI.2019.2954667.

[5] M.Shamim Hossain , Ghulam Muhammad , Emotion Recognition Using Deep Learning Approach from Audio-Visual Emotional Big Data, Information Fusion (2018), doi:https://doi.org/10.1016/j.inffus.2018.09.008.

[6] K. -Y. Huang, C. -H. Wu, M. -H. Su and H. -C. Fu, "Mood detection from daily conversational speech using denoising autoencoder and LSTM," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5125-5129, doi: 10.1109/ICASSP.2017.7953133.

[7] E. Lieskovska, M. Jakubec and R. Jarina, "Speech Emotion Recognition Overview and Experimental Results," 2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA), 2020, pp. 388-393, doi: 10.1109/ICETA51985.2020.9379218..

[8] Araño, K.A., Gloor, P., Orsenigo, C. et al. When Old Meets New: Emotion Recognition from Speech Signals. Cogn Comput 13, 771–783 (2021). https://doi.org/10.1007/s12559-021-09865-2.

[9] P. Tzirakis, J. Zhang and B. W. Schuller, "End-to-End Speech Emotion Recognition Using Deep Neural Networks," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5089-5093, doi: 10.1109/ICASSP.2018.8462677.

[10] T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi and E. Ambikairajah, "A Comprehensive Review of Speech Emotion Recognition Systems," in IEEE Access, vol. 9, pp. 47795-47814, 2021, doi: 10.1109/ACCESS.2021.3068045.

[11] Liu, M. English speech emotion recognition method based on speech recognition. Int J Speech Technol 25, 391–398 (2022). https://doi.org/10.1007/s10772-021-09955-4.

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Abstract

##plugins.themes.bootstrap3.article.details##

References