Volume 30, Issue 2, 2021


DOI: 10.24205/03276716.2020.4093

Acoustic-Visual based Accent Identification System using Deep Neural Networks


Abstract
Human-machine interfaces are evolving rapidly. Identifying the accent of a speaker can improve the performance of speech recognition systems. Although foreign accent identification is extensively explored and this paper aims to build a robust accent identification for Tamil language using acoustic and visual features. The proposed system which is first automatically recognize the speaker’s accent among regional Tamil accents from three different regions of Tamil Nadu. This system is built using acoustic mel cepstral features and visual optical flow motion features, which are classified as being either local by Lucas-Kanade method, and global by Horn-Schunck technique. These proposed features are trained using a sequential model in an artificial and convolution neural network, which allows for the detection and classification of accents. Second, this system uses visual color features and cepstral features to recognize accented speakers. The speaker recognition module trained with Hidden Markov Model. The Tamil accent system performance achieves 93.7%, 89.5%, and 96% acoustically, visemically, and the combined one respectively. The recognition rate of 93.1% for Nellai and Chennai accent whereas for Nellai and Kovai Accents, the accuracy was 94%. The multi features based accented speaker recognition system achieves better recognition rate of 96.7% rate compared to the individual feature-based system feature performance

Keywords
Automatic speech recognition, Artificial neural networks, Mel frequency cepstral coefficient, Optical flow motion

Download PDF
Scroll to Top