Volume 30, Issue 2, 2021
DOI: 10.24205/03276716.2020.4093
Acoustic-Visual based Accent Identification System using Deep Neural Networks
Abstract
Human-machine interfaces are evolving rapidly. Identifying the accent of a speaker can improve the performance of speech recognition systems. Although foreign accent identification is extensively explored and this paper aims to build a robust accent identification for Tamil language using acoustic and visual features. The proposed system
which is first automatically recognize the speaker’s accent among regional Tamil accents from three different regions of Tamil Nadu. This system is built using acoustic mel cepstral features and visual optical flow motion features, which are classified as being either local
by Lucas-Kanade method, and global by Horn-Schunck technique. These proposed features are trained using a sequential model in an artificial and convolution neural network, which allows for the detection and classification of accents. Second, this system
uses visual color features and cepstral features to recognize accented speakers. The speaker recognition module trained with Hidden Markov Model. The Tamil accent system performance achieves 93.7%, 89.5%, and 96% acoustically, visemically, and the combined one respectively. The recognition rate of 93.1% for Nellai and Chennai accent whereas for Nellai and Kovai Accents, the accuracy was 94%. The multi features based accented speaker recognition system achieves better recognition rate of 96.7% rate compared to the individual feature-based system feature performance
Keywords
Automatic speech recognition, Artificial neural networks, Mel frequency cepstral coefficient, Optical flow motion