Home > Publications
Home University of Twente
Prospective Students
Intranet (internal)

EEMCS EPrints Service

26750 Discrimination Between Native and Non-Native Speech Using Visual Features Only
Home Policy Brochure Browse Search User Area Contact Help

Georgakis, C. and Petridis, S. and Pantic, M. (2016) Discrimination Between Native and Non-Native Speech Using Visual Features Only. IEEE transactions on cybernetics, 46 (12). pp. 2758-2771. ISSN 2168-2267 *** ISI Impact 4,943 ***

Full text available as:

- Univ. of Twente only
1284 Kb

Official URL:

Exported to Metis


Accent is a soft biometric trait that can be inferred from pronunciation and articulation patterns characterizing the speaking style of an individual. Past research has addressed the task of classifying accent, as belonging to a native language speaker or a foreign language speaker, by means of the audio modality only. However, features extracted from the visual stream of speech have been successfully used to extend or substitute audio-only approaches that target speech or language recognition. Motivated by these findings, we investigate to what extent temporal visual speech dynamics attributed to accent can be modeled and identified when the audio stream is missing or noisy, and the speech content is unknown. We present here a fully automated approach to discriminating native from non-native English speech, based exclusively on visual cues. A systematic evaluation of various appearance and shape features for the target problem is conducted, with the former consistently yielding superior performance. Subject-independent cross-validation experiments are conducted on mobile phone recordings of continuous speech and isolated word utterances spoken by 56 subjects from the challenging MOBIO database. High performance is achieved on a text-dependent (TD) protocol, with the best score of 76.5% yielded by fusion of five hidden Markov models trained on appearance features. Our framework is also efficient even when tested on examples of speech unseen in the training phase, although performing less accurately compared to the TD case.

Item Type:Article
Research Group:EWI-HMI: Human Media Interaction
Research Program:CTIT-General
Research Project:TERESA: Telepresence Reinforcement-learning Social Agent
Uncontrolled Keywords:Foreign accent detection, non-native speech, visual accent classification, visual speech processing
ID Code:26750
Deposited On:08 February 2016
ISI Impact Factor:4,943
More Information:statisticsmetis

Export this item as:

To request a copy of the PDF please email us request copy

To correct this item please ask your editor

Repository Staff Only: edit this item