Home > Publications
Home University of Twente
Prospective Students
Intranet (internal)

EEMCS EPrints Service

26753 Prediction-based Audiovisual Fusion for Classification of Non-Linguistic Vocalisations
Home Policy Brochure Browse Search User Area Contact Help

Petridis, S. and Pantic, M. (2016) Prediction-based Audiovisual Fusion for Classification of Non-Linguistic Vocalisations. IEEE transactions on affective computing, 7 (1). pp. 45-58. ISSN 1949-3045 *** ISI Impact 1,873 ***

Full text available as:

- Univ. of Twente only
11827 Kb

Official URL:

Exported to Metis


Prediction plays a key role in recent computational models of the brain and it has been suggested that the brain constantly makes multisensory spatiotemporal predictions. Inspired by these findings we tackle the problem of audiovisual fusion from a new perspective based on prediction. We train predictive models which model the spatiotemporal relationship between audio and visual features by learning the audio-to-visual and visual-to-audio feature mapping for each class. Similarly, we train predictive models which model the time evolution of audio and visual features by learning the past-to-future feature mapping for each class. In classification, all the class-specific regression models produce a prediction of the expected audio / visual features and their prediction errors are combined for each class. The set of class-specific regressors which best describes the audiovisual feature relationship, i.e., results in the lowest prediction error, is chosen to label the input frame. We perform cross-database experiments, using the AMI, SAL and MAHNOB databases, in order to classify laughter and speech and subject-independent experiments on the AVIC database in order to classify laughter, hesitation and consent. In virtually all cases prediction-based audiovisual fusion consistently outperforms the two most commonly used fusion approaches, decision-level and feature-level fusion.

Item Type:Article
Research Group:EWI-HMI: Human Media Interaction
Research Program:CTIT-General
Research Project:TERESA: Telepresence Reinforcement-learning Social Agent
Uncontrolled Keywords:Prediction-based Fusion, Audiovisual Fusion, Nonlinguistic Vocalisation Classification
ID Code:26753
Deposited On:09 February 2016
ISI Impact Factor:1,873
More Information:statisticsmetis

Export this item as:

To request a copy of the PDF please email us request copy

To correct this item please ask your editor

Repository Staff Only: edit this item