Affective computing using speech and eye gaze: a review and bimodal system proposal for continuous affect prediction.
Abstract
Speech has been a widely used modality in the field of affective computing. Recently however, there has been a growing interest in the use of multi-modal affective computing systems. These multi-modal systems incorporate both verbal
and non-verbal features for affective computing tasks. Such multi-modal affective computing systems are advantageous for emotion assessment of individuals
in audio-video communication environments such as teleconferencing, healthcare, and education. From a review of the literature, the use of eye gaze features extracted from video is a modality that has remained largely unexploited
for continuous affect prediction. This work presents a review of the literature
within the emotion classification and continuous affect prediction sub-fields of
affective computing for both speech and eye gaze modalities. Additionally, continuous affect prediction experiments using speech and eye gaze modalities are
presented. A baseline system is proposed using open source software, the performance of which is assessed on a publicly available audio-visual corpus. Further
system performance is assessed in a cross-corpus and cross-lingual experiment.
The experimental results suggest that eye gaze is an effective supportive modality for speech when used in a bimodal continuous affect prediction system. The
addition of eye gaze to speech in a simple feature fusion framework yields a
prediction improvement of 6.13% for valence and 1.62% for arousal.
Collections
The following license files are associated with this item: