Speech, head, and eye-based cues for continuous affect prediction.
Abstract
Continuous affect prediction involves the discrete time-continuous regression of affect dimensions. Researchers in this domain are currently embracing multimodal model input. This provides motivation for researchers to investigate previously unexplored affective cues. Speech-based cues have traditionally received the most attention for affect prediction, however, nonverbal inputs have significant potential to increase the performance of affective computing systems and enable affect modelling in the absence of speech. Non-verbal inputs that have received little attention for continuous affect prediction include head and eye-based cues. Both head and eye-based cues are involved in emotion displays and perception. Additionally, these cues can be estimated non-intrusively from video, using computer vision tools. This work exploits this gap by comprehensively investigating head and eye-based features and their combination with speech for continuous affect prediction. Hand-crafted, automatically generated and convolutional neural network (CNN) learned features from these modalities will be investigated for continuous affect prediction. The highest performing feature set combinations will answer how effective these features are for the prediction of an individual's affective state.
Collections
The following license files are associated with this item: