Advances in Audio-Visual Emotion Recognition: A Comprehensive Review of Deep Learning Approaches
JALLURI KARTHIKEYA SAI SRI ADITHYA KARTHIKEYA SAI SRI ADITHYA
Paper Contents
Abstract
Audio-visual emotional analysis is an important part of affective computing, helping in areas like human-machine interaction, mental health, and autonomous systems. This review looks at the latest ways to use both audio and visual information to understand emotions and shows why combining these two types of information is helpful. However, many methods today dont fully use the shared information between audio and visual information, which makes them less effective. The review talks about different approaches, including advanced machine learning models that use attention mechanisms and methods to blend audio and visual information. It also examines new loss functions that help improve how features are learned from both types of information. The review includes methods like correlation analysis and joint loss strategies to combine audio and visual information better. It is based on studies using informationsets like RAVDESS, CREMA-D, eNTERFACE05, and BAUM-1s. The review highlights both the strengths and weaknesses of current methods and suggests where more research is needed. Keywords: Audio-Visual Emotion Recognition, Attention Mechanisms, Feature Fusion, Deep Learning, Loss Functions.
Copyright
Copyright © 2024 JALLURI KARTHIKEYA SAI SRI ADITHYA. This is an open access article distributed under the Creative Commons Attribution License.