EMO-Avatar: An LLM-Agent-Orchestrated Framework for Multimodal
Emotional Support in Human Animation
Here is an example: The virtual avatar in the video can express corresponding body language, tone, and
emotionally supportive responses based on user's input context. This technology aims to deliver more
emotionally
expressive
and empathetic virtual humans, applicable in scenarios such as AI-assisted psychological counseling. PS:
YouTube videos default to 360p and can be adjusted to 720p.
Abstract
Emotional Support Chatbots could unlock potential by providing scalable, low-cost, and personal emotional
support, overcoming critical accessibility barriers inherent in traditional counseling. However, current
text-based Chatbots fall short in conveying the multimodal empathy crucial in counseling. Humans naturally
prefer face-to-face communication with peers to share feelings, encompassing spoken tone, micro-expressions, and
body language to convey empathy. To bridge this gap, we propose EMO-Avatar, an LLM-agent-orchestrated framework
that integrates emotional reasoning capabilities and multimodal expression in counseling. Our approach
introduces two innovations: (1) a Multimodal Emotional Support Agent. EMO-Avatar can follow adaptive instruction
across TTS, pose, micro-expressions, and body actions, leading to the generation of highly expressive human
animations. (2) a Comforting-Exploration-Action support strategy; EMO-Avatar systematically integrates Hill's
three-stage counseling theory into its emotional reasoning capability. Guided by the LLM's reasoning, this
strategy informs response generation and displays stage-specific preferences for speech, body language, and
expressions. EMO-Avatar can provide deeper emotional support and therapeutic human-like interactions.
Experimental validation on the AvaMERG Challenge demonstrates EMO-Avatar's superior performance, achieving top-2
ranking among 20 participants across response appropriateness, multimodal consistency, naturalness, and
emotional expressiveness metrics
Team AI4AI
EMO-Avatar - Multimodal Emotional Support Video -TOP2 Solution in ACMMMM25 Challenge
Network quality may affect video playback results. Please check the youtube connection.
PS: Youtube Login is required.
You can access more of our videos through Baidu Netdisk.
Shared via Baidu Netdisk with Online Videos:
audio video json README — 4 files
🔗 Link: https://pan.baidu.com/s/1hmVOY2ISejRsaRfpiNbkUA?pwd=AI4A
🔑 Access code: AI4A
AvaMERG@MM2025 Grand Challenge - Avatar-based Multimodal Empathetic Response Generation
https://avamerg.github.io/MM25-challenge/