
The Neuroscience of Engagement: A Multimodal AI Approach to Understanding and Enhancing Human-Robot Interaction
Project period
January 2025 – December 2026
Objective
This project will explore the neural correlates of human-human and human-robot conversations, with the goal of creating adaptive social robots capable of fostering more meaningful interactions. Social robots can assist people in societal situations such as health care, elderly care, education, public spaces and homes.
Our newly developed telepresence system for human-robot interaction, allowed participants to situate themselves in natural conversations while physically in a functional magnetic resonance imaging scanner. Each participant interacted directly with a human-like robot or ahuman actor while lying in the scanner. In our previous research pairs project, we used this telepresence interface to create the pioneering NeuroEngage fMRI dataset.
This project aims to advance the understanding of conversational engagement by integrating neuroscience, human-robot interaction (HRI), and artificial intelligence. Engagement plays a crucial role in effective communication, yet its underlying brain mechanisms and real-time detection remain largely unexplored. We will use the NeuroEngage dataset and complement it with additional multimodal features like facial expressions, audio embeddings, and detailed annotations of engagement levels. By using multimodal machine learning (MML), this research will develop models capable of detecting and responding to engagement levels in social interactions.
Background
In everyday conversations, a speaker and a listener are involved in a common project that relies on close coordination, requiring each participant’s continuous attention and related engagement. However, current engagement detection methods lack robustness and often rely on superficial behavioral cues, without considering the underlying neural mechanisms that drive engagement. Prior research has demonstrated the feasibility of engagement detection using multimodal signals, but most existing datasets are limited in their scope and do not incorporate neuroimaging data.
In our previous work, by analyzing two different datasets, we have shown that listening to arobot recruits more activity in sensory regions, including auditory and visual areas. We also have observed strong indications that speaking to a human, compared to the robot, recruitsmore activity in frontal regions associated with socio-pragmatic processing, i.e. considering the other’s viewpoint and factoring in what to say next. Additional comparisons of this sort will be enabled by expanding our dataset and refining machine learning models for engagement prediction. As a result, this project will help with AI-driven conversational adaptivity, advancing research in both HRI and neuroscience.
Crossdisciplinary collaboration
The researchers in the team represent the Department of Intelligent Systems, division of Speech Music and Interaction at KTH EECS, the Psychology Department and the Linguistics Department at Stockholm University. This project integrates neuroscience, linguistics, social robotics, and AI to study how humans engage in conversations with both humans and robots.
Contacts

André Tiago Abelho Pereira
Researcher at KTH, Co-PI: Using Neuroimaging Data for Exploring Conversational Engagement in Human-Robot Interaction, Co-PI: The Neuroscience of Engagement: A Multimodal AI Approach to Understanding and Enhancing Human-Robot Interaction, Digital Futures Faculty
atap@kth.se
Julia Uddén
Assistant Professor at Stockholm University, Co-Pi of project Using Neuroimaging Data for Exploring Conversational Engagement in Human-Robot Interaction at Digital Futures, Co-PI: The Neuroscience of Engagement: A Multimodal AI Approach to Understanding and Enhancing Human-Robot Interaction, Digital Futures Faculty
08-16 32 32julia.udden@psychology.su.se