You must have come across a machine welcoming visitors at restaurants, guiding them to tables, making and serving food, or even mopping the floor. But then, here is a new AI machine learning model that can be leveraged towards interpreting human emotions to assist people in the realm of personal healthcare.
Researchers at the International Institute of Information Technology (IIIT) in Hyderabad have come up with a novel machine learning model that teaches the machine to interpret emotions using videos.
In their study titled “How are you feeling?” Learning emotions and mental states in movie scenes”, the researchers introduced a machine learning model to understand and label emotions not just for each movie character but also for the overall scene.
With film holding a great deal of emotional data, the research group picked movies as their starting point. “A character can go through a range of emotions in a single scene, from surprise and happiness to anger and even sadness. The emotions in a scenario cannot be summed up with a single label, and evaluating several emotions and mental states is important,” the study’s primary author, Dhruv Srivastava, said.
According to experts, there is a contrast between an emotion and a mental state, and while the former might be obvious and visible, for instance, pleased and angry, the latter relates to ideas or sentiments that may be difficult to distinguish outwardly, for example, honest and helpful.
Decoding emotion and mental state based just on the language used was fraught with difficulty, co-author Prof. Makarand Tapaswi said. “Take the statement: I hate you. Interpreted in isolation, stripped of visual signals, a machine will likely categorise the underlying emotion as ‘anger’. However, the identical remark might be repeated in a humorous manner where the character is smiling at another while saying it, thereby confusing machines,” he said.
Researchers used an existing dataset of movie clips collected by Prof. Tapaswi in his earlier work, MovieGraphs. The EmoTx was trained to reliably label the emotions and mental states of characters in each scene. For this, the researchers employed a three-pronged technique: analysing the complete film and the movements involved, analysing particular facial features of distinct people, and extracting the subtitles that accompanied the dialogues in each scene.
“Based on the three criteria, we were able to predict the corresponding mental states of the characters, which are not explicit in the scenes,” co-author Aditya Kumar Singh said.
According to academics, the approach can potentially be leveraged to assist in the field of personal healthcare. The paper has been accepted for presentation at the conference on Computer Vision and Pattern Recognition 2023 in Vancouver, Canada, from June 18 to 23.