Please use this identifier to cite or link to this item:
https://elib.bsu.by/handle/123456789/337595| Title: | Multimodal Emotion Recognition Method Based on Domain Generalization and Graph Neural Networks |
| Authors: | Xie, J. Wang, Y. Meng, T. Tai, J. Zheng, Y. Varatnitski 4, Yu.I. |
| Keywords: | ЭБ БГУ::ТЕХНИЧЕСКИЕ И ПРИКЛАДНЫЕ НАУКИ. ОТРАСЛИ ЭКОНОМИКИ::Автоматика. Вычислительная техника |
| Issue Date: | 2025 |
| Publisher: | MDPI |
| Citation: | Electronics. 2025; 14: 885 |
| Abstract: | In recent years, multimodal sentiment analysis has attracted increasing attention from researchers owing to the rapid development of human–computer interactions. Sentiment analysis is an important task for understanding dialogues. However, with the increase of multimodal data, the processing of individual modality features and the methods for multimodal feature fusion have become more significant for research. Existing methods that handle the features of each modality separately are not suitable for subsequent multimodal fusion and often fail to capture sufficient global and local information. Therefore, this study proposes a novel multimodal sentiment analysis method based on domain generalization and graph neural networks. The main characteristic of this method is that it considers the features of each modality as domains. It extracts domain-specific and cross-domain-invariant features, thereby facilitating cross-domain generalization. Generalized features are more suitable for multimodal fusion. Graph neural networks were employed to extract global and local information from the dialogue to capture the emotional changes of the speakers. Specifically, global representations were captured by modeling cross-modal interactions at the dialogue level, whereas local information was typically inferred from temporal information or the emotional changes of the speakers. The method proposed in this study outperformed existing models on the IEMOCAP, CMU-MOSEI, and MELD datasets by 0.97%, 1.09% (for seven-class classification), and 0.65% in terms of weighted F1 score, respectively. This clearly demonstrates that the domain-generalized features proposed in this study are better suited for subsequent multimodal fusion, and that the model developed here is more effective at capturing both global and local information. |
| URI: | https://elib.bsu.by/handle/123456789/337595 |
| DOI: | 10.3390/electronics14050885 |
| Scopus: | 86000555087 |
| Sponsorship: | This research was funded by the Education Department of Hainan Province (project number: Huky2022-19) and Key Project of Application Research on the National Smart Education Platform for Primary and Secondary Schools in Hainan Province. |
| Licence: | info:eu-repo/semantics/openAccess |
| Appears in Collections: | Кафедра информатики и компьютерных систем. Статьи |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| electronics-14-00885.pdf | 1,11 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

