Federated high order tensor fusion for privacy preserving multimodal social media analysis

PLoS ONE 2026-05-06

Semantic Scholar ↗ Google Scholar ↗

Abstract

The rapid evolution of social networks has positioned multimodal content, including text, images, and audio, as a pivotal medium for self-expression and public sentiment analysis. However, existing multimodal fusion methods are often limited by privacy risks, parameter redundancy, and insufficient exploitation of intermodal correlations. To overcome these challenges, this study introduces a novel federated learning framework that integrates high-order tensor-based multimodal data fusion with privacy-aware decentralized training by keeping raw data local. It leverages tensor Tucker decomposition to capture complex spatial and semantic relationships between modalities, enhancing fusion accuracy while supporting user privacy through local data retention. Experimental results on the separate TREC2017 Precision Medicine Track Scientific Abstracts dataset and on the CMU-MOSI multimodal sentiment benchmark demonstrate that the proposed algorithm outperforms existing methods. The TREC2017 experiments validate the framework’s performance in text-dominant conditions (higher Mean Average Precision, MAP)), while the CMU-MOSI experiments confirm the effectiveness of the high-order tensor fusion in modeling intermodal correlations for multimodal tasks. Furthermore, our framework demonstrates adaptive learning capabilities, efficiently processing diverse multimodal data types without expanding redundant model parameters. This research opens new avenues for privacy-aware multimodal data fusion in social media, offering a robust solution for monitoring and managing online public opinion while supporting user privacy through local data retention.

Classification

Topics

multimodal analysisprivacy preservationtensor fusionsocial mediamachine learning

Methodology

federated learningtensor decompositionexperimental

Key findings

The algorithm enhances fusion accuracy by effectively capturing complex spatial and semantic relationships through tensor Tucker decomposition.

In experiments, the framework achieved higher Mean Average Precision (MAP) in text-dominant conditions compared to existing methods.

The adaptive learning capabilities of the framework allow it to process various multimodal data types efficiently without increasing redundant model parameters.

Conclusion

The proposed federated learning framework effectively integrates high-order tensor data fusion to enhance multimodal analysis while ensuring user privacy through local data retention. It significantly outperforms existing methods in both text-dominant and multimodal environments.

Practical advice

Future research should explore further improvements in privacy-aware techniques and the scalability of federated learning frameworks in handling larger datasets.

Agreement with similar literature

Coming soon: this paper's agreement with other literature answering the same research question.