Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media
Zhizhen Zhang, Ning Wang, Haojie Li, Zhihui Wang

TL;DR
This paper introduces a novel multimodal fusion transformer that leverages similarity guidance to improve semantic location prediction from noisy social media posts containing text and images.
Contribution
It proposes a similarity-guided interaction module and a fusion mechanism that effectively reduce noise and modality heterogeneity in multimodal social media data.
Findings
Outperforms existing methods in semantic location prediction tasks.
Effectively reduces noise and heterogeneity in multimodal data.
Demonstrates superior accuracy through comprehensive experiments.
Abstract
Semantic location prediction aims to derive meaningful location insights from multimodal social media posts, offering a more contextual understanding of daily activities than using GPS coordinates. This task faces significant challenges due to the noise and modality heterogeneity in "text-image" posts. Existing methods are generally constrained by inadequate feature representations and modal interaction, struggling to effectively reduce noise and modality heterogeneity. To address these challenges, we propose a Similarity-Guided Multimodal Fusion Transformer (SG-MFT) for predicting the semantic locations of users from their multimodal posts. First, we incorporate high-quality text and image representations by utilizing a pre-trained large vision-language model. Then, we devise a Similarity-Guided Interaction Module (SIM) to alleviate modality heterogeneity and noise interference by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Human Mobility and Location-Based Analysis
MethodsAttention Is All You Need · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer · Greedy Policy Search · Byte Pair Encoding
