Similarity Guided Multimodal Fusion Transformer for Semantic Location   Prediction in Social Media

Zhizhen Zhang; Ning Wang; Haojie Li; Zhihui Wang

arXiv:2405.05760·cs.CV·June 25, 2024

Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media

Zhizhen Zhang, Ning Wang, Haojie Li, Zhihui Wang

PDF

Open Access

TL;DR

This paper introduces a novel multimodal fusion transformer that leverages similarity guidance to improve semantic location prediction from noisy social media posts containing text and images.

Contribution

It proposes a similarity-guided interaction module and a fusion mechanism that effectively reduce noise and modality heterogeneity in multimodal social media data.

Findings

01

Outperforms existing methods in semantic location prediction tasks.

02

Effectively reduces noise and heterogeneity in multimodal data.

03

Demonstrates superior accuracy through comprehensive experiments.

Abstract

Semantic location prediction aims to derive meaningful location insights from multimodal social media posts, offering a more contextual understanding of daily activities than using GPS coordinates. This task faces significant challenges due to the noise and modality heterogeneity in "text-image" posts. Existing methods are generally constrained by inadequate feature representations and modal interaction, struggling to effectively reduce noise and modality heterogeneity. To address these challenges, we propose a Similarity-Guided Multimodal Fusion Transformer (SG-MFT) for predicting the semantic locations of users from their multimodal posts. First, we incorporate high-quality text and image representations by utilizing a pre-trained large vision-language model. Then, we devise a Similarity-Guided Interaction Module (SIM) to alleviate modality heterogeneity and noise interference by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Human Mobility and Location-Based Analysis

MethodsAttention Is All You Need · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer · Greedy Policy Search · Byte Pair Encoding