Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data

Heegeon Yoon; Heeyoung Kim

arXiv:2605.10498·cs.CV·May 12, 2026

Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data

Heegeon Yoon, Heeyoung Kim

PDF

TL;DR

This paper presents a novel multi-modal framework for long-tailed recognition that dynamically fuses heterogeneous data sources, improving classification performance on imbalanced datasets.

Contribution

It introduces a modality-aware multi-expert architecture with confidence-guided fusion and specialized training procedures for multi-modal long-tailed recognition.

Findings

01

Outperforms existing methods on benchmark datasets.

02

Effectively integrates multi-modal data for imbalanced classification.

03

Demonstrates robustness and generalization in real-world scenarios.

Abstract

Long-tailed distributions in class-imbalanced data present a fundamental challenge for deep learning models, which tend to be biased toward majority classes. While recent methods for long-tailed recognition have mitigated this issue, they are largely restricted to single-modal inputs and cannot fully exploit complementary information from diverse data sources. In this work, we introduce a new framework for long-tailed recognition that explicitly handles multi-modal inputs. Our approach extends multi-expert architectures to the multi-modal setting by fusing heterogeneous data into a unified representation while leveraging modality-specific networks to estimate the informativeness of each modality. These confidence-guided weights dynamically modulate the fusion process, ensuring that more informative modalities contribute more strongly to the final decision. To further enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.