Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data
Heegeon Yoon, Heeyoung Kim

TL;DR
This paper presents a novel multi-modal framework for long-tailed recognition that dynamically fuses heterogeneous data sources, improving classification performance on imbalanced datasets.
Contribution
It introduces a modality-aware multi-expert architecture with confidence-guided fusion and specialized training procedures for multi-modal long-tailed recognition.
Findings
Outperforms existing methods on benchmark datasets.
Effectively integrates multi-modal data for imbalanced classification.
Demonstrates robustness and generalization in real-world scenarios.
Abstract
Long-tailed distributions in class-imbalanced data present a fundamental challenge for deep learning models, which tend to be biased toward majority classes. While recent methods for long-tailed recognition have mitigated this issue, they are largely restricted to single-modal inputs and cannot fully exploit complementary information from diverse data sources. In this work, we introduce a new framework for long-tailed recognition that explicitly handles multi-modal inputs. Our approach extends multi-expert architectures to the multi-modal setting by fusing heterogeneous data into a unified representation while leveraging modality-specific networks to estimate the informativeness of each modality. These confidence-guided weights dynamically modulate the fusion process, ensuring that more informative modalities contribute more strongly to the final decision. To further enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
