$\phi$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models
Thanh-Dat Truong, Huu-Thien Tran, Jackson Cothren, Bhiksha Raj, Khoa Luu

TL;DR
This paper introduces $phi$-DPO, a novel continual learning framework for large multimodal models that explicitly addresses data imbalance and mitigates catastrophic forgetting through preference optimization.
Contribution
The paper proposes a new $phi$-DPO loss function and a continual learning paradigm based on preference signals, improving fairness and performance in large multimodal models.
Findings
$phi$-DPO outperforms prior methods on multiple benchmarks.
The approach effectively mitigates data imbalance and catastrophic forgetting.
Extensive experiments validate the theoretical analysis and practical benefits.
Abstract
Fairness in Continual Learning for Large Multimodal Models (LMMs) is an emerging yet underexplored challenge, particularly in the presence of imbalanced data distributions that can lead to biased model updates and suboptimal performance across tasks. While recent continual learning studies have made progress in addressing catastrophic forgetting, the problem of fairness caused the imbalanced data remains largely underexplored. This paper presents a novel Fairness Direct Preference Optimization (FaiDPO or -DPO) framework for continual learning in LMMs. In particular, we first propose a new continual learning paradigm based on Direct Preference Optimization (DPO) to mitigate catastrophic forgetting by aligning learning with pairwise preference signals. Then, we identify the limitations of conventional DPO in imbalanced data and present a new -DPO loss that explicitly addresses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
