Data Processing Techniques for Modern Multimodal Models
Yinheng Li, Han Ding, Hang Chen

TL;DR
This paper reviews data processing techniques for multimodal models, focusing on diffusion models and MLLMs, categorizing methods into four areas to guide developers in effective data handling.
Contribution
It provides a comprehensive categorization and analysis of data processing techniques specifically tailored for modern multimodal models.
Findings
Guidance on selecting data processing methods for different model types
Summary of techniques across data quality, quantity, distribution, and safety
Insights into best practices for multimodal model training
Abstract
Data processing plays an significant role in current multimodal model training. In this paper. we provide an comprehensive review of common data processing techniques used in modern multimodal model training with a focus on diffusion models and multimodal large language models (MLLMs). We summarized all techniques into four categories: data quality, data quantity, data distribution and data safety. We further present our findings in the choice of data process methods in different type of models. This study aims to provide guidance to multimodal models developers with effective data processing techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications
MethodsDiffusion · Focus
