Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy
Priyaranjan Pattnayak, Hitesh Laxmichand Patel, Bhargava Kumar, Amit, Agarwal, Ishan Banerjee, Srikant Panda, Tejaswini Kumar

TL;DR
This survey reviews large multimodal datasets, their application categories, and taxonomy, highlighting their importance in training and evaluating multimodal language models for diverse AI tasks.
Contribution
It provides a comprehensive overview of existing datasets, application areas, and a taxonomy, emphasizing their role in advancing multimodal AI research.
Findings
Large-scale datasets are crucial for training effective multimodal models.
Benchmark datasets enable performance assessment across diverse scenarios.
Multimodal learning is rapidly evolving with ongoing dataset development.
Abstract
Multimodal learning, a rapidly evolving field in artificial intelligence, seeks to construct more versatile and robust systems by integrating and analyzing diverse types of data, including text, images, audio, and video. Inspired by the human ability to assimilate information through many senses, this method enables applications such as text-to-video conversion, visual question answering, and image captioning. Recent developments in datasets that support multimodal language models (MLLMs) are highlighted in this overview. Large-scale multimodal datasets are essential because they allow for thorough testing and training of these models. With an emphasis on their contributions to the discipline, the study examines a variety of datasets, including those for training, domain-specific tasks, and real-world applications. It also emphasizes how crucial benchmark datasets are for assessing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications
