Unifying Scientific Communication: Fine-Grained Correspondence Across Scientific Media

Megha Mariam K.M; Vineeth N. Balasubramanian; C.V. Jawahar

arXiv:2605.05831·cs.CV·May 12, 2026

Unifying Scientific Communication: Fine-Grained Correspondence Across Scientific Media

Megha Mariam K.M, Vineeth N. Balasubramanian, C.V. Jawahar

PDF

1 Repo

TL;DR

This paper introduces the Multimodal Conference Dataset (MCD), a benchmark for aligning and understanding scientific content across text, visuals, and speech, evaluating current models' capabilities and limitations.

Contribution

The paper presents the first benchmark integrating multiple scientific media and systematically evaluates models' ability to discover cross-format correspondences.

Findings

01

Vision-language models are robust but struggle with fine-grained alignment.

02

Embedding-based models capture text-visual correspondences well.

03

Equations and symbolic content form distinct clusters in embeddings.

Abstract

The communication of scientific knowledge has become increasingly multimodal, spanning text, visuals, and speech through materials such as research papers, slides, and recorded presentations. These different representations collectively convey a study's reasoning, results, and insights, offering complementary perspectives that enrich understanding. However, despite their shared purpose, such materials are rarely connected in a structured way. The absence of explicit links across formats makes it difficult to trace how concepts, visuals, and explanations correspond, limiting unified exploration and analysis of research content. To address this gap, we introduce the Multimodal Conference Dataset (MCD), the first benchmark that integrates research papers, presentation videos, explanatory videos, and slides from the same works. We evaluate a range of embedding-based and vision-language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

meghamariamkm2002/MCD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.