LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval
Joohyung Yun, Doyup Lee, and Wook-Shin Han

TL;DR
LILaC is a novel multimodal retrieval framework that uses layered component graphs and late interaction for efficient, multihop reasoning across diverse document elements, achieving state-of-the-art results without extra fine-tuning.
Contribution
The paper introduces a layered component graph and a late-interaction subgraph retrieval method for improved multimodal document retrieval.
Findings
Achieves state-of-the-art performance on five benchmarks
Operates effectively without additional fine-tuning
Efficiently captures semantic relationships across multimodal components
Abstract
Multimodal document retrieval aims to retrieve query-relevant components from documents composed of textual, tabular, and visual elements. An effective multimodal retriever needs to handle two main challenges: (1) mitigate the effect of irrelevant contents caused by fixed, single-granular retrieval units, and (2) support multihop reasoning by effectively capturing semantic relationships among components within and across documents. To address these challenges, we propose LILaC, a multimodal retrieval framework featuring two core innovations. First, we introduce a layered component graph, explicitly representing multimodal information at two layers - each representing coarse and fine granularity - facilitating efficient yet precise reasoning. Second, we develop a late-interaction-based subgraph retrieval method, an edge-based approach that initially identifies coarse-grained nodes for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Information Retrieval and Search Behavior · Advanced Graph Neural Networks
