AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval
Tony Joseph, Carlos Pareja, David Lopes Pegna, and Abhishek Singh

TL;DR
AMES introduces a scalable, multimodal enterprise search architecture that enables cross-modal retrieval using late interaction, embedding diverse data types into a shared space without redesigning existing systems.
Contribution
The paper presents a unified, backend-agnostic multimodal retrieval architecture that integrates fine-grained late interaction retrieval into enterprise search engines without major redesigns.
Findings
Achieves competitive ranking performance on ViDoRe V3 benchmark.
Supports cross-modal retrieval with shared embedding space.
Operates efficiently within a production-grade Solr system.
Abstract
We present AMES (Approximate Multimodal Enterprise Search), a unified multimodal late interaction retrieval architecture which is backend agnostic. AMES demonstrates that fine-grained multimodal late interaction retrieval can be deployed within a production grade enterprise search engine without architectural redesign. Text tokens, image patches, and video frames are embedded into a shared representation space using multi-vector encoders, enabling cross-modal retrieval without modality specific retrieval logic. AMES employs a two-stage pipeline: parallel token level ANN search with per document Top-M MaxSim approximation, followed by accelerator optimized Exact MaxSim re-ranking. Experiments on the ViDoRe V3 benchmark show that AMES achieves competitive ranking performance within a scalable, production ready Solr based system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
