MITRA: An AI Assistant for Knowledge Retrieval in Physics Collaborations
Abhishikth Mallampalli, Sridhara Dasu

TL;DR
MITRA is a privacy-preserving AI system that enhances knowledge retrieval in large physics collaborations by combining document retrieval, OCR, and LLMs, significantly improving query accuracy within complex internal documentation.
Contribution
This work introduces a novel on-premise Retrieval-Augmented Generation system with a two-tiered vector database architecture tailored for physics collaboration documentation.
Findings
MITRA outperforms keyword-based retrieval baselines on realistic queries.
The system maintains data privacy by hosting all components on-premise.
The two-tiered database effectively disambiguates analyses based on abstracts and full documents.
Abstract
Large-scale scientific collaborations, such as the Compact Muon Solenoid (CMS) at CERN, produce a vast and ever-growing corpus of internal documentation. Navigating this complex information landscape presents a significant challenge for both new and experienced researchers, hindering knowledge sharing and slowing down the pace of scientific discovery. To address this, we present a prototype of MITRA, a Retrieval-Augmented Generation (RAG) based system, designed to answer specific, context-aware questions about physics analyses. MITRA employs a novel, automated pipeline using Selenium for document retrieval from internal databases and Optical Character Recognition (OCR) with layout parsing for high-fidelity text extraction. Crucially, MITRA's entire framework, from the embedding model to the Large Language Model (LLM), is hosted on-premise, ensuring that sensitive collaboration data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Machine Learning in Materials Science · Handwritten Text Recognition Techniques
