ArcBERT: An LLM-based Search Engine for Exploring Integrated Multi-Omics Metadata
Gajendra Doniparthi, Shashank Balu Pandhare, Stefan De{\ss}loch, Timo M\"uhlhaus

TL;DR
ArcBERT is a novel LLM-based search engine that enables natural language querying and semantic understanding for exploring integrated multi-omics metadata in research data management systems.
Contribution
It introduces ArcBERT, a system that leverages domain-specific LLMs for natural language search and structural understanding of complex metadata hierarchies.
Findings
Enables natural language queries for metadata exploration
Uses semantic matching for improved search accuracy
Handles diverse user query patterns effectively
Abstract
Traditional search applications within Research Data Management (RDM) ecosystems are crucial in helping users discover and explore the structured metadata from the research datasets. Typically, text search engines require users to submit keyword-based queries rather than using natural language. However, using Large Language Models (LLMs) trained on domain-specific content for specialized natural language processing (NLP) tasks is becoming increasingly common. We present ArcBERT, an LLM-based system designed for integrated metadata exploration. ArcBERT understands natural language queries and relies on semantic matching, unlike traditional search applications. Notably, ArcBERT also understands the structure and hierarchies within the metadata, enabling it to handle diverse user querying patterns effectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Research Data Management Practices · Semantic Web and Ontologies
