Domain-specific ChatBots for Science using Embeddings
Kevin G. Yager

TL;DR
This paper presents a method for creating domain-specific scientific chatbots by integrating text and image embeddings with large language models to improve research support in physical sciences.
Contribution
It introduces a practical approach to adapt LLMs for scientific domains using embeddings and existing tools, enhancing their accuracy and relevance for researchers.
Findings
Effective use of text embeddings for scientific document retrieval
Image embeddings enable search across publication figures
Demonstrated potential for accelerating physical science research
Abstract
Large language models (LLMs) have emerged as powerful machine-learning systems capable of handling a myriad of tasks. Tuned versions of these systems have been turned into chatbots that can respond to user queries on a vast diversity of topics, providing informative and creative replies. However, their application to physical science research remains limited owing to their incomplete knowledge in these areas, contrasted with the needs of rigor and sourcing in science domains. Here, we demonstrate how existing methods and software tools can be easily combined to yield a domain-specific chatbot. The system ingests scientific documents in existing formats, and uses text embedding lookup to provide the LLM with domain-specific contextual information when composing its reply. We similarly demonstrate that existing image embedding methods can be used for search and retrieval across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Artificial Intelligence in Healthcare and Education
