Domain-specific ChatBots for Science using Embeddings

Kevin G. Yager

arXiv:2306.10067·cs.CL·August 16, 2024·1 cites

Domain-specific ChatBots for Science using Embeddings

Kevin G. Yager

PDF

Open Access 1 Repo

TL;DR

This paper presents a method for creating domain-specific scientific chatbots by integrating text and image embeddings with large language models to improve research support in physical sciences.

Contribution

It introduces a practical approach to adapt LLMs for scientific domains using embeddings and existing tools, enhancing their accuracy and relevance for researchers.

Findings

01

Effective use of text embeddings for scientific document retrieval

02

Image embeddings enable search across publication figures

03

Demonstrated potential for accelerating physical science research

Abstract

Large language models (LLMs) have emerged as powerful machine-learning systems capable of handling a myriad of tasks. Tuned versions of these systems have been turned into chatbots that can respond to user queries on a vast diversity of topics, providing informative and creative replies. However, their application to physical science research remains limited owing to their incomplete knowledge in these areas, contrasted with the needs of rigor and sourcing in science domains. Here, we demonstrate how existing methods and software tools can be easily combined to yield a domain-specific chatbot. The system ingests scientific documents in existing formats, and uses text embedding lookup to provide the LLM with domain-specific contextual information when composing its reply. We similarly demonstrate that existing image embedding methods can be used for search and retrieval across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cfn-softbio/scibot
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Artificial Intelligence in Healthcare and Education