Small Language Models Offer Significant Potential for Science Community
Jian Zhang

TL;DR
This paper presents a framework using small language models (MiniLMs) for efficient, accurate, and cost-effective information retrieval from extensive geoscience literature, addressing limitations of large models.
Contribution
The paper introduces a novel approach employing MiniLMs for semantic search and analysis in geoscience literature, demonstrating advantages over large language models in accuracy and efficiency.
Findings
MiniLMs effectively retrieve expert-verified geoscience information
Semantic search with MiniLMs outperforms generalized LLM responses
Sentiment and clustering analyses reveal research trends and evolution
Abstract
Recent advancements in natural language processing, particularly with large language models (LLMs), are transforming how scientists engage with the literature. While the adoption of LLMs is increasing, concerns remain regarding potential information biases and computational costs. Rather than LLMs, I developed a framework to evaluate the feasibility of precise, rapid, and cost-effective information retrieval from extensive geoscience literature using freely available small language models (MiniLMs). A curated corpus of approximately 77 million high-quality sentences, extracted from 95 leading peer-reviewed geoscience journals such as Geophysical Research Letters and Earth and Planetary Science Letters published during years 2000 to 2024, was constructed. MiniLMs enable a computationally efficient approach for extracting relevant domain-specific information from these corpora through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping · Artificial Intelligence in Healthcare and Education · Computational Physics and Python Applications
