SAIL: Search-Augmented Instruction Learning
Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim,, Xixin Wu, Danny Fox, Helen Meng, James Glass

TL;DR
SAIL enhances large language models by integrating search engine results into instruction tuning, improving transparency, factual accuracy, and the ability to filter relevant information for better task performance.
Contribution
This work introduces a novel search-augmented instruction learning framework that grounds language models on search results, improving transparency and factual correctness.
Findings
SAIL-7B outperforms baseline models on open-ended QA tasks
The model effectively filters and reasons over search results
Improves transparency and factual accuracy in language generation
Abstract
Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information. In this work, we propose search-augmented instruction learning (SAIL), which grounds the language generation and instruction following abilities on complex search results generated by in-house and external search engines. With an instruction tuning corpus, we collect search results for each training case from different search APIs and domains, and construct a new search-grounded training set containing \textit{(instruction, grounding information, response)} triplets. We then fine-tune the LLaMA-7B model on the constructed training set. Since the collected results contain unrelated and disputing languages, the model needs to learn to ground on trustworthy search results, filter out distracting passages, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
