Evaluating Search Engines and Large Language Models for Answering Health   Questions

Marcos Fern\'andez-Pichel; Juan C. Pichel; David E. Losada

arXiv:2407.12468·cs.IR·March 7, 2025·3 cites

Evaluating Search Engines and Large Language Models for Answering Health Questions

Marcos Fern\'andez-Pichel, Juan C. Pichel, David E. Losada

PDF

Open Access 1 Repo

TL;DR

This study compares search engines, large language models, and retrieval-augmented methods in answering health questions, revealing LLMs outperform search engines with higher accuracy, especially when combined with retrieval techniques.

Contribution

It provides a comprehensive comparison of SEs, LLMs, and RAG methods for health question answering, highlighting the effectiveness of retrieval-augmented LLMs.

Findings

01

SEs answer 50-70% of questions correctly

02

LLMs answer about 80% correctly

03

RAG improves small LLMs' accuracy by up to 30%

Abstract

Search engines (SEs) have traditionally been primary tools for information seeking, but the new Large Language Models (LLMs) are emerging as powerful alternatives, particularly for question-answering tasks. This study compares the performance of four popular SEs, seven LLMs, and retrieval-augmented (RAG) variants in answering 150 health-related questions from the TREC Health Misinformation (HM) Track. Results reveal SEs correctly answer between 50 and 70% of questions, often hindered by many retrieval results not responding to the health question. LLMs deliver higher accuracy, correctly answering about 80% of questions, though their performance is sensitive to input prompts. RAG methods significantly enhance smaller LLMs' effectiveness, improving accuracy by up to 30% by integrating retrieval evidence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marcosfp97/llm-binary-health-qa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Expert finding and Q&A systems · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Attention Dropout · Linear Warmup With Linear Decay · Residual Connection · Adam · Dropout · Byte Pair Encoding · Layer Normalization · Linear Layer