To Reason or Not to: Selective Chain-of-Thought in Medical Question Answering
Zaifu Zhan, Min Zeng, Shuang Zhou, Yiran Song, Xiaoyi Chen, Yu Hou, Yifan Wu, Yang Ruan, Rui Zhang

TL;DR
This paper introduces Selective Chain-of-Thought, an inference strategy for medical question answering with large language models that reduces reasoning overhead by selectively generating rationales, maintaining accuracy while improving efficiency.
Contribution
It proposes a novel Selective CoT method that predicts when reasoning is needed, significantly reducing inference time and token usage without sacrificing accuracy.
Findings
Reduced inference time by up to 45%
Lowered token usage by up to 47%
Maintained or improved accuracy in some cases
Abstract
Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining accuracy. Methods: We propose Selective Chain-of-Thought (Selective CoT), an inference-time strategy that first predicts whether a question requires reasoning and generates a rationale only when needed. Two open-source LLMs (Llama-3.1-8B and Qwen-2.5-7B) were evaluated on four biomedical QA benchmarks-HeadQA, MedQA-USMLE, MedMCQA, and PubMedQA. Metrics included accuracy, total generated tokens, and inference time. Results: Selective CoT reduced inference time by 13-45% and token usage by 8-47% with minimal accuracy loss (4\%). In some model-task pairs, it achieved both higher accuracy and greater efficiency than standard CoT. Compared with fixed-length CoT, Selective CoT reached similar or superior accuracy at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
