Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment
Chengfeng Dou, Ying Zhang, Zhi Jin, Wenpin Jiao, Haiyan Zhao,, Yongqiang Zhao, Zhengwei Tao

TL;DR
This paper proposes a novel agent-based annotation method using LLMs and Constitutional AI to improve medical dialogue data labeling, reducing expert reliance and enhancing model performance in healthcare applications.
Contribution
It introduces an innovative agent-based annotation approach leveraging Constitutional AI and flowcharts, addressing evaluation challenges and outperforming existing methods.
Findings
Agent-based approach outperforms existing RLAIF methods
Framework effectively assesses LLMs in medical dialogue tasks
Flowcharts are particularly effective for expressing physician preferences
Abstract
This research examines the use of Reinforcement Learning from AI Feedback (RLAIF) techniques to improve healthcare dialogue models, with the aim of tackling the challenges of preference-aligned data annotation while reducing the reliance on medical experts. We argue that the primary challenges in current RLAIF research for healthcare are the limitations of automated evaluation methods and the difficulties in accurately representing physician preferences. To address these challenges, we present a new evaluation framework based on standardized patient examinations. This framework is designed to objectively assess the effectiveness of large language models (LLMs) in guiding users and following instructions, enabling a comprehensive comparison across different models. Furthermore, our investigation of effective ways to express physician preferences using Constitutional AI algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Semantic Web and Ontologies
MethodsReinforcement Learning from AI Feedback
