A Mixed-Methods Evaluation of LLM-Based Chatbots for Menopause
Roshini Deva, Manvi S, Jasmine Zhou, Elizabeth Britton Chahine, Agena, Davenport-Nicholson, Nadi Nina Kaonga, Selen Bozkurt, and Azra Ismail

TL;DR
This study evaluates the performance of LLM-based chatbots in addressing menopause-related health questions, highlighting their potential and limitations, and advocates for specialized evaluation frameworks to ensure safety and reliability in healthcare.
Contribution
It provides a comprehensive mixed-methods assessment of LLM chatbots for menopause, emphasizing the need for tailored, ethically grounded evaluation methods in healthcare applications.
Findings
Traditional metrics have limitations for sensitive health topics
LLMs show promise but also significant limitations in accuracy and safety
Customized evaluation frameworks are necessary for healthcare use
Abstract
The integration of Large Language Models (LLMs) into healthcare settings has gained significant attention, particularly for question-answering tasks. Given the high-stakes nature of healthcare, it is essential to ensure that LLM-generated content is accurate and reliable to prevent adverse outcomes. However, the development of robust evaluation metrics and methodologies remains a matter of much debate. We examine the performance of publicly available LLM-based chatbots for menopause-related queries, using a mixed-methods approach to evaluate safety, consensus, objectivity, reproducibility, and explainability. Our findings highlight the promise and limitations of traditional evaluation metrics for sensitive health topics. We propose the need for customized and ethically grounded evaluation frameworks to assess LLMs to advance safe and effective use in healthcare.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Health and mHealth Applications
