Risk-graded Safety for Handling Medical Queries in Conversational AI
Gavin Abercrombie, Verena Rieser

TL;DR
This paper introduces a dataset and analysis for assessing the risk levels of medical queries handled by conversational AI, emphasizing the importance of accurate seriousness detection to prevent unsafe responses.
Contribution
It presents a new corpus of medical queries with risk annotations and evaluates automated classification methods for identifying query seriousness and response risk levels.
Findings
Crowdsourced labels align well with expert opinions on medical query seriousness.
Automated classification can identify risk levels but requires caution due to potential errors.
Errors in risk assessment can have serious consequences, highlighting the need for careful system design.
Abstract
Conversational AI systems can engage in unsafe behaviour when handling users' medical queries that can have severe consequences and could even lead to deaths. Systems therefore need to be capable of both recognising the seriousness of medical inputs and producing responses with appropriate levels of risk. We create a corpus of human written English language medical queries and the responses of different types of systems. We label these with both crowdsourced and expert annotations. While individual crowdworkers may be unreliable at grading the seriousness of the prompts, their aggregated labels tend to agree with professional opinion to a greater extent on identifying the medical queries and recognising the risk types posed by the responses. Results of classification experiments suggest that, while these tasks can be automated, caution should be exercised, as errors can potentially be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · AI in Service Interactions · Artificial Intelligence in Healthcare and Education
