Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study
Martin J. Hetz, Nicolas Carl, Sarah Haggenm\"uller, Christoph Wies,, Maurice Stephan Michel, Frederik Wessels, Titus J. Brinker

TL;DR
This study introduces UroBot, an explainable, urology-specific large language model that integrates current guidelines, outperforming both existing models and urologists on board exam questions with high clinician-verifiability.
Contribution
The paper presents UroBot, a novel, explainable LLM tailored for urology that incorporates the latest guidelines and demonstrates superior performance and verifiability compared to prior models and clinicians.
Findings
UroBot-4o achieved an average RoCA of 88.4%.
UroBot outperformed urologists on board questions.
UroBot exhibited high clinician-verifiability with Fleiss' Kappa of 0.979.
Abstract
Large Language Models (LLMs) are revolutionizing medical Question-Answering (medQA) through extensive use of medical literature. However, their performance is often hampered by outdated training data and a lack of explainability, which limits clinical applicability. This study aimed to create and assess UroBot, a urology-specialized chatbot, by comparing it with state-of-the-art models and the performance of urologists on urological board questions, ensuring full clinician-verifiability. UroBot was developed using OpenAI's GPT-3.5, GPT-4, and GPT-4o models, employing retrieval-augmented generation (RAG) and the latest 2023 guidelines from the European Association of Urology (EAU). The evaluation included ten runs of 200 European Board of Urology (EBU) In-Service Assessment (ISA) questions, with performance assessed by the mean Rate of Correct Answers (RoCA). UroBot-4o achieved an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClinical practice guidelines implementation
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Softmax · {Dispute@FaQ-s}How to file a dispute with Expedia? · Layer Normalization · Weight Decay · Attention Dropout · Linear Layer
