Evaluating Chat Generative Pretrained Transformer (GPT-4o) Problem-Solving Performance in the Japan Certificate Examination for Biomedical Engineering Class 1
Kai Ishida

TL;DR
This study tested ChatGPT's ability to solve biomedical engineering exam questions and found it performed below human-level accuracy, especially in problem-solving and understanding complex concepts.
Contribution
The novel contribution is evaluating ChatGPT's performance on a specific biomedical engineering certification exam, revealing its limitations in accuracy and knowledge depth.
Findings
ChatGPT scored below 70% on fundamental and applied knowledge questions.
Over 80% of incorrect answers were due to lack of knowledge or hallucinations.
Performance met passing criteria only in one of three exams tested.
Abstract
Introduction Chat generative pretrained transformer (ChatGPT; OpenAI, San Francisco, CA) has developed rapidly and is used in various fields, including medical engineering. Japan’s Certificate Examination for Biomedical Engineering class 1 (CEBM1) is responsible for the assessment of comprehensive specialized knowledge and skills centered on the maintenance and safety management of medical devices, systems, and related equipment. This study evaluated the performance of ChatGPT (GPT-4o) on CEBM1 for comparison to human-level expectations. Methods We targeted 171 questions including testing for knowledge with fundamental, applied, and problem-solving abilities from the 26th to 28th CEBM1s. We inputted the Japanese version of questions to ChatGPT (GPT-4o), and evaluated performance based on question difficulty. No prompt optimizations were used. We compared the responses provided by…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Biomedical and Engineering Education · AI in Service Interactions
