From Prompt Optimization to Multi-Dimensional Credibility Evaluation: Enhancing Trustworthiness of Chinese LLM-Generated Liver MRI Reports

Qiuli Wang; Jie Chen; Yongxu Liu; Xingpeng Zhang; Xiaoming Li; Wei Chen

arXiv:2510.23008·cs.AI·October 29, 2025

From Prompt Optimization to Multi-Dimensional Credibility Evaluation: Enhancing Trustworthiness of Chinese LLM-Generated Liver MRI Reports

Qiuli Wang, Jie Chen, Yongxu Liu, Xingpeng Zhang, Xiaoming Li, Wei Chen

PDF

TL;DR

This paper introduces a Multi-Dimensional Credibility Assessment framework to evaluate and improve the trustworthiness of Chinese LLM-generated liver MRI reports, addressing prompt optimization and standardized assessment.

Contribution

It presents a novel framework for assessing LLM report credibility and offers guidance for prompt optimization in clinical radiology contexts.

Findings

01

The framework effectively evaluates multiple LLMs' report quality.

02

Institution-specific prompt optimization improves report trustworthiness.

03

The study compares several advanced LLMs using the proposed assessment method.

Abstract

Large language models (LLMs) have demonstrated promising performance in generating diagnostic conclusions from imaging findings, thereby supporting radiology reporting, trainee education, and quality control. However, systematic guidance on how to optimize prompt design across different clinical contexts remains underexplored. Moreover, a comprehensive and standardized framework for assessing the trustworthiness of LLM-generated radiology reports is yet to be established. This study aims to enhance the trustworthiness of LLM-generated liver MRI reports by introducing a Multi-Dimensional Credibility Assessment (MDCA) framework and providing guidance on institution-specific prompt optimization. The proposed framework is applied to evaluate and compare the performance of several advanced LLMs, including Kimi-K2-Instruct-0905, Qwen3-235B-A22B-Instruct-2507, DeepSeek-V3, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.