ChatGPT Inaccuracy Mitigation during Technical Report Understanding: Are We There Yet?
Salma Begum Tamanna, Gias Uddin, Song Wang, Lan Xia, Longyu Zhang

TL;DR
This paper investigates ChatGPT's inaccuracies in understanding technical reports, introduces CHIME to improve response correctness by preprocessing and validation, and demonstrates significant accuracy gains and user-perceived usefulness.
Contribution
The paper presents CHIME, a novel framework that preprocesses technical reports and guides ChatGPT to reduce hallucinations in technical contexts.
Findings
ChatGPT with RAG achieves 36.4% correctness on technical Q&A.
CHIME improves response correctness by 30.3%.
Users find CHIME-enhanced responses more useful.
Abstract
Hallucinations, the tendency to produce irrelevant/incorrect responses, are prevalent concerns in generative AI-based tools like ChatGPT. Although hallucinations in ChatGPT are studied for textual responses, it is unknown how ChatGPT hallucinates for technical texts that contain both textual and technical terms. We surveyed 47 software engineers and produced a benchmark of 412 Q&A pairs from the bug reports of two OSS projects. We find that a RAG-based ChatGPT (i.e., ChatGPT tuned with the benchmark issue reports) is 36.4% correct when producing answers to the questions, due to two reasons 1) limitations to understand complex technical contents in code snippets like stack traces, and 2) limitations to integrate contexts denoted in the technical terms and texts. We present CHIME (ChatGPT Inaccuracy Mitigation Engine) whose underlying principle is that if we can preprocess the technical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
