The Human Factor in Detecting Errors of Large Language Models: A Systematic Literature Review and Future Research Directions
Christian A. Schiller

TL;DR
This paper systematically reviews how human factors influence the ability to detect errors in Large Language Models like ChatGPT, emphasizing the importance of human oversight in ensuring accuracy in critical applications.
Contribution
It provides a comprehensive synthesis of existing research on human error detection in LLM outputs and outlines future research directions in this domain.
Findings
Human factors significantly impact error detection in LLM outputs
Training and deployment strategies can enhance user error detection capabilities
Identifies gaps and future research needs in human-LLM interaction
Abstract
The launch of ChatGPT by OpenAI in November 2022 marked a pivotal moment for Artificial Intelligence, introducing Large Language Models (LLMs) to the mainstream and setting new records in user adoption. LLMs, particularly ChatGPT, trained on extensive internet data, demonstrate remarkable conversational capabilities across various domains, suggesting a significant impact on the workforce. However, these models are susceptible to errors - "hallucinations" and omissions, generating incorrect or incomplete information. This poses risks especially in contexts where accuracy is crucial, such as legal compliance, medicine or fine-grained process frameworks. There are both technical and human solutions to cope with this isse. This paper explores the human factors that enable users to detect errors in LLM outputs, a critical component in mitigating risks associated with their use in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
