Perplexed: Understanding When Large Language Models are Confused
Nathan Cooper, Torsten Scholak

TL;DR
This paper introduces 'perplexed', a library for analyzing where large language models are confused, demonstrated through a case study on code generation models to identify their strengths and weaknesses.
Contribution
The paper presents a novel library and analysis framework for understanding LLM confusion, applied specifically to code generation models, with open-sourced tools for the research community.
Findings
Models perform worse on syntactically incorrect code.
Internal method invocation predictions are less accurate than external ones.
Tools enable detailed analysis of LLMs' success and failure cases.
Abstract
Large Language Models (LLMs) have become dominant in the Natural Language Processing (NLP) field causing a huge surge in progress in a short amount of time. However, their limitations are still a mystery and have primarily been explored through tailored datasets to analyze a specific human-level skill such as negation, name resolution, etc. In this paper, we introduce perplexed, a library for exploring where a particular language model is perplexed. To show the flexibility and types of insights that can be gained by perplexed, we conducted a case study focused on LLMs for code generation using an additional tool we built to help with the analysis of code models called codetokenizer. Specifically, we explore success and failure cases at the token level of code LLMs under different scenarios pertaining to the type of coding structure the model is predicting, e.g., a variable name or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
