Perplexed: Understanding When Large Language Models are Confused

Nathan Cooper; Torsten Scholak

arXiv:2404.06634·cs.SE·April 11, 2024·1 cites

Perplexed: Understanding When Large Language Models are Confused

Nathan Cooper, Torsten Scholak

PDF

Open Access

TL;DR

This paper introduces 'perplexed', a library for analyzing where large language models are confused, demonstrated through a case study on code generation models to identify their strengths and weaknesses.

Contribution

The paper presents a novel library and analysis framework for understanding LLM confusion, applied specifically to code generation models, with open-sourced tools for the research community.

Findings

01

Models perform worse on syntactically incorrect code.

02

Internal method invocation predictions are less accurate than external ones.

03

Tools enable detailed analysis of LLMs' success and failure cases.

Abstract

Large Language Models (LLMs) have become dominant in the Natural Language Processing (NLP) field causing a huge surge in progress in a short amount of time. However, their limitations are still a mystery and have primarily been explored through tailored datasets to analyze a specific human-level skill such as negation, name resolution, etc. In this paper, we introduce perplexed, a library for exploring where a particular language model is perplexed. To show the flexibility and types of insights that can be gained by perplexed, we conducted a case study focused on LLMs for code generation using an additional tool we built to help with the analysis of code models called codetokenizer. Specifically, we explore success and failure cases at the token level of code LLMs under different scenarios pertaining to the type of coding structure the model is predicting, e.g., a variable name or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research