How do Humans and LLMs Process Confusing Code?
Youssef Abdelsalam, Norman Peitek, Anna-Maria Maurer, Mariya Toneva, Sven Apel

TL;DR
This study compares human and LLM understanding of code, finding that LLM perplexity spikes align with human neurophysiological confusion signals, indicating similar confusion patterns.
Contribution
It introduces a novel method to compare human and LLM code comprehension using neurophysiological data and LLM perplexity, revealing shared confusion patterns.
Findings
LLM perplexity spikes correlate with EEG-based human confusion signals
Humans and LLMs are similarly confused by certain code regions
Proposed LLM-based method identifies regions of human confusion in code
Abstract
Already today, humans and programming assistants based on large language models (LLMs) collaborate in everyday programming tasks. Clearly, a misalignment between how LLMs and programmers comprehend code can lead to misunderstandings, inefficiencies, low code quality, and bugs. A key question in this space is whether humans and LLMs are confused by the same kind of code. This would not only guide our choices of integrating LLMs in software engineering workflows, but also inform about possible improvements of LLMs. To this end, we conducted an empirical study comparing an LLM to human programmers comprehending clean and confusing code. We operationalized comprehension for the LLM by using LLM perplexity, and for human programmers using neurophysiological responses (in particular, EEG-based fixation-related potentials). We found that LLM perplexity spikes correlate both in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
