NYT-Connections: A Deceptively Simple Text Classification Task that Stumps System-1 Thinkers
Angel Yahir Loredo Lopez, Tyler McDonald, and Ali Emami

TL;DR
NYT-Connections is a new benchmark of simple word puzzles designed to challenge LLMs' reasoning skills beyond quick intuition, revealing significant performance gaps compared to humans and highlighting the limits of advanced prompting techniques.
Contribution
The paper introduces NYT-Connections, a novel reasoning benchmark that isolates fundamental skills and evaluates LLMs against humans across multiple configurations.
Findings
LLMs lag behind humans by nearly 30% on the benchmark
Advanced prompting techniques show limited gains as task difficulty increases
The benchmark resists intuitive shortcuts and is regularly updated to prevent data leakage
Abstract
Large Language Models (LLMs) have shown impressive performance on various benchmarks, yet their ability to engage in deliberate reasoning remains questionable. We present NYT-Connections, a collection of 358 simple word classification puzzles derived from the New York Times Connections game. This benchmark is designed to penalize quick, intuitive "System 1" thinking, isolating fundamental reasoning skills. We evaluated six recent LLMs, a simple machine learning heuristic, and humans across three configurations: single-attempt, multiple attempts without hints, and multiple attempts with contextual hints. Our findings reveal a significant performance gap: even top-performing LLMs like GPT-4 fall short of human performance by nearly 30%. Notably, advanced prompting techniques such as Chain-of-Thought and Self-Consistency show diminishing returns as task difficulty increases.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Biomedical Text Mining and Ontologies · Topic Modeling
MethodsAttention Is All You Need · Adam · Position-Wise Feed-Forward Layer · Linear Layer · Softmax · Multi-Head Attention · Byte Pair Encoding · Label Smoothing · Dropout · Dense Connections
