Measuring Representation Robustness in Large Language Models for Geometry

Vedant Jawandhia; Yash Sinha; Murari Mandal; Ankan Pal; Dhruv Kumar

arXiv:2604.16421·cs.CL·April 21, 2026

Measuring Representation Robustness in Large Language Models for Geometry

Vedant Jawandhia, Yash Sinha, Murari Mandal, Ankan Pal, Dhruv Kumar

PDF

1 Repo

TL;DR

This paper introduces GeoRepEval, a framework for evaluating the robustness of large language models in geometric reasoning across different problem representations, revealing significant accuracy gaps and potential for improvement.

Contribution

The paper presents a novel representation-aware evaluation framework and metrics, along with empirical findings on LLMs' sensitivity to geometric problem representations.

Findings

01

Accuracy gaps up to 14 percentage points due to representation choice.

02

Vector formulations are a consistent failure point with Invariance@3 as low as 0.044.

03

Convert-then-solve prompting improves vector accuracy significantly for high-capacity models.

Abstract

Large language models (LLMs) are increasingly evaluated on mathematical reasoning, yet their robustness to equivalent problem representations remains poorly understood. In geometry, identical problems can be expressed in Euclidean, coordinate, or vector forms, but existing benchmarks report accuracy on fixed formats, implicitly assuming representation invariance and masking failures caused by representational changes alone. We propose GeoRepEval, a representation-aware evaluation framework that measures correctness, invariance, and consistency at the problem level across parallel formulations, combining strict answer matching, bootstrap confidence intervals, paired McNemar tests, representation-flip analyses, and regression controls for surface complexity. We prove that our Invariance@3 metric decomposes accuracy into robust and fragile components and is bounded by the weakest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vedjaw/GeoRepEval
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.