Ambiguity in LLMs is a concept missing problem

Zhibo Hu; Chen Wang; Yanfeng Shu; Hye-Young Paik; Liming Zhu

arXiv:2505.11679·cs.CL·October 2, 2025

Ambiguity in LLMs is a concept missing problem

Zhibo Hu, Chen Wang, Yanfeng Shu, Hye-Young Paik, Liming Zhu

PDF

4 Reviews

TL;DR

This paper addresses the challenge of ambiguity in natural language processing by proposing a novel method to detect and handle ambiguous questions in large language models, improving accuracy in structured data mapping tasks.

Contribution

It introduces a new distance measure based on a path kernel over concepts to identify ambiguity in text representations, and proposes a method for improving LLM performance on ambiguous queries.

Findings

01

Achieved state-of-the-art results in ambiguity detection.

02

Developed a new concept-based distance measure for ambiguity identification.

03

Enhanced LLM performance in agentic tool calling through missing concept prediction.

Abstract

Ambiguity in natural language is a significant obstacle for achieving accurate text to structured data mapping through large language models (LLMs), which affects the performance of tasks such as mapping text to agentic tool calling and text-to-SQL queries. Existing methods to ambiguity handling either rely on the ReACT framework to obtain correct mappings through trial and error, or on supervised fine-tuning to bias models toward specific tasks. In this paper, we adopt a different approach that characterizes representation differences of ambiguous text in the latent space and leverages these differences to identify ambiguity before mapping them to structured data. To detect sentence-level ambiguity, we focus on the relationship between ambiguous questions and their interpretations. Unlike distances calculated by dense embeddings, we introduce a new distance measure based on a path…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 4

Strengths

1. This paper posits that linguistic ambiguity stems from missing conceptual representations within the latent space of large language models (LLMs) and introduces a distance metric to improve interpretability while capturing specific semantic patterns. 2. This paper identifies systematic patterns that effectively differentiate ambiguous questions from unambiguous ones. 3. This paper further proposes a comprehensive framework aimed at enhancing LLM performance in managing ambiguous agentic tool-

Weaknesses

1. The evaluations are conducted solely on the AMBROS (text-to-query) and Gorilla (tool-calling) datasets, which raises two primary concerns: (1) this limited scope renders the study somewhat fragile and lacking in coherence, and (2) the absence of broader testing on additional QA tasks restricts the generalizability of the approach. 2. The evaluation is further weakened by the omission of key baseline comparisons. Specifically, methods referenced and critiqued in the Introduction section, such

Reviewer 02Rating 2Confidence 3

Strengths

1. This paper proposes a method for ambiguity detection, and show that there are benefits both in detecting ambiguity and augmenting agentic workflows with alternative interpretations of the user query.

Weaknesses

1. Overall, the clarity of the technical writing is weak, making it difficult to understand what was actually done. - The following two interpretations have the same flaw to me, which is that they are both ambiguous themselves. How come the first one is considered wrong and the second one is correct? "Show all gate agents and pilots who speak Spanish" and "Show all gate agents and pilots who are Spanish-speaking" - L. 176: How was the SAE trained? How is it decoding concepts in natural langua

Reviewer 03Rating 2Confidence 3

Strengths

- The paper presents an interesting and novel view that ambiguity in LLMs can be interpreted as a missing concept problem, linking ambiguity detection with model interpretability. - Integrating sparse autoencoders with path kernels is a creative idea that provides a new way to measure semantic differences beyond dense embeddings. - The proposed method achieves clear performance gains on AMBROSIA and Gorilla benchmarks, showing empirical value beyond conceptual novelty.

Weaknesses

1. Incomplete methodological exposition. Section 3.4 (“Predicting Missing Concepts to Mitigate Ambiguity”) is poorly explained. The paper does not specify how labeled data are obtained, what features are used as input to the concept predictor. The integration between the predictor and retrieval module (“union joint”) is vague, and the role of the path kernel in this stage is unclear. Figure 4 is also oversimplified, leaving the data flow between modules undefined and the overall pipeline difficu

Reviewer 04Rating 4Confidence 4

Strengths

- This work highlights the lack of research on the representational differences of ambiguous text and tackles this underexplored yet important problem, which I also find crucial. - The case studies presented in Section 3.1 are intuitive and help readers clearly understand the problem that this work aims to address. - The idea of using representations derived from SAEs for ambiguity detection is relatively novel and valuable, as it connects theoretical probing studies using SAEs with practical sc

Weaknesses

- First of all, I would like to mention that I basically like the overall idea of this work. However, given the current state of the draft, there are several aspects that need improvement to better clarify the paper’s contributions. - There is room for clarification and refinement in the notations and equations. Some notations appear to be directly adopted from previous works without sufficient caution. For example, in Eq. (2), what does $\mathbf{w}$ represent? I could not find a definition in t

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsADaptive gradient method with the OPTimal convergence rate