Recognizing Limits: Investigating Infeasibility in Large Language Models

Wenbo Zhang; Zihang Xu; Hengrui Cai

arXiv:2408.05873·cs.CL·August 27, 2025

Recognizing Limits: Investigating Infeasibility in Large Language Models

Wenbo Zhang, Zihang Xu, Hengrui Cai

PDF

Open Access 1 Repo

TL;DR

This paper investigates how large language models can recognize and refuse tasks that are beyond their capabilities, proposing a new dataset and fine-tuning methods to improve their infeasibility detection.

Contribution

It introduces a novel categorization of infeasible tasks, a benchmark dataset for evaluation, and demonstrates the effectiveness of fine-tuning to enhance refusal capabilities.

Findings

01

Fine-tuned models better recognize infeasible tasks.

02

Benchmark dataset effectively evaluates refusal abilities.

03

Categorization aids in understanding LLM limitations.

Abstract

Large language models (LLMs) have shown remarkable performance in various tasks but often fail to handle queries that exceed their knowledge and capabilities, leading to incorrect or fabricated responses. This paper addresses the need for LLMs to recognize and refuse infeasible tasks due to the requests surpassing their capabilities. We conceptualize four main categories of infeasible tasks for LLMs, which cover a broad spectrum of hallucination-related challenges identified in prior literature. We develop and benchmark a new dataset comprising diverse infeasible and feasible tasks to evaluate multiple LLMs' abilities to decline infeasible tasks. Furthermore, we explore the potential of increasing LLMs' refusal capabilities with fine-tuning. Our experiments validate the effectiveness of the trained models, suggesting promising directions for improving the performance of LLMs in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zihang-xu-2002/infeasible-benchmark
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques