Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems

Oier Ijurco; Oier Lopez de Lacalle

arXiv:2604.27850·cs.CL·May 1, 2026

Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems

Oier Ijurco, Oier Lopez de Lacalle

PDF

TL;DR

This paper introduces a reasoning-based approach using large language models to improve coreference resolution in task-based dialogue systems, especially in visually grounded environments, by leveraging detailed object metadata and dialogue history.

Contribution

It presents a novel test-time reasoning method with LLMs that enhances cross-domain generalization and outperforms supervised models in coreference resolution for dialogue systems.

Findings

01

LLMs can generate effective step-by-step reasoning for coreference resolution.

02

Test-time reasoning improves accuracy in unseen scenarios and with novel objects.

03

Structured metadata and prompt engineering are key to robustness and generalization.

Abstract

Task-based dialogue systems assist users in achieving specific goals, such as executing actions or retrieving information, through natural language interactions. Accurate coreference resolution is essential, as it involves identifying object references within the dialogue - a task that becomes increasingly challenging in visually grounded environments characterized by complex scenes and diverse object metadata. However, coreference resolution in task-based dialogue remains limited by poor generalization across domains and heavy reliance on supervised models that often overfit to dataset-specific artifacts. In this work, we propose a unimodal test-time reasoning approach that enables large language models (LLMs) to reason over detailed object metadata and dialogue history to improve coreference resolution. Empirical results on the SIMMC 2.1 dataset demonstrate that LLMs can generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.