AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

Nikolas Karafyllis; Maria Lymperaiou; Giorgos Filandrianos; Athanasios Voulodimos; Giorgos Stamou

arXiv:2603.04319·cs.CL·March 5, 2026

AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

Nikolas Karafyllis, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos Stamou

PDF

Open Access

TL;DR

This paper introduces a three-stage system combining graph retrieval, large language model reasoning with optimized prompts, and consistency checks, achieving top accuracy in abductive event reasoning at SemEval 2026.

Contribution

It presents a novel multi-stage approach integrating graph-based retrieval and reflective prompt evolution for improved abductive reasoning performance.

Findings

01

Achieved 0.95 accuracy, ranking first in the challenge.

02

Identified shared inductive biases across models affecting reasoning.

03

Systematic failure modes in causal reasoning were analyzed.

Abstract

We present a winning three-stage system for SemEval 2026 Task~12: Abductive Event Reasoning that combines graph-based retrieval, LLM-driven abductive reasoning with prompt design optimized through reflective prompt evolution, and post-hoc consistency enforcement; our system ranks first on the evaluation-phase leaderboard with an accuracy score of 0.95. Cross-model error analysis across 14 models (7~families) reveals three shared inductive biases: causal chain incompleteness, proximate cause preference, and salience bias, whose cross-family convergence (51\% cause-count reduction) indicates systematic rather than model-specific failure modes in multi-label causal reasoning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Multimodal Machine Learning Applications