LLMSR@XLLM25: An Empirical Study of LLM for Structural Reasoning
Xinye Li, Mingqi Wan, Dianbo Sui

TL;DR
This paper evaluates the ability of Meta-Llama-3-8B-Instruct to perform detailed, controllable, and interpretable structural reasoning tasks without fine-tuning, achieving competitive results in a shared challenge.
Contribution
The study demonstrates that a simple few-shot prompt with minimal post-processing enables effective structural reasoning with LLMs without additional training or complex pipelines.
Findings
Achieved macro F1 scores comparable to more complex methods.
Ranked 5th in the shared task without fine-tuning or retrieval.
Method is lightweight and resource-efficient.
Abstract
We present Team asdfo123's submission to the LLMSR@XLLM25 shared task, which evaluates large language models on producing fine-grained, controllable, and interpretable reasoning processes. Systems must extract all problem conditions, decompose a chain of thought into statement-evidence pairs, and verify the logical validity of each pair. Leveraging only the off-the-shelf Meta-Llama-3-8B-Instruct, we craft a concise few-shot, multi-turn prompt that first enumerates all conditions and then guides the model to label, cite, and adjudicate every reasoning step. A lightweight post-processor based on regular expressions normalises spans and enforces the official JSON schema. Without fine-tuning, external retrieval, or ensembling, our method ranks 5th overall, achieving macro F1 scores on par with substantially more complex and resource-consuming pipelines. We conclude by analysing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
