The promise and limits of LLMs in constructing proofs and hints for logic problems in intelligent tutoring systems

Sutapa Dey Tithi; Arun Kumar Ramesh; Clara DiMarco; Xiaoyi Tian; Nazia Alam; Kimia Fazeli; Tiffany Barnes

arXiv:2505.04736·cs.AI·November 24, 2025

The promise and limits of LLMs in constructing proofs and hints for logic problems in intelligent tutoring systems

Sutapa Dey Tithi, Arun Kumar Ramesh, Clara DiMarco, Xiaoyi Tian, Nazia Alam, Kimia Fazeli, Tiffany Barnes

PDF

Open Access

TL;DR

This paper evaluates the use of large language models for generating proofs and hints in logic tutoring systems, highlighting their potential and current limitations in accuracy and pedagogical quality.

Contribution

It introduces a comparative analysis of prompting techniques for LLMs in proof construction and demonstrates their application in generating student hints with promising accuracy.

Findings

01

DeepSeek-V3 achieved up to 86.7% accuracy in proof stepwise construction.

02

LLM-generated hints were 75% accurate and rated highly by humans.

03

LLMs need modifications for better pedagogical appropriateness.

Abstract

Intelligent tutoring systems have demonstrated effectiveness in teaching formal propositional logic proofs, but their reliance on template-based explanations limits their ability to provide personalized student feedback. While large language models (LLMs) offer promising capabilities for dynamic feedback generation, they risk producing hallucinations or pedagogically unsound explanations. We evaluated the stepwise accuracy of LLMs in constructing multi-step symbolic logic proofs, comparing six prompting techniques across four state-of-the-art LLMs on 358 propositional logic problems. Results show that DeepSeek-V3 achieved superior performance up to 86.7% accuracy on stepwise proof construction and excelled particularly in simpler rules. We further used the best-performing LLM to generate explanatory hints for 1,050 unique student problem-solving states from a logic ITS and evaluated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)

MethodsHierarchical Information Threading