On Semantic Loss Fine-Tuning Approach for Preventing Model Collapse in Causal Reasoning
Pratik Deshmukh, Atirek Gupta

TL;DR
The paper introduces a semantic loss function with graph-based constraints to prevent catastrophic collapse in transformer models fine-tuned for causal reasoning, significantly improving their stability and accuracy.
Contribution
It proposes a novel semantic loss approach with dynamic scheduling that stabilizes fine-tuning, enabling models to perform causal reasoning reliably instead of collapsing to trivial solutions.
Findings
Semantic loss prevents model collapse during fine-tuning.
Models with semantic loss achieve over 68% accuracy on reasoning tasks.
Baseline models without semantic loss collapse to trivial solutions with high accuracy.
Abstract
Standard fine-tuning of transformer models on causal reasoning tasks leads to catastrophic model collapse, where models learn trivial solutions such as always predicting "Yes" or "No" regardless of input structure. We demonstrate that fine-tuning Gemma 270M on transitivity and d-separation tasks without semantic loss results in 100% collapse rate, with models achieving misleadingly high accuracy (73.9%) while learning no causal reasoning. We propose a semantic loss function with graph-based logical constraints and dynamic lambda scheduling that prevents this collapse. Our approach achieves 70.4% accuracy on transitivity tasks and 68.6% on d-separation tasks with stable, context-dependent predictions, representing a 42.7% improvement over collapsed baselines. Adversarial evaluation on 1,000 structural reasoning samples shows semantic models achieve 67-70% accuracy while collapsed models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
