How does My Model Fail? Automatic Identification and Interpretation of Physical Plausibility Failure Modes with Matryoshka Transcoders

Yiming Tang; Abhijeet Sinha; Dianbo Liu

arXiv:2511.10094·cs.LG·November 19, 2025

How does My Model Fail? Automatic Identification and Interpretation of Physical Plausibility Failure Modes with Matryoshka Transcoders

Yiming Tang, Abhijeet Sinha, Dianbo Liu

PDF

Open Access

TL;DR

This paper introduces Matryoshka Transcoders, a framework that automatically identifies and interprets physical plausibility failure modes in generative models, aiding targeted improvements and establishing a new evaluation benchmark.

Contribution

It extends the Matryoshka representation learning paradigm to hierarchical feature discovery in generative models, enabling automatic detection and interpretation of physical failure modes without manual feature engineering.

Findings

01

Identifies diverse physics-related failure modes in state-of-the-art models

02

Achieves superior feature relevance and accuracy over existing methods

03

Provides insights for improving physical plausibility in generative models

Abstract

Although recent generative models are remarkably capable of producing instruction-following and realistic outputs, they remain prone to notable physical plausibility failures. Though critical in applications, these physical plausibility errors often escape detection by existing evaluation methods. Furthermore, no framework exists for automatically identifying and interpreting specific physical error patterns in natural language, preventing targeted model improvements. We introduce Matryoshka Transcoders, a novel framework for the automatic discovery and interpretation of physical plausibility features in generative models. Our approach extends the Matryoshka representation learning paradigm to transcoder architectures, enabling hierarchical sparse feature learning at multiple granularity levels. By training on intermediate representations from a physical plausibility classifier and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games · Multimodal Machine Learning Applications