How does My Model Fail? Automatic Identification and Interpretation of Physical Plausibility Failure Modes with Matryoshka Transcoders
Yiming Tang, Abhijeet Sinha, Dianbo Liu

TL;DR
This paper introduces Matryoshka Transcoders, a framework that automatically identifies and interprets physical plausibility failure modes in generative models, aiding targeted improvements and establishing a new evaluation benchmark.
Contribution
It extends the Matryoshka representation learning paradigm to hierarchical feature discovery in generative models, enabling automatic detection and interpretation of physical failure modes without manual feature engineering.
Findings
Identifies diverse physics-related failure modes in state-of-the-art models
Achieves superior feature relevance and accuracy over existing methods
Provides insights for improving physical plausibility in generative models
Abstract
Although recent generative models are remarkably capable of producing instruction-following and realistic outputs, they remain prone to notable physical plausibility failures. Though critical in applications, these physical plausibility errors often escape detection by existing evaluation methods. Furthermore, no framework exists for automatically identifying and interpreting specific physical error patterns in natural language, preventing targeted model improvements. We introduce Matryoshka Transcoders, a novel framework for the automatic discovery and interpretation of physical plausibility features in generative models. Our approach extends the Matryoshka representation learning paradigm to transcoder architectures, enabling hierarchical sparse feature learning at multiple granularity levels. By training on intermediate representations from a physical plausibility classifier and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games · Multimodal Machine Learning Applications
