Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?
Sigma Jahan, Mohammad Masudur Rahman

TL;DR
This paper investigates the use of Hessian-based analysis to improve fault diagnosis in attention-based deep learning models, demonstrating its effectiveness in identifying unstable regions and fault sources.
Contribution
It introduces Hessian-derived metrics for fault diagnosis in attention models and empirically evaluates their effectiveness across multiple architectures.
Findings
Hessian metrics outperform gradients in fault localization
Hessian analysis identifies fragile regions effectively
Applicable to diverse attention-based models
Abstract
As attention-based deep learning models scale in size and complexity, diagnosing their faults becomes increasingly challenging. In this work, we conduct an empirical study to evaluate the potential of Hessian-based analysis for diagnosing faults in attention-based models. Specifically, we use Hessian-derived insights to identify fragile regions (via curvature analysis) and parameter interdependencies (via parameter interaction analysis) within attention mechanisms. Through experiments on three diverse models (HAN, 3D-CNN, DistilBERT), we show that Hessian-based metrics can localize instability and pinpoint fault sources more effectively than gradients alone. Our empirical findings suggest that these metrics could significantly improve fault diagnosis in complex neural architectures, potentially improving software debugging practices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Advanced Neural Network Applications · Software Engineering Research
MethodsSoftmax · Attention Is All You Need
