Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?

Sigma Jahan; Mohammad Masudur Rahman

arXiv:2506.07871·cs.LG·June 10, 2025

Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?

Sigma Jahan, Mohammad Masudur Rahman

PDF

Open Access

TL;DR

This paper investigates the use of Hessian-based analysis to improve fault diagnosis in attention-based deep learning models, demonstrating its effectiveness in identifying unstable regions and fault sources.

Contribution

It introduces Hessian-derived metrics for fault diagnosis in attention models and empirically evaluates their effectiveness across multiple architectures.

Findings

01

Hessian metrics outperform gradients in fault localization

02

Hessian analysis identifies fragile regions effectively

03

Applicable to diverse attention-based models

Abstract

As attention-based deep learning models scale in size and complexity, diagnosing their faults becomes increasingly challenging. In this work, we conduct an empirical study to evaluate the potential of Hessian-based analysis for diagnosing faults in attention-based models. Specifically, we use Hessian-derived insights to identify fragile regions (via curvature analysis) and parameter interdependencies (via parameter interaction analysis) within attention mechanisms. Through experiments on three diverse models (HAN, 3D-CNN, DistilBERT), we show that Hessian-based metrics can localize instability and pinpoint fault sources more effectively than gradients alone. Our empirical findings suggest that these metrics could significantly improve fault diagnosis in complex neural architectures, potentially improving software debugging practices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Advanced Neural Network Applications · Software Engineering Research

MethodsSoftmax · Attention Is All You Need