Energy-Based Constraint Networks: Learning Structural Coherence Across Modalities
Chirag Shinde

TL;DR
This paper presents energy-based constraint networks that learn structural coherence across modalities, achieving high accuracy in text corruption detection and competitive results in deepfake detection without retraining the core architecture.
Contribution
The authors introduce a modality-agnostic architecture that learns explicit energy landscapes for structural coherence, transferable across domains with minimal modifications.
Findings
Achieved 93.4% accuracy on trained text corruption types.
Achieved 87.2% accuracy on unseen text corruption types.
Attained 0.959 AUC in deepfake detection without domain-specific training.
Abstract
We introduce energy-based constraint networks -- a modality-agnostic architecture that learns structural coherence from contrastive pairs. The system processes frozen encoder embeddings through a state-space model with dual-head attention, producing a scalar energy measuring structural consistency alongside per-position energy scores that localize violations. Multiple independently trained branches detect different violation types and compose at inference without interference. We demonstrate the framework in two domains. In text, the system achieves 93.4% accuracy on trained corruption types and 87.2% on 9 unseen types, using frozen BERT and 7.4M trainable parameters. In vision, the same architecture achieves competitive deepfake detection: 0.959 AUC on FaceForensics++ Deepfakes and 0.870 on Celeb-DF without any Celeb-DF training data, using frozen DINOv2 and 3.6M parameters per…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
