BiRQA: Bidirectional Robust Quality Assessment for Images
Aleksandr Gushchin, Dmitriy S. Vatolin, Anastasia Antsiferova

TL;DR
BiRQA is a fast, robust full-reference image quality assessment model that outperforms previous methods in accuracy and adversarial resilience by using a bidirectional multiscale pyramid and anchored adversarial training.
Contribution
The paper introduces BiRQA, a novel FR IQA model combining bidirectional multiscale features with anchored adversarial training for improved robustness and speed.
Findings
Outperforms or matches state-of-the-art accuracy on five benchmarks.
Runs approximately three times faster than previous SOTA models.
Significantly improves robustness under unseen white-box attacks.
Abstract
Full-Reference image quality assessment (FR IQA) is important for image compression, restoration and generative modeling, yet current neural metrics remain slow and vulnerable to adversarial perturbations. We present BiRQA, a compact FR IQA metric model that processes four fast complementary features within a bidirectional multiscale pyramid. A bottom-up attention module injects fine-scale cues into coarse levels through an uncertainty-aware gate, while a top-down cross-gating block routes semantic context back to high resolution. To enhance robustness, we introduce Anchored Adversarial Training, a theoretically grounded strategy that uses clean "anchor" samples and a ranking loss to bound pointwise prediction error under attacks. On five public FR IQA benchmarks BiRQA outperforms or matches the previous state of the art (SOTA) while running ~3x faster than previous SOTA models. Under…
Peer Reviews
Decision·Submitted to ICLR 2026
Present a compact FR IQA metric model that processes four fast complementary features within a bidirectional multi-scale pyramid. A bottom-up attention module injects fine-scale cues into coarse levels through an uncertainty-aware gate, while a top-down cross-gating block routes semantic context back to high resolution. Introduce anchored adversarial training, a theoretically grounded strategy that uses clean "anchor" samples and a ranking loss to bound point-wise prediction error under atta
BiRQA uses "four fast complementary features"—the most basic building block of the model—but provides no description of what these features are (e.g., texture, edge, frequency-domain features), how they are extracted, or why four features are chosen over fewer/more. Without this, the "compactness" of the model (a core selling point) is meaningless. The "bottom-up attention module with an uncertainty-aware gate" and "top-down cross-gating block" are described in name only. There is no explanatio
**S1.** The proposed anchor-ranking loss for robustness improvement is novel. **S2.** The experiments are thorough, and the proposed method demonstrates strong performance across all datasets.
**W1.** The novelty of the model architecture appears limited. The extracted features are borrowed from existing works, and both top-down and bottom-up architectures are commonly used in the literature. Moreover, the authors do not clearly explain why these modules are combined or how this combination addresses the first problem mentioned in the Introduction—namely, “(i) slow inference speed that limits real-time use.” **W2.** The notations in Eq. (3) are confusing, and there is no illustration
+ The CSRAM’s explicit strength/confidence decomposition is a meaningful incremental architectural novelty, which is designed to favor speed and interpretability. + Combining anchor-based ranking with adversarial fine-tuning and proving a mini-batch pointwise error bound relating the anchored ranking loss to max prediction error (Theorem 1) is an interesting contribution. + Multiple white-box attacks and perturbation budgets are evaluated, where the proposed method outperforms the competing me
- The general idea of multi-scale fusion and top-down/bottom-up information flow has prior art in IQA. - Theorem 1 depends on assumptions like anchor spacing, anchor accuracy, MOS resolution, and minibatch construction in narrow MOS bands. - DISTS and LPIPS are not Transformer-based. TOPIQ is not necessarily based on Transformer. - Neural IQA models tend to be heavier than analytic or hand-crafted ones, but this is a design choice, not an intrinsic property of being “neural.” There exist lightw
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Image Processing Techniques · Digital Media Forensic Detection
