No Pixel Left Behind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image Detection
Lianrui Mu, Zou Xingze, Jianhong Bai, Jiaqi Hu, Wenjie Zheng, Jiangnan Ye, Jiedong Zhuang, Mudassar Ali, Jing Wang, Haoji Hu

TL;DR
This paper presents HiDA-Net, a high-resolution image detection architecture that preserves pixel-level details using feature aggregation from local tiles and global views, improving robustness and accuracy in detecting AI-generated images.
Contribution
The paper introduces HiDA-Net, a novel detail-preserving framework with feature aggregation, and new modules for forgery localization and compression noise disentanglement, along with a large high-resolution benchmark.
Findings
Achieves over 13% accuracy improvement on Chameleon dataset.
Improves detection accuracy by 10% on HiRes-50K benchmark.
Demonstrates robustness against localized manipulations and compression artifacts.
Abstract
The rapid growth of high-resolution, meticulously crafted AI-generated images poses a significant challenge to existing detection methods, which are often trained and evaluated on low-resolution, automatically generated datasets that do not align with the complexities of high-resolution scenarios. A common practice is to resize or center-crop high-resolution images to fit standard network inputs. However, without full coverage of all pixels, such strategies risk either obscuring subtle, high-frequency artifacts or discarding information from uncovered regions, leading to input information loss. In this paper, we introduce the High-Resolution Detail-Aggregation Network (HiDA-Net), a novel framework that ensures no pixel is left behind. We use the Feature Aggregation Module (FAM), which fuses features from multiple full-resolution local tiles with a down-sampled global view of the image.…
Peer Reviews
Decision·ICLR 2026 Poster
1. The motivation for the method is absolutely clear and supplemented with math and illustrations. 2. The proposed method achieves SOTA performance on several datasets. 3. The paper proposed a novel high-resolution HiRes-50K dataset that may be valuable for the community. 4. The paper includes extensive ablation on the proposed method.
Major weaknesses: 1. The proposed method is not compared with recent AI-generated image detection methods, like [1 - 3]. 2. The proposed method has an increased inference time for high resolution images compared to the other approaches. But what is the difference in speed between the HiRes-50K and the other methods on standard resolutions, like 224 $\times$ 224? 3. I have not found the explicit list of models that are used in creating the HiRes-50K dataset. It is important to include the relevan
1. The paper convincingly identifies and quantifies how resizing harms detection by losing high-frequency details. 2. It introduces a new and high-quality dataset, named HiRes-50K, for evaluation.
1. The proposed method includes both a global and a local path, but uses the same transformer blocks to process global and local images. How do the proposed FAM, TFL, and QFE modules adaptively extract and distinguish local and global features for classification? It is unclear why these modules can effectively capture both types of features simultaneously. 2. The computational complexity of transformer blocks is quadratic. When more local patches are used, this inevitably increases the processi
- The motivation is clear and intuitive. The detailed information loss will introduce misunderstanding for the detection model, especially for the AIGC scene, where most generated models produce well-structured but detail-failed images. - The model design is reasonable and theoretically proven. - The experiment is comprehensive to demonstrate the effectiveness of the proposed network and benchmarks.
- The open-source AIGC image dataset and real-image dataset are innumerable. The paper does not sufficiently explain the core basis for HiRes-50K to surpass existing data resources in terms of irreplaceability or value increment. - The experiments primarily rely on outdated models (e.g., SD v1.4, SDXL) for generating AI-synthesized images with limited coverage of other mainstream high-resolution generative models—especially advanced models updated after 2024 (e.g., SD3.5, FLUX, Qwen-Image). - T
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
