Adaptive Low-Resolution Combination Search for Reference-Independent Image Super-Resolution
Ye Tian

TL;DR
This paper introduces a new image super-resolution method that reconstructs high-resolution images from low-resolution inputs without needing high-resolution references.
Contribution
The novel contribution is an adaptive search algorithm that combines low-resolution images to reconstruct high-resolution content using a unified degradation model.
Findings
The proposed method improves PSNR by 27.33% and SSIM by 44.64% on the USAF 1951 resolution target.
In semiconductor chip inspection, PSNR increases by 22.36% and SSIM by 40.38%.
Abstract
Accurately reconstructing high-resolution (HR) images remains challenging in scenarios where HR observations cannot be captured due to optical, hardware, or cost constraints. To address this limitation, we introduce an image super-resolution (SR) framework that reconstructs HR content solely from multiple low-resolution (LR) measurements, without relying on any HR reference images. The proposed method formulates a unified degradation model that describes how HR pixels contribute to LR observations under subpixel shifts and anisotropic downsampling. Based on this model, we develop an adaptive search algorithm capable of identifying the minimal and most informative combination of LR images required to equivalently represent the latent HR image. The selected LR images are then used to construct a solvable linear system whose solution directly yields the HR pixel values. Experiments…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11- —Natural Science Foundation of Shandong Province
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Video Quality Assessment · Digital Media Forensic Detection
1. Introduction
Image super-resolution (SR) plays an essential role in improving image quality and restoring intricate details, which has proven useful in numerous sectors such as medical imaging, satellite remote sensing, and biometric technology [1,2,3]. Conventional SR methods use high-resolution (HR) images as references, implementing strategies such as patch matching, texture transfer, and frequency domain analysis to refine low-resolution images. Notable progress in this area includes [4,5,6,7,8]. However, obtaining HR images is often difficult or infeasible in many practical circumstances, which considerably limits the use of SR models that require HR reference images [9,10].
Reference-free super-resolution methods have attracted increasing attention due to their ability to improve image quality without requiring access to HR images [11,12]. Mishra [13] proposed an unsupervised super-resolution framework that employs contrastive learning to extract texture features, which are subsequently refined through iterative neural network optimization to achieve super-resolved outputs. The Synthetic Multi-Orientation Resolution Enhancement algorithm was introduced by Zhao [14], which simulates paired low- and high-resolution magnetic resonance imaging data by applying synthetic degradations to native HR scans. Deepthi [15] developed a Gompertz-function convergence war accelerometric optimization generative adversarial network for hyperspectral image super-resolution in remote sensing. This approach leverages a Gompertz-function-based optimization strategy to generate HR images and uses a pre-trained Inception-v3 model to evaluate the realism and diversity of generated images based on feature distribution similarity. Ortega [16] used a controllable displacement device to capture low-resolution images at specific subpixel positions and applied a multi-image super-resolution algorithm to reconstruct a high-resolution image from four frames. Existing reference-free super-resolution methods rely on proxy metrics for quality assessment. Although these metrics show heuristic value, SR reconstruction remains an ill-posed problem where multiple HR solutions exist for one LR input. The ground-truth solution may not optimize these metrics, and excessive focus on them can produce pseudo-sharp results that distort semantic structures and morphological features [17,18]. Consequently, developing a substitute reference that truly replicates real HR imaging for physically reliable super-resolution reconstruction remains a challenge.
To address these challenges, this paper proposes a reference-free image super-resolution method based on a low-resolution image search algorithm. The method first constructs an image degradation model to characterize the relationship between the high-resolution image and its corresponding low-resolution observations. Based on this model, an adaptive search algorithm is developed to identify the optimal combination of subpixel-shifted and downsampled low-resolution images by analyzing their features and degradation characteristics. The super-resolved image is then reconstructed by solving a system of equations formulated from the selected low-resolution image combinations. Experimental results presented later in this paper demonstrate the effectiveness of the proposed method.
2. Methods
2.1. Search Algorithm for Low-Resolution Image Combination
Recall that previous studies employed the pixel binning method [19,20] to improve image quality in low-light and high-noise conditions by reducing spatial resolution, as shown in Figure 1. Under the assumption that different low-resolution observations are produced by this resolution–reduction process, each low-resolution observation was represented as a linear combination of multiple high-resolution pixels. Building on this, we introduce a novel adaptive search algorithm capable of intelligently determining the optimal combination of low-resolution images, thereby providing an efficient and equivalent depiction of the original high-resolution image. Note that our degradation model follows the same principle as pixel binning, which is applicable only to digital images, and subpixel information requires high-precision motion devices.
Pixels are the fundamental units that make up a digital image. According to [21], the value of the pixel is equal to the integral of the light intensity function over its spatial area divided by that area. Ideal high-resolution pixels can be expressed as
Let l denote the edge length of the high-resolution pixel, where i and j are natural numbers that index the location of the pixel. The function describes the distribution of light intensity in the two-dimensional spatial domain. An ideal high-resolution image , composed of and high-resolution pixels along the horizontal and vertical dimensions, respectively, can be expressed as:
Here, m and n are positive integers, and z represents the degradation factor, an integer greater than 2 by which the image is downscaled and subsequently upscaled during super-resolution. The resolution of the high-resolution image is × , occupying a sensor spatial dimension of × . However, in practical imaging, we cannot obtain the ideal high-resolution image but can only acquire an actual low-resolution image with a larger pixel size:
The resolution of a low-resolution image is m × n, occupying a sensor area of × . The pixels of the actually acquired low-resolution image are denoted as :
According to the theory of pixel binning, can be expressed as the arithmetic mean of high-resolution pixels. The relationship between the ideal high-resolution image and the actual acquired low-resolution image is illustrated in Figure 2.
Figure 2 shows the established image coordinate system, which takes the top-left corner of the image as the origin, with the positive x-direction to the right and the positive y-direction downward. Based on the acquired low-resolution image , subpixel images and further downsampled pixel images can be derived through offset sampling and downsampling techniques. The subpixel image can be expressed as
A subpixel is defined as an image unit acquired by introducing an in-plane displacement, at a subpixel scale, of the imaging sensor relative to the scene. Formally, is the image shifted by along the x-direction, and is the image shifted by along the y-direction. The corresponding subpixel can be expressed as:
The subpixel can also be expressed as the arithmetic mean of high-resolution pixels. The relationship between the actual captured low-resolution images and the subpixel-shifted images is shown in Figure 3.
In addition to subpixel images, the downsampled image can be expressed as:
A downsampled pixel refers to a larger sensor pixel formed by merging multiple adjacent sensor pixels, which inherently corresponds to a lower spatial resolution. In this context, and represent the images produced by downsampling along the x and y axes, respectively, while denotes the image downsampled along both directions. Here, is a positive integer that exceeds z, acting as the downsampling factor. The downsampled pixels can be expressed as:
Pixels downsampled along either the x- or y-direction alone can be represented as the arithmetic mean of high-resolution pixels, while pixels downsampled in both directions correspond to the arithmetic mean of high-resolution pixels. The relationship between actually captured low-resolution images and the downsampled images is shown in Figure 4.
Purchasing a high-resolution image (2) directly is not practical in real-world scenarios. However, various types of low-resolution images can be obtained efficiently. As described in (6) and (8), each low-resolution pixel can be expressed as the arithmetic average of a set of high-resolution pixels. Unlike existing studies that compute low-resolution pixels from known high-resolution pixels, our objective is to recover unknown high-resolution pixels using only the observed low-resolution measurements. A single equation derived from one low-resolution pixel is insufficient for this purpose. However, by jointly solving multiple equations that express different low-resolution pixels as linear combinations of high-resolution pixels, the rank of the resulting system can be increased, making the recovery of high-resolution pixels feasible. In this context, this study introduces a low-resolution image combination search algorithm as shown in Figure 5.
The objective of this algorithm is to search through an infinite number of low-resolution images and identify a combination that can equivalently represent a high-resolution image. According to the formula, an image is essentially a set of pixels, and there is a one-to-one correspondence between them. Therefore, in our search algorithm, pixels, rather than the entire image, are treated as the basic search unit. In this work, the adaptive search algorithm was implemented in Python 3.13 and executed on a desktop equipped with an Intel Core i5 CPU, 8 GB RAM (Intel, Santa Clara, CA, USA), and an NVIDIA GeForce GT 1030 GPU (Nvidia, Santa Clara, CA, USA), completing the computation in 1.26 s. The search algorithm proceeds through three calculation steps.
Step 1: Beginning with the low-resolution pixel , the process generates subpixels ( ) and downsampled pixels ( ) depending on , , and . Express each low-resolution pixel as a summation of high-resolution pixels, and reformulate all summation constraints into a linear system , where each row of A corresponds to one summation operator, and X denotes the vector of distinct high-resolution pixels.
Step 2: Calculate the rank of matrix A. If is greater than or equal to the number of elements in X, the current parameters are stored and the search is stopped. Otherwise, if the rank increases compared to the previous iteration, the current parameters are recorded on the track.
Step 3: The parameters , , and are updated, new low-resolution pixels and are generated, and the above procedure is repeated until the stopping condition is satisfied.
According to the search algorithm, when employing a super-resolution factor of , the associated sequence of low-resolution pixels is: , , , , , , , , , as detailed in Table 1. The first and second columns list the nine low-resolution pixels that need to be collected. The third column shows their corresponding degradation parameters. Every element in this sequence is linked to a distinct low-resolution image. Consequently, reconstructing the desired high-resolution image is feasible by strategically obtaining just nine low-resolution images.
2.2. Simulation
To evaluate the efficacy of the proposed search method, a simulation was performed with a randomly generated high-resolution image. The image is a standard 8-bit grayscale with pixel values randomly chosen from 0 to 255. This setup provides a consistent and unbiased basis for the computational procedures, ensuring a robust assessment of the method’s general applicability. The low-resolution image, referred to as , is obtained by applying a twofold degradation to the high-resolution pixel data. According to the subpixel displacement parameter obtained by the search algorithm, other subpixel images , , and are collected. Based on the downsampling parameter , the corresponding downsampled images are acquired. Both the subpixel images and downsampled images are depicted within the blue dashed boxes in Figure 6.
In Figure 6, the pixel sizes vary according to their resolutions. Pixels of higher resolutions are characterized by smaller sizes, whereas those with lower resolutions are larger. We identify the largest downsampled pixel, as marked by the red bounding box, as the basic computational unit. Each computational unit corresponds to nine high-resolution pixels, denoted to , organized in row-major order from top left to bottom right. Consequently, we can establish a system of linear equations represented by . In this formulation, represents the coefficient matrix, is the vector containing the unknown variables, and is the vector of constants. The explicit form of the coefficient matrix is provided:
The representation of the vector of unknown variables is:
The expression of the vector of constant terms is shown in (11).
The elements of the constant vector are derived by multiplying each low-resolution pixel value by a scaling factor. This factor is determined by the ratio of the area of the low-resolution pixel to that of the high-resolution pixel. Then we can obtain the solution:
For convenience, we rearrange through into a matrix here. It should be noted that these results were obtained using only the low-resolution images (Figure 6) as input, whereas the provided high-resolution images were not involved in the reconstruction process.
To quantitatively evaluate the super-resolution results, we compare the reconstructed image (corresponding to through ) with the reference region marked by the red bounding box in Figure 6. The evaluation metrics are the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity (SSIM), which are widely adopted in image super-resolution research. The PSNR calculation first requires computing the Mean Squared Error (MSE) between the reference image R and the reconstructed image R, where denotes the total number of pixels in the images and can be adaptively adjusted according to the input image size:
For 8-bit images with the maximum pixel intensity , the PSNR (in decibels) is then derived as:
Higher PSNR values indicate better reconstruction fidelity. On the other hand, the SSIM metric evaluates perceptual quality through three comparative components: luminance (l), contrast (c), and structure (s):
where , denote local means and standard deviations, respectively, is the cross-covariance, and are stability constants. The composite SSIM index combines these components:
Following standard practice [22], we set and , with , . In this validation case study, the evaluation metrics PSNR and SSIM yield values of 39.6798 dB and 0.9991, respectively. According to the criteria established in [23], these results confirm that high-quality reconstruction can be achieved solely from low-resolution inputs without relying on high-resolution references. These results further substantiate the precision of our innovative low-resolution image combination search algorithm, which efficiently identifies an optimal subset from a theoretically infinite pool of potential low-resolution image combinations to accurately approximate the target high-resolution image. Note that although the simulation experiment demonstrates only a single case, the pixel values of the test image can be arbitrarily replaced. To further ensure the robustness of the conclusions, we provide a theoretical validation in the Appendix A that does not rely on specific pixel values.
3. Experiment and Results
3.1. Experimental Setup
An experimental setup was constructed, as illustrated in Figure 7, to validate the effectiveness of the proposed reference-free image super-resolution based on the low-resolution image search algorithm. The setup comprised a microscope equipped with interchangeable objective lenses (20×, 50×, and 100×), a digital camera, a motion driver, a computer, and a nano-positioning stage. The digital camera, mounted on the eyepiece port of the microscope, was used to capture image sequences. Both the camera and the motion driver were connected to the computer that was responsible for generating motion control signals and coordinating image acquisition.
The model and manufacturer of all equipment are detailed in Table 2.
The nano-positioning stage serves as the core actuating mechanism for subpixel-shifted image generation, with its key performance specifications summarized in Table 3. Driven by piezoelectric ceramics, this stage enables precise nanometer-level displacement of the observed samples, thereby providing a reliable motion control foundation for high-accuracy acquisition of subpixel-shifted images.
The microscope enables image acquisition at different magnifications (50× and 100×). Throughout our experiments, every type of low-resolution image was obtained using a magnification of 50×. Separately, high-resolution ground truth images were captured at 100× magnification and used solely for quantitative evaluation of reconstruction results. It is important to note that the 100× reference images were used exclusively as the benchmark for evaluation purposes and did not participate in the reconstruction process.
The experiments consist of two parts: resolution detection using a calibration target and surface inspection of a semiconductor chip. For both experiments, the procedures were identical. Initially, under the 50× objective lens, the first image captured after focusing was used as the base image corresponding to (3). Then, the low-resolution image search algorithm was executed to generate acquisition positions. These positions were used to drive the nano-positioning stage accordingly, acquiring additional low-resolution images through image stretching and downsampling operations. Subsequently, a system of equations such as was formulated, and the target HR pixels were obtained by solving these equations. Finally, these HR pixels were combined to reconstruct the super-resolution image. To establish comprehensive performance benchmarks, additional reference-free methods were also implemented as control experiments, including GAN-based reconstruction [15], a multiimage super-resolution (MISR) method [16], and classical interpolation algorithms [24]. These comparison methods were deliberately selected because they operate in reference-free settings and exploit multiple low-resolution images with subpixel shifts. In contrast, many recent reference-free SR approaches rely on single-image inputs or learned priors, which are not directly comparable to our acquisition model.
3.2. Results
The resolution performance was evaluated using the USAF 1951 resolution target, as shown in Figure 8. The target consists of multiple groups of line-pair patterns with gradually decreasing spacing, providing a precise and standardized reference for resolution assessment. Owing to its well-defined geometric progression and high manufacturing accuracy, the USAF target enables reliable benchmarking of the proposed method.
Figure 9 presents the super-resolution results of the USAF target. To facilitate visual comparison, the region corresponding to Group 7, Element 6—containing the finest line pairs—was selected as the region of interest (ROI) and magnified. The proposed method, the GAN-based method, and the MISR method successfully resolve the finest line-pair structure corresponding to 228 lp/mm. In contrast, the interpolation-based method suffers from significant blurring after magnification and fails to recover the line-pair details, while the MISR method introduces noticeable jagged artifacts along edges despite resolving the structure.
The second experiment evaluates the method in the context of semiconductor surface inspection. The test sample is a BIWIN DDR4-2666 chip, a widely used high-performance memory component. Optical microscopy is commonly employed for chip inspection due to its non-contact and fast imaging characteristics, despite inherent resolution limitations. The imaging results obtained by different methods are shown in Figure 10.
In addition to the visual results, quantitative evaluations were performed using Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) metric to measure reconstruction accuracy. The metrics for all methods in both experiments are summarized in Table 4. Here, the original image refers to the low-resolution image directly captured under the 50× objective and used as input prior to super-resolution, the baseline of interpolation corresponds to classical bicubic interpolation.
4. Discussion
Across both the USAF 1951 target and DDR4 chip experiments, the proposed method consistently outperforms existing approaches in terms of reconstruction quality. Compared with the best-performing baseline, PSNR increases by 27.33% and 28.10%, while SSIM increases by 44.64% and 50.98% for the two experiments, respectively. These improvements demonstrate that the proposed method not only enhances numerical accuracy but also preserves structural information more effectively.
Notably, the improvement in SSIM is more pronounced than that in PSNR. This difference can be attributed to the high-frequency quantization errors introduced during the estimation of high-resolution pixels. Since SSIM focuses on structural similarity and perceptual fidelity, it is less sensitive to such high-frequency noise. In contrast, PSNR directly reflects pixel-wise errors and therefore decreases more easily when quantization artifacts appear. This observation highlights that the proposed method yields perceptually faithful reconstructions even in the presence of minor high-frequency deviations. Overall, both subjective visual assessment and objective metrics demonstrate that the method achieves superior reconstruction performance in resolving fine structures and recovering surface details, validating its effectiveness in microscopy imaging and semiconductor inspection applications.
5. Conclusions
This paper presents a novel reference-free image super-resolution method based on a low-resolution image combination search algorithm. By constructing a unified degradation model that encompasses subpixel-shifted images, downsampled images, and high-resolution counterparts, a linear combination-based optimization strategy is developed to effectively search and integrate low-resolution observations. This enables accurate representation and reconstruction in the absence of high-resolution references. The experimental results validate the effectiveness of the proposed method in different test scenarios. In the USAF 1951 resolution target experiment, the proposed method improved PSNR and SSIM by 27. 33% and 44. 64%, respectively, achieving a resolution of 228 pairs of lines per millimeter. Similarly, in the DDR4 chip imaging experiment, PSNR and SSIM increased by 22.36% and 40.38%, respectively. These results demonstrate the method’s ability to address the critical challenge of missing high-resolution priors in unsupervised super-resolution reconstruction.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Umirzakova S. Ahmad S. Khan L.U. Whangbo T. Medical image super-resolution for smart healthcare applications: A comprehensive survey Inf. Fusion 202410310207510.1016/j.inffus.2023.102075 · doi ↗
- 2Li H. Deng W. Zhu Q. Guan Q. Luo J. Local-global context-aware generative dual-region adversarial networks for remote sensing scene image super-resolution IEEE Trans. Geosci. Remote Sens.202462540211410.1109/TGRS.2024.3355419 · doi ↗
- 3Zhang Z. Guan D. Koc C.K. Luo X. Adapt PFL: Unlocking cross-device palmprint recognition via adaptive personalized federated learning with feature decoupling Proceedings of the 34th International Joint Conference on Artificial Intelligence Montreal, QC, Canada 16–22 August 202570747082
- 4Zheng H. Heng H. Yan Z. Zeng K. Fang J. Qiang B. A Generic Multicorrespondence Matching Framework for Reference-Based Image Super-Resolution IEEE Trans. Instrum. Meas.202473502591110.1109/TIM.2024.3428600 · doi ↗
- 5Min J. Lee Y. Kim D. Yoo J. Bridging the Domain Gap: A Simple Domain Matching Method for Reference-Based Image Super-Resolution in Remote Sensing IEEE Geosci. Remote Sens. Lett.202421800010510.1109/LGRS.2023.3336680 · doi ↗
- 6Hayat M. Aramvith S. Saliency-aware deep learning approach for enhanced endoscopic image super-resolution IEEE Access 202412834528346510.1109/ACCESS.2024.3402953 · doi ↗
- 7Jiang Y. Chan K.C.K. Wang X. Loy C.C. Liu Z. Reference-Based Image and Video Super-Resolution via C 2-Matching IEEE Trans. Pattern Anal. Mach. Intell.202345887488873701543110.1109/TPAMI.2022.3231089 · doi ↗ · pubmed ↗
- 8Lee H. Yoo J.-S. Jung S.-W. Ref QSR: Reference-Based Quantization for Image Super-Resolution Networks IEEE Trans. Image Process.2024332823283410.1109/TIP.2024.338527638598375 · doi ↗ · pubmed ↗
