Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content
Evgeney Bogatyrev, Khaled Abud, Ivan Molodetskikh, Nikita Alutis, and Dmitriy Vatolin

TL;DR
This paper introduces StreamSR, a new dataset for real-time video super-resolution benchmarking, proposes EfRLFN, an efficient model optimized for streaming content, and demonstrates that fine-tuning improves performance across benchmarks.
Contribution
The paper presents a new comprehensive dataset, StreamSR, and a novel real-time super-resolution model EfRLFN, along with fine-tuning strategies for improved performance.
Findings
EfRLFN achieves better visual quality and runtime performance.
Fine-tuning models on StreamSR improves results on standard benchmarks.
The dataset reflects real-world streaming scenarios more accurately.
Abstract
Recent advancements in real-time super-resolution have enabled higher-quality video streaming, yet existing methods struggle with the unique challenges of compressed video content. Commonly used datasets do not accurately reflect the characteristics of streaming media, limiting the relevance of current benchmarks. To address this gap, we introduce a comprehensive dataset - StreamSR - sourced from YouTube, covering a wide range of video genres and resolutions representative of real-world streaming scenarios. We benchmark 11 state-of-the-art real-time super-resolution models to evaluate their performance for the streaming use-case. Furthermore, we propose EfRLFN, an efficient real-time model that integrates Efficient Channel Attention and a hyperbolic tangent activation function - a novel design choice in the context of real-time super-resolution. We extensively optimized the…
Peer Reviews
Decision·ICLR 2026 Poster
1. Novelty: The new StreamSR dataset is a valuable contribution, addressing a need for training/evaluating SR on real-world streaming content. It comprises 5.2K YouTube videos (25–30s clips) with aligned low/high-resolution pairs at 360p→1440p (4×) and 720p→1440p (2×) scales. Unlike prior SR datasets which often use pristine images synthetically downscaled (e.g. DIV2K, REDS), StreamSR provides naturally compressed low-resolution frames containing authentic streaming artifacts (from common codecs
1. Generalization Concerns: While StreamSR is a strong step toward real-world data, there are some questions about generalization. The dataset and experiments focus solely on YouTube videos and the prevalent codecs used there (VP9, H.264, AV1). This covers a large portion of online content, but it may not generalize to other domains or compression formats. For example, professional streaming or broadcast uses codecs like HEVC or VVC; these might produce different artifacts, and it’s unclear if m
1. The paper addresses a timely and practical problem: enhancing low-quality compressed videos in real-world streaming scenarios. By introducing both a new dataset and benchmark, it highlights the limitations of current SR methods under realistic conditions. 2. The StreamSR dataset fills an important gap by providing a large-scale collection of real YouTube videos with natural compression artifacts, offering a more realistic evaluation platform than synthetic benchmarks. 3. EfRLFN achieves compe
1. The paper focuses on real-time SR for compressed streaming videos, but the related work primarily discusses general real-time SR methods, with little discussion of existing compressed VSR approaches. Important works in this domain—such as *Learning Degradation-Robust Spatiotemporal Frequency-Transformer for Video Super-Resolution*—are not adequately reviewed. A dedicated discussion on compressed VSR literature is needed. 2. The main claimed contribution is the StreamSR dataset, yet no sample
Dataset Significance and Quality: The paper's most valuable contribution is the StreamSR dataset. The authors correctly argue that popular datasets like DIV2K and Vimeo90K do not adequately represent the challenges of real-world streaming media. The effort to collect, filter, and curate a 5,200-video dataset with associated low- and high-resolution pairs from a streaming source is substantial and addresses a clear need in the community . Comprehensive Benchmarking: The second major strength is
Limited Architectural Novelty: The primary weakness is the limited originality of the proposed EfRLFN model. The architecture appears to be a thoughtful and effective combination of existing lightweight techniques rather than a novel design. The key modifications—such as replacing ESA with ECA and using a tanh activation—are well-motivated by prior work (e.g., SPAN ) and common practices from recent NTIRE challenges. While the ablation studies (e.g., Table 3, Figure 5(b)) are detailed and confir
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Video Quality Assessment · Advanced Vision and Imaging
