# Leveraging Temporal Down-Sampling Structure and Spatio-Temporal Fusion for Efficient Video Coding

**Authors:** Keren He, Yufei Gao, Qi Wang, Haixin Wang, Jinjia Zhou

PMC · DOI: 10.3390/s26051522 · Sensors (Basel, Switzerland) · 2026-02-28

## TL;DR

This paper introduces a new video compression method that improves efficiency by selectively down-sampling frames and using advanced attention modules.

## Contribution

The novel temporal down-sampling system and Multi-scale Temporal-Spatial Attention module enhance compression efficiency.

## Key findings

- The proposed method achieves BD-rate reductions of 14% to 39% compared to VVC.
- The approach outperforms HEVC-based methods across various configurations.
- The MTSA module effectively models temporal and spatial correlations for better compression.

## Abstract

Down-sampling-based video compression frameworks have shown great potential in improving compression efficiency in modern sensing and imaging systems. However, existing methods ignore critical spatial and temporal redundancy, and treat all frames uniformly during down-sampling. This leads to the loss of important information and impacts compression efficiency. To address these limitations, this paper proposes a temporal down-sampling system, in which only intermediate frames are down-sampled while preserving key frames with high quality for reference. On the decoding side, we employ a frame-recurrent enhancement mechanism to maximize the use of temporal redundancy information. In the fusion of enhancement stage, we design a Multi-scale Temporal-Spatial Attention (MTSA) module. MTSA consists of two components: Multi-Temporal Attention (MTA) and Pyramid Spatial Attention (PSA). MTA performs multi-scale temporal correlation modeling, expanding the receptive field and providing stable cues in compressed regions. PSA integrates local spatial saliency and contextual structure in a progressive and multi-stage manner. Extensive experiments show that our approach achieves consistent BD-rate reductions. Under All-Intra, Low-Delay-P, and Random Access configurations, we observe BD-rate reductions of I, P, and B frames ranging from 14% to 39% compared to VVC, and outperform prior approaches anchored by the standard HEVC.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12987268/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12987268/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC12987268/full.md

---
Source: https://tomesphere.com/paper/PMC12987268