LiFR-Seg: Anytime High-Frame-Rate Segmentation via Event-Guided Propagation
Xiaoshan Wu, Xiaoyang Lyu, Yifei Yu, Bo Wang, Zhongrui Wang, Xiaojuan Qi

TL;DR
LiFR-Seg introduces an event-guided propagation framework for real-time semantic segmentation that maintains high accuracy at arbitrary times using minimal input data, effectively bridging perceptual gaps in dynamic scenes.
Contribution
The paper presents a novel event-guided semantic segmentation method that propagates deep features over time with uncertainty-aware warping and memory attention, enabling high-frame-rate performance from low-frame-rate inputs.
Findings
Achieves 73.82% mIoU on DSEC, close to high-frame-rate upper bound.
Validates effectiveness on a new synthetic high-frequency benchmark.
Outperforms existing LFR segmentation methods in dynamic scenes.
Abstract
Dense semantic segmentation in dynamic environments is fundamentally limited by the low-frame-rate (LFR) nature of standard cameras, which creates critical perceptual gaps between frames. To solve this, we introduce Anytime Interframe Semantic Segmentation: a new task for predicting segmentation at any arbitrary time using only a single past RGB frame and a stream of asynchronous event data. This task presents a core challenge: how to robustly propagate dense semantic features using a motion field derived from sparse and often noisy event data, all while mitigating feature degradation in highly dynamic scenes. We propose LiFR-Seg, a novel framework that directly addresses these challenges by propagating deep semantic features through time. The core of our method is an uncertainty-aware warping process, guided by an event-driven motion field and its learned, explicit confidence. A…
Peer Reviews
Decision·ICLR 2026 Poster
The paper tackles a well-defined and practically important problem at the interface of event-based sensing and dense semantic segmentation. The authors clearly articulate the “Anytime Interframe Segmentation” task, distinguishing it from standard video propagation or multi-modal fusion. The method is technically solid, combining event-driven motion estimation, uncertainty-weighted feature propagation, and temporal memory in a cohesive framework that respects causal and anytime constraints.
While the overall contribution is convincing, the conceptual novelty is limited. LiFR-Seg builds on well-known components, RAFT-style optical flow, Softmax Splatting, and memory-based refinement, and integrates them effectively rather than introducing a fundamentally new algorithmic idea. The contribution is thus primarily at the systems and task-definition level. Some implementation details remain underspecified. The design and update mechanism of the temporal memory module are described conce
The LiFR-Seg framework presents a practical, causal, and predictive formulation for real-world autonomous systems. Its core idea, propagating deep features instead of images or segmentation maps, is empirically validated as shown in Table 3. The uncertainty-guided propagation, described in Equation 4, effectively addresses the inherent noise and sparsity of event-based motion. Quantitative results closely match the high-frame-rate upper bound on DSEC with a 0.09 percent mIoU gap and exceed it on
The paper argues for being an "efficient paradigm" (Abstract, Appendix D), but this argument is based entirely on hardware (cost, power, bandwidth). It provides no analysis of computational cost (e.g., FLOPs, inference latency). The proposed LiFR-Seg framework involves running a feature encoder, a flow network, a ScoreNet, a splatting operation, and a temporal memory module. This is almost certainly more computationally expensive than the HFR baseline (which just runs a SegFormer). This is a cri
1. The paper introduces a novel task, "Anytime Interframe Semantic Segmentation," which addresses the critical "perceptual gap" in low-frame-rate (LFR) systems by enabling dense semantic segmentation at any arbitrary time 2. LiFR-Seg proposes a robust framework that leverages the complementary strengths of RGB cameras (dense semantic context) and event cameras (high-temporal-resolution motion cues). The method’s core components—uncertainty-aware motion field estimation (Section 3.2), uncertainty
1. The Eq.(3) employs a compact function composition that obscures the specific roles and data flow between $\phi_{\text{joint}}$, $F_{\text{SED}}$, and $\phi_{\text{out}}$. A stepwise breakdown would improve readability, and its correspondence to components in Figure 2(b). 2. The author claims that the pipeline is causal, however, the $E_{t+\delta t \rightarrow t+\Delta t}$ is input the 'Flow estimator' in Fig.2, making it confusing. 3. The experiments lack detailed ablations for all proposed m
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Advanced Vision and Imaging
