GATS: Gaussian Aware Temporal Scaling Transformer for Invariant 4D Spatio-Temporal Point Cloud Representation

Jiayi Tian; Jiaze Wang

arXiv:2603.16154·cs.CV·March 18, 2026

GATS: Gaussian Aware Temporal Scaling Transformer for Invariant 4D Spatio-Temporal Point Cloud Representation

Jiayi Tian, Jiaze Wang

PDF

Open Access

TL;DR

GATS introduces a dual invariant framework combining Gaussian aware temporal scaling and uncertainty-guided convolution to improve robustness and invariance in 4D point cloud video understanding.

Contribution

The paper proposes GATS, a novel approach that explicitly addresses distributional inconsistencies and temporal biases in 4D point cloud videos, enhancing robustness and invariance.

Findings

01

Improves accuracy on MSR-Action3D by +6.62%

02

Enhances NTU RGBD accuracy by +1.4%

03

Increases Synthia4D mIoU by +1.8%

Abstract

Understanding 4D point cloud videos is essential for enabling intelligent agents to perceive dynamic environments. However, temporal scale bias across varying frame rates and distributional uncertainty in irregular point clouds make it highly challenging to design a unified and robust 4D backbone. Existing CNN or Transformer based methods are constrained either by limited receptive fields or by quadratic computational complexity, while neglecting these implicit distortions. To address this problem, we propose a novel dual invariant framework, termed \textbf{Gaussian Aware Temporal Scaling (GATS)}, which explicitly resolves both distributional inconsistencies and temporal. The proposed \emph{Uncertainty Guided Gaussian Convolution (UGGC)} incorporates local Gaussian statistics and uncertainty aware gating into point convolution, thereby achieving robust neighborhood aggregation under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Advanced Vision and Imaging