TL;DR
FlashFPS is a novel framework that significantly accelerates Farthest Point Sampling in point cloud neural networks by pruning redundant computations and caching results, enabling scalable and efficient large-scale point cloud processing.
Contribution
It introduces FlashFPS, a hardware-agnostic, plug-and-play acceleration method with pruning and caching techniques that reduce FPS latency without sacrificing accuracy.
Findings
Achieves 5.16× speedup on GPU and 2.69× on PNN accelerators.
Reduces redundant FPS computations while maintaining sampling quality.
Integrated into existing CUDA libraries, enhancing large-scale point cloud processing efficiency.
Abstract
Point-based Neural Networks (PNNs) have become a key approach for point cloud processing. However, a core operation in these models, Farthest Point Sampling (FPS), often introduces significant inference latency, especially for large-scale processing. Despite existing CUDA- and hardware-level optimizations, FPS remains a major bottleneck due to exhaustive computations across multiple network layers in PNNs, which hinders scalability. Through systematic analysis, we identify three substantial redundancies in FPS, including unnecessary full-cloud computations, redundant late-stage iterations, and predictable inter-layer outputs that make later FPS computations avoidable. To address these, we propose \textbf{\textit{FlashFPS}}, a hardware-agnostic, plug-and-play framework for FPS acceleration, composed of \textit{FPS-Prune} and \textit{FPS-Cache}. \textit{FPS-Prune} introduces candidate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
