TTP: A Hardware-Efficient Design for Precise Prefetching in Ray Tracing
Yavuz Selim Tozlu, Anshul Naithani, Huiyang Zhou

TL;DR
This paper introduces TTP, a hardware prefetcher designed to reduce memory latency in ray tracing by leveraging existing traversal stacks, resulting in significant speedups and high accuracy.
Contribution
The paper presents a novel hardware prefetcher, TTP, that uses existing traversal stacks for accurate prefetching in ray tracing, improving performance with minimal overhead.
Findings
Achieves 1.48x average speedup in ray tracing workloads.
Provides 98.92% average L1 accuracy in prefetching.
Reduces L1 cache misses by 31.54% compared to baseline.
Abstract
Ray tracing (RT) is a 3D graphics technique that offers highly realistic visuals. It is becoming prominent and accessible as GPU vendors have integrated dedicated ray tracing acceleration hardware. However, tracing millions of rays through 3D scenes consisting of high numbers of triangles in real time is challenging and requires expensive hardware. The main bottleneck in RT workloads is the expensive Bounding Volume Hierarchy (BVH) traversal task, which is a large tree structure that encodes the 3D scene. BVH traversal is a memory-bound problem, as the GPU threads spend most of their time reading tree node data from memory. In this work, we attack the memory latency bottleneck of ray tracing through prefetching. We propose a novel hardware prefetcher, named Tree Traversal Prefetcher (TTP), for ray tracing. The main idea is to leverage the existing tree traversal stack in the RT units…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
