SLAP: A Split Latency Adaptive VLIW pipeline architecture which enables on-the-fly variable SIMD vector-length
Ashish Shrivastava, Alan Gatherer, Tong Sun, Sushma Wokhlu, and Alex Chandra

TL;DR
SLAP introduces a flexible VLIW pipeline architecture that dynamically adjusts vector lengths and decouples execution units, significantly improving cache performance and execution efficiency in real-time signal processing systems.
Contribution
It presents a novel SLAP architecture that enables on-the-fly variable SIMD vector-length without modifying object code, enhancing performance over traditional lockstep VLIW designs.
Findings
Improved cache performance demonstrated on wireless baseband traces.
Enables variable vector length for better data parallelism.
Reduces overhead by removing smart DMA reliance.
Abstract
Over the last decade the relative latency of access to shared memory by multicore increased as wire resistance dominated latency and low wire density layout pushed multiport memories farther away from their ports. Various techniques were deployed to improve average memory access latencies, such as speculative pre-fetching and branch-prediction, often leading to high variance in execution time which is unacceptable in real time systems. Smart DMAs can be used to directly copy data into a layer1 SRAM, but with overhead. The VLIW architecture, the de facto signal processing engine, suffers badly from a breakdown in lockstep execution of scalar and vector instructions. We describe the Split Latency Adaptive Pipeline (SLAP) VLIW architecture, a cache performance improvement technology that requires zero change to object code, while removing smart DMAs and their overhead. SLAP builds on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
