
TL;DR
This paper introduces a theoretical memory architecture capable of parallel processing within itself, significantly reducing instruction cycles for array operations and enabling efficient in-memory computation for data processing tasks.
Contribution
It proposes a novel in-memory processing memory with limited connectivity that performs parallel array operations, enhancing efficiency and reducing data movement.
Findings
Reduces instruction cycles for array operations to ~1 for universal tasks.
Achieves ~√N instruction cycles for global operations like sorting.
Eliminates most streaming activities on system bus.
Abstract
A theoretical memory with limited processing power and internal connectivity at each element is proposed. This memory carries out parallel processing within itself to solve generic array problems. The applicability of this in-memory finest-grain massive SIMD approach is studied in some details. For an array of N items, it reduces the total instruction cycle count of universal operations such as insertion/deletion and match finding to ~ 1, local operations such as filtering and template matching to ~ local operation size, and global operations such as sum, finding global limit and sorting to ~\sqroot{N} instruction cycles. It eliminates most streaming activities for data processing purpose on the system bus. Yet it remains general-purposed, easy to use, pin compatible with conventional memory, and practical for implementation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems
