# Fast in-place accumulation

**Authors:** Jean-Guillaume Dumas (CASC), Bruno Grenet (CASC)

arXiv: 2302.13600 · 2025-11-07

## TL;DR

This paper introduces novel in-place algorithms for fast polynomial and matrix multiplications, as well as polynomial remainders, that optimize space without sacrificing computational speed, applicable to various linear algebra and polynomial operations.

## Contribution

It presents a general automatic design for in-place, fast algorithms for bilinear formulas and extends to linear accumulations, including polynomial and matrix multiplication, with complexity matching traditional methods.

## Key findings

- Developed in-place algorithms for polynomial and matrix multiplication.
- Achieved in-place polynomial remainder computation with near-optimal complexity.
- Extended techniques to finite field extensions and Toeplitz matrix operations.

## Abstract

This paper deals with simultaneously fast and in-place algorithms for formulae where the result has to be linearly accumulated: some output variables are also input variables, linked by a linear dependency. Fundamental examples include the in-place accumulated multiplication of polynomials or matrices (that is with only $O(1)$ extra space). The difficulty is to combine in-place computations with fast algorithms: those usually come at the expense of (potentially large) extra temporary space. We first propose a novel automatic design of fast and in-place accumulating algorithms for any bilinear formulae (and thus for polynomial and matrix multiplication) and then extend it to any linear accumulation of a collection of functions. For this, we relax the in-place model to any algorithm allowed to modify its inputs, provided that those are restored to their initial state afterwards. This allows us to derive in-place accumulating algorithms for fast polynomial multiplications and for Strassen-like matrix multiplications. We then consider the simultaneously fast and in-place computation of the Euclidean polynomial remainder $R = A \bmod B$. If $A$ and $B$ have respective degree $m+n$ and $n$, and $M(k)$ denotes the complexity of a (not-in-place) algorithm to multiply two degree-$k$ polynomials, our algorithm uses at most $O((n/m) M(m)\log(m))$ arithmetic operations. If $M(n) = \Theta(n^{1+\epsilon})$ for some $\epsilon>0$, then our algorithms do match the not-in-place complexity bound of $O((n/m) M(m))$. We also propose variants that compute - still in-place and with the same complexity bounds - $A = A \bmod B$, $R += A \bmod B$ and $R += AC \bmod B$, that is multiplication in a finite field extension. To achieve this, we develop techniques for Toeplitz matrix operations, for generalized convolutions, short product and power series division and remainder whose output is also part of the input.

---
Source: https://tomesphere.com/paper/2302.13600