Efficient Data-Parallel Continual Learning with Asynchronous Distributed   Rehearsal Buffers

Thomas Bouvier (KerData); Bogdan Nicolae (ANL); Hugo Chaugier; (KerData); Alexandru Costan (KerData); Ian Foster (ANL); Gabriel Antoniu; (KerData)

arXiv:2406.03285·cs.DC·June 6, 2024

Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers

Thomas Bouvier (KerData), Bogdan Nicolae (ANL), Hugo Chaugier, (KerData), Alexandru Costan (KerData), Ian Foster (ANL), Gabriel Antoniu, (KerData)

PDF

1 Repo

TL;DR

This paper introduces an asynchronous distributed rehearsal buffer system for data-parallel continual learning, significantly improving scalability and runtime while maintaining high accuracy in classification tasks.

Contribution

It proposes a novel distributed rehearsal buffer method that enhances performance and scalability in continual learning on multiple GPUs, addressing previous limitations.

Findings

01

Achieves near upper-bound accuracy in classification tasks.

02

Maintains low runtime comparable to incremental training.

03

Scales efficiently on up to 128 GPUs.

Abstract

Deep learning has emerged as a powerful method for extracting valuable information from large volumes of data. However, when new training data arrives continuously (i.e., is not fully available from the beginning), incremental training suffers from catastrophic forgetting (i.e., new patterns are reinforced at the expense of previously acquired knowledge). Training from scratch each time new training data becomes available would result in extremely long training times and massive data accumulation. Rehearsal-based continual learning has shown promise for addressing the catastrophic forgetting challenge, but research to date has not addressed performance and scalability. To fill this gap, we propose an approach based on a distributed rehearsal buffer that efficiently complements data-parallel training on multiple GPUs, allowing us to achieve short runtime and scalability while retaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thomas-bouvier/distributed-continual-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.