Core Tokensets for Data-efficient Sequential Training of Transformers

Subarnaduti Paul; Manuel Brack; Patrick Schramowski; Kristian; Kersting; Martin Mundt

arXiv:2410.05800·cs.CV·October 10, 2024

Core Tokensets for Data-efficient Sequential Training of Transformers

Subarnaduti Paul, Manuel Brack, Patrick Schramowski, Kristian, Kersting, Martin Mundt

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces core tokensets, a novel method for data-efficient sequential training of transformers that selects the most informative tokens and features, significantly reducing memory while maintaining high performance across various tasks.

Contribution

The paper proposes core tokensets, a new approach that constructs token-level summaries for efficient continual learning, surpassing traditional sample-based coresets in performance and memory efficiency.

Findings

01

Core tokensets achieve comparable performance with only 1% data.

02

Significant memory reduction in incremental learning tasks.

03

Effective across image classification, visual QA, and captioning.

Abstract

Deep networks are frequently tuned to novel tasks and continue learning from ongoing data streams. Such sequential training requires consolidation of new and past information, a challenge predominantly addressed by retaining the most important data points - formally known as coresets. Traditionally, these coresets consist of entire samples, such as images or sentences. However, recent transformer architectures operate on tokens, leading to the famous assertion that an image is worth 16x16 words. Intuitively, not all of these tokens are equally informative or memorable. Going beyond coresets, we thus propose to construct a deeper-level data summary on the level of tokens. Our respectively named core tokensets both select the most informative data points and leverage feature attribution to store only their most relevant features. We demonstrate that core tokensets yield significant…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 2

Strengths

1. I do not know of another paper that explores the effect of storing a subset of tokens for replay in incremental learning 2. The diversity of tasks in the experiments is good- many incremental learning papers just evaluate on one task 3. Using a subset of tokens could also benefit efficiency of replay methods.

Weaknesses

1. Identifying a subset of tokens that does as well as the whole sample has been explored in prior work. In particular, there are token pruning/merging methods (e.g., [A,B]) as well as methods used to for faster training (e.g., [C]). The authors don't discuss or compare to these methods (I only gave examples, but there are many papers on this topic that should be included in the discussion). As such, one could see this paper as simply a new application of a known technique. 2. The authors do

Reviewer 02Rating 6Confidence 3

Strengths

- **Novel Concept**: The concept of core tokensets, which select the most informative tokens within informative data points, represents a novel and efficient way to summarize datasets for sequential training of transformers. - **Solid Performance**: Core tokensets demonstrate comparable effectiveness to other data selection methods (e.g., Coreset, Core Token) while substantially reducing memory usage. - **Versatility**: Core tokensets was shown effective on multiple different vision and langua

Weaknesses

While core tokensets significantly reduce memory usage, their two-step approach—first selecting informative data points and then identifying key tokens within those points—may introduce more latency in dataset summarization compared to single-step methods (e.g., Coreset, Core Token).

Reviewer 03Rating 5Confidence 4

Strengths

The proposed method is with practical value, which could be applied to lots of applications.

Weaknesses

As can be found in the summary.

Code & Models

Repositories

paulsubarna/core-tokenset
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsCoresets