Asterisk*: Keep it Simple

Andrew Semenov

arXiv:2411.05691·cs.CL·November 11, 2024

Asterisk*: Keep it Simple

Andrew Semenov

PDF

Open Access

TL;DR

Asterisk is a minimalist GPT-based model designed for efficient text embedding generation, balancing size and performance through knowledge distillation, suitable for classification tasks with competitive results.

Contribution

The paper introduces Asterisk, a small, efficient GPT-based model optimized for classification, demonstrating competitive performance via knowledge distillation from larger models.

Findings

01

Moderate zero-shot classification performance across tasks.

02

Performance can approach larger models with proper configuration.

03

Minimalist architecture reduces computational and memory requirements.

Abstract

This paper describes Asterisk, a compact GPT-based model for generating text embeddings. The model uses a minimalist architecture with two layers, two attention heads, and 256 embedding dimensions. By applying knowledge distillation from larger pretrained models, we explore the trade-offs between model size and performance while minimizing computational and memory requirements. The model is primarily evaluated and optimized for classification tasks, with experimental results showing its moderate performance in zero-shot classification across various downstream applications. With additional configuration, the model performance can approach or even surpass that of larger architectures on specific classification tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Advanced Database Systems and Queries

MethodsSoftmax · Attention Is All You Need · Knowledge Distillation