
TL;DR
Asterisk is a minimalist GPT-based model designed for efficient text embedding generation, balancing size and performance through knowledge distillation, suitable for classification tasks with competitive results.
Contribution
The paper introduces Asterisk, a small, efficient GPT-based model optimized for classification, demonstrating competitive performance via knowledge distillation from larger models.
Findings
Moderate zero-shot classification performance across tasks.
Performance can approach larger models with proper configuration.
Minimalist architecture reduces computational and memory requirements.
Abstract
This paper describes Asterisk, a compact GPT-based model for generating text embeddings. The model uses a minimalist architecture with two layers, two attention heads, and 256 embedding dimensions. By applying knowledge distillation from larger pretrained models, we explore the trade-offs between model size and performance while minimizing computational and memory requirements. The model is primarily evaluated and optimized for classification tasks, with experimental results showing its moderate performance in zero-shot classification across various downstream applications. With additional configuration, the model performance can approach or even surpass that of larger architectures on specific classification tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Database Systems and Queries
MethodsSoftmax · Attention Is All You Need · Knowledge Distillation
