Approximating Human-Like Few-shot Learning with GPT-based Compression
Cynthia Huang, Yuqing Xie, Zhiying Jiang, Jimmy Lin, Ming Li

TL;DR
This paper introduces a novel method using GPT to approximate Kolmogorov complexity for few-shot learning, enabling human-like data compression and improved NLP task performance.
Contribution
It presents a new approach that leverages GPT as a prior for lossless text compression to estimate information distance for few-shot learning.
Findings
Achieved a compression ratio of 15.5 on enwik9 with LLAMA2-7B.
Demonstrated GPT's equivalence to compression length for text.
Outperformed baselines on semantic similarity and zero-shot NLP tasks.
Abstract
In this work, we conceptualize the learning process as information compression. We seek to equip generative pre-trained models with human-like learning capabilities that enable data compression during inference. We present a novel approach that utilizes the Generative Pre-trained Transformer (GPT) to approximate Kolmogorov complexity, with the aim of estimating the optimal Information Distance for few-shot learning. We first propose using GPT as a prior for lossless text compression, achieving a noteworthy compression ratio. Experiment with LLAMA2-7B backbone achieves a compression ratio of 15.5 on enwik9. We justify the pre-training objective of GPT models by demonstrating its equivalence to the compression length, and, consequently, its ability to approximate the information distance for texts. Leveraging the approximated information distance, our method allows the direct application…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Absolute Position Encodings · Discriminative Fine-Tuning · Layer Normalization · Adam · Softmax · Label Smoothing
