GPT Self-Supervision for a Better Data Annotator
Xiaohuan Pei, Yanxi Li, Chang Xu

TL;DR
This paper introduces a GPT-based self-supervised annotation method that improves data summarization by leveraging a generating-recovering paradigm, enhancing annotation quality without requiring extensive labeled data.
Contribution
It proposes a novel self-supervised approach using GPT with a generating-recovering paradigm and alignment scores, addressing limitations of existing annotation methods.
Findings
Achieves competitive annotation scores across datasets
Demonstrates robustness in complex structured data
Utilizes alignment scores for self-supervision refinement
Abstract
The task of annotating data into concise summaries poses a significant challenge across various domains, frequently requiring the allocation of significant time and specialized knowledge by human experts. Despite existing efforts to use large language models for annotation tasks, significant problems such as limited applicability to unlabeled data, the absence of self-supervised methods, and the lack of focus on complex structured data still persist. In this work, we propose a GPT self-supervision annotation method, which embodies a generating-recovering paradigm that leverages the one-shot learning capabilities of the Generative Pretrained Transformer (GPT). The proposed approach comprises a one-shot tuning phase followed by a generation phase. In the one-shot tuning phase, we sample a data from the support set as part of the prompt for GPT to generate a textual summary, which is then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Attention Dropout · Position-Wise Feed-Forward Layer
