DWReCO at CheckThat! 2023: Enhancing Subjectivity Detection through Style-based Data Sampling
Ipek Baris Schlicht, Lynn Khellaf, Defne Altiok

TL;DR
This paper presents a method for improving subjectivity detection by augmenting training data with style-based samples generated by GPT-3, demonstrating effectiveness across multiple languages.
Contribution
The study introduces style-based data sampling using GPT-3 to address class imbalance in subjectivity detection across English, German, and Turkish.
Findings
Style-based oversampling outperforms paraphrasing in English and Turkish.
Generated style-based data improves model performance across all three languages.
GPT-3's style-based generation has limitations in non-English languages.
Abstract
This paper describes our submission for the subjectivity detection task at the CheckThat! Lab. To tackle class imbalances in the task, we have generated additional training materials with GPT-3 models using prompts of different styles from a subjectivity checklist based on journalistic perspective. We used the extended training set to fine-tune language-specific transformer models. Our experiments in English, German and Turkish demonstrate that different subjective styles are effective across all languages. In addition, we observe that the style-based oversampling is better than paraphrasing in Turkish and English. Lastly, the GPT-3 models sometimes produce lacklustre results when generating style-based texts in non-English languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Cosine Annealing · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Linear Warmup With Cosine Annealing · Attention Dropout
