Can Large Language Models Understand Internet Buzzwords Through User-Generated Content
Chen Huang, Junkai Luo, Xinzuo Wang, Wenqiang Lei, Jiancheng Lv

TL;DR
This paper introduces a new dataset and method to evaluate and improve large language models' ability to generate accurate definitions for Chinese internet buzzwords using user-generated content.
Contribution
It presents CHEER, a Chinese buzzword dataset, and RESS, a novel method to enhance LLMs' comprehension and definition generation of buzzwords.
Findings
RESS outperforms baseline methods in accuracy
Shared challenges include over-reliance on prior exposure
Difficulty in identifying high-quality UGC
Abstract
The massive user-generated content (UGC) available in Chinese social media is giving rise to the possibility of studying internet buzzwords. In this paper, we study if large language models (LLMs) can generate accurate definitions for these buzzwords based on UGC as examples. Our work serves a threefold contribution. First, we introduce CHEER, the first dataset of Chinese internet buzzwords, each annotated with a definition and relevant UGC. Second, we propose a novel method, called RESS, to effectively steer the comprehending process of LLMs to produce more accurate buzzword definitions, mirroring the skills of human language learning. Third, with CHEER, we benchmark the strengths and weaknesses of various off-the-shelf definition generation methods and our RESS. Our benchmark demonstrates the effectiveness of RESS while revealing crucial shared challenges: over-reliance on prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining · Complex Network Analysis Techniques
