TL;DR
ORCHID is a new Chinese debate dataset enabling research on stance detection and summarization, highlighting the challenges and potential of integrating stance detection into dialogue summarization.
Contribution
This paper introduces ORCHID, the first Chinese dataset for stance detection and debate summarization, addressing language resource gaps and providing a benchmark for future research.
Findings
The dataset is challenging for current models.
Incorporating stance detection can improve summarization.
Empirical analysis demonstrates the dataset's utility.
Abstract
Dialogue agents have been receiving increasing attention for years, and this trend has been further boosted by the recent progress of large language models (LLMs). Stance detection and dialogue summarization are two core tasks of dialogue agents in application scenarios that involve argumentative dialogues. However, research on these tasks is limited by the insufficiency of public datasets, especially for non-English languages. To address this language resource gap in Chinese, we present ORCHID (Oral Chinese Debate), the first Chinese dataset for benchmarking target-independent stance detection and debate summarization. Our dataset consists of 1,218 real-world debates that were conducted in Chinese on 476 unique topics, containing 2,436 stance-specific summaries and 14,133 fully annotated utterances. Besides providing a versatile testbed for future research, we also conduct an empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
