TL;DR
This paper presents a method for automatically identifying and categorizing comparative sentences, achieving high accuracy with a gradient boosting model trained on a large annotated dataset, useful for argumentation and search engines.
Contribution
It introduces a new annotated dataset of 7,199 sentences and a gradient boosting approach for effective comparative sentence classification.
Findings
F1 score of 85% on comparative sentence detection
Large annotated dataset of 7,199 sentences
Model suitable for argumentation and search applications
Abstract
We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e.g., "Python has better NLP libraries than MATLAB" => (Python, better, MATLAB). To this end, we manually annotate 7,199 sentences for 217 distinct target item pairs from several domains (27% of the sentences contain an oriented comparison in the sense of "better" or "worse"). A gradient boosting model based on pre-trained sentence embeddings reaches an F1 score of 85% in our experimental evaluation. The model can be used to extract comparative sentences for pro/con argumentation in comparative / argument search engines or debating technologies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
