Quotations, Coreference Resolution, and Sentiment Annotations in   Croatian News Articles: An Exploratory Study

Jelena Sarajli\'c; Gaurish Thakkar; Diego Alves; Nives Mikelic; Preradovi\'c

arXiv:2212.07172·cs.CL·December 15, 2022

Quotations, Coreference Resolution, and Sentiment Annotations in Croatian News Articles: An Exploratory Study

Jelena Sarajli\'c, Gaurish Thakkar, Diego Alves, Nives Mikelic, Preradovi\'c

PDF

Open Access

TL;DR

This study introduces a Croatian news corpus annotated for direct-speech extraction, co-reference resolution, and sentiment analysis, highlighting language-specific challenges and providing a resource for NLP tasks.

Contribution

It presents a new annotated Croatian news corpus focusing on quotations, co-reference, and sentiment, with analysis of language-specific differences from English.

Findings

01

Identified language-specific annotation challenges

02

Created a Croatian annotated corpus for NLP tasks

03

Analyzed differences between Croatian and English annotations

Abstract

This paper presents a corpus annotated for the task of direct-speech extraction in Croatian. The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian and on the analysis of its language-specific differences compared to English. From this, a list of the phenomena that require special attention when performing these annotations is derived. The generated corpus with quotation features annotations can be used for multiple tasks in the field of Natural Language Processing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining