NoReC: The Norwegian Review Corpus
Erik Velldal, Lilja {\O}vrelid, Eivind Alexander Bergem and, Cathrine Stadsnes, Samia Touileb, Fredrik J{\o}rgensen

TL;DR
The paper introduces NoReC, a comprehensive Norwegian review corpus with over 35,000 labeled reviews from diverse domains, designed to facilitate sentiment analysis and opinion mining for Norwegian language technology.
Contribution
It provides the first large-scale, annotated Norwegian review dataset in a standardized format, supporting sentiment analysis research and development.
Findings
Over 35,000 reviews included
Diverse domains covered including literature, movies, and products
Resource supports Norwegian sentiment analysis advancements
Abstract
This paper presents the Norwegian Review Corpus (NoReC), created for training and evaluating models for document-level sentiment analysis. The full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1-6, as provided by the rating of the original author. This first release of the corpus comprises more than 35,000 reviews. It is distributed using the CoNLL-U format, pre-processed using UDPipe, along with a rich set of metadata. The work reported in this paper forms part of the SANT initiative (Sentiment Analysis for Norwegian Text), a project seeking to provide resources and tools for sentiment analysis and opinion mining for Norwegian. As…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Computational and Text Analysis Methods
