A New Korean Text Classification Benchmark for Recognizing the Political   Intents in Online Newspapers

Beomjune Kim; Eunsun Lee; Dongbin Na

arXiv:2311.01712·cs.CL·November 6, 2023·1 cites

A New Korean Text Classification Benchmark for Recognizing the Political Intents in Online Newspapers

Beomjune Kim, Eunsun Lee, Dongbin Na

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large-scale Korean news dataset for classifying political intent in online articles, along with baseline models demonstrating effective performance on multi-task classification tasks.

Contribution

It presents the first large-scale Korean news dataset with multi-task labels for political orientation and pro-government stance, and provides baseline deep learning models.

Findings

01

Deep learning models achieve decent classification accuracy.

02

The dataset is the largest of its kind for Korean political news.

03

Models trained on the dataset outperform previous benchmarks.

Abstract

Many users reading online articles in various magazines may suffer considerable difficulty in distinguishing the implicit intents in texts. In this work, we focus on automatically recognizing the political intents of a given online newspaper by understanding the context of the text. To solve this task, we present a novel Korean text classification dataset that contains various articles. We also provide deep-learning-based text classification baseline models trained on the proposed dataset. Our dataset contains 12,000 news articles that may contain political intentions, from the politics section of six of the most representative newspaper organizations in South Korea. All the text samples are labeled simultaneously in two aspects (1) the level of political orientation and (2) the level of pro-government. To the best of our knowledge, our paper is the most large-scale Korean news dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kdavid2355/kopolitic-benchmark-dataset
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods

MethodsFocus