Integration of Japanese Papers Into the DBLP Data Set

Paul Christian Sommerhoff

arXiv:1709.09119·cs.CL·September 27, 2017

Integration of Japanese Papers Into the DBLP Data Set

Paul Christian Sommerhoff

PDF

Open Access

TL;DR

This paper presents a method to automatically process and integrate Japanese computer science publications into the DBLP dataset, addressing language-specific challenges like transcription and name matching.

Contribution

It introduces an automated approach for processing Japanese papers and their metadata for inclusion in the DBLP database, focusing on language and name matching issues.

Findings

01

Successfully processed Japanese publication metadata

02

Improved name matching accuracy for Japanese names

03

Enhanced coverage of Japanese papers in DBLP

Abstract

If someone is looking for a certain publication in the field of computer science, the searching person is likely to use the DBLP to find the desired publication. The DBLP data set is continuously extended with new publications, or rather their metadata, for example the names of involved authors, the title and the publication date. While the size of the data set is already remarkable, specific areas can still be improved. The DBLP offers a huge collection of English papers because most papers concerning computer science are published in English. Nevertheless, there are official publications in other languages which are supposed to be added to the data set. One kind of these are Japanese papers. This diploma thesis will show a way to automatically process publication lists of Japanese papers and to make them ready for an import into the DBLP data set. Especially important are the problems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Semantic Web and Ontologies · Web Data Mining and Analysis