How Lexical Gold Standards Have Effects On The Usefulness Of Text   Analysis Tools For Digital Scholarship

Jussi Karlgren

arXiv:2105.14921·cs.CL·June 1, 2021

How Lexical Gold Standards Have Effects On The Usefulness Of Text Analysis Tools For Digital Scholarship

Jussi Karlgren

PDF

Open Access

TL;DR

This paper examines how existing lexical gold standards influence the effectiveness of text analysis tools in digital scholarship, highlighting the need for more aligned standards with humanities and social sciences applications.

Contribution

It advocates for systematic requirement formulation and explicit assumptions in model design to improve text analysis tools for digital humanities and social sciences.

Findings

01

Current lexical standards favor topical relevance, limiting broader application.

02

Misalignment between standards and humanities/social sciences needs.

03

Call for explicit assumptions in model evaluation.

Abstract

This paper describes how the current lexical similarity and analogy gold standards are built to conform to certain ideas about what the models they are designed to evaluate are used for. Topical relevance has always been the most important target notion for information access tools and related language technology technologies, and while this has proven a useful starting point for much of what information technology is used for, it does not always align well with other uses to which technologies are being put, most notably use cases from digital scholarship in the humanities or social sciences. This paper argues for more systematic formulation of requirements from the digital humanities and social sciences and more explicit description of the assumptions underlying model design.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques