How Lexical Gold Standards Have Effects On The Usefulness Of Text Analysis Tools For Digital Scholarship
Jussi Karlgren

TL;DR
This paper examines how existing lexical gold standards influence the effectiveness of text analysis tools in digital scholarship, highlighting the need for more aligned standards with humanities and social sciences applications.
Contribution
It advocates for systematic requirement formulation and explicit assumptions in model design to improve text analysis tools for digital humanities and social sciences.
Findings
Current lexical standards favor topical relevance, limiting broader application.
Misalignment between standards and humanities/social sciences needs.
Call for explicit assumptions in model evaluation.
Abstract
This paper describes how the current lexical similarity and analogy gold standards are built to conform to certain ideas about what the models they are designed to evaluate are used for. Topical relevance has always been the most important target notion for information access tools and related language technology technologies, and while this has proven a useful starting point for much of what information technology is used for, it does not always align well with other uses to which technologies are being put, most notably use cases from digital scholarship in the humanities or social sciences. This paper argues for more systematic formulation of requirements from the digital humanities and social sciences and more explicit description of the assumptions underlying model design.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
