Architectures of Meaning, A Systematic Corpus Analysis of NLP Systems

Oskar Wysocki; Malina Florea; Donal Landers; Andre Freitas

arXiv:2107.08124·cs.CL·July 20, 2021

Architectures of Meaning, A Systematic Corpus Analysis of NLP Systems

Oskar Wysocki, Malina Florea, Donal Landers, Andre Freitas

PDF

Open Access

TL;DR

This paper introduces a new statistical corpus analysis framework to interpret NLP system architectures at scale, revealing coherent patterns and enabling data-driven understanding of the field.

Contribution

It presents a novel combination of saturation-based lexicon construction, statistical analysis, and graph collocations for systematic NLP architecture interpretation.

Findings

01

Identified coherent architectural patterns in NLP systems

02

Validated framework on Semeval corpus

03

Provides a systematic method for interpreting NLP architectures

Abstract

This paper proposes a novel statistical corpus analysis framework targeted towards the interpretation of Natural Language Processing (NLP) architectural patterns at scale. The proposed approach combines saturation-based lexicon construction, statistical corpus analysis methods and graph collocations to induce a synthesis representation of NLP architectural patterns from corpora. The framework is validated in the full corpus of Semeval tasks and demonstrated coherent architectural patterns which can be used to answer architectural questions on a data-driven fashion, providing a systematic mechanism to interpret a largely dynamic and exponentially growing field.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining