SPaR.txt, a cheap Shallow Parsing approach for Regulatory texts

Ruben Kruiper; Ioannis Konstas; Alasdair Gray; Farhad Sadeghineko,; Richard Watson; Bimal Kumar

arXiv:2110.01295·cs.CL·October 5, 2021

SPaR.txt, a cheap Shallow Parsing approach for Regulatory texts

Ruben Kruiper, Ioannis Konstas, Alasdair Gray, Farhad Sadeghineko,, Richard Watson, Bimal Kumar

PDF

Open Access 1 Repo

TL;DR

This paper presents a cost-effective shallow parsing method for regulatory texts, using a small annotated dataset to identify key terms and multi-word expressions, aiding automated compliance checking.

Contribution

Introduces a shallow parsing approach with a small annotated dataset for building regulation texts, enabling efficient semantic parsing for compliance systems.

Findings

01

Achieved 79.93% F1-score on test set

02

Identified 89.84% of defined terms in regulation documents

03

Discovered multi-word expressions with 70.3% accuracy

Abstract

Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPaR.txt, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84%) defined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rubenkruiper/spar.txt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Software Engineering Research · Artificial Intelligence in Law

MethodsTest