Building Metadata Inference Using a Transducer Based Language Model
David Waterworth, Subbu Sethuvenkatraman, Quan Z. Sheng

TL;DR
This paper explores using transducer-based language models to improve the parsing and normalization of building metadata, addressing challenges in automatic translation for smart building applications.
Contribution
It introduces a novel approach combining finite state transducers with language models for building metadata normalization, overcoming limitations of conventional machine learning methods.
Findings
Transducer models effectively handle abbreviations and variations in building metadata.
The approach improves normalization accuracy over traditional methods.
Preliminary analysis shows promise for deployment in smart building systems.
Abstract
Solving the challenges of automatic machine translation of Building Automation System text metadata is a crucial first step in efficiently deploying smart building applications. The vocabulary used to describe building metadata appears small compared to general natural languages, but each term has multiple commonly used abbreviations. Conventional machine learning techniques are inefficient since they need to learn many different forms for the same word, and large amounts of data must be used to train these models. It is also difficult to apply standard techniques such as tokenisation since this commonly results in multiple output tags being associated with a single input token, something traditional sequence labelling models do not allow. Finite State Transducers can model sequence-to-sequence tasks where the input and output sequences are different lengths, and they can be combined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Data Quality and Management
