Arce: Augmented Roberta with Contextualized Elucidations for Ner in Automated Rule Checking
Jian Chen, Jiabao Dou

TL;DR
This paper introduces ARCE, a knowledge distillation framework that enhances smaller models for domain-specific NER tasks in AEC by leveraging LLMs to generate specialized training data, achieving state-of-the-art results.
Contribution
ARCE systematically explores knowledge transfer strategies, using LLMs to synthesize domain-specific corpora for effective pre-training of smaller models in AEC NER tasks.
Findings
ARCE achieves a Macro-F1 score of 77.20% on the benchmark dataset.
Simple explanations outperform complex rationales for domain adaptation.
The approach outperforms both domain-specific baselines and fine-tuned LLMs.
Abstract
Accurate information extraction from specialized texts is a critical challenge for automated rule checking (ARC) in the architecture, engineering, and construction (AEC) domain. While large language models (LLMs) possess strong reasoning capabilities, their deployment in resource-constrained AEC environments is often impractical. Conversely, standard efficient models struggle with the significant domain gap. Although this gap can be mitigated by pre-training on large, humancurated corpora, such approaches are labor-intensive and costly. To address this, we propose ARCE (Augmented RoBERTa with Contextualized Elucidations), a novel knowledge distillation framework that leverages LLMs to synthesize a task-oriented corpus, termed Cote, for incrementally pre-training smaller models. ARCE systematically explores the optimal strategy for knowledge transfer. Our extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
