Arce: Augmented Roberta with Contextualized Elucidations for Ner in Automated Rule Checking

Jian Chen; Jiabao Dou

arXiv:2508.07286·cs.CL·January 29, 2026

Arce: Augmented Roberta with Contextualized Elucidations for Ner in Automated Rule Checking

Jian Chen, Jiabao Dou

PDF

Open Access

TL;DR

This paper introduces ARCE, a knowledge distillation framework that enhances smaller models for domain-specific NER tasks in AEC by leveraging LLMs to generate specialized training data, achieving state-of-the-art results.

Contribution

ARCE systematically explores knowledge transfer strategies, using LLMs to synthesize domain-specific corpora for effective pre-training of smaller models in AEC NER tasks.

Findings

01

ARCE achieves a Macro-F1 score of 77.20% on the benchmark dataset.

02

Simple explanations outperform complex rationales for domain adaptation.

03

The approach outperforms both domain-specific baselines and fine-tuned LLMs.

Abstract

Accurate information extraction from specialized texts is a critical challenge for automated rule checking (ARC) in the architecture, engineering, and construction (AEC) domain. While large language models (LLMs) possess strong reasoning capabilities, their deployment in resource-constrained AEC environments is often impractical. Conversely, standard efficient models struggle with the significant domain gap. Although this gap can be mitigated by pre-training on large, humancurated corpora, such approaches are labor-intensive and costly. To address this, we propose ARCE (Augmented RoBERTa with Contextualized Elucidations), a novel knowledge distillation framework that leverages LLMs to synthesize a task-oriented corpus, termed Cote, for incrementally pre-training smaller models. ARCE systematically explores the optimal strategy for knowledge transfer. Our extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques