Multi-Source (Pre-)Training for Cross-Domain Measurement, Unit and Context Extraction
Yueling Li, Sebastian Martschat, Simone Paolo Ponzetto

TL;DR
This paper introduces a multi-source, multi-domain approach using pre-trained language models for automated measurement and context extraction, demonstrating improved cross-domain generalization and insights for future research.
Contribution
It presents a novel multi-source, multi-domain training framework with task-adaptive pre-training for measurement and context extraction, and provides a comprehensive error analysis.
Findings
Multi-source training yields the best overall results.
Single-source training performs best within individual domains.
The approach effectively extracts quantities and units, but needs improvement for contextual entities.
Abstract
We present a cross-domain approach for automated measurement and context extraction based on pre-trained language models. We construct a multi-source, multi-domain corpus and train an end-to-end extraction pipeline. We then apply multi-source task-adaptive pre-training and fine-tuning to benchmark the cross-domain generalization capability of our model. Further, we conceptualize and apply a task-specific error analysis and derive insights for future work. Our results suggest that multi-source training leads to the best overall results, while single-source training yields the best results for the respective individual domain. While our setup is successful at extracting quantity values and units, more research is needed to improve the extraction of contextual entities. We make the cross-domain corpus used in this work available online.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
