SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions
Huitong Pan, Qi Zhang, Cornelia Caragea, Eduard Dragut and, Longin Jan Latecki

TL;DR
SciDMT is the largest annotated corpus for scientific mention detection, enabling improved scientific information extraction and indexing through deep learning models, and serving as a benchmark for future research.
Contribution
The paper introduces SciDMT, a large-scale, diverse corpus for scientific mention detection, and demonstrates its utility with deep learning models, establishing performance baselines.
Findings
SciBERT and GPT-3.5 achieve strong baseline performance.
The corpus improves scientific information retrieval tasks.
Unresolved challenges remain in scientific mention detection.
Abstract
We present SciDMT, an enhanced and expanded corpus for scientific mention detection, offering a significant advancement over existing related resources. SciDMT contains annotated scientific documents for datasets (D), methods (M), and tasks (T). The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes. To the best of our knowledge, SciDMT is the largest corpus for scientific entity mention detection. The corpus's scale and diversity are instrumental in developing and refining models for tasks such as indexing scientific papers, enhancing information retrieval, and improving the accessibility of scientific knowledge. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Residual Connection · Multi-Head Attention · Weight Decay · Softmax · Layer Normalization
