DLT-Corpus: A Large-Scale Text Collection for the Distributed Ledger Technology Domain
Walter Hernandez Cruz, Peter Devine, Nikhil Vadgama, Paolo Tasca, Jiahua Xu

TL;DR
DLT-Corpus is the largest domain-specific text dataset for Distributed Ledger Technology, enabling advanced NLP research and revealing technology transfer patterns and market dynamics within the DLT sector.
Contribution
Introduces the extensive DLT-Corpus dataset, a domain-adapted NLP model LedgerBERT, and demonstrates their utility in analyzing technology emergence and market trends.
Findings
Technologies originate in scientific literature before patents and social media.
Social media sentiment remains bullish despite market downturns.
Research activity correlates with market growth, preceding economic expansion.
Abstract
We introduce DLT-Corpus, the largest domain-specific text collection for Distributed Ledger Technology (DLT) research to date: 2.98 billion tokens from 22.12 million documents spanning scientific literature (37,440 publications), United States Patent and Trademark Office (USPTO) patents (49,023 filings), and social media (22 million posts). Existing Natural Language Processing (NLP) resources for DLT focus narrowly on cryptocurrencies price prediction and smart contracts, leaving domain-specific language under explored despite the sector's ~$3 trillion market capitalization and rapid technological evolution. We demonstrate DLT-Corpus' utility by analyzing technology emergence patterns and market-innovation correlations. Findings reveal that technologies originate in scientific literature before reaching patents and social media, following traditional technology transfer patterns.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlockchain Technology Applications and Security · Intellectual Property and Patents · FinTech, Crowdfunding, Digital Finance
