Anticipating Innovation Using Large Language Models
Enrico Maria Fenoaltea, Filippo Santoro, Giordano De Marzo, Segun Taofeek Aroyehun, Andrea Tacchella

TL;DR
This paper introduces TechToken, a transformer model that predicts technological innovations by analyzing collective language shifts in patents, revealing signals of future combinations decades in advance.
Contribution
The paper presents TechToken, a novel transformer-based approach that models patent classifications as language tokens to forecast technological innovation.
Findings
TechToken predicts first technological combinations with high accuracy.
Collective language shifts in patents serve as early signals of innovation.
TechToken outperforms existing models in patent-related tasks.
Abstract
Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance. We show that signal is not attributable to any single inventor, but emerges as a collective shift in how technologies are described across thousands of patents. To this end, we introduce TechToken, a transformer-based model that treats technologies, classified by International Patent Classification codes, as words in its vocabulary, learning the language of technologies by embedding these codes during fine-tuning. We define context similarity between code embeddings as a measure of linguistic convergence and show that it accurately predicts first technological combinations. TechToken also improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
