Logic Mill -- A Knowledge Navigation System
Sebastian Erhardt, Mainak Ghosh, Erik Buunk, Michael E. Rose, Dietmar, Harhoff

TL;DR
Logic Mill is a scalable, accessible system that uses advanced NLP techniques and large pre-trained models to identify semantically similar documents across extensive scientific and patent corpora, supporting research and knowledge discovery.
Contribution
It introduces a large-scale, flexible system leveraging pre-trained language models for semantic document similarity in diverse domains.
Findings
Contains over 200 million documents.
Provides API and web interface for easy access.
Continuously updated and extendable to new domains.
Abstract
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies
