Chemical Reaction Extraction from Long Patent Documents

Aishwarya Jadhav; Ritam Dutt

arXiv:2407.15124·cs.IR·July 24, 2024

Chemical Reaction Extraction from Long Patent Documents

Aishwarya Jadhav, Ritam Dutt

PDF

1 Repo

TL;DR

This paper addresses extracting chemical reaction information from lengthy patent documents to build a comprehensive database, aiding chemical research and patent analysis.

Contribution

It formulates reaction extraction as a paragraph-level sequence tagging task and explores various models to improve extraction accuracy across domains.

Findings

01

Proposed multiple approaches for reaction span extraction.

02

Analyzed model generalization across chemical patent domains.

03

Enhanced reaction snippet identification for chemical knowledge bases.

Abstract

The task of searching through patent documents is crucial for chemical patent recommendation and retrieval. This can be enhanced by creating a patent knowledge base (ChemPatKB) to aid in prior art searches and to provide a platform for domain experts to explore new innovations in chemical compound synthesis and use-cases. An essential foundational component of this KB is the extraction of important reaction snippets from long patents documents which facilitates multiple downstream tasks such as reaction co-reference resolution and chemical entity role identification. In this work, we explore the problem of extracting reactions spans from chemical patents in order to create a reactions resource database. We formulate this task as a paragraph-level sequence tagging problem, where the system is required to return a sequence of paragraphs that contain a description of a reaction. We propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aishwaryajadhav/chemical-patent-reaction-extraction
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBalanced Selection