Embedding-based Retrieval with LLM for Effective Agriculture Information Extracting from Unstructured Data
Ruoling Peng, Kang Liu, Po Yang, Zhipeng Yuan, Shunbao Li

TL;DR
This paper presents a novel approach combining embedding-based retrieval and large language models to automatically extract structured agricultural pest data from unstructured documents, improving accuracy and efficiency.
Contribution
It introduces a domain-agnostic methodology that leverages LLMs and embedding retrieval for automatic data extraction from agricultural texts, with minimal human intervention.
Findings
Achieves higher accuracy than existing methods on benchmark datasets.
Maintains efficiency in processing unstructured agricultural documents.
Effectively extracts entities and attributes for pest identification.
Abstract
Pest identification is a crucial aspect of pest control in agriculture. However, most farmers are not capable of accurately identifying pests in the field, and there is a limited number of structured data sources available for rapid querying. In this work, we explored using domain-agnostic general pre-trained large language model(LLM) to extract structured data from agricultural documents with minimal or no human intervention. We propose a methodology that involves text retrieval and filtering using embedding-based retrieval, followed by LLM question-answering to automatically extract entities and attributes from the documents, and transform them into structured data. In comparison to existing methods, our approach achieves consistently better accuracy in the benchmark while maintaining efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Agriculture and AI · Advanced Text Analysis Techniques
