ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction
Alexander Brinkmann, Roee Shraga, Christian Bizer

TL;DR
This paper investigates the use of large language models like GPT-4 and Llama-3-70B for extracting structured product attribute-value pairs from unstructured e-commerce descriptions, demonstrating high accuracy with minimal training.
Contribution
It introduces prompt-based methods for attribute extraction using LLMs, showing they outperform traditional BERT-based approaches in data efficiency and robustness.
Findings
GPT-4 achieves 85% F1-score with detailed prompts
Llama-3-70B performs competitively as an open-source alternative
Fine-tuning GPT-3.5 improves performance but reduces generalization
Abstract
E-commerce platforms require structured product data in the form of attribute-value pairs to offer features such as faceted product search or attribute-based product comparison. However, vendors often provide unstructured product descriptions, necessitating the extraction of attribute-value pairs from these texts. BERT-based extraction methods require large amounts of task-specific training data and struggle with unseen attribute values. This paper explores using large language models (LLMs) as a more training-data efficient and robust alternative. We propose prompt templates for zero-shot and few-shot scenarios, comparing textual and JSON-based target schema representations. Our experiments show that GPT-4 achieves the highest average F1-score of 85% using detailed attribute descriptions and demonstrations. Llama-3-70B performs nearly as well, offering a competitive open-source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Linear Layer · Layer Normalization · Attention Dropout · Softmax · {Dispute@FaQ-s}How to file a dispute with Expedia?
