Dataset of Natural Language Queries for E-Commerce
Andrea Papenmeier, Dagmar Kern, Daniel Hienert, Alfred Sliwa, Ahmet, Aker, Norbert Fuhr

TL;DR
This paper introduces a new dataset of 3,540 natural language e-commerce queries across two product domains, with detailed annotations, to facilitate research in natural language understanding and retrieval systems.
Contribution
The paper provides one of the first publicly available datasets of real user product search queries with detailed annotations, addressing privacy concerns and enabling research in natural language processing for e-commerce.
Findings
Dataset includes 3,540 queries with annotations.
Contains detailed annotations for 1,754 laptop queries.
Enables new research in NLP and information retrieval for e-commerce.
Abstract
Shopping online is more and more frequent in our everyday life. For e-commerce search systems, understanding natural language coming through voice assistants, chatbots or from conversational search is an essential ability to understand what the user really wants. However, evaluation datasets with natural and detailed information needs of product-seekers which could be used for research do not exist. Due to privacy issues and competitive consequences, only few datasets with real user search queries from logs are openly available. In this paper, we present a dataset of 3,540 natural language queries in two domains that describe what users want when searching for a laptop or a jacket of their choice. The dataset contains annotations of vague terms and key facts of 1,754 laptop queries. This dataset opens up a range of research opportunities in the fields of natural language processing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
