An Exploratory Study on Utilising the Web of Linked Data for Product Data Mining
Ziqi Zhang, Xingyi Song

TL;DR
This study explores how structured Linked Data from the Web can be utilized for product data mining in e-commerce, demonstrating that word embeddings significantly improve classification and linking tasks, despite limitations in data coverage.
Contribution
It introduces methods for leveraging Linked Open Data to create language resources for product classification and linking, and evaluates their effectiveness in e-commerce.
Findings
Word embeddings improve accuracy by up to 6.9 percentage points.
Other methods like language model pre-training and translation are less effective.
Structured data biases and vocabulary gaps limit performance.
Abstract
The Linked Open Data practice has led to a significant growth of structured data on the Web in the last decade. Such structured data describe real-world entities in a machine-readable way, and have created an unprecedented opportunity for research in the field of Natural Language Processing. However, there is a lack of studies on how such data can be used, for what kind of tasks, and to what extent they can be useful for these tasks. This work focuses on the e-commerce domain to explore methods of utilising such structured data to create language resources that may be used for product classification and linking. We process billions of structured data points in the form of RDF n-quads, to create multi-million words of product-related corpora that are later used in three different ways for creating of language resources: training word embedding models, continued pre-training of BERT-like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
