From Scattered Sources to Comprehensive Technology Landscape: A Recommendation-based Retrieval Approach
Chi Thang Duong, Dimitri Percia David, Ljiljana Dolamic, Alain, Mermoud, Vincent Lenders, Karl Aberer

TL;DR
This paper presents an end-to-end recommendation-based retrieval system that automatically identifies and retrieves related technologies and companies from web data, improving comprehensiveness and relevance over traditional methods.
Contribution
It introduces a novel framework combining technology classification with recommendation-based retrieval using DistilBERT, and constructs a new dataset for evaluation.
Findings
Retrieves 4 times more relevant companies than baseline methods.
Outperforms traditional retrieval in technology identification.
Supports retrieval of related companies and technologies effectively.
Abstract
Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and incomplete approach. In this work, we propose an end-to-end recommendation based retrieval approach to support automatic retrieval of technologies and their associated companies from raw Web data. This is a two-task setup involving (i) technology classification of entities extracted from company corpus, and (ii) technology and company retrieval based on classified technologies. Our proposed framework approaches the first task by leveraging DistilBERT which is a state-of-the-art language model. For the retrieval task, we introduce a recommendation-based retrieval technique to simultaneously support retrieving related companies,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb visibility and informetrics · Web Data Mining and Analysis
MethodsAttention Is All You Need · Linear Layer · Attention Dropout · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection · Weight Decay · Linear Warmup With Linear Decay · Dense Connections
