Company2Vec -- German Company Embeddings based on Corporate Websites
Christopher Gerling

TL;DR
Company2Vec introduces a new method for creating detailed company embeddings from German corporate websites, enabling advanced semantic analysis, industry prediction, and peer-firm identification for applications like banking and clustering.
Contribution
The paper presents a novel approach combining Word2Vec and dimensionality reduction to generate fine-grained, semantic company embeddings from unstructured website data, enhancing business analytics.
Findings
Semantic embeddings preserve language structure.
Effective industry prediction using supervised learning.
Enhanced company similarity measurement with cosine distance.
Abstract
With Company2Vec, the paper proposes a novel application in representation learning. The model analyzes business activities from unstructured company website data using Word2Vec and dimensionality reduction. Company2Vec maintains semantic language structures and thus creates efficient company embeddings in fine-granular industries. These semantic embeddings can be used for various applications in banking. Direct relations between companies and words allow semantic business analytics (e.g. top-n words for a company). Furthermore, industry prediction is presented as a supervised learning application and evaluation method. The vectorized structure of the embeddings allows measuring companies similarities with the cosine distance. Company2Vec hence offers a more fine-grained comparison of companies than the standard industry labels (NACE). This property is relevant for unsupervised learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodsk-Means Clustering
