Topical: Learning Repository Embeddings from Source Code using Attention
Agathe Lherondelle, Varun Babbar, Yash Satsangi, Fran Silavong,, Shaltiel Eloul, Sean Moran

TL;DR
Topical introduces an attention-based neural network that creates comprehensive repository embeddings from source code and textual data, outperforming traditional methods in auto-tagging and demonstrating scalability.
Contribution
It presents a novel attention mechanism for repository embedding generation from source code, improving over existing aggregation-based approaches.
Findings
Outperforms baseline methods in auto-tagging tasks
Demonstrates scalability and efficiency in embedding computation
Provides open-source tools and datasets for further research
Abstract
This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical's utilization of an attention mechanism. This mechanism generates repository-level representations from source code, full dependency graphs, and script level textual data. Trained on publicly accessible GitHub repositories, Topical surpasses multiple baselines in tasks such as repository auto-tagging, highlighting the attention mechanism's efficacy over traditional aggregation methods. Topical also demonstrates scalability and efficiency, making it a valuable contribution to repository-level representation computation. For further research, the accompanying tools, code, and training dataset are provided at: https://github.com/jpmorganchase/topical.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Online Learning and Analytics · Software System Performance and Reliability
