Landscape of High-performance Python to Develop Data Science and Machine Learning Applications
Oscar Castro, Pierrick Bruneau, Jean-S\'ebastien Sottet and, Dario Torregrossa

TL;DR
This paper surveys tools and techniques to enhance Python's performance for data science and machine learning, focusing on practical scenarios and aiding both practitioners and tool developers.
Contribution
It provides a comprehensive overview of high-performance Python tools tailored for data science and ML, highlighting gaps and guiding future development.
Findings
Summarizes key high-performance Python tools
Identifies gaps in current tools and techniques
Provides practical scenarios for users and developers
Abstract
Python has become the prime language for application development in the Data Science and Machine Learning domains. However, data scientists are not necessarily experienced programmers. While Python lets them quickly implement their algorithms, when moving at scale, computation efficiency becomes inevitable. Thus, harnessing high-performance devices such as multicore processors and Graphical Processing Units (GPUs) to their potential is generally not trivial. The present narrative survey was thought as a reference document for such practitioners to help them make their way in the wealth of tools and techniques available for the Python language. Our document revolves around user scenarios, which are meant to cover most situations they may face. We believe that this document may also be of practical use to tool developers, who may use our work to identify potential lacks in existing tools…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Parallel Computing and Optimization Techniques
