Basic Data Analysis and More - A Guided Tour Using Python
O. Melchert

TL;DR
This paper provides an accessible introduction to fundamental statistical tools for data analysis, illustrated with Python implementations, focusing on processing data from large-scale simulations and including advanced topics like clustering and parallelization.
Contribution
It offers a selective, computational experimentalist's perspective on basic data analysis techniques with practical Python examples, covering from moments to hierarchical clustering and parallel computing.
Findings
Python implementations of statistical tools provided
Techniques applicable to large-scale simulation data
Includes advanced topics like clustering and parallelization
Abstract
In these lecture notes, a selection of frequently required statistical tools will be introduced and illustrated. They allow to post-process data that stem from, e.g., large-scale numerical simulations (aka sequence of random experiments). From a point of view of data analysis, the concepts and techniques introduced here are of general interest and are, at best, employed by computational aid. Consequently, an exemplary implementation of the presented techniques using the Python programming language is provided. The contents of these lecture notes is rather selective and represents a computational experimentalist's view on the subject of basic data analysis, ranging from the simple computation of moments for distributions of random variables to more involved topics such as hierarchical cluster analysis and the parallelization of Python code.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Scientific Research and Discoveries · Statistical Mechanics and Entropy
