A project-based course on software development for (engineering) research
Kyle E. Niemeyer

TL;DR
This paper presents the design and implementation of a 10-week graduate course focused on practical software development skills for research, emphasizing hands-on projects and open science principles.
Contribution
It introduces a comprehensive course structure integrating software engineering practices into research training, with open materials and peer review for practical learning.
Findings
Course successfully taught in Spring 2018 with 17 students
Students developed research software projects with peer review
Materials are openly shared via GitHub
Abstract
This paper describes the motivation and design of a 10-week graduate course that teaches practices for developing research software; although offered by an engineering program, the content applies broadly to any field of scientific research where software may be developed. Topics taught in the course include local and remote version control, licensing and copyright, structuring Python modules, testing and test coverage, continuous integration, packaging and distribution, open science, software citation, and reproducibility basics, among others. Lectures are supplemented by in-class activities and discussions, and all course material is shared openly via GitHub. Coursework is heavily based on a single, term-long project where students individually develop a software package targeted at their own research topic; all contributions must be submitted as pull requests and reviewed/merged by…
| Topic | In-class activity |
|---|---|
| Getting started, and version control | Configure Git |
| Remote version control, licensing, and copyright | Create, clone, and fork repos |
| Structuring modules, and testing | Create basic structure of module |
| Test coverage, continuous integration, documentation | Configure Travis CI |
| Introduction to Julia (guest lecture) | |
| Introduction to parallel programming | |
| Classes and objects (in Python) | |
| Packaging and distributing your software | Create PyPI, Anaconda packages |
| Optimizing numerical code in Python | |
| Working with files, command-line inputs in Python | |
| Open science, software citation, reproducibility | Connect GitHub and Zenodo |
| Posters, presentations, and technical writing | |
| Project presentations |
| Assignment | Week |
|---|---|
| Join Gitter chat room and create GitHub profile | 1 |
| Project proposal | 1 |
| Choose open-source license | 3 |
| Create tests and submit as PR for review | 3 |
| Finish configuring Travis CI | 4 |
| Write comments and docstrings | 4 |
| Complete PyPI and/or Anaconda packages | 6 |
| Write report and make presentation | 7 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: Oregon State University, Corvallis OR 97331, USA
11email: [email protected]
A project-based course on software development for (engineering) research
Kyle E. Niemeyer 11 0000-0003-4425-7097
Abstract
This paper describes the motivation and design of a 10-week graduate course that teaches practices for developing research software; although offered by an engineering program, the content applies broadly to any field of scientific research where software may be developed. Topics taught in the course include local and remote version control, licensing and copyright, structuring Python modules, testing and test coverage, continuous integration, packaging and distribution, open science, software citation, and reproducibility basics, among others. Lectures are supplemented by in-class activities and discussions, and all course material is shared openly via GitHub. Coursework is heavily based on a single, term-long project where students individually develop a software package targeted at their own research topic; all contributions must be submitted as pull requests and reviewed/merged by other students. The course was initially offered in Spring 2018 with 17 students enrolled, and will be taught again in Spring 2019.
Keywords:
Research software Teaching software development Software best practices.
1 Motivation
Nearly all research relies on software—even experimental—but researchers typically do not receive training in best practices during graduate school in the same way as they do for experimental methods. In fact, in two recent surveys the vast majority of academics confirmed that they use software and that their research would be impractical without it: 90%/70% of UK academics surveyed in 2014 [12], and 95%/63% of US postdoctoral researchers surveyed in 2017 [19]. Computational science in particular depends on software and following good, evidence-based practices when working with software and data.
For example, in the Mechanical Engineering graduate curriculum at Oregon State University, the thermal-fluid sciences option (where I teach) requires a course on experimental measurement techniques, but no analogous course on proper techniques on software development or computational science. (We do require a course on numerical methods that focuses on solving differential equations, but this does not extend to software development.) Instead, our program—as in most similar programs around the world—assumes that such practices are trivial compared with the physical phenomena or mathematical methods and/or can be self-taught. However, in the same way that appropriate measurement techniques and statistical analysis of data are necessary for experimental (and computational) research, good practices ensure reliability and correctness of research results obtained from software developed for both computational (i.e., modeling-based) or experimental (i.e., analysis of results) research.
As the research community has recognized the importance of software and data skills, in recent years Software Carpentry [27] workshops have become a recognized avenue for graduate students and postdocs (and the occasional faculty member) to learn necessary skills for working with Python, the command line, and version-control systems. While these are essential skills for research, researchers who go further to develop software require additional training. This article describes a course aimed at filling this gap by teaching graduate-student researchers practical software development skills, and also exposing them to topics related to open science and reproducibility.
2 Course design
The course heavily relies and builds on the Effective Computation in Physics textbook by Scopatz and Huff [23] (Chapters 10–22), as well as recommendations by Wilson et al. [28] and Jiménez et al. [13]. All materials for the course are openly available online via GitHub and shared under a Creative Commons Attribution license; the online course syllabus provides links to each lecture [20]. The course combines lectures, hosted on GitHub and presented using reveal.js [8], with in-class activities and discussions, as summarized in Table 1. Out-of-class work, described in Table 2, centers around a software development project discussed in Sec. 2.2.
2.1 Course description and learning objectives
The listed course description is
This course will advance students’ understanding of topics related to computational science and engineering, and advance their skills in applying techniques to solve research problems using high-level programming languages. The course will build on existing abilities in computer programming to cover topics related to computational modeling and scientific software development. Students will gain experience in applying available packages and libraries, as well as developing software to solve problems related to their own research interests. Students will also gain experience in working collaboratively and openly on scientific computing projects.
By the end of the course, students should be able to
use high-level programming language to analyze and/or solve practical research problems; 2. 2.
apply principles of modern computational science and engineering, reproducibility, and open science to their research; 3. 3.
evaluate, visualize, write about, and publish computational research results; and 4. 4.
develop and share an open-source research software package that solves a problem in their research area.
These are the formal student learning objectives for the course.
2.2 Project
In lieu of standalone homework assignments, all assigned work contributes to a term-long project where students develop a new software package targeted at their own research area. The project initiates with a proposal that students submit via pull request to an open repository on GitHub, and which the instructor merges upon approval (following any changes requested). Then, students create a repository in the course organization for their software package, and fork this to their own accounts. After this, students submit all project contributions as pull requests to the upstream repository. Partners review these and either approve or request changes; only after the code-review partner approves the contribution can the project’s owner merge the pull request.
2.3 Methods of instruction
The course is delivered using a combination of lectures, discussion, and in-class interactive work. Lectures mostly exist as reveal.js [9] presentations, which are shared openly on a public-facing syllabus website [20]. Lectures also use practical demonstrations of Python code, shown using either IPython [21] or via Jupyter Notebooks [15]. Nearly all lectures also ask students to follow along on their own computer, either executing example code or advancing their project packages.
3 Results from first offering
I offered the first iteration of this course in the Spring 2018 term, with the title “Software Development for Engineering Research;” while the course content is not limited to engineering research, I offered the course out of the Mechanical Engineering program with a targeted audience of graduate students in the College of Engineering. 17 students enrolled in the course, with all but one being graduate students; roughly half were in the second year or later of their graduate programs. Approximately 40% of the students came from mechanical engineering (including thermal-fluid sciences and design engineering), 35% were from nuclear engineering, and the remaining came from robotics and chemical engineering. Three quarters of the students had already taken a course on Python programming for engineering applications, while the others had some self-taught Python programming skills. Half expressed comfort working with the Unix command line, and the other half said they had used it but were not as comfortable with command-line operations. None admitted to being command-line ninjas, and none were completely unfamiliar.
The first offering of this 10-week course on software development for engineering research completed successfully in June 2018, with all 17 students releasing the first version of their software developed during the course. At least four of the software packages have been developed further after the conclusion of the course, and at least one package is being prepared for submission to the Journal of Open Source Software (JOSS) [24].
The projects covered a wide variety of topics, with functions including simulation, experimental data analysis, and automation: designing detonation tubes [6], using machine learning to extract features from nuclear physics simulations [10], interfacing with an 8-channel digital pulse processor board [16], simulating and analyzing the combustion engine of a Global Formula Racing formula SAE vehicle [14], optimizing and analyzing wind-farm layouts [17], analyzing spin stabilization of solid rocket motors [18], a nodal quasi-diffusion solver for nuclear fission [22], agent-learning for autonomous path finding [25], generating input for a Monte-Carlo radiation transport code [26], calculating solar-energy terms based on location [11], analyzing radioxenon spectra [7], calculating deep-learning layers for multi-agent reinforcement learners [5], analyzing solvent extraction kinetics [4], simulating rapid compression machine experiments [3], calibrating blackbody infrared cameras [2], and simulating transient heat transfer in a microchannel with passive temperature dependent flow control [1].
Although the sample size is small, students rated the course well in their end-of-term evaluations: they rated the course as a whole 5.3/5.5 (mean/median) out of 6.0 and the instructor contribution 5.5/5.8 (mean/median) out of 6.0. Multiple comments discussed the course favorably, and that it should be taken by all students doing research involving software/programming. Suggestions included clarifying expectations for students and proving more feedback; also, one student felt the course was too advanced for their experience level.
4 Conclusions
This article describes a 10-week course given in Spring 2018 teaching skills for developing research software; all lesson and assignment content is available openly [20]. This course will be offered again in Spring term 2019 (10 weeks, April–June). Planned changes include incorporating more in-class activities in more of the topics, and adding new topics such as peer code review and high-performance computing.
In addition, I am developing alternate versions of the course aimed at different lengths of time, such as an afternoon tutorial session or day-long workshop. These lessons and modules will be shared openly for the community to use, adapt, and extend. Furthermore, while the course at Oregon State University is currently offered out of the Mechanical Engineering program, it may fit better offered as an Engineering course or more broadly in the Graduate Education program.
Acknowledgements
This research was supported by the Better Scientific Software Fellowship, part of the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Armatis, P.D.: PA Flo CS v 0.1.0. Zenodo (2018). https://doi.org/10.5281/zenodo.1291277
- 2[2] Bean, D.: IR Cal v 0.1.0. Zenodo (2018). https://doi.org/10.5281/zenodo.1291190
- 3[3] Behnoudfar, D.: Sim RCM v 0.1.0. Zenodo (2018). https://doi.org/10.5281/zenodo.1291302
- 4[4] Bettinardi, D.J.: Sep Kinetics pre-alpha v 0.110. Zenodo (2018). https://doi.org/10.5281/zenodo.1291202
- 5[5] Brian, M.: deep_learning_layer_calculator v 1.0. Zenodo (2018). https://doi.org/10.5281/zenodo.1291273
- 6[6] Carter, M.: Beaver Det v 0.1.0. Zenodo (2018). https://doi.org/10.5281/zenodo.1288009
- 7[7] Czyz, S.A.: radioxenon_ml v 0.5.0. Zenodo (2018). https://doi.org/10.5281/zenodo.1291208
- 8[8] El Hattab, H.: reveal.js 3.7.0. https://github.com/hakimel/reveal.js (dec 2018)
