Summary Analysis of the 2017 GitHub Open Source Survey
R. Stuart Geiger

TL;DR
This paper provides a comprehensive high-level summary of the 2017 GitHub Open Source Survey, including statistical analyses and visualizations of survey responses.
Contribution
It offers the first detailed statistical overview and visual analysis of the 2017 GitHub Open Source Survey data.
Findings
Identified key trends in open source contributions in 2017.
Presented detailed frequency and proportion analyses for survey questions.
Visualized data through bar plots for better understanding.
Abstract
This report is a high-level summary analysis of the 2017 GitHub Open Source Survey dataset, presenting frequency counts, proportions, and frequency or proportion bar plots for every question asked in the survey.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34
Figure 35
Figure 36
Figure 37
Figure 38
Figure 39
Figure 40Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management
Summary Analysis of the 2017 GitHub Open Source Survey
R. Stuart Geiger
Berkeley Institute for Data Science
University of California, Berkeley
(7 June 2017)
1 Abstract
This report is a high-level summary analysis of the 2017 GitHub Open Source Survey dataset,111https://opensourcesurvey.org/2017/ presenting frequency counts, proportions, and frequency or proportion bar plots for every question asked in the survey.
2 Overview
2.1 The 2017 Open Source Survey
This report analyzes the open dataset from the 2017 Open Source Survey, which was conducted by staff at GitHub, with help, support, and feedback from many others (Zlotnick et al., 2017a). The survey was run in 2017, asking over 50 questions on a variety of topics. The survey’s designers explain the motivation, design, and distribution of the survey on the project’s website:
"In collaboration with researchers from academia, industry, and the community, GitHub designed a survey to gather high quality and novel data on open source software development practices and communities. We collected responses from 5,500 randomly sampled respondents sourced from over 3,800 open source repositories on GitHub.com, and over 500 responses from a non-random sample of communities that work on other platforms. The results are an open data set about the attitudes, experiences, and backgrounds of those who use, build, and maintain open source software." (Zlotnick et al., 2017b)
2.2 Purpose and goal of this report
The GitHub survey team presented analyses of some questions when releasing the survey (Zlotnick et al., 2017b), but there were many more questions asked that are relevant to researchers and community members. This report is an exploratory analysis of all questions asked in the survey, providing a basic summary of the responses to each question. This report presents and plots summary statistics – mostly frequency counts, proportions, then a frequency or proportion bar graph – of all questions asked in the survey. Most questions are presented individually, with panel questions grouped together as appropriate. There are no correlations, regressions, or descriptive breakouts between subgroups. Likert-style questions (e.g. Strongly agree <-> strongly disagree) have not been recoded to numerical, scalar values. There are no discussions or interpretations of results. This is left for future work.
The purpose of this report is to facilitate future research on this dataset by giving an overview of the kinds of questions asked in the survey, as well as provide a single, stable reference for citing broad claims in the data. This report is the PDF version of a Jupyter Notebook, which can be run to reproduce the results of the tables and graphs in this report. The Jupyter notebook and a copy of the data is public on GitHub 222https://github.com/staeiou/github-survey-analysis and the Open Science Framework 333https://osf.io/enrq5/. Others are encouraged to extend it as they see fit, as this report and the notebooks are licensed CC-BY-4.0.444https://creativecommons.org/licenses/by/4.0/ The "Out[number]" notes before each table and chart are linked to the Jupyter notebook, so you can easily navigate to the notebook cell where the applicable code can be found. If you find this report useful, please cite both this report (Geiger, 2017) and the original survey (Zlotnick et al., 2017a) as detailed in the bibliography at the end of this report.
2.3 Software used
This analysis was conducted in Python (van Rossum, 1995) version 3.6, using Pandas dataframes (McKinney, 2010) for data parsing and transformation, SciPy (Jones et al., 2001) and NumPy (van der Walt et al., 2011) for quantitative computations, and Matplotlib (Hunter, 2007) and Seaborn (Waskom et al., 2014) for visualization. It was conducted in Jupyter Notebooks (Kluyver et al., 2016) using the IPython kernel (Pérez and Granger, 2007), and nbconvert (also discussed in (Kluyver et al., 2016)) was used to convert the notebook into LaTeX for publication in this report.
3 Table of Contents
Contents
-
4.1.2 How often do you engage in each of the following activities?
-
4.1.5 How interested are you in contributing to open source projects in the future?
-
4.1.6 How likely are you to contribute to open source projects in the future?
-
4.2.3 How often do you try to find open source options over other kinds of software?
-
4.3.2 In general, how much information about you is publicly available online?
-
4.4.3 Which best describes your prior relationship with the person who helped you?
-
4.4.5 Have you ever provided help for another person on an open source project?
-
4.4.7 Which best describes your prior relationship with the person you helped?
-
4.5.1 Do you contribute to open source as part of your professional work?
-
4.5.2 How often do you use open source software in your professional work?
-
4.5.4 Which is closest to your employer’s policy on using open source software applications?
-
4.5.5 How important do you think your involvement in open source was to getting your current job?
-
4.6.1 Do you currently live in a country other than the one in which you were born?
-
4.6.9 What is highest level of formal education that you have completed?
-
4.6.10 What is the highest level of formal education that either of your parents completed?
-
4.6.11 How old were you when you first had regular access to a computer with an internet connection?
-
4.6.12 Where did you first have regular access to a computer with internet connection?
-
4.7.1 Have you ever observed any of the following in the context of an open source project?
-
4.7.4 Thinking of the last time you experienced harassment, how did you respond?
-
4.7.5 How effective were the following responses? Response counts
-
4.7.6 How effective were the following responses? Proportions
4 Analysis
4.1 Contributor identity
4.1.1 People participate in open source in different ways.
Which of the following activities do you engage in?
PARTICIPATION.TYPE.*
Out[29]:
[TABLE]
Out[30]:
[TABLE]
4.1.2 How often do you engage in each of the following activities?
CONTRIBUTION.TYPE.*
Out[32]:
[TABLE]
4.1.3 Employment status
EMPLOYMENT.STATUS
Out[34]:
[TABLE]
Out[35]:
[TABLE]
4.1.4 In your main job, how often do you write or otherwise
directly contribute to producing software?
PROFESSIONAL.SOFTWARE
Out[37]:
[TABLE]
Out[38]:
[TABLE]
4.1.5 How interested are you in contributing to open source
projects in the future?
FUTURE.CONTRIBUTION.INTEREST
Out[40]:
[TABLE]
Out[41]:
[TABLE]
4.1.6 How likely are you to contribute to open source projects
in the future?
Out[43]:
[TABLE]
Out[44]:
[TABLE]
4.2 Priorities and values
4.2.1 When thinking about whether to use open source software,
how important are the following things?
OSS.USER.PRIORITIES.*
Out[47]:
[TABLE]
Out[49]:
[TABLE]
4.2.2 When thinking about whether to contribute to an open
source project, how important are the following things?
OSS.CONTRIBUTOR.PRIORITIES.*
Out[52]:
[TABLE]
Out[54]:
[TABLE]
4.2.3 How often do you try to find open source options over
other kinds of software?
SEEK.OPEN.SOURCE
Out[56]:
[TABLE]
Out[57]:
[TABLE]
4.2.4 Open source software
usability
OSS.UX: Do you believe that open source software is generally easier to use than closed source (proprietary) software, harder to use, or about the same?
Out[59]:
[TABLE]
Out[60]:
[TABLE]
4.2.5 Open source software
security
OSS.SECURITY: Do you believe that open source software is generally more secure than closed source (proprietary) software, less secure, or about the same?
Out[62]:
[TABLE]
Out[63]:
[TABLE]
4.2.6 Open source software
stability
OSS.STABILITY: Do you believe that open source software is generally more stable than closed source (proprietary) software, less stable, or about the same?
Out[65]:
[TABLE]
Out[66]:
[TABLE]
4.2.7 Identification with open
source
How much do you agree or disagree with the following statements:
- •
EXTERNAL.EFFICACY: The open source community values contributions from people like me.
- •
INTERNAL.EFFICACY: I have the skills and understanding necessary to make meaningful contributions to open source projects.
- •
OSS.IDENTIFICATION: I consider myself to be a member of the open source (and/or the Free/Libre software) community.
Out[69]:
[TABLE]
Out[70]:
[TABLE]
4.3 Transparency vs privacy
4.3.1 Attribution
TRANSPARENCY.PRIVACY.BELIEFS: Which of the following statements is closest to your beliefs about attribution in software development?
- •
Records of authorship should be required so that end users know who created the source code they are working with.
- •
People should be able to contribute code without attribution, if they wish to remain anonymous.
Out[72]:
[TABLE]
Out[73]:
[TABLE]
4.3.2 In general, how much information about you is publicly
available online?
INFO.AVAILABILITY
Out[75]:
[TABLE]
Out[76]:
[TABLE]
4.3.3 Do you feel that you need to make information available
about yourself online for professional reasons?
INFO.JOB
Out[78]:
[TABLE]
Out[79]:
[TABLE]
4.3.4 General privacy
practices
TRANSPARENCY.PRIVACY.PRACTICES.GENERAL
"Which of the following best describes your practices around publishing content online, such as posts on social media (e.g. Facebook, Instagram, Twitter, etc.), blogs, and other platforms (not including contributions to open source projects)?" (single choice)
Out[81]:
[TABLE]
Out[82]:
[TABLE]
4.3.5 OSS privacy practices
"Which of the following best describes your practices when making open source contributions?"
Out[85]:
[TABLE]
Out[86]:
[TABLE]
4.4 Mentorship / Help
4.4.1 Have you ever received any kind of help from other people
related to using or contributing to an open source project?
RECEIVED.HELP
Out[89]:
[TABLE]
Out[90]:
[TABLE]
4.4.2 Thinking of the most recent case where someone helped
you, how did you find someone to help you?
Out[92]:
[TABLE]
Out[93]:
[TABLE]
4.4.3 Which best describes your prior relationship with the
person who helped you?
HELPER.PRIOR.RELATIONSHIP
Out[95]:
[TABLE]
Out[96]:
[TABLE]
4.4.4 What kind of problem did they help you
with?
RECEIVED.HELP.TYPE
Out[98]:
[TABLE]
Out[99]:
[TABLE]
4.4.5 Have you ever provided help for another person on an open
source project?
PROVIDED.HELP
Out[101]:
[TABLE]
Out[102]:
[TABLE]
4.4.6 Thinking of the most recent case where you helped
someone, how did you come to help this person?
FIND.HELPEE
Out[104]:
[TABLE]
Out[105]:
[TABLE]
4.4.7 Which best describes your prior relationship with the
person you helped?
HELPEE.PRIOR.RELATIONSHIP
Out[107]:
[TABLE]
Out[108]:
[TABLE]
4.4.8 What kind of problem did you help them
with?
PROVIDED.HELP.TYPE
Out[110]:
[TABLE]
Out[111]:
[TABLE]
4.5 Open Source Software in Paid
Work
4.5.1 Do you contribute to open source as part of your
professional work?
OSS.AS.JOB: Do you contribute to open source as part of your professional work? In other words, are you paid for any of your time spent on open source contributions?
- •
Yes, indirectly- I contribute to open source in carrying out my work duties, but I am not required or expected to do so.
- •
No.
- •
Yes, directly- some or all of my work duties include contributing to open source projects.
Out[113]:
[TABLE]
Out[114]:
[TABLE]
4.5.2 How often do you use open source software in your
professional work?
OSS.AT.WORK
Out[116]:
[TABLE]
Out[117]:
[TABLE]
4.5.3 How does your employer’s intellectual property
agreement/policy affect your free-time contributions to open source unrelated to your work?
OSS.IP.POLICY
Out[119]:
[TABLE]
Out[120]:
[TABLE]
4.5.4 Which is closest to your employer’s policy on using open
source software applications?
Out[122]:
[TABLE]
Out[123]:
[TABLE]
4.5.5 How important do you think your involvement in open
source was to getting your current job?
OSS.HIRING
Out[125]:
[TABLE]
Out[126]:
[TABLE]
4.6 Demographics
4.6.1 Do you currently live in a country other than the one in
which you were born?
IMMIGRATION
Out[128]:
[TABLE]
Out[129]:
[TABLE]
4.6.2 Thinking of where you were born, are you a member of an
ethnicity or nationality that is a considered a minority in that country?
MINORITY.HOMECOUNTRY
Out[131]:
[TABLE]
Out[132]:
[TABLE]
4.6.3 Thinking of where you currently live, are you a member of
an ethnicity or nationality that is a considered a minority in that country?
MINORITY.CURRENT.COUNTRY
Out[134]:
[TABLE]
Out[135]:
[TABLE]
4.6.4 What is your gender?
GENDER
Out[137]:
[TABLE]
Out[138]:
[TABLE]
4.6.5 Do you identify as
transgender?
TRANSGENDER.IDENTITY
Out[140]:
[TABLE]
Out[141]:
[TABLE]
4.6.6 Do you identify as gay, lesbian, or bisexual, asexual, or
any other minority sexual orientation?
SEXUAL.ORIENTATION
Out[143]:
[TABLE]
Out[144]:
[TABLE]
4.6.7 How well can you read and write in
English?
WRITTEN.ENGLISH
Out[146]:
[TABLE]
Out[147]:
[TABLE]
4.6.8 What is your age?
AGE
Out[149]:
[TABLE]
Out[150]:
[TABLE]
4.6.9 What is highest level of formal education that you have
completed?
FORMAL.EDUCATION
Out[152]:
[TABLE]
Out[153]:
[TABLE]
4.6.10 What is the highest level of formal education that either
of your parents completed?
PARENTS.FORMAL.EDUCATION
Out[155]:
[TABLE]
Out[156]:
[TABLE]
4.6.11 How old were you when you first had regular access to a
computer with an internet connection?
AGE.AT.FIRST.COMPUTER.INTERNET
Out[158]:
[TABLE]
Out[159]:
[TABLE]
4.6.12 Where did you first have regular access to a computer
with internet connection?
LOCATION.OF.FIRST.COMPUTER.INTERNET
Out[161]:
[TABLE]
Out[162]:
[TABLE]
4.6.13 Where was the respondent surveyed
from?
POPLATION
Out[164]:
[TABLE]
Out[165]:
[TABLE]
4.7 Harassment / Inclusiveness of
OSS
4.7.1 Have you ever observed any of the following in the
context of an open source project?
DISCOURAGING.BEHAVIOR.*
Out[167]:
[TABLE]
Out[168]:
[TABLE]
4.7.2 Have you ever witnessed any of the following behaviors
directed at another person in the context of an open source project? (not including something directed at you)
NEGATIVE.WITNESS.*
Out[172]:
[TABLE]
Out[173]:
[TABLE]
4.7.3 Have you ever experienced any of the following behaviors
directed at you in the context of an open source project?
Out[177]:
[TABLE]
Out[178]:
[TABLE]
4.7.4 Thinking of the last time you experienced harassment, how
did you respond?
NEGATIVE.RESPONSE.*
Out[182]:
[TABLE]
Out[183]:
[TABLE]
4.7.5 How effective were the following
responses? Response counts
RESPONSE.EFFECTIVENESS.*
Out[186]:
[TABLE]
4.7.6 How effective were the following
responses? Proportions
Out[189]:
[TABLE]
4.7.7 As a result of experiencing or witnessing harassment,
which, if any, of the following have you done?
NEGATIVE.CONSEQUENCES.*
Out[193]:
[TABLE]
Out[194]:
[TABLE]
5 Bibliography
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Geiger [2017] R. Stuart Geiger. Summary analysis of the 2017 github open source survey. Soc Ar Xiv Preprints , 2017. doi: 10.17605/OSF.IO/ENRQ 5 . URL https://osf.io/preprints/socarxiv/qps 53 .
- 2Hunter [2007] J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science Engineering , 9(3):90–95, May 2007. ISSN 1521-9615. doi: 10.1109/MCSE.2007.55 . URL http://ieeexplore.ieee.org/document/4160265/ .
- 3Jones et al. [2001] Eric Jones, Travis Oliphant, Pearu Peterson, et al. Sci Py: Open source scientific tools for Python, 2001. URL http://www.scipy.org/ .
- 4Kluyver et al. [2016] Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, Carol Willing, and Jupyter development team. Jupyter notebooks: a publishing format for reproducible computational workflows, 2016. URL https://eprints.soton.ac.uk/403913/ .
- 5Mc Kinney [2010] Wes Mc Kinney. Data Structures for Statistical Computing in Python. In Stéfan van der Walt and Jarrod Millman, editors, Proceedings of the 9th Python in Science Conference , pages 51–56, 2010. URL http://conference.scipy.org/proceedings/scipy 2010/mckinney.html .
- 6Pérez and Granger [2007] Fernando Pérez and Brian E. Granger. I Python: a system for interactive scientific computing. Computing in Science and Engineering , 9(3):21–29, May 2007. ISSN 1521-9615. doi: 10.1109/MCSE.2007.53 . URL http://ipython.org .
- 7van der Walt et al. [2011] S. van der Walt, S. C. Colbert, and G. Varoquaux. The numpy array: A structure for efficient numerical computation. Computing in Science Engineering , 13(2):22–30, March 2011. ISSN 1521-9615. doi: 10.1109/MCSE.2011.37 . URL https://arxiv.org/abs/1102.1523 .
- 8van Rossum [1995] Guido van Rossum. Python library reference, 1995. URL https://ir.cwi.nl/pub/5009/05009 D.pdf .
