Python scripts for investigating Open Source Hardware GitHub repositories

Python scripts for investigating Open Source Hardware GitHub repositories

This dataset contains Python scripts applying repository mining and social network analysis (SNA) techniques for investigating the transparency and workload distribution of open source hardware (OSH) product development projects hosted on GitHub. Starting from a list of projects and the reference of their corresponding repositories, the scripts extract file versioning metadata from the GitHub API and compute GraphML graphs depicting the full history of commit information for each project. Three types of graphs are computed: commit graphs, file co-edition graphs and file change graphs. They then apply SNA indicators (size, centrality and clustering index) to characterize the topology of file co-edition graphs. Finally, they apply a k-means clustering to these indicators in order to identify different types of projects based on the topology of their co-edition graphs. These scripts have been developed and applied to 105 OSH product development projects in the frame of a study published in the following article (in open access): Bonvoisin, J., Tom Buchert, Maurice Preidel, Rainer Stark. 2018. “How participative is open source hardware? Insights from online repository mining”. Design Science, 4, E19. doi:10.1017/dsj.2018.15

Keywords:
open source hardware, open source innovation, open design, open innovation, repository mining, social network analysis, community-based product development, Git, GitHub

Cite this dataset as:
Bonvoisin, J., 2018. Python scripts for investigating Open Source Hardware GitHub repositories. Zenodo. https://doi.org/10.5281/zenodo.1208379.

Export

Creators

Documentation

Data collection method:

Instructions for use are given in the header of each of the six scripts. Script "goMine.py": - extracts metadata from GitHub API; - takes a list of project references as input (a CSV where each line is a list of GitHub repository references / affiliated to a project); - produces for each project a JSON file with a reference all branches; - produces for each project a JSON file with all commits of all branches. Script "goCreateGraphs.py": - takes as input a list of JSON files containing all commit information related to a project produced by goMine.py; - creates for each project the following graphs in GraphML: * a commit graph (as seen in Insights/Network in GitHub), * contributor graphs (where each node is a contributor and each edge is the edition of the same file by two contributors), filtered per filetype, * graphs of all committed file changes (one subgraph per file), filtered per filetype. Script "analysisActivityVolume.py": - computes indicators related to activity volume (number of file changes over time and per project); - takes as input the graphs of file changes produced by goCreateGraphs.py. Script "analysisActivityDistribution.py": - computes indicators related to activity distribution; - takes as input the contributor graphs produced by goCreateGraphs.py; - produces a CSV with computed indicators for all considered projects. Script "clustering.py": - applies a k-means clustering to the topological indicators computed on the contributor graphs; - takes as input the computed list of topological indicators produced by analysisActivityDistribution.py. Script "timeStop.py" is just a utility to add timestamps in traces.

Technical details and requirements:

The scripts require Python 3.7.

Funders

Agence Nationale de la Recherche (ANR)
https://doi.org/10.13039/501100001665

Open! - Methods and tools for community-based product development
ANR-15-CE26-0012

Deutsche Forschungsgemeinschaft (DFG)
https://doi.org/10.13039/501100001659

Open! - Methods and tools for community-based product development
STA 1112/13-1

Publication details

Publication date: 27 March 2018
by: Zenodo

Version: 0.1

DOI: https://doi.org/10.5281/zenodo.1208379

URL for this record: https://researchdata.bath.ac.uk/id/eprint/574

Related articles

Bonvoisin, J., Buchert, T., Preidel, M. and Stark, R. G., 2018. How participative is open source hardware? Insights from online repository mining. Design Science, 4. Available from: https://doi.org/10.1017/dsj.2018.15.

Contact information

Please contact the Research Data Service in the first instance for all matters concerning this item.

Contact person: Jeremy Bonvoisin

Departments:

Faculty of Engineering & Design
Mechanical Engineering