Python scripts for investigating Open Source Hardware GitHub repositories

This dataset contains Python scripts applying repository mining and social network analysis (SNA) techniques for investigating the transparency and workload distribution of open source hardware (OSH) product development projects hosted on GitHub. Starting from a list of projects and the reference of their corresponding repositories, the scripts extract file versioning metadata from the GitHub API and compute GraphML graphs depicting the full history of commit information for each project. Three types of graphs are computed: commit graphs, file co-edition graphs and file change graphs. They then apply SNA indicators (size, centrality and clustering index) to characterize the topology of file co-edition graphs. Finally, they apply a k-means clustering to these indicators in order to identify different types of projects based on the topology of their co-edition graphs. These scripts have been developed and applied to 105 OSH product development projects in the frame of a study published in the following article (in open access): Bonvoisin, J., Tom Buchert, Maurice Preidel, Rainer Stark. 2018. “How participative is open source hardware? Insights from online repository mining”. Design Science, 4, E19. doi:10.1017/dsj.2018.15

open source hardware, open source innovation, open design, open innovation, repository mining, social network analysis, community-based product development, Git, GitHub

Cite this dataset as:
Bonvoisin, J., 2018. Python scripts for investigating Open Source Hardware GitHub repositories. Zenodo. Available from:


[QR code for this page]



Data collection method:

Instructions for use are given in the header of each of the six scripts. Script "": - extracts metadata from GitHub API; - takes a list of project references as input (a CSV where each line is a list of GitHub repository references / affiliated to a project); - produces for each project a JSON file with a reference all branches; - produces for each project a JSON file with all commits of all branches. Script "": - takes as input a list of JSON files containing all commit information related to a project produced by; - creates for each project the following graphs in GraphML: * a commit graph (as seen in Insights/Network in GitHub), * contributor graphs (where each node is a contributor and each edge is the edition of the same file by two contributors), filtered per filetype, * graphs of all committed file changes (one subgraph per file), filtered per filetype. Script "": - computes indicators related to activity volume (number of file changes over time and per project); - takes as input the graphs of file changes produced by Script "": - computes indicators related to activity distribution; - takes as input the contributor graphs produced by; - produces a CSV with computed indicators for all considered projects. Script "": - applies a k-means clustering to the topological indicators computed on the contributor graphs; - takes as input the computed list of topological indicators produced by Script "" is just a utility to add timestamps in traces.

Technical details and requirements:

The scripts require Python 3.7.


Agence Nationale de la Recherche (ANR)

Open! - Methods and tools for community-based product development

Deutsche Forschungsgemeinschaft (DFG)

Open! - Methods and tools for community-based product development
STA 1112/13-1

Publication details

Publication date: 27 March 2018
by: Zenodo

Version: 0.1


URL for this record:

Contact information

Please contact the Research Data Service in the first instance for all matters concerning this item.

Contact person: Jeremy Bonvoisin


Faculty of Engineering & Design
Mechanical Engineering