skip to content

Skills-Based Volunteering case study - How GitHub Actions help Ersilia to open source endemic disease research in low-and-middle-income countries

Cynthia Lo image

Cynthia Lo @csmlo

Program Manager, Skills-Based Volunteering, GitHub Social Impact

June 6, 2023 // DevelopersEmployeesSkills-Based Volunteering

Published on: June 6, 2023

Background

Ersilia is an AI nonprofit that equips universities, hospitals, and laboratories in low-resourced countries with data science tools for infectious and neglected disease research. Ersilia developed a free, open-source repository of artificial intelligence and machine learning (AI/ML) models for drug discovery. These models aim to help researchers identify drug candidates for orphan and neglected diseases, design molecules de novo, understand mechanisms of action, and anticipate adverse side effects. In August 2022, the Ersilia team, Gemma Turon (CEO and Co-Founder) and Miquel Duran-Frigola (Chief Scientific Officer and Founder), reached out to GitHub Social Impact to tackle challenges they faced with running ML predictions, with compute cost and with how manual the process was. A GitHub Skills-Based Volunteering project team led by Dimitrios Philliou (@d1m1tr10s) Product Manager, and supported by Grant Birkinbine (@grantbirki) Security Engineer III, Ankit Kumar Honey (@honeyankit) Senior Software Engineer, Rachel Stanik (@lehcar) Software Engineer, and SKi Sankhe (@megamanics) Senior Solutions Architect took on this challenge.

Project Challenge

Over the course of 7 months, the project team met weekly to work on developing an automated model-contribution pipeline using GitHub Actions, where an external collaborator can contribute a new ML model to the Ersilia Model Hub. The goals of this project were: to automate Ersilia’s model-acceptance process; to automate model testing; and to parallelize scheduled model prediction workflow runs, thus increasing scalability using GitHub Actions.

The project team went a step further and also noticed an opportunity for better code security with the use of Dependabot. Dependabot is a service that scans code to detect out-of-date dependencies, alerts users when code depends on insecure packages, and updates dependencies automatically. The ability to monitor potential dependency vulnerabilities in the Ersilia Model Hub has helped manage automatic and scheduled version updates, thus improving scalability by reducing the number of manual processes.

Project Outcome

The gap between pharmaceutical research in low-and-middle-income countries versus high income countries disproportionately impacts individuals in low-middle income countries due to the lack of funding and resources. Ersilia’s goal is to further disseminate AI/ML models existing in the peer-review literature, and in-house collection of models focused on diseases that are currently neglected by the pharmaceutical industry due to estimated low return on investment.

The original Ersilia model contributions would accept contributions from contributors Ersilia previously had worked together with. In addition, contributors had to be individually added to the GitHub org and a repository needed to be created manually. In order to add metadata to the database contributors also had to work with the Ersilia team synchronously and models were not being validated before they were made public. This process was not scalable and required manual review from the Ersilia team. Understanding the challenges Ersilia encountered, the project team automated much of the manual permissions process and allowed for asynchronous contributions. Issue templates were also set up to help contributors know what data to include and the project team also automated database entries and testing. The new model intake process flow set up a public discussions board which is important for keeping record of the contributions. This model contribution workflow project resulted in an easier way for Ersilia maintainers to revise and accept ML model contributions much more efficiently, with quality control checks and automatic metadata updates with a goal to increase the contributions from 90 to 500 models. Furthermore we have a slack integration where a new model submission will be automatically posted in slack through a GitHub Actions to notify the community. As a result of this engagement, Ersilia and GitHub presented this project at MozFest 2023 which can be seen here! If you missed the MozFest event, join us on June 8, 2023 for RightsCon!

Sincere gratitude to Miquel Duran-Frigola and Gemma Turon from Ersilia, and Dimitrios Philliou, Grant Birkinbine, Ankit Kumar Honey, Rachel Stanik, SKi Sankhe from GitHub for all their hard work on this project!

Related insights