Increase citations, ease review & foster collaboration#

Binder DOI

A collection of “easy wins” to make machine learning in research reproducible.

This books focuses on basics that work. Getting you 90% of the way to top-tier reproducibility.

Every scientific conference has seen a massive uptick in applications that use some type of machine learning. Whether it’s a linear regression using scikit-learn, a transformer from Hugging Face, or a custom convolutional neural network in Jax, the breadth of applications is as vast as the quality of contributions.

This tutorial aims to provide easy ways to increase the quality of scientific contributions that use machine learning methods. The reproducible aspect will make it easy for fellow researchers to use and iterate on a publication, increasing citations of published work. The use of appropriate validation techniques and increase in code quality accelerates the review process during publication and avoids possible rejection due to deficiencies in the methodology. Making models, code and possibly data available increases the visibility of work and enables easier collaboration on future work.

This work to make machine learning applications reproducible has an outsized impact compared to the limited additional work that is required using existing Python libraries.


Model Evaluation 🤖

Avoid overfitting and ensure results work on future data reliably.

Benchmarking 🪑

Compare your results to other solutions on standardized datasets and metrics.

Model Sharing 🤝

Export and share models to collaborate and gain citations.

Testing 🧪

Catch code errors early and test that data is treated correctly.

Interpretability ⚡

Communicate results and inspect models to avoid spurious correlations.

Ablation Studies 🔪

Model building is iterative, so explore which parts actually matter.

This book is organized into these major sections:

  • Motivation to expand on how the following sections aide in increasing citations, easing review, and fostering collaboration.

  • Front Matter that goes into the installation and data.

  • How To with notebooks and additional resources on the sections to improve research artifacts.

  • Talks & Workshop that showcase presentations around this material.

Overall, this tutorial is aimed at applied scientists that want to explore machine learning solutions for their problems.

This tutorial focuses on a collection of “easy wins” that scientists can implement in their research to avoid catastrophic failures and increase reproducibility with all its benefits.