{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Sharing\n", "\n", "Some journals will require the sharing of code or models, but even if they don’t we might benefit from it.\n", "\n", "Anytime we share a model, we give other researchers the opportunity to replicate our studies and iterate upon them. Altruistically, this advances science, which in and of itself is a noble pursuit. However, this also increases the citations of our original research, a core metric for most researchers in academia.\n", "\n", "In this section, we explore how we can export models and make our training codes reproducible. Saving a model from scikit-learn is easy enough. But what tools can we use to easily make our training code adaptable for others to import and try out that model? Specifically, I want to talk about:\n", "\n", "- Automatic Linters\n", "- Automatic Formatting\n", "- Automatic Docstrings and Documentation\n", "- Docker and containerization for ultimate reproducibility\n" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0 }, "source": [ "## Model Export\n", "Scikit learn uses the Python `pickle` (or rather `joblib`) module to persist models in storage.\n", "More information [here](https://scikit-learn.org/stable/model_persistence.html)" ] }, { "cell_type": "code", "execution_count": 1, "id": "54158e1d", "metadata": { "execution": { "iopub.execute_input": "2022-12-13T01:42:16.975494Z", "iopub.status.busy": "2022-12-13T01:42:16.975494Z", "iopub.status.idle": "2022-12-13T01:42:16.987167Z", "shell.execute_reply": "2022-12-13T01:42:16.986667Z" } }, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "DATA_FOLDER = Path(\"..\", \"..\") / \"data\"\n", "DATA_FILEPATH = DATA_FOLDER / \"penguins_clean.csv\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2022-12-13T01:42:16.989668Z", "iopub.status.busy": "2022-12-13T01:42:16.989167Z", "iopub.status.idle": "2022-12-13T01:42:17.390237Z", "shell.execute_reply": "2022-12-13T01:42:17.389738Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | Culmen Length (mm) | \n", "Culmen Depth (mm) | \n", "Flipper Length (mm) | \n", "Sex | \n", "Species | \n", "
---|---|---|---|---|---|
0 | \n", "39.1 | \n", "18.7 | \n", "181.0 | \n", "MALE | \n", "Adelie Penguin (Pygoscelis adeliae) | \n", "
1 | \n", "39.5 | \n", "17.4 | \n", "186.0 | \n", "FEMALE | \n", "Adelie Penguin (Pygoscelis adeliae) | \n", "
2 | \n", "40.3 | \n", "18.0 | \n", "195.0 | \n", "FEMALE | \n", "Adelie Penguin (Pygoscelis adeliae) | \n", "
3 | \n", "36.7 | \n", "19.3 | \n", "193.0 | \n", "FEMALE | \n", "Adelie Penguin (Pygoscelis adeliae) | \n", "
4 | \n", "39.3 | \n", "20.6 | \n", "190.0 | \n", "MALE | \n", "Adelie Penguin (Pygoscelis adeliae) | \n", "