A journey from machine learning experiment to live production application, and the playbook for product managers to bridge the gap between data science and deployment.
This specific project took about 10-15 hours a week over approximately 6 weeks. This timeline included everything from the initial data exploration and writing the processing/training scripts to, frankly, the most time-consuming part: debugging the Dockerfile, IAM permissions, and the CI/CD pipeline on AWS.
- Did you need to learn AWS, Docker and SageMaker, or was that with help from someone else?
This was all part of my learning journey, so I learned them as I went, with no outside help. The course provided the foundational theory, but the practical skills came from doing this project. I had to learn Docker from scratch (how to write a Dockerfile, build images, etc.) and the core AWS services myself. Just a quick clarification: I didn't use SageMaker. My pipeline used Amazon ECR (for storing the Docker image) and Amazon ECS with Fargate (for running the container). Learning the AWS permissions and networking rules to get them all talking was definitely the steepest learning curve!
- Were you using Anaconda Jupyter Notebooks with MLFlow on your laptop initially?
Yes, exactly. I started just like most projects do: using a Jupyter Notebook (in my case, within a standard Python virtual environment, but Anaconda works perfectly too) for the initial Exploratory Data Analysis (EDA). Once I had my logic figured out, I moved it into standalone Python scripts (src/data_processing.py, src/train.py). And yes, I ran the MLflow tracking server (mlflow ui) locally on my laptop to log all my experiments. The screenshots you see in the article are from that local server.
- How did you build the presentation in GitHub?
That's a great question. The presentation isn't a PowerPoint or Google Slides. It’s actually a custom-built HTML file (you can see presentation.html in the GitHub repo). I used Tailwind CSS for the styling. I then used GitHub Pages, which is a free feature built into GitHub, to host that HTML file as a live website. It was a fun way to keep all the project components, code, article, and presentation, in one place.
I hope that helps! It was a challenging project, but building a full end-to-end pipeline was the only way to make the concepts truly "click" for me.
Thanks Adam, I got it running on MacOS from within Docker now in a few hours including MLFlow, Conda, Cursor and Docker setup. I needed to modify ports, update Python/libraries and select the best model. Now for the CICD AWS part, which is a lot of AWS cloud architecture, IAM/PK management etc. Next week I will do MLOps Engineering course on AWS, which will go over a lot of this including teaching Sagemaker, so this is a good start.
Thanks for this E2E deep-dive, Adam. I have a couple of questions about the MLOps work you did, if you don't mind me asking,
- What was the course you referred to from a Professor in your article?
- How long did this project take you?
- Did you learn AWS, Docker and SageMaker, or was that with help from someone else?
- Were you using Anaconda Jupyter Notebooks with MLFlow on your laptop initially?
- How did you build the presentation in GitHub?
I appreciate you sharing your process and tools you used.
Hi Gaurav,
Thanks for connecting, and I'm happy to answer your questions. It's a challenging but really rewarding topic to get into.
Here are the answers to your questions:
- What was the course you referred to from a Professor in your article?
The course I took was a university diploma program in France called "Sorbonne Data Analytics" at Paris 1 Panthéon-Sorbonne. It was a comprehensive, in-person program taught in French, which is why it's not as open-access as a typical online course. You can see the program details here: https://formations.pantheonsorbonne.fr/fr/catalogue-des-formations/diplome-d-universite-DU/diplome-d-universite-KBVXM363/diplome-d-universite-sorbonne-data-analytics-KPMK3V7Z.html
- How long did this project take you?
This specific project took about 10-15 hours a week over approximately 6 weeks. This timeline included everything from the initial data exploration and writing the processing/training scripts to, frankly, the most time-consuming part: debugging the Dockerfile, IAM permissions, and the CI/CD pipeline on AWS.
- Did you need to learn AWS, Docker and SageMaker, or was that with help from someone else?
This was all part of my learning journey, so I learned them as I went, with no outside help. The course provided the foundational theory, but the practical skills came from doing this project. I had to learn Docker from scratch (how to write a Dockerfile, build images, etc.) and the core AWS services myself. Just a quick clarification: I didn't use SageMaker. My pipeline used Amazon ECR (for storing the Docker image) and Amazon ECS with Fargate (for running the container). Learning the AWS permissions and networking rules to get them all talking was definitely the steepest learning curve!
- Were you using Anaconda Jupyter Notebooks with MLFlow on your laptop initially?
Yes, exactly. I started just like most projects do: using a Jupyter Notebook (in my case, within a standard Python virtual environment, but Anaconda works perfectly too) for the initial Exploratory Data Analysis (EDA). Once I had my logic figured out, I moved it into standalone Python scripts (src/data_processing.py, src/train.py). And yes, I ran the MLflow tracking server (mlflow ui) locally on my laptop to log all my experiments. The screenshots you see in the article are from that local server.
- How did you build the presentation in GitHub?
That's a great question. The presentation isn't a PowerPoint or Google Slides. It’s actually a custom-built HTML file (you can see presentation.html in the GitHub repo). I used Tailwind CSS for the styling. I then used GitHub Pages, which is a free feature built into GitHub, to host that HTML file as a live website. It was a fun way to keep all the project components, code, article, and presentation, in one place.
I hope that helps! It was a challenging project, but building a full end-to-end pipeline was the only way to make the concepts truly "click" for me.
Thanks Adam, I got it running on MacOS from within Docker now in a few hours including MLFlow, Conda, Cursor and Docker setup. I needed to modify ports, update Python/libraries and select the best model. Now for the CICD AWS part, which is a lot of AWS cloud architecture, IAM/PK management etc. Next week I will do MLOps Engineering course on AWS, which will go over a lot of this including teaching Sagemaker, so this is a good start.