Add content for mlops

pull/5178/head
Kamran Ahmed 11 months ago
parent be8495a60a
commit 28a0fca90d
  1. 12
      src/data/roadmaps/mlops/content/100-programming-fundamentals/100-python.md
  2. 6
      src/data/roadmaps/mlops/content/100-programming-fundamentals/101-bash.md
  3. 3
      src/data/roadmaps/mlops/content/100-programming-fundamentals/index.md
  4. 9
      src/data/roadmaps/mlops/content/101-version-control-systems/100-git.md
  5. 14
      src/data/roadmaps/mlops/content/101-version-control-systems/101-github.md
  6. 8
      src/data/roadmaps/mlops/content/101-version-control-systems/index.md
  7. 9
      src/data/roadmaps/mlops/content/102-cloud-computing/100-aws-azure-gcp.md
  8. 9
      src/data/roadmaps/mlops/content/102-cloud-computing/101-cloud-native-ml-services.md
  9. 3
      src/data/roadmaps/mlops/content/102-cloud-computing/index.md
  10. 9
      src/data/roadmaps/mlops/content/103-containerization/100-docker.md
  11. 13
      src/data/roadmaps/mlops/content/103-containerization/101-kubernetes.md
  12. 12
      src/data/roadmaps/mlops/content/103-containerization/index.md
  13. 6
      src/data/roadmaps/mlops/content/104-ml-fundamentals.md
  14. 3
      src/data/roadmaps/mlops/content/105-data-eng-fundamentals/100-data-pipelines.md
  15. 3
      src/data/roadmaps/mlops/content/105-data-eng-fundamentals/101-data-lakes-warehouses.md
  16. 3
      src/data/roadmaps/mlops/content/105-data-eng-fundamentals/102-spark-airflow-kafka.md
  17. 3
      src/data/roadmaps/mlops/content/105-data-eng-fundamentals/index.md
  18. 9
      src/data/roadmaps/mlops/content/106-mlops-principles.md
  19. 8
      src/data/roadmaps/mlops/content/107-mlops-components/100-version-control.md
  20. 9
      src/data/roadmaps/mlops/content/107-mlops-components/101-ci-cd.md
  21. 4
      src/data/roadmaps/mlops/content/107-mlops-components/102-orchestration.md
  22. 8
      src/data/roadmaps/mlops/content/107-mlops-components/103-experiment-tracking.md
  23. 9
      src/data/roadmaps/mlops/content/107-mlops-components/104-data-lineage.md
  24. 8
      src/data/roadmaps/mlops/content/107-mlops-components/105-model-training.md
  25. 7
      src/data/roadmaps/mlops/content/107-mlops-components/106-monitoring.md
  26. 3
      src/data/roadmaps/mlops/content/107-mlops-components/index.md
  27. 7
      src/data/roadmaps/mlops/content/108-infra-as-code.md
  28. 1
      src/data/roadmaps/mlops/content/index.md

@ -0,0 +1,12 @@
# Python
Python is an interpreted high-level general-purpose programming language. Its design philosophy emphasizes code readability with its significant use of indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects. Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.
To start learning Python, here are some useful resources:
- [Python.org](https://www.python.org/) - The official website offers extensive documentation and tutorials for beginners as well as advanced users.
- [Codecademy's Python Course](https://www.codecademy.com/learn/learn-python) - A comprehensive, interactive course covering a wide range of Python topics.
- [Real Python](https://realpython.com/) - Offers a variety of Python tutorials, articles, and courses that cater to different experience levels.
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/) - A beginner-friendly book that teaches Python by guiding you through practical tasks and automation examples.
Remember, practice is key, and the more you work with Python, the more you'll appreciate its utility in the world of cyber security.

@ -0,0 +1,6 @@
# Bash
Understanding bash is essential for MLOps tasks.
- **Book Suggestion:** _The Linux Command Line, 2nd Edition_ by William E. Shotts
- [Bash Scripting Tutorial](https://www.freecodecamp.org/news/bash-scripting-tutorial-linux-shell-script-and-command-line-for-beginners/)

@ -0,0 +1,3 @@
# Programming Fundamentals
Programming is the key requirement for MLOps. You need to be proficient in atleast one programming language. Python is the most popular language for MLOps.

@ -0,0 +1,9 @@
# Git
[Git](https://git-scm.com/) is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Visit the following resources to learn more:
- [Git & GitHub Crash Course For Beginners](https://www.youtube.com/watch?v=SWYqp7iY_Tc)
- [Learn Git with Tutorials, News and Tips - Atlassian](https://www.atlassian.com/git)
- [Git Cheat Sheet](https://cs.fyi/guide/git-cheatsheet)

@ -0,0 +1,14 @@
# GitHub
GitHub is a provider of Internet hosting for software development and version control using Git. It offers the distributed version control and source code management functionality of Git, plus its own features.
Visit the following resources to learn more:
- [GitHub Website](https://github.com)
- [GitHub Documentation](https://docs.github.com/en/get-started/quickstart)
- [How to Use Git in a Professional Dev Team](https://ooloo.io/project/github-flow)
- [What is GitHub?](https://www.youtube.com/watch?v=w3jLJU7DT5E)
- [Git vs. GitHub: Whats the difference?](https://www.youtube.com/watch?v=wpISo9TNjfU)
- [Git and GitHub for Beginners](https://www.youtube.com/watch?v=RGOj5yH7evk)
- [Git and GitHub - CS50 Beyond 2019](https://www.youtube.com/watch?v=eulnSXkhE7I)
- [Learn Git Branching](https://learngitbranching.js.org/?locale=en_us)

@ -0,0 +1,8 @@
# Version Control Systems
Version control/source control systems allow developers to track and control changes to code over time. These services often include the ability to make atomic revisions to code, branch/fork off of specific points, and to compare versions of code. They are useful in determining the who, what, when, and why code changes were made.
Visit the following resources to learn more:
- [Git](https://git-scm.com/)
- [What is Version Control?](https://www.atlassian.com/git/tutorials/what-is-version-control)

@ -0,0 +1,9 @@
# AWS / Azure / GCP
AWS (Amazon Web Services) Azure and GCP (Google Cloud Platform) are three leading providers of cloud computing services. AWS by Amazon is the oldest and the most established among the three, providing a breadth and depth of solutions ranging from infrastructure services like compute, storage, and databases to the machine and deep learning. Azure, by Microsoft, has integrated tools for DevOps, supports a large number of programming languages, and offers seamless integration with on-prem servers and Microsoft’s software. Google's GCP has strength in cost-effectiveness, live migration of virtual machines, and flexible computing options. All three have introduced various MLOps tools and services to boost capabilities for machine learning development and operations.
Visit the following resources to learn more about AWS, Azure, and GCP:
- [AWS Roadmap](https://roadmap.sh/aws)
- [Azure Tutorials](https://docs.microsoft.com/en-us/learn/azure/)
- [GCP Learning Resources](https://cloud.google.com/training)

@ -0,0 +1,9 @@
# Cloud-native ML Services
Most of the cloud providers offer managed services for machine learning. These services are designed to help data scientists and machine learning engineers to build, train, and deploy machine learning models at scale. These services are designed to be cloud-native, meaning they are designed to work with other cloud services and are optimized for the cloud environment.
Here are the services offered by the major cloud providers:
- **Amazon Web Services (AWS)**: SageMaker
- **Google Cloud Platform (GCP)**: AI Platform
- **Microsoft Azure**: Azure Machine Learning

@ -0,0 +1,3 @@
# Cloud Computing
**Cloud Computing** refers to the delivery of computing services over the internet rather than using local servers or personal devices. These services include servers, storage, databases, networking, software, analytics, and intelligence. Cloud Computing enables faster innovation, flexible resources, and economies of scale. There are various types of cloud computing such as public clouds, private clouds, and hybrids clouds. Furthermore, it's divided into different services like Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). These services differ mainly in the level of control an organization has over their data and infrastructures.

@ -0,0 +1,9 @@
# Docker
Docker is a platform for working with containerized applications. Among its features are a daemon and client for managing and interacting with containers, registries for storing images, and a desktop application to package all these features together.
Visit the following resources to learn more:
- [Docker Documentation](https://docs.docker.com/)
- [Docker Tutorial](https://www.youtube.com/watch?v=RqTEHSBrYFw)
- [Docker simplified in 55 seconds](https://youtu.be/vP_4DlOH1G4)

@ -0,0 +1,13 @@
# Kubernetes
Kubernetes is an [open source](https://github.com/kubernetes/kubernetes) container management platform, and the dominant product in this space. Using Kubernetes, teams can deploy images across multiple underlying hosts, defining their desired availability, deployment logic, and scaling logic in YAML. Kubernetes evolved from Borg, an internal Google platform used to provision and allocate compute resources (similar to the Autopilot and Aquaman systems of Microsoft Azure).
The popularity of Kubernetes has made it an increasingly important skill for the DevOps Engineer and has triggered the creation of Platform teams across the industry. These Platform engineering teams often exist with the sole purpose of making Kubernetes approachable and usable for their product development colleagues.
Visit the following resources to learn more:
- [Kubernetes Website](https://kubernetes.io/)
- [Kubernetes Documentation](https://kubernetes.io/docs/home/)
- [Kubernetes Crash Course for Absolute Beginners](https://www.youtube.com/watch?v=s_o8dwzRlu4)
- [Primer: How Kubernetes Came to Be, What It Is, and Why You Should Care](https://thenewstack.io/primer-how-kubernetes-came-to-be-what-it-is-and-why-you-should-care/)
- [Kubernetes: An Overview](https://thenewstack.io/kubernetes-an-overview/)

@ -0,0 +1,12 @@
# Containers
Containers are a construct in which [cgroups](https://en.wikipedia.org/wiki/Cgroups), [namespaces](https://en.wikipedia.org/wiki/Linux_namespaces), and [chroot](https://en.wikipedia.org/wiki/Chroot) are used to fully encapsulate and isolate a process. This encapsulated process, called a container image, shares the kernel of the host with other containers, allowing containers to be significantly smaller and faster than virtual machines.
These images are designed for portability, allowing for full local testing of a static image, and easy deployment to a container management platform.
Visit the following resources to learn more:
- [What are Containers?](https://cloud.google.com/learn/what-are-containers)
- [What is a Container?](https://www.docker.com/resources/what-container/)
- [What are Containers?](https://www.youtube.com/playlist?list=PLawsLZMfND4nz-WDBZIj8-nbzGFD4S9oz)
- [Articles about Containers - The New Stack](https://thenewstack.io/category/containers/)

@ -0,0 +1,6 @@
# Machine Learning Fundamentals
An MLOps engineer should have a basic understanding of machine learning models.
- **Courses:** [MLCourse.ai](https://mlcourse.ai/), [Fast.ai](https://course.fast.ai)
- **Book Suggestion:** _Applied Machine Learning and AI for Engineers_ by Jeff Prosise

@ -0,0 +1,3 @@
# Data Pipelines
Data pipelines refer to a set of processes that involve moving data from one system to another, for purposes such as data integration, data migration, data transformation, or data synchronization. These processes can involve a variety of data sources and destinations, and may often require data to be cleaned, enriched, or otherwise transformed along the way. It's a key concept in data engineering to ensure that data is appropriately processed from its source to the location where it will be used, typically a data warehouse, data mart, or a data lake. As such, data pipelines play a crucial part in building an effective and efficient data analytics setup, enabling the flow of data to be processed for insights.

@ -0,0 +1,3 @@
# Data lakes & Warehouses
"**Data Lakes** are large-scale data repository systems that store raw, untransformed data, in various formats, from multiple sources. They're often used for big data and real-time analytics requirements. Data lakes preserve the original data format and schema which can be modified as necessary. On the other hand, **Data Warehouses** are data storage systems which are designed for analyzing, reporting and integrating with transactional systems. The data in a warehouse is clean, consistent, and often transformed to meet wide-range of business requirements. Hence, data warehouses provide structured data but require more processing and management compared to data lakes."

@ -0,0 +1,3 @@
# Spark / Airflow / Kafka
Apache Spark is an open-source distributed computing system used for big data processing and analytics. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. On the other hand, Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. The primary use case of Airflow is to define workflows of tasks that run at specific times or in response to specific events. Apache Kafka is a distributed event streaming platform that lets you publish, subscribe to, store, and process streams of records in real time. It is often used in situations where JMS (Java Messaging Service), RabbitMQ, and other messaging systems are found to be necessary but not powerful or flexible enough.

@ -0,0 +1,3 @@
# Data Engineering Fundamentals
Data Engineering is essentially dealing with the collection, validation, storage, transformation, and processing of data. The objective is to provide reliable, efficient, and scalable data pipelines and infrastructure that allow data scientists to convert data into actionable insights. It involves steps like data ingestion, data storage, data processing, and data provisioning. Important concepts include designing, building, and maintaining data architecture, databases, processing systems, and large-scale processing systems. It is crucial to have extensive technical knowledge in various tools and programming languages like SQL, Python, Hadoop, and more.

@ -0,0 +1,9 @@
# MLOps Principles
Awareness of MLOps principles and maturity factors is required.
- **Books:**
- _Designing Machine Learning Systems_ by Chip Huyen
- _Introducing MLOps_ by Mark Treveil and Dataiku
- **Assessment:** [MLOps maturity assessment](https://marvelousmlops.substack.com/p/mlops-maturity-assessment)
- **Great resource on MLOps:** [ml-ops.org](https://ml-ops.org)

@ -0,0 +1,8 @@
# Version Control Systems
Version control/source control systems allow developers to track and control changes to code over time. These services often include the ability to make atomic revisions to code, branch/fork off of specific points, and to compare versions of code. They are useful in determining the who, what, when, and why code changes were made.
Visit the following resources to learn more:
- [Git](https://git-scm.com/)
- [What is Version Control?](https://www.atlassian.com/git/tutorials/what-is-version-control)

@ -0,0 +1,9 @@
# CI / CD
Critical for traceable and reproducible ML model deployments.
- **Books:**
- _Learning GitHub Actions_ by Brent Laster
- _Learning Git_ by Anna Skoulikari
- **Tutorials & Courses:** [Git & GitHub for beginners](https://www.youtube.com/watch?v=RGOj5yH7evk), [Python to Production guide](https://www.udemy.com/course/setting-up-the-linux-terminal-for-software-development/), [Version Control Missing Semester](https://missing.csail.mit.edu/2020/version-control/), https://learngitbranching.js.org/
- **Tool:** [Pre-commit hooks](https://marvelousmlops.substack.com/p/welcome-to-pre-commit-heaven)

@ -0,0 +1,4 @@
Systems like Airflow and Mage are important in ML engineering.
- **Course:** [Introduction to Airflow in Python](https://app.datacamp.com/learn/courses/introduction-to-airflow-in-python)
- **Note:** Airflow is also featured in the _ML Engineering with Python_ book and [_The Full Stack 7-Steps MLOps Framework_](https://www.pauliusztin.me/courses/the-full-stack-7-steps-mlops-framework).

@ -0,0 +1,8 @@
# Experiment Tracking and Model Registry
**Experiment Tracking** is an essential part of MLOps, providing a system to monitor and record the different experiments conducted during the machine learning model development process. This involves capturing, organizing and visualizing the metadata associated with each experiment, such as hyperparameters used, models produced, metrics like accuracy or loss, and other information about the computational environment. This tracking allows for reproducibility of experiments, comparison across different experiment runs, and helps in identifying the best models.
Logging metadata, parameters, and artifacts of training runs.
- **Tool:** MLflow
- **Courses:** [MLflow Udemy course](https://www.udemy.com/course/mlflow-course/), [End-to-end machine learning (MLflow piece)](https://www.udemy.com/course/sustainable-and-scalable-machine-learning-project-development/)

@ -0,0 +1,9 @@
# Data Lineage and Feature Stores
**Data Lineage** refers to the life-cycle of data, including its origins, movements, characteristics and quality. It's a critical component in MLOps for tracking the journey of data through every process in a pipeline, from raw input to model output. Data lineage helps in maintaining transparency, ensuring compliance, and facilitating data debugging or tracing data related bugs. It provides a clear representation of data sources, transformations, and dependencies thereby aiding in audits, governance, or reproduction of machine learning models.
Feature stores are a crucial component of MLOps infrastructure.
- **Tutorial:** Creating a feature store with Feast [Part 1](https://kedion.medium.com/creating-a-feature-store-with-feast-part-1-37c380223e2f) [Part 2](https://kedion.medium.com/feature-storage-for-ml-with-feast-part-2-34df1971a8d3) [Part 3](https://kedion.medium.com/feature-storage-for-ml-with-feast-a061899fc4a2)
- **Tool:** DVC for data tracking
- **Course:** [End-to-end machine learning (DVC piece)](https://www.udemy.com/course/sustainable-and-scalable-machine-learning-project-development/)

@ -0,0 +1,8 @@
# Model Training and Serving
"Model Training" refers to the phase in the Machine Learning (ML) pipeline where we teach a machine learning model how to make predictions by providing it with data. This process begins with feeding the model a training dataset, which it uses to learn and understand patterns or perform computations. The model's performance is then evaluated by comparing its prediction outputs with the actual results. Various algorithms can be used in the model training process. The choice of algorithm usually depends on the task, the data available, and the requirements of the project. It is worth noting that the model training stage can be computationally expensive particularly when dealing with large datasets or complex models.
Decisions depend on the organization's infrastructure.
- **Repository Suggestion:** [ML Deployment k8s Fast API](https://github.com/sayakpaul/ml-deployment-k8s-fastapi/tree/main)
- **Tutorial Suggestions:** [ML deployment with k8s FastAPI, Building an ML app with FastAPI](https://dev.to/bravinsimiyu/beginner-guide-on-how-to-build-a-machine-learning-app-with-fastapi-part-ii-deploying-the-fastapi-application-to-kubernetes-4j6g), [Basic Kubeflow pipeline](https://towardsdatascience.com/tutorial-basic-kubeflow-pipeline-from-scratch-5f0350dc1905), [Building and deploying ML pipelines](https://www.datacamp.com/tutorial/kubeflow-tutorial-building-and-deploying-machine-learning-pipelines?utm_source=google&utm_medium=paid_search&utm_campaignid=19589720818&utm_adgroupid=157156373991&utm_device=c&utm_keyword=&utm_matchtype=&utm_network=g&utm_adpostion=&utm_creative=683184494153&utm_targetid=dsa-2218886984380&utm_loc_interest_ms=&utm_loc_physical_ms=9064564&utm_content=&utm_campaign=230119_1-sea~dsa~tofu_2-b2c_3-eu_4-prc_5-na_6-na_7-le_8-pdsh-go_9-na_10-na_11-na-dec23&gad_source=1&gclid=Cj0KCQiA4Y-sBhC6ARIsAGXF1g7iSih9h2RGL27LwWY6dlPLhEss-e5Af8pnaBvdDynRh7IHIKi8sGgaApD-EALw_wcB), [KServe tutorial](https://towardsdatascience.com/kserve-highly-scalable-machine-learning-deployment-with-kubernetes-aa7af0b71202)

@ -0,0 +1,7 @@
# Monitoring and Observability
**Monitoring** in MLOps primarily involves tracking the performance of machine learning (ML) models in production to ensure that they continually deliver accurate and reliable results. Such monitoring is necessary because the real-world data that these models handle may change over time, a scenario known as data drift. These changes can adversely affect model performance. Monitoring helps to detect any anomalies in the model’s behaviour or performance and such alerts can trigger the retraining of models with new data. From a broader perspective, monitoring also involves tracking resources and workflows to detect and rectify any operational issues in the MLOps pipeline.
- [**ML Monitoring vs Observability article**](https://marvelousmlops.substack.com/p/ml-monitoring-vs-ml-observability)
- **Course:** [Machine learning monitoring concepts](https://app.datacamp.com/learn/courses/machine-learning-monitoring-concepts), [Monitoring ML in Python](https://app.datacamp.com/learn/courses/monitoring-machine-learning-in-python)
- **Tools:** [Prometheus, Grafana](https://www.udemy.com/course/mastering-prometheus-and-grafana/)

@ -0,0 +1,3 @@
# MLOps Components
MLOps components can be broadly classified into three major categories: Development, Operations and Governance. The **Development** components include everything involved in the creation of machine learning models, such as data extraction, data analysis, feature engineering, and machine learning model training. The **Operations** category includes components involved in deploying, monitoring, and maintaining machine learning models in production. This may include release management, model serving, and performance monitoring. Lastly, the **Governance** category encompasses the policies and regulations related to machine learning models. This includes model audit and tracking, model explainability, and security & compliance regulations.

@ -0,0 +1,7 @@
# Infrastructure as Code
Essential for a reproducible MLOps framework.
- **Course:** [Terraform course for beginners](https://www.youtube.com/watch?v=SLB_c_ayRMo)
- **Video:** [8 Terraform best practices by Techworld by Nana](https://www.youtube.com/watch?v=gxPykhPxRW0)
- **Book Suggestion:** _Terraform: Up and Running, 3rd Edition_ by Yevgeniy Brikman
Loading…
Cancel
Save