From 247b24e1a3346025779c3efb0e8e282b20411791 Mon Sep 17 00:00:00 2001 From: Gustavo Montini de Abreu <48092985+GMAbreu@users.noreply.github.com> Date: Fri, 10 May 2024 22:09:17 -0300 Subject: [PATCH] Add resource (#5219) * Update 102-spark-airflow-kafka.md Add link for website 'Spark by Examples' * Update src/data/roadmaps/mlops/content/105-data-eng-fundamentals/102-spark-airflow-kafka.md Co-authored-by: dsh --------- Co-authored-by: Kamran Ahmed Co-authored-by: dsh --- .../105-data-eng-fundamentals/102-spark-airflow-kafka.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/data/roadmaps/mlops/content/105-data-eng-fundamentals/102-spark-airflow-kafka.md b/src/data/roadmaps/mlops/content/105-data-eng-fundamentals/102-spark-airflow-kafka.md index 18636478e..08224f400 100644 --- a/src/data/roadmaps/mlops/content/105-data-eng-fundamentals/102-spark-airflow-kafka.md +++ b/src/data/roadmaps/mlops/content/105-data-eng-fundamentals/102-spark-airflow-kafka.md @@ -1,3 +1,7 @@ # Spark / Airflow / Kafka -Apache Spark is an open-source distributed computing system used for big data processing and analytics. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. On the other hand, Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. The primary use case of Airflow is to define workflows of tasks that run at specific times or in response to specific events. Apache Kafka is a distributed event streaming platform that lets you publish, subscribe to, store, and process streams of records in real time. It is often used in situations where JMS (Java Messaging Service), RabbitMQ, and other messaging systems are found to be necessary but not powerful or flexible enough. \ No newline at end of file +Apache Spark is an open-source distributed computing system used for big data processing and analytics. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. On the other hand, Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. The primary use case of Airflow is to define workflows of tasks that run at specific times or in response to specific events. Apache Kafka is a distributed event streaming platform that lets you publish, subscribe to, store, and process streams of records in real time. It is often used in situations where JMS (Java Messaging Service), RabbitMQ, and other messaging systems are found to be necessary but not powerful or flexible enough. + +Visit the following resources to learn more: + +- [Spark By Examples](https://sparkbyexamples.com)