Add resource (#5219)

* Update 102-spark-airflow-kafka.md

Add link for website 'Spark by Examples'

* Update src/data/roadmaps/mlops/content/105-data-eng-fundamentals/102-spark-airflow-kafka.md

Co-authored-by: dsh <daniel.s.holdsworth@gmail.com>

---------

Co-authored-by: Kamran Ahmed <kamranahmed.se@gmail.com>
Co-authored-by: dsh <daniel.s.holdsworth@gmail.com>
pull/5651/head
Gustavo Montini de Abreu 7 months ago committed by GitHub
parent fb6c56e1aa
commit 247b24e1a3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
  1. 4
      src/data/roadmaps/mlops/content/105-data-eng-fundamentals/102-spark-airflow-kafka.md

@ -1,3 +1,7 @@
# Spark / Airflow / Kafka # Spark / Airflow / Kafka
Apache Spark is an open-source distributed computing system used for big data processing and analytics. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. On the other hand, Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. The primary use case of Airflow is to define workflows of tasks that run at specific times or in response to specific events. Apache Kafka is a distributed event streaming platform that lets you publish, subscribe to, store, and process streams of records in real time. It is often used in situations where JMS (Java Messaging Service), RabbitMQ, and other messaging systems are found to be necessary but not powerful or flexible enough. Apache Spark is an open-source distributed computing system used for big data processing and analytics. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. On the other hand, Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. The primary use case of Airflow is to define workflows of tasks that run at specific times or in response to specific events. Apache Kafka is a distributed event streaming platform that lets you publish, subscribe to, store, and process streams of records in real time. It is often used in situations where JMS (Java Messaging Service), RabbitMQ, and other messaging systems are found to be necessary but not powerful or flexible enough.
Visit the following resources to learn more:
- [Spark By Examples](https://sparkbyexamples.com)

Loading…
Cancel
Save