From 7acb0250fcabe7ece88664d64117f956c1c562e6 Mon Sep 17 00:00:00 2001 From: syedmouaazfarrukh Date: Wed, 18 Jan 2023 22:53:43 -0800 Subject: [PATCH] Adding content to 102-resiliency --- .../101-high-availability/circuit-breaker.md | 1 + .../101-high-availability/index.md | 1 + .../103-reliability-patterns/102-resiliency/bulkhead.md | 9 ++++++++- .../102-resiliency/circuit-breaker.md | 9 ++++++++- .../102-resiliency/compensating-transaction.md | 9 ++++++++- .../102-resiliency/health-endpoint-monitoring.md | 9 ++++++++- .../103-reliability-patterns/102-resiliency/index.md | 8 +++++++- .../102-resiliency/leader-election.md | 9 ++++++++- .../102-resiliency/queue-based-load-leveling.md | 8 +++++++- .../103-reliability-patterns/102-resiliency/retry.md | 9 ++++++++- .../102-resiliency/scheduler-agent-supervisor.md | 8 +++++++- 11 files changed, 71 insertions(+), 9 deletions(-) diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/circuit-breaker.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/circuit-breaker.md index bd3eed4d2..ff66afacf 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/circuit-breaker.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/circuit-breaker.md @@ -1,5 +1,6 @@ # Circuit Breaker +Circuit Breaker in system design is a pattern that is used to prevent an application from repeatedly trying to perform an action that is likely to fail. By tripping the circuit breaker when an operation fails a certain number of times, the system can prevent cascading failures, provide fallback behavior, and monitor system health. It can be implemented in several different ways such as State machine, and Hystrix (library for Java). Learn more from the following links: diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/index.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/index.md index bc0af3dfd..3b6f4302e 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/index.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/index.md @@ -1,5 +1,6 @@ # High availability +High availability in system design refers to the ability of a system to continue operating even in the event of a failure or outage. This is often achieved by designing the system to be redundant, meaning that multiple copies of the system are running at the same time, and if one copy fails, the others can take over. It can be achieved by using Redundancy, Load balancing, and Failover. It can be measured using metrics such as Mean Time Between Failures (MTBF), Mean Time To Recovery (MTTR) and Availability. Learn more from the following links: diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/bulkhead.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/bulkhead.md index e158980d6..e4d4e6156 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/bulkhead.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/bulkhead.md @@ -1 +1,8 @@ -# Bulkhead \ No newline at end of file +# Bulkhead + +Bulkhead in system design refers to a technique for isolating different parts of a system to prevent one part from affecting the performance of the whole system. The term "bulkhead" is used to refer to the partitions or walls that are used to separate different parts of the system. It allows to Isolate critical parts of the system, prevent cascading failures and provide isolation for different types of requests. It can be implemented in several different ways such as Thread pools, Circuit breakers, and Workers. + +Learn more from the following links: + +- [Bulkhead pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/bulkhead) +- [Get started with Bulkhead](https://dzone.com/articles/resilient-microservices-pattern-bulkhead-pattern) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/circuit-breaker.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/circuit-breaker.md index 56427e3fa..ff66afacf 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/circuit-breaker.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/circuit-breaker.md @@ -1 +1,8 @@ -# Circuit breaker \ No newline at end of file +# Circuit Breaker + +Circuit Breaker in system design is a pattern that is used to prevent an application from repeatedly trying to perform an action that is likely to fail. By tripping the circuit breaker when an operation fails a certain number of times, the system can prevent cascading failures, provide fallback behavior, and monitor system health. It can be implemented in several different ways such as State machine, and Hystrix (library for Java). + +Learn more from the following links: + +- [Circuit breaker design pattern](https://en.wikipedia.org/wiki/Circuit_breaker_design_pattern) +- [Overview of Circuit Breaker](https://medium.com/geekculture/design-patterns-for-microservices-circuit-breaker-pattern-276249ffab33) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/compensating-transaction.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/compensating-transaction.md index 55256f4dd..959d24a88 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/compensating-transaction.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/compensating-transaction.md @@ -1 +1,8 @@ -# Compensating transaction \ No newline at end of file +# Compensating Transaction + +A Compensating Transaction in system design refers to a mechanism for reversing or undoing the effects of a previously executed transaction in a system. It can be used to ensure that the system remains in a consistent state, even if a subsequent transaction fails or is rolled back. Typically used in systems that implement the principles of ACID transactions, it can be implemented in several different ways such as undo logs, savepoints. + +Learn more from the following resources: + +- [Compensating Transaction pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/compensating-transaction) +- [Intro to Compensation Transaction](https://en.wikipedia.org/wiki/Compensating_transaction) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/health-endpoint-monitoring.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/health-endpoint-monitoring.md index 05c137bb9..edb34eb22 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/health-endpoint-monitoring.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/health-endpoint-monitoring.md @@ -1 +1,8 @@ -# Health endpoint monitoring \ No newline at end of file +# Health Endpoint Monitoring + +Health Endpoint Monitoring in system design refers to a technique for monitoring the health of a system by periodically sending requests to a specific endpoint, called a "health endpoint", on the system. The health endpoint returns a response indicating the current status of the system, such as whether it is running properly or if there are any issues. It allows to Monitor the overall health of the system, Provide insight into the system's performance, and automate the process of monitoring. It can be implemented in several different ways such as Periodic requests and Event-based monitoring. + +To learn more visit the following links: + +- [Health Endpoint Monitoring pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring) +- [Explaining the health endpoint monitoring pattern](https://www.oreilly.com/library/view/java-ee-8/9781788830621/5012c01e-90ca-4809-a210-d3736574f5b3.xhtml) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/index.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/index.md index 7efb356b2..50e218654 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/index.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/index.md @@ -1 +1,7 @@ -# Resiliency \ No newline at end of file +# Resilience + +Resilience in system design refers to the ability of a system to withstand and recover from disruptions, failures or unexpected conditions. It means the system can continue to function and provide service even when faced with stressors such as high traffic, failures or unexpected changes. Resilience can be achieved by designing the system to be redundant, fault-tolerant, scalable, having automatic recovery, and monitoring and alerting mechanisms. It can be measured by Recovery Time Objective (RTO), Recovery Point Objective (RPO), Mean time to failure (MTTF), and Mean time to recovery (MTTR). + +Learn more from the following links: + +- [System Resilience: What Exactly is it?](https://insights.sei.cmu.edu/blog/system-resilience-what-exactly-is-it/) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/leader-election.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/leader-election.md index 3b7c08ee2..6de11a45c 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/leader-election.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/leader-election.md @@ -1 +1,8 @@ -# Leader election \ No newline at end of file +# Leader Election + +Leader Election in system design is a pattern that is used to elect a leader among a group of distributed nodes in a system. The leader is responsible for coordinating the activities of the other nodes and making decisions on behalf of the group. Leader Election is important in distributed systems, as it ensures that there is a single point of coordination and decision-making, reducing the risk of conflicting actions or duplicate work. Leader Election can be used to ensure a single point of coordination, provide fault tolerance, and scalability. There are several algorithms such as Raft, Paxos, and Zab that can be used to implement Leader Election in distributed systems. + +To learn more, visit the following links: + +- [Overview of Leader Election](https://aws.amazon.com/builders-library/leader-election-in-distributed-systems/) +- [What is Leader Election in system design?](https://www.enjoyalgorithms.com/blog/leader-election-system-design) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/queue-based-load-leveling.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/queue-based-load-leveling.md index 0a40a26b1..026abf013 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/queue-based-load-leveling.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/queue-based-load-leveling.md @@ -1 +1,7 @@ -# Queue based load leveling \ No newline at end of file +# Queue-Based load leveling + +Queue-based load leveling in system design refers to a technique for managing the workload of a system by using a queue to buffer incoming requests and process them at a steady pace. By using a queue, the system can handle bursts of incoming requests without being overwhelmed, as well as prevent idle periods where there are not enough requests to keep the system busy. It allows to smooth out bursts of incoming requests, prevent idle periods, Provide a way to prioritize requests, and provide a way to monitor requests. It can be implemented in several different ways such as In-memory queue and Persistent queue. + +To learn more visit the following links: + +- [Queue-Based Load Leveling pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/queue-based-load-leveling) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/retry.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/retry.md index 1ba28f640..90656178b 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/retry.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/retry.md @@ -1 +1,8 @@ -# Retry \ No newline at end of file +# Retry + +Retry in system design refers to the process of automatically re-executing a failed operation in the hopes of getting a successful outcome. Retries are used to handle transient failures such as network errors, temporary unavailability of a service, or other issues that may be resolved quickly. Retries can be an effective way of dealing with these types of failures, as they can help to ensure that the system continues to function, even in the face of temporary disruptions. + +Learn more from the following resources: + +- [Introducing Retry](https://engineering.grab.com/designing-resilient-systems-part-2) +- [Retry pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/retry) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/scheduler-agent-supervisor.md b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/scheduler-agent-supervisor.md index b12084317..0b36cfd37 100644 --- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/scheduler-agent-supervisor.md +++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/scheduler-agent-supervisor.md @@ -1 +1,7 @@ -# Scheduler agent supervisor \ No newline at end of file +# Scheduling Agent Supervisor + +Scheduling Agent Supervisor in system design is a pattern that allows for the scheduling and coordination of tasks or processes by a central entity, known as the Scheduling Agent. The Scheduling Agent is responsible for scheduling tasks, monitoring their execution, and handling errors or failures. This pattern can be used to build robust and fault-tolerant systems, by ensuring that tasks are executed as intended and that any errors or failures are handled appropriately. + +Learn more from the following links: + +- [Scheduler Agent Supervisor pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/scheduler-agent-supervisor) \ No newline at end of file