Add content for reliability patterns

pull/3331/head
Kamran Ahmed 2 years ago
parent ad4f35764d
commit e934dc60f4
  1. 5
      src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/index.md
  2. 5
      src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/index.md
  3. 8
      src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/index.md
  4. 5
      src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/index.md

@ -1,8 +1,7 @@
# Availability
Availability refers to the ability of a system to perform its intended function without interruption. High availability is desired as it means that the system is less likely to experience downtime, and when it does, it can quickly recover. To increase the availability of a system, several methods can be used such as Redundancy, Load balancing, Failover, Monitoring, and Automated recovery.
Availability is measured as a percentage of uptime, and defines the proportion of time that a system is functional and working. Availability is affected by system errors, infrastructure problems, malicious attacks, and system load. Cloud applications typically provide users with a service level agreement (SLA), which means that applications must be designed and implemented to maximize availability.
To learn more visit the following links:
- [System Design: Availability](https://dev.to/karanpratapsingh/system-design-availability-38bd)
- [Concept of Availability in system design](https://www.enjoyalgorithms.com/blog/availability-system-design-concept)
- [Availability Patterns](https://learn.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns#availability)

@ -1,8 +1,7 @@
# High availability
High availability refers to the ability of a system to continue operating even in the event of a failure or outage. This is often achieved by designing the system to be redundant, meaning that multiple copies of the system are running at the same time, and if one copy fails, the others can take over. It can be achieved by using Redundancy, Load balancing, and Failover. It can be measured using metrics such as Mean Time Between Failures (MTBF), Mean Time To Recovery (MTTR) and Availability.
Azure infrastructure is composed of geographies, regions, and Availability Zones, which limit the blast radius of a failure and therefore limit potential impact to customer applications and data. The Azure Availability Zones construct was developed to provide a software and networking solution to protect against datacenter failures and to provide increased high availability (HA) to our customers. With HA architecture there is a balance between high resilience, low latency, and cost.
Learn more from the following links:
- [What is High availability (HA)?](https://www.techtarget.com/searchdatacenter/definition/high-availability)
- [Introduction to High Availability Architecture](https://www.filecloud.com/blog/an-introduction-to-high-availability-architecture/)
- [High availability Patterns](https://learn.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns#high-availability)

@ -1,7 +1,11 @@
# Resilience
Resilience refers to the ability of a system to withstand and recover from disruptions, failures or unexpected conditions. It means the system can continue to function and provide service even when faced with stressors such as high traffic, failures or unexpected changes. Resilience can be achieved by designing the system to be redundant, fault-tolerant, scalable, having automatic recovery, and monitoring and alerting mechanisms. It can be measured by Recovery Time Objective (RTO), Recovery Point Objective (RPO), Mean time to failure (MTTF), and Mean time to recovery (MTTR).
Resiliency is the ability of a system to gracefully handle and recover from failures, both inadvertent and malicious.
The nature of cloud hosting, where applications are often multi-tenant, use shared platform services, compete for resources and bandwidth, communicate over the Internet, and run on commodity hardware means there is an increased likelihood that both transient and more permanent faults will arise. The connected nature of the internet and the rise in sophistication and volume of attacks increase the likelihood of a security disruption.
Detecting failures and recovering quickly and efficiently, is necessary to maintain resiliency.
Learn more from the following links:
- [System Resilience: What Exactly is it?](https://insights.sei.cmu.edu/blog/system-resilience-what-exactly-is-it/)
- [Resiliency Patterns](https://learn.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns#resiliency)

@ -1,8 +1,7 @@
# Reliability Patterns
Reliability patterns are solutions to common problems that arise when building systems that need to be highly available and fault-tolerant. These patterns provide a way to design and implement systems that can withstand failures, maintain high levels of performance, and recover quickly from disruptions. Some common reliability patterns include Failover, Circuit Breaker, Retry, Bulkhead, Backpressure, Cache-Aside, Idempotent Operations and Health Endpoint Monitoring.
These patterns provide a way to design and implement systems that can withstand failures, maintain high levels of performance, and recover quickly from disruptions. Some common reliability patterns include Failover, Circuit Breaker, Retry, Bulkhead, Backpressure, Cache-Aside, Idempotent Operations and Health Endpoint Monitoring.
Learn more from the following links:
- [Reliability Patterns: A Survey](http://laccei.org/LACCEI2019-MontegoBay/full_papers/FP53.pdf)
- [Get started with Reliability Patterns](https://learn.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns)
- [Reliability Patterns](https://learn.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns)
Loading…
Cancel
Save