Add content to cloud design patterns

2 years ago · 5c2562dadb
parent e934dc60f4
commit 5c2562dadb
22 changed files with 31 additions and 41 deletions
--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/deployment-stamps.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/deployment-stamps.md
@ -1,6 +1,6 @@
 # Deployment Stamps

-Deployment Stamps refers to a technique used to manage the deployment of a system across different environments, such as development, staging, and production. A deployment stamp is a set of environment-specific configurations and settings that are applied to the system during the deployment process. It allows to manage environment-specific configurations, ensure consistency across environments, and simplify the deployment process. It can be implemented in several different ways such as Configuration files, Environment variables and Deployment script.
+The deployment stamp pattern involves provisioning, managing, and monitoring a heterogeneous group of resources to host and operate multiple workloads or tenants. Each individual copy is called a stamp, or sometimes a service unit, scale unit, or cell. In a multi-tenant environment, every stamp or scale unit can serve a predefined number of tenants. Multiple stamps can be deployed to scale the solution almost linearly and serve an increasing number of tenants. This approach can improve the scalability of your solution, allow you to deploy instances across multiple regions, and separate your customer data.

 To learn more visit the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/geodes.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/geodes.md
@ -1,6 +1,6 @@
 # Geodes

-Geodes refers to a technique of partitioning a large dataset into smaller chunks, called geodes, that can be stored and processed more efficiently. Geodes are similar to shards in database partitioning, but the term is often used in the context of distributed systems and data processing. It allows to Scale the system, Improve performance and balance the load. It can be implemented in several different ways such as Hashing and Range-based partitioning.
+The Geode pattern involves deploying a collection of backend services into a set of geographical nodes, each of which can service any request for any client in any region. This pattern allows serving requests in an active-active style, improving latency and increasing availability by distributing request processing around the globe.

 To learn more visit the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/health-endpoint-monitoring.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/health-endpoint-monitoring.md
@ -1,6 +1,6 @@
 # Health Endpoint Monitoring

-Health Endpoint Monitoring refers to a technique for monitoring the health of a system by periodically sending requests to a specific endpoint, called a "health endpoint", on the system. The health endpoint returns a response indicating the current status of the system, such as whether it is running properly or if there are any issues. It allows to Monitor the overall health of the system, Provide insight into the system's performance, and automate the process of monitoring. It can be implemented in several different ways such as Periodic requests and Event-based monitoring.
+Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals. This can help to verify that applications and services are performing correctly.

 To learn more visit the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/queue-based-load-leveling.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/queue-based-load-leveling.md
@ -1,6 +1,6 @@
 # Queue-Based load leveling

-Queue-based load leveling refers to a technique for managing the workload of a system by using a queue to buffer incoming requests and process them at a steady pace. By using a queue, the system can handle bursts of incoming requests without being overwhelmed, as well as prevent idle periods where there are not enough requests to keep the system busy. It allows to smooth out bursts of incoming requests, prevent idle periods, Provide a way to prioritize requests, and provide a way to monitor requests. It can be implemented in several different ways such as In-memory queue and Persistent queue.
+Use a queue that acts as a buffer between a task and a service it invokes in order to smooth intermittent heavy loads that can cause the service to fail or the task to time out. This can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.

 To learn more visit the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/throttling.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/100-availability/throttling.md
@ -1,6 +1,6 @@
 # Throttling

-Throttling refers to a technique for limiting the rate at which requests are processed by a system. This is often used to prevent the system from being overwhelmed by a high volume of requests, or to ensure that resources are used efficiently. Throttling can be applied to incoming requests, outgoing requests or both, and can be implemented at different levels of the system, such as at the network, application, or service level. It allows to prevent system overload, ensure efficient resource usage, provide Quality of Service (QoS) and prevent Denial of Service (DoS). It can be implemented in several different ways such as Rate limiting, Leaking bucket and Token bucket.
+Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. This can allow the system to continue to function and meet service level agreements, even when an increase in demand places an extreme load on resources.

 To learn more visit the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/bulkhead.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/bulkhead.md
@ -1,6 +1,6 @@
 # Bulkhead

-Bulkhead refers to a technique for isolating different parts of a system to prevent one part from affecting the performance of the whole system. The term "bulkhead" is used to refer to the partitions or walls that are used to separate different parts of the system. It allows to Isolate critical parts of the system, prevent cascading failures and provide isolation for different types of requests. It can be implemented in several different ways such as Thread pools, Circuit breakers, and Workers.
+The Bulkhead pattern is a type of application design that is tolerant of failure. In a bulkhead architecture, elements of an application are isolated into pools so that if one fails, the others will continue to function. It's named after the sectioned partitions (bulkheads) of a ship's hull. If the hull of a ship is compromised, only the damaged section fills with water, which prevents the ship from sinking.

 Learn more from the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/circuit-breaker.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/circuit-breaker.md
@ -1,6 +1,6 @@
 # Circuit Breaker

-Circuit Breaker is a pattern that is used to prevent an application from repeatedly trying to perform an action that is likely to fail. By tripping the circuit breaker when an operation fails a certain number of times, the system can prevent cascading failures, provide fallback behavior, and monitor system health. It can be implemented in several different ways such as State machine, and Hystrix (library for Java).
+Handle faults that might take a variable amount of time to recover from, when connecting to a remote service or resource. This can improve the stability and resiliency of an application.

 Learn more from the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/deployment-stamps.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/deployment-stamps.md
@ -1,6 +1,6 @@
 # Deployment Stamps

-Deployment Stamps refers to a technique used to manage the deployment of a system across different environments, such as development, staging, and production. A deployment stamp is a set of environment-specific configurations and settings that are applied to the system during the deployment process. It allows to manage environment-specific configurations, ensure consistency across environments, and simplify the deployment process. It can be implemented in several different ways such as Configuration files, Environment variables and Deployment script.
+The Bulkhead pattern is a type of application design that is tolerant of failure. In a bulkhead architecture, elements of an application are isolated into pools so that if one fails, the others will continue to function. It's named after the sectioned partitions (bulkheads) of a ship's hull. If the hull of a ship is compromised, only the damaged section fills with water, which prevents the ship from sinking.

 To learn more visit the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/geodes.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/geodes.md
@ -1,8 +1,7 @@
 # Geodes

-Geodes refers to a technique of partitioning a large dataset into smaller chunks, called geodes, that can be stored and processed more efficiently. Geodes are similar to shards in database partitioning, but the term is often used in the context of distributed systems and data processing. It allows to Scale the system, Improve performance and balance the load. It can be implemented in several different ways such as Hashing and Range-based partitioning.
+The Geode pattern involves deploying a collection of backend services into a set of geographical nodes, each of which can service any request for any client in any region. This pattern allows serving requests in an active-active style, improving latency and increasing availability by distributing request processing around the globe.

 To learn more visit the following links:

 - [Geode pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/geodes)
- [Geode Formation, Types & Appearance | What is a Geode?](https://study.com/academy/lesson/geode-formation-types-appearance.ht)
--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/health-endpoint-monitoring.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/101-high-availability/health-endpoint-monitoring.md
@ -1,8 +1,7 @@
 # Health Endpoint Monitoring

-Health Endpoint Monitoring refers to a technique for monitoring the health of a system by periodically sending requests to a specific endpoint, called a "health endpoint", on the system. The health endpoint returns a response indicating the current status of the system, such as whether it is running properly or if there are any issues. It allows to Monitor the overall health of the system, Provide insight into the system's performance, and automate the process of monitoring. It can be implemented in several different ways such as Periodic requests and Event-based monitoring.
+Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals. This can help to verify that applications and services are performing correctly.

 To learn more visit the following links:

 - [Health Endpoint Monitoring pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring)
- [Explaining the health endpoint monitoring pattern](https://www.oreilly.com/library/view/java-ee-8/9781788830621/5012c01e-90ca-4809-a210-d3736574f5b3.xhtml)
--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/bulkhead.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/bulkhead.md
@ -1,6 +1,6 @@
 # Bulkhead

-Bulkhead refers to a technique for isolating different parts of a system to prevent one part from affecting the performance of the whole system. The term "bulkhead" is used to refer to the partitions or walls that are used to separate different parts of the system. It allows to Isolate critical parts of the system, prevent cascading failures and provide isolation for different types of requests. It can be implemented in several different ways such as Thread pools, Circuit breakers, and Workers.
+The Bulkhead pattern is a type of application design that is tolerant of failure. In a bulkhead architecture, elements of an application are isolated into pools so that if one fails, the others will continue to function. It's named after the sectioned partitions (bulkheads) of a ship's hull. If the hull of a ship is compromised, only the damaged section fills with water, which prevents the ship from sinking.

 Learn more from the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/circuit-breaker.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/circuit-breaker.md
@ -1,6 +1,6 @@
 # Circuit Breaker

-Circuit Breaker is a pattern that is used to prevent an application from repeatedly trying to perform an action that is likely to fail. By tripping the circuit breaker when an operation fails a certain number of times, the system can prevent cascading failures, provide fallback behavior, and monitor system health. It can be implemented in several different ways such as State machine, and Hystrix (library for Java).
+Handle faults that might take a variable amount of time to recover from, when connecting to a remote service or resource. This can improve the stability and resiliency of an application.

 Learn more from the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/compensating-transaction.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/compensating-transaction.md
@ -1,6 +1,6 @@
 # Compensating Transaction

-A Compensating Transaction refers to a mechanism for reversing or undoing the effects of a previously executed transaction in a system. It can be used to ensure that the system remains in a consistent state, even if a subsequent transaction fails or is rolled back. Typically used in systems that implement the principles of ACID transactions, it can be implemented in several different ways such as undo logs, savepoints.
+Undo the work performed by a series of steps, which together define an eventually consistent operation, if one or more of the steps fail. Operations that follow the eventual consistency model are commonly found in cloud-hosted applications that implement complex business processes and workflows.

 Learn more from the following resources:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/health-endpoint-monitoring.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/health-endpoint-monitoring.md
@ -1,8 +1,7 @@
 # Health Endpoint Monitoring

-Health Endpoint Monitoring refers to a technique for monitoring the health of a system by periodically sending requests to a specific endpoint, called a "health endpoint", on the system. The health endpoint returns a response indicating the current status of the system, such as whether it is running properly or if there are any issues. It allows to Monitor the overall health of the system, Provide insight into the system's performance, and automate the process of monitoring. It can be implemented in several different ways such as Periodic requests and Event-based monitoring.
+Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals. This can help to verify that applications and services are performing correctly.

 To learn more visit the following links:

 - [Health Endpoint Monitoring pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring)
- [Explaining the health endpoint monitoring pattern](https://www.oreilly.com/library/view/java-ee-8/9781788830621/5012c01e-90ca-4809-a210-d3736574f5b3.xhtml)
--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/leader-election.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/leader-election.md
@ -1,8 +1,7 @@
 # Leader Election

-Leader Election is a pattern that is used to elect a leader among a group of distributed nodes in a system. The leader is responsible for coordinating the activities of the other nodes and making decisions on behalf of the group. Leader Election is important in distributed systems, as it ensures that there is a single point of coordination and decision-making, reducing the risk of conflicting actions or duplicate work. Leader Election can be used to ensure a single point of coordination, provide fault tolerance, and scalability. There are several algorithms such as Raft, Paxos, and Zab that can be used to implement Leader Election in distributed systems.
+Coordinate the actions performed by a collection of collaborating instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the others. This can help to ensure that instances don't conflict with each other, cause contention for shared resources, or inadvertently interfere with the work that other instances are performing.

 To learn more, visit the following links:

- [Overview of Leader Election](https://aws.amazon.com/builders-library/leader-election-in-distributed-systems/)
- [What is Leader Election in system design?](https://www.enjoyalgorithms.com/blog/leader-election-system-design)
+- [Leader Election Pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/leader-election)
--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/queue-based-load-leveling.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/queue-based-load-leveling.md
@ -1,6 +1,6 @@
 # Queue-Based load leveling

-Queue-based load leveling refers to a technique for managing the workload of a system by using a queue to buffer incoming requests and process them at a steady pace. By using a queue, the system can handle bursts of incoming requests without being overwhelmed, as well as prevent idle periods where there are not enough requests to keep the system busy. It allows to smooth out bursts of incoming requests, prevent idle periods, Provide a way to prioritize requests, and provide a way to monitor requests. It can be implemented in several different ways such as In-memory queue and Persistent queue.
+Use a queue that acts as a buffer between a task and a service it invokes in order to smooth intermittent heavy loads that can cause the service to fail or the task to time out. This can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.

 To learn more visit the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/retry.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/retry.md
@ -1,8 +1,7 @@
 # Retry

-Retry refers to the process of automatically re-executing a failed operation in the hopes of getting a successful outcome. Retries are used to handle transient failures such as network errors, temporary unavailability of a service, or other issues that may be resolved quickly. Retries can be an effective way of dealing with these types of failures, as they can help to ensure that the system continues to function, even in the face of temporary disruptions.
+Enable an application to handle transient failures when it tries to connect to a service or network resource, by transparently retrying a failed operation. This can improve the stability of the application.

 Learn more from the following resources:

- [Introducing Retry](https://engineering.grab.com/designing-resilient-systems-part-2)
 - [Retry pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/retry)
--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/scheduler-agent-supervisor.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/102-resiliency/scheduler-agent-supervisor.md
@ -1,6 +1,6 @@
 # Scheduling Agent Supervisor

-Scheduling Agent Supervisor is a pattern that allows for the scheduling and coordination of tasks or processes by a central entity, known as the Scheduling Agent. The Scheduling Agent is responsible for scheduling tasks, monitoring their execution, and handling errors or failures. This pattern can be used to build robust and fault-tolerant systems, by ensuring that tasks are executed as intended and that any errors or failures are handled appropriately.
+Coordinate a set of distributed actions as a single operation. If any of the actions fail, try to handle the failures transparently, or else undo the work that was performed, so the entire operation succeeds or fails as a whole. This can add resiliency to a distributed system, by enabling it to recover and retry actions that fail due to transient exceptions, long-lasting faults, and process failures.

 Learn more from the following links:

--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/103-security/federated-identity.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/103-security/federated-identity.md
@ -1,7 +1,7 @@
-# Federated Identity
+# Federated Identity pattern

+Delegate authentication to an external identity provider. This can simplify development, minimize the requirement for user administration, and improve the user experience of the application.

 To learn more, visit the following links:

- []()
- []()
+- [Federated Identity pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/federated-identity)
--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/103-security/gatekeeper.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/103-security/gatekeeper.md
@ -1,8 +1,7 @@
 # Gatekeeper

-A Gatekeeper is a pattern that is used to control access to a system or service. It acts as a central point of control and decision-making for requests to the system or service and is responsible for enforcing policies, rules, and constraints that are used to govern access. It can be implemented in different ways such as API Gateway, Service Mesh, and Load Balancer. It can be useful for Authentication and Authorization, Traffic Management, and Observability.
+Protect applications and services using a dedicated host instance that acts as a broker between clients and the application or service, validates and sanitizes requests, and passes requests and data between them. This can provide an additional layer of security and limit the system's attack surface.

 Learn more from the following resources:

 - [Gatekeeper pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/gatekeeper)
- [Overview of Gatekeeper](https://www.techtarget.com/searchunifiedcommunications/definition/gatekeeper)
--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/103-security/index.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/103-security/index.md
@ -1,8 +1,7 @@
 # Security

-Security refers to the measures and techniques that are used to protect a system from unauthorized access, use, disclosure, disruption, modification, or destruction. It involves identifying and mitigating the risks that the system faces and implementing controls to prevent and detect security incidents. Security can be divided into several areas such as Access Control, Data security, Network security, and Application security. It can be achieved by implementing best practices and standards such as Defense in depth, Least privilege, Separation of duties, and Monitoring and incident response.
+Security provides confidentiality, integrity, and availability assurances against malicious attacks on information systems (and safety assurances for attacks on operational technology systems). Losing these assurances can negatively impact your business operations and revenue, as well as your organization's reputation in the marketplace. Maintaining security requires following well-established practices (security hygiene) and being vigilant to detect and rapidly remediate vulnerabilities and active attacks.

 Learn more from the following links:

- [Security in Infrastructure System Design](https://medium.com/cermati-tech/security-in-software-development-and-infrastructure-system-design-7b675c2323fc)
- [Security By Design: What Is It and How to Do It Right?](https://www.spiceworks.com/it-security/cyber-risk-management/articles/what-is-security-by-design/)
+- [Security patterns](https://learn.microsoft.com/en-us/azure/architecture/framework/security/security-patterns)
--- a/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/103-security/valet-key.md
+++ b/src/roadmaps/system-design/content/118-cloud-design-patterns/103-reliability-patterns/103-security/valet-key.md
@ -1,10 +1,7 @@
 # Valet Key

-A valet key is a type of security feature that allows a user to grant limited access to a resource. It is commonly used in the automotive industry, where a valet key is used to allow a valet parking attendant to drive and park a car, but not to access the trunk or the glove compartment of the car.
-
-In system design, a valet key can be used as a security feature to allow a user to grant limited access to a resource, such as a file or a service, to a third party. The third party can access the resource, but only with the limited permissions that have been granted by the valet key.
+Use a token that provides clients with restricted direct access to a specific resource, in order to offload data transfer from the application. This is particularly useful in applications that use cloud-hosted storage systems or queues, and can minimize cost and maximize scalability and performance.

 Learn more from the following links:

 - [Valet Key pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/valet-key)
- [Explanation of Valet Key](https://www.youtube.com/watch?v=sapu2CE1W8s)