Add content to system design roadmap

pull/3331/head
Kamran Ahmed 2 years ago
parent ca35551e4f
commit 59ed243fa7
  1. 12
      src/roadmaps/system-design/content/107-domain-name-system.md
  2. 3
      src/roadmaps/system-design/content/108-content-delivery-networks/100-push-cdns.md
  3. 3
      src/roadmaps/system-design/content/108-content-delivery-networks/101-pull-cdns.md
  4. 19
      src/roadmaps/system-design/content/108-content-delivery-networks/index.md
  5. 6
      src/roadmaps/system-design/content/109-load-balancers/100-horizontal-scaling.md
  6. 5
      src/roadmaps/system-design/content/109-load-balancers/101-layer-4-load-balancing.md
  7. 5
      src/roadmaps/system-design/content/109-load-balancers/102-layer-7-load-balancing.md
  8. 7
      src/roadmaps/system-design/content/109-load-balancers/103-load-balancing-algorithms.md
  9. 9
      src/roadmaps/system-design/content/109-load-balancers/104-lb-vs-reverse-proxy.md
  10. 20
      src/roadmaps/system-design/content/109-load-balancers/index.md
  11. 6
      src/roadmaps/system-design/content/110-application-layer/100-microservices.md
  12. 2
      src/roadmaps/system-design/content/110-application-layer/101-service-discovery.md
  13. 9
      src/roadmaps/system-design/content/110-application-layer/index.md
  14. 12
      src/roadmaps/system-design/content/111-databases/100-rdbms/100-replication.md
  15. 4
      src/roadmaps/system-design/content/111-databases/100-rdbms/102-federation.md
  16. 1
      src/roadmaps/system-design/content/111-databases/100-rdbms/103-denormalization.md
  17. 2
      src/roadmaps/system-design/content/111-databases/100-rdbms/104-sql-tuning.md
  18. 5
      src/roadmaps/system-design/content/111-databases/100-rdbms/index.md
  19. 2
      src/roadmaps/system-design/content/111-databases/101-nosql/100-key-value-store.md
  20. 1
      src/roadmaps/system-design/content/111-databases/101-nosql/101-document-store.md
  21. 1
      src/roadmaps/system-design/content/111-databases/101-nosql/102-wide-column-store.md
  22. 3
      src/roadmaps/system-design/content/111-databases/101-nosql/index.md
  23. 28
      src/roadmaps/system-design/content/111-databases/102-sql-vs-nosql.md
  24. 18
      src/roadmaps/system-design/content/111-databases/index.md

@ -9,8 +9,16 @@ DNS is hierarchical, with a few authoritative servers at the top level. Your rou
- A record (address) - Points a name to an IP address.
- CNAME (canonical) - Points a name to another name or CNAME (example.com to www.example.com) or to an A record.
Services such as [CloudFlare](https://www.cloudflare.com/dns/) and [Route53](https://aws.amazon.com/route53/) provide managed DNS services. Some DNS services can route traffic through various methods:
- [Weighted Round Robin](https://www.jscape.com/blog/load-balancing-algorithms)
- Prevent traffic from going to servers under maintenance
- Balance between varying cluster sizes
- A/B testing
- [Latency Based](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-latency)
- [Geolocation Based](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-geo)
To learn more, visit the following links:
- [Getting started with Domain Name System](https://github.com/donnemartin/system-design-primer#domain-name-system)
- [Intro to DNS Architecture](https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/dd197427(v=ws.10)?redirectedfrom=MSDNs)
- [DNS articles](https://support.dnsimple.com/categories/dns/)
[What is DNS?](https://www.cloudflare.com/learning/dns/what-is-dns/)

@ -6,5 +6,4 @@ Sites with a small amount of traffic or sites with content that isn't often upda
To learn more, visit the following links:
- [Introduction on Push CDNs](https://github.com/donnemartin/system-design-primer#content-delivery-network)
- [Why use a CDN?](https://dev.to/karanpratapsingh/system-design-content-delivery-network-cdn-bof)
- [Introduction to CDNs](https://github.com/donnemartin/system-design-primer#content-delivery-network)

@ -6,6 +6,5 @@ A time-to-live (TTL) determines how long content is cached. Pull CDNs minimize s
To learn more, visit the following links:
- [Introduction to CDNs](https://github.com/donnemartin/system-design-primer#content-delivery-network)
- [The Differences Between Push And Pull CDNss](http://www.travelblogadvice.com/technical/the-differences-between-push-and-pull-cdns/)
- [Brief aout Content delivery network](https://en.wikipedia.org/wiki/Content_delivery_network)
- [What is Globally distributed content delivery?](https://figshare.com/articles/journal_contribution/Globally_distributed_content_delivery/6605972)

@ -7,21 +7,8 @@ Serving content from CDNs can significantly improve performance in two ways:
- Users receive content from data centers close to them
- Your servers do not have to serve requests that the CDN fulfills
## Push CDNs
Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage.
Learn more about CDNs from the following links:
Sites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals.
## Pull CDNs
Pull CDNs grab new content from your server when the first user requests the content. You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN.
A time-to-live (TTL) determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they have actually changed. Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested content remaining on the CDN.
## Disadvantages of CDN
- CDN costs could be significant depending on traffic, although this should be weighed with additional costs you would incur not using a CDN.
- Content might be stale if it is updated before the TTL expires it.
- CDNs require changing URLs for static content to point to the CDN.
- [The Differences Between Push And Pull CDNss](http://www.travelblogadvice.com/technical/the-differences-between-push-and-pull-cdns/)
- [Introduction to CDNs](https://github.com/donnemartin/system-design-primer#content-delivery-network)
- [The Differences Between Push And Pull CDNs](http://www.travelblogadvice.com/technical/the-differences-between-push-and-pull-cdns/)
- [Brief aout Content delivery network](https://en.wikipedia.org/wiki/Content_delivery_network)
- [What is Globally distributed content delivery?](https://figshare.com/articles/journal_contribution/Globally_distributed_content_delivery/6605972)

@ -7,9 +7,3 @@ Load balancers can also help with horizontal scaling, improving performance and
- Servers should be stateless: they should not contain any user-related data like sessions or profile pictures
- Sessions can be stored in a centralized data store such as a database (SQL, NoSQL) or a persistent cache (Redis, Memcached)
- Downstream servers such as caches and databases need to handle more simultaneous connections as upstream servers scale out.
To learn more, visit the following links:
- [Introduction to Horizontal Scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
- [System Design – Horizontal and Vertical Scaling](https://www.geeksforgeeks.org/system-design-horizontal-and-vertical-scaling/)
- [Getting started with Horizontal and Vertical Scaling](https://www.codingninjas.com/blog/2021/08/25/system-design-horizontal-and-vertical-scaling/)

@ -1,8 +1,3 @@
# Layer 4 Load Balancing
Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers forward network packets to and from the upstream server, performing Network Address Translation (NAT).
To learn more, visit the following links:
- [What is Layer 4 Load Balancing?](https://github.com/donnemartin/system-design-primer#communication)
- [Getting Started with Layer 4 Load Balancing](https://www.nginx.com/resources/glossary/layer-4-load-balancing/)

@ -3,8 +3,3 @@
Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.
At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.
Learn more from the following links:
- [Introduction to Layer 7 Load Balancing](https://github.com/donnemartin/system-design-primer#layer-7-load-balancing)
- [A Brief of Layer 7 Balancing](https://github.com/donnemartin/system-design-primer#communication)

@ -1,8 +1,9 @@
# Load Balancing Algorithms
Load balancing is the process of distributing incoming network traffic across multiple servers in order to optimize resource usage, minimize response time, and avoid overloading any single server. There are several algorithms that can be used to achieve this, each with its own advantages and disadvantages.
A load balancer is a software or hardware device that keeps any one server from becoming overloaded. A load balancing algorithm is the logic that a load balancer uses to distribute network traffic between servers (an algorithm is a set of predefined rules).
There are two primary approaches to load balancing. Dynamic load balancing uses algorithms that take into account the current state of each server and distribute traffic accordingly. Static load balancing distributes traffic without making these adjustments. Some static algorithms send an equal amount of traffic to each server in a group, either in a specified order or at random.
To learn more, visit the following links:
- [Concept of load balancing algorithms](https://www.enjoyalgorithms.com/blog/load-balancers-in-system-design)
- [Types of load balancing algorithms](https://www.cloudflare.com/learning/performance/types-of-load-balancing-algorithms/)
- [Types of Load Balancing Algorithms](https://www.cloudflare.com/learning/performance/types-of-load-balancing-algorithms/)

@ -4,11 +4,14 @@
- Reverse proxies can be useful even with just one web server or application server, opening up the benefits described in the previous section.
- Solutions such as NGINX and HAProxy can support both layer 7 reverse proxying and load balancing.
## Disadvantages of reverse proxy:
## Disadvantages of Reverse Proxy:
- Introducing a reverse proxy results in increased complexity.
- A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a failover) further increases complexity
- A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a failover) further increases complexity.
To learn more visit the following links:
- [What is a Reverse Proxy vs. Load Balancer?](https://www.nginx.com/resources/glossary/reverse-proxy-vs-load-balancer/)
- [Reverse Proxy vs Load Balancer](https://www.nginx.com/resources/glossary/reverse-proxy-vs-load-balancer/)
- [NGINX Architecture](https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/)
- [HAProxy Architecture Guide](http://www.haproxy.org/download/1.2/doc/architecture.txt)
- [Reverse Proxy](https://en.wikipedia.org/wiki/Reverse_proxy)

@ -12,20 +12,14 @@ Load balancers can be implemented with hardware (expensive) or with software suc
- Removes the need to install X.509 certificates on each server
- **Session persistence** - Issue cookies and route a specific client's requests to same instance if the web apps do not keep track of sessions
To protect against failures, it's common to set up multiple load balancers, either in active-passive or active-active mode. Load balancers can route traffic based on various metrics, including:
## Layer 4 load balancing
Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers forward network packets to and from the upstream server, performing Network Address Translation (NAT).
## Layer 7 load balancing
Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.
## Disadvantages of load balancer
The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.
Introducing a load balancer to help eliminate a single point of failure results in increased complexity.
A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.
- The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.
- Introducing a load balancer to help eliminate a single point of failure results in increased complexity.
- A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.
To learn more, visit the following links:
- [What is Load balancing (computing)?](https://en.wikipedia.org/wiki/Load_balancing_(computing))
- [Introduction to Load Balancing](https://github.com/donnemartin/system-design-primer#layer-7-load-balancing)
- [NGINX Architecture](https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/)
- [HAProxy Architecture Guide](http://www.haproxy.org/download/1.2/doc/architecture.txt)
- [Scalability](http://www.lecloud.net/post/7295452622/scalability-for-dummies-part-1-clones)

@ -1,10 +1,10 @@
# Microservices
Related to this discussion are microservices, which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. 1
Related to the "Application Layer" discussion are microservices, which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. 1
Pinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc.
To learn more, visit the following links:
- [Intro to Microservice](https://github.com/donnemartin/system-design-primer#microservices)
- [Building Microservices](https://cloudncode.wordpress.com/2016/07/22/msa-getting-started/)
- [Introduction to Microservices](https://aws.amazon.com/microservices/)
- [Microservices - Wikipedia](https://en.wikipedia.org/wiki/Microservices)

@ -1,6 +1,6 @@
# Service Discovery
Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint. Both Consul and Etcd have a built in key-value store that can be useful for storing config values and other shared data.
Systems such as [Consul](https://www.consul.io/docs/index.html), [Etcd](https://coreos.com/etcd/docs/latest), and [Zookeeper](http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper) can help services find each other by keeping track of registered names, addresses, and ports. [Health checks](https://www.consul.io/intro/getting-started/checks.html) help verify service integrity and are often done using an HTTP endpoint. Both Consul and Etcd have a built in key-value store that can be useful for storing config values and other shared data.
Visit the following links to learn more:

@ -2,6 +2,13 @@
Separating out the web layer from the application layer (also known as platform layer) allows you to scale and configure both layers independently. Adding a new API results in adding application servers without necessarily adding additional web servers. The single responsibility principle advocates for small and autonomous services that work together. Small teams with small services can plan more aggressively for rapid growth.
![](https://i.imgur.com/F0cjurv.png)
## Disadvantages
- Adding an application layer with loosely coupled services requires a different approach from an architectural, operations, and process viewpoint (vs a monolithic system).
- Microservices can add complexity in terms of deployments and operations.
For more resources, visit the following links:
- [Getting started with Application Layer](https://github.com/donnemartin/system-design-primer#Application%20layer)
- [Intro to architecting systems for scale](http://lethain.com/introduction-to-architecting-systems-for-scale/#platform_layer)

@ -1,11 +1,11 @@
# Replication
## Master-slave replication:
The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.
Replication is the process of copying data from one database to another. Replication is used to increase availability and scalability of databases. There are two types of replication: master-slave and master-master.
## Master-master replication:
Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.
## Master-slave Replication:
To learn more, visit the following links:
The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.
- [Getting started with Replication](https://github.com/donnemartin/system-design-primer#replication)
## Master-master Replication:
Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.

@ -1,7 +1,3 @@
# Federation
Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: forums, users, and products, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.
Learn more from the following links:
- [Intro to Federation](https://github.com/donnemartin/system-design-primer#federation)

@ -6,5 +6,4 @@ Once data becomes distributed with techniques such as federation and sharding, m
To learn more, visit the following links:
- [Guide to Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
- [Denormalization](https://en.wikipedia.org/wiki/Denormalization)

@ -9,5 +9,5 @@ Benchmarking and profiling might point you to the following optimizations.
To learn more, visit the following links:
- [What is SQL Tuning?](https://github.com/donnemartin/system-design-primer#sql-tuning)
- [Optimizing MySQL Queries](https://aiddroid.com/10-tips-optimizing-mysql-queries-dont-suck/)
- [How we optimized PostgreSQL queries 100x](https://towardsdatascience.com/how-we-optimized-postgresql-queries-100x-ff52555eabe?gi=13caf5bcf32e)

@ -11,4 +11,7 @@ There are many techniques to scale a relational database: master-slave replicati
To learn more, visit the following links:
- [Guide to RDBMS?](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
- [Is there a good reason I see VARCHAR(255) used so often?](https://stackoverflow.com/questions/1217466/is-there-a-good-reason-i-see-varchar255-used-so-often-as-opposed-to-another-l)
- [How we optimized PostgreSQL queries 100x](https://towardsdatascience.com/how-we-optimized-postgresql-queries-100x-ff52555eabe?gi=13caf5bcf32e)
- [How do NULL values affect performance in a database search?](https://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search)
- [Slow Query Log](https://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html)

@ -1,6 +1,6 @@
# Key Value Store
A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.
A key-value store generally allows for `O(1)` reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.
Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.

@ -6,5 +6,4 @@ Based on the underlying implementation, documents are organized by collections,
To learn more, visit the following links:
- [Getting started with Document Store](https://github.com/donnemartin/system-design-primer#document-store)
- [Document-oriented database](https://en.wikipedia.org/wiki/Document-oriented_database)

@ -6,5 +6,4 @@ Google introduced Bigtable as the first wide column store, which influenced the
Learn more from the following links:
- [A brief of Wide Column Store](https://github.com/donnemartin/system-design-primer#Wide%20column%20store)
- [Bigtable architecture](https://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf)

@ -10,6 +10,5 @@ BASE is often used to describe the properties of NoSQL databases. In comparison
Learn more from the following links:
- [SQL or noSQL?](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
- [Brief of noSQL Patterns](http://horicky.blogspot.com/2009/11/nosql-patterns.html)
- [Brief of NOSQL Patterns](http://horicky.blogspot.com/2009/11/nosql-patterns.html)
- [Introduction to NoSQL](https://www.youtube.com/watch?v=qI_g07C_Q5I)

@ -1,32 +1,12 @@
# SQL vs noSQL
## Reasons for SQL:
- Structured data
- Strict schema
- Relational data
- Need for complex joins
- Transactions
- Clear patterns for scaling
- More established: developers, community, code, tools, etc
- Lookups by index are very fast
SQL databases, such as MySQL and PostgreSQL, are best suited for structured, relational data and use a fixed schema. They provide robust ACID (Atomicity, Consistency, Isolation, Durability) transactions and support complex queries and joins.
NoSQL databases, such as MongoDB and Cassandra, are best suited for unstructured, non-relational data and use a flexible schema. They provide high scalability and performance for large amounts of data and are often used in big data and real-time web applications.
## Reasons for NoSQL:
- Semi-structured data
- Dynamic or flexible schema
- Non-relational data
- No need for complex joins
- Store many TB (or PB) of data
- Very data intensive workload
- Very high throughput for IOPS
The choice between SQL and NoSQL depends on the specific use case and requirements of the project. If you need to store and query structured data with complex relationships, an SQL database is likely a better choice. If you need to store and query large amounts of unstructured data with high scalability and performance, a NoSQL database may be a better choice.
## Sample data well-suited for NoSQL:
- Rapid ingest of clickstream and log data
- Leaderboard or scoring data
- Temporary data, such as a shopping cart
- Frequently accessed ('hot') tables
- Metadata/lookup tables
Learn more from the followinw links:
- [SQL vs NoSQL: The Differences](https://www.sitepoint.com/sql-vs-nosql-differences/)
- [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=kKjm4ehYiMs)
- [SQL vs NoSQL - When to Use Each](https://www.ibm.com/cloud/blog/sql-vs-nosql)

@ -1,15 +1,13 @@
# Databases
A database is a collection of data that is organized and stored in a structured way, allowing for efficient retrieval and manipulation of the data. Databases are used in many different types of systems to store and manage data, from small personal applications to large enterprise systems.
Picking the right database for a system is an important decision, as it can have a significant impact on the performance, scalability, and overall success of the system. Some of the key reasons why it's important to pick the right database include:
There are many different types of databases available, each with their own strengths and weaknesses. Some of the most common types of databases are:
- Performance: Different databases have different performance characteristics, and choosing the wrong one can lead to poor performance and slow response times.
- Scalability: As the system grows and the volume of data increases, the database needs to be able to scale accordingly. Some databases are better suited for handling large amounts of data than others.
- Data Modeling: Different databases have different data modeling capabilities and choosing the right one can help to keep the data consistent and organized.
- Data Integrity: Different databases have different capabilities for maintaining data integrity, such as enforcing constraints, and can have different levels of data security.
- Support and maintenance: Some databases have more active communities and better documentation, making it easier to find help and resources.
- Relational databases
- NoSQL databases
- Graph databases
- Time-series databases
Overall, by choosing the right database, you can ensure that your system will perform well, scale as needed, and be maintainable in the long run.
Learn more from the following links:
- [Intro to Databases](https://github.com/donnemartin/system-design-primer#database)
- [Database design](https://en.wikipedia.org/wiki/Database_design)
- [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=kKjm4ehYiMs)

Loading…
Cancel
Save