diff --git a/src/roadmaps/system-design/content/117-monitoring/100-health-monitoring.md b/src/roadmaps/system-design/content/117-monitoring/100-health-monitoring.md index fb0861f4c..7b65f0092 100644 --- a/src/roadmaps/system-design/content/117-monitoring/100-health-monitoring.md +++ b/src/roadmaps/system-design/content/117-monitoring/100-health-monitoring.md @@ -1,7 +1,7 @@ # Health Monitoring -A health monitoring system is a system that is designed to collect, store, and analyze health-related data from a variety of sources, such as wearable devices, medical devices, and electronic health records. The goal of a health monitoring system is to provide healthcare professionals and individuals with real-time insights into their health, allowing them to make informed decisions about their care. +A system is healthy if it is running and capable of processing requests. The purpose of health monitoring is to generate a snapshot of the current health of the system so that you can verify that all components of the system are functioning as expected. Learn more from the following: -- [Design of Wearable Health Monitoring Systems](https://link.springer.com/chapter/10.1007/978-3-319-23341-3_6) \ No newline at end of file +- [Health Monitoring of a System](https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#health-monitoring) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/117-monitoring/101-availability-monitoring.md b/src/roadmaps/system-design/content/117-monitoring/101-availability-monitoring.md index 2c59527e2..f22cded72 100644 --- a/src/roadmaps/system-design/content/117-monitoring/101-availability-monitoring.md +++ b/src/roadmaps/system-design/content/117-monitoring/101-availability-monitoring.md @@ -1,14 +1,7 @@ # Availability Monitoring -Availability monitoring in system design refers to the practice of monitoring the availability of a system, service or application, to ensure that it is functioning correctly and is accessible to users when they need it. This is an important aspect of ensuring that a system is reliable and performs well. - -Availability monitoring typically includes the following components: - -- Heartbeat monitoring -- Transaction monitoring -- Alerts and notifications -- Root cause analysis +A truly healthy system requires that the components and subsystems that compose the system are available. Availability monitoring is closely related to health monitoring. But whereas health monitoring provides an immediate view of the current health of the system, availability monitoring is concerned with tracking the availability of the system and its components to generate statistics about the uptime of the system. Learn more from the following: -- [System Monitoring, Alerting and Availability](https://www.aits.uillinois.edu/services/network_and_desktop_services/system_monitoring__alerting_and_availability) \ No newline at end of file +- [Availability Monitoring](https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#availability-monitoring) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/117-monitoring/102-performance-monitoring.md b/src/roadmaps/system-design/content/117-monitoring/102-performance-monitoring.md index a0c7bae70..fda39f065 100644 --- a/src/roadmaps/system-design/content/117-monitoring/102-performance-monitoring.md +++ b/src/roadmaps/system-design/content/117-monitoring/102-performance-monitoring.md @@ -1,7 +1,7 @@ # Performance Monitoring -Performance monitoring in system design refers to the practice of monitoring the performance of a system, service, or application, in order to ensure that it is performing well and meeting the needs of users. This is an important aspect of ensuring that a system is reliable and performs well. +As the system is placed under more and more stress (by increasing the volume of users), the size of the datasets that these users access grows and the possibility of failure of one or more components becomes more likely. Frequently, component failure is preceded by a decrease in performance. If you're able detect such a decrease, you can take proactive steps to remedy the situation. Learn more from following links: -- [Get More on Performance Monitoring Systems](https://www.solarwinds.com/server-application-monitor/use-cases/performance-monitoring-system) \ No newline at end of file +- [Performance Monitoring](https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#performance-monitoring) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/117-monitoring/103-security-monitoring.md b/src/roadmaps/system-design/content/117-monitoring/103-security-monitoring.md index 0f864a454..c9f40661a 100644 --- a/src/roadmaps/system-design/content/117-monitoring/103-security-monitoring.md +++ b/src/roadmaps/system-design/content/117-monitoring/103-security-monitoring.md @@ -1,15 +1,13 @@ # Security Monitoring -Security monitoring in system design refers to the practice of monitoring the security of a system, service, or application, in order to detect and respond to security threats and vulnerabilities. This is an important aspect of ensuring that a system is secure and protected against unauthorized access, data breaches, and other security incidents. +All commercial systems that include sensitive data must implement a security structure. The complexity of the security mechanism is usually a function of the sensitivity of the data. In a system that requires users to be authenticated, you should record: -Security monitoring typically includes the following components: +- All sign-in attempts, whether they fail or succeed. +- All operations performed by—and the details of all resources accessed by—an authenticated user. +- When a user ends a session and signs out. -- Event collection -- Event analysis and correlation -- Alerts and notifications -- Incident response -- Compliance and audit +Monitoring might be able to help detect attacks on the system. For example, a large number of failed sign-in attempts might indicate a brute-force attack. An unexpected surge in requests might be the result of a distributed denial-of-service (DDoS) attack. You must be prepared to monitor all requests to all resources regardless of the source of these requests. A system that has a sign-in vulnerability might accidentally expose resources to the outside world without requiring a user to actually sign in. Visit the following to learn more: -- [Intro to Security Monitoring](https://www.sciencedirect.com/topics/computer-science/security-monitoring) \ No newline at end of file +- [Security Monitoring](https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#security-monitoring) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/117-monitoring/104-usage-monitoring.md b/src/roadmaps/system-design/content/117-monitoring/104-usage-monitoring.md index 6d0a7f161..4e6b79b6b 100644 --- a/src/roadmaps/system-design/content/117-monitoring/104-usage-monitoring.md +++ b/src/roadmaps/system-design/content/117-monitoring/104-usage-monitoring.md @@ -1,14 +1,13 @@ # Usage Monitoring -Usage monitoring in system design refers to the practice of monitoring the usage of a system, service, or application, in order to understand how it is being used and identify any potential issues or areas for improvement. This is an important aspect of ensuring that a system is meeting the needs of users and providing value. +Usage monitoring tracks how the features and components of an application are used. An operator can use the gathered data to: -Usage monitoring typically includes the following components: - -- Data collection -- Data analysis and visualization -- Alerts and notifications -- Trend analysis +- Determine which features are heavily used and determine any potential hotspots in the system. High-traffic elements might benefit from functional partitioning or even replication to spread the load more evenly. An operator can also use this information to ascertain which features are infrequently used and are possible candidates for retirement or replacement in a future version of the system. +- Obtain information about the operational events of the system under normal use. For example, in an e-commerce site, you can record the statistical information about the number of transactions and the volume of customers that are responsible for them. This information can be used for capacity planning as the number of customers grows. +- Detect (possibly indirectly) user satisfaction with the performance or functionality of the system. For example, if a large number of customers in an e-commerce system regularly abandon their shopping carts, this might be due to a problem with the checkout functionality. +- Generate billing information. A commercial application or multitenant service might charge customers for the resources that they use. +- Enforce quotas. If a user in a multitenant system exceeds their paid quota of processing time or resource usage during a specified period, their access can be limited or processing can be throttled. Learn more from the following links: -- [What is Usage Monitoring?](https://patterns.arcitura.com/cloud-computing-patterns/design_patterns/usage_monitoring) \ No newline at end of file +- [Usage Monitoring](https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#usage-monitoring) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/117-monitoring/105-instrumentation.md b/src/roadmaps/system-design/content/117-monitoring/105-instrumentation.md index 32917654c..7d08c7dbb 100644 --- a/src/roadmaps/system-design/content/117-monitoring/105-instrumentation.md +++ b/src/roadmaps/system-design/content/117-monitoring/105-instrumentation.md @@ -1,7 +1,7 @@ # Instrumentation - +Instrumentation is a critical part of the monitoring process. You can make meaningful decisions about the performance and health of a system only if you first capture the data that enables you to make these decisions. The information that you gather by using instrumentation should be sufficient to enable you to assess performance, diagnose problems, and make decisions without requiring you to sign in to a remote production server to perform tracing (and debugging) manually. Instrumentation data typically comprises metrics and information that's written to trace logs. Learn more from the following links: -- [Instrumentation System Docs](http://eolss.net/Sample-Chapters/C05/E6-39A-04-08.pdf) \ No newline at end of file +- [Instrumenting an application](https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#instrumenting-an-application) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/117-monitoring/106-visualization-and-alerts.md b/src/roadmaps/system-design/content/117-monitoring/106-visualization-and-alerts.md index b64c8f85a..b1afec09e 100644 --- a/src/roadmaps/system-design/content/117-monitoring/106-visualization-and-alerts.md +++ b/src/roadmaps/system-design/content/117-monitoring/106-visualization-and-alerts.md @@ -1,14 +1,7 @@ # Visualization and Alerts -Instrumentation in system design refers to the process of adding monitoring and measurement capabilities to a system, service, or application. This allows developers and operations teams to observe the behavior of the system, measure its performance, and identify any issues or areas for improvement. +An important aspect of any monitoring system is the ability to present the data in such a way that an operator can quickly spot any trends or problems. Also important is the ability to quickly inform an operator if a significant event has occurred that might require attention. -Instrumentation can be used to monitor a wide variety of aspects of a system, such as: +Learn more from the following links: -- Performance: Instrumentation can be used to measure the performance of a system, such as response time, throughput, and resource utilization. -- Errors: Instrumentation can be used to detect and diagnose errors, such as exceptions and stack traces. -- Security: Instrumentation can be used to monitor for security-related events, such as authentication attempts and network traffic. -- Usage: Instrumentation can be used to monitor usage-related data, such as the number of users and requests. - -To learn more, visit the following links: - -- [Visualize Data and Raise Alerts](https://learn.microsoft.com/en-us/azure/architecture/framework/devops/monitor-visualize-data) \ No newline at end of file +- [Visualize Data and Raise Alerts](https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#visualizing-data-and-raising-alerts) \ No newline at end of file diff --git a/src/roadmaps/system-design/content/117-monitoring/index.md b/src/roadmaps/system-design/content/117-monitoring/index.md index 2118261a7..a6a376898 100644 --- a/src/roadmaps/system-design/content/117-monitoring/index.md +++ b/src/roadmaps/system-design/content/117-monitoring/index.md @@ -1,8 +1,7 @@ # Monitoring -System monitoring involves the continuous monitoring of an infrastructure – aka an IT system – by an IT manager. It includes the monitoring of CPU, server memory, routers, switches, bandwidth, and applications, as well as the performance and availability of important network devices. +Distributed applications and services running in the cloud are, by their nature, complex pieces of software that comprise many moving parts. In a production environment, it's important to be able to track the way in which users use your system, trace resource utilization, and generally monitor the health and performance of your system. You can use this information as a diagnostic aid to detect and correct issues, and also to help spot potential problems and prevent them from occurring. Visit the following to learn more: -- [Design and implement a monitoring system](https://www.tdh.ch/sites/default/files/tdh_gmm_en_nouvelleversion_ang.pdf) -- [System Design — Design a Monitoring System](https://gongybable.medium.com/system-design-design-a-monitoring-system-f0f0cbafc895) \ No newline at end of file +- [Monitoring and Diagnostics Guidance](https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring) \ No newline at end of file