Threads Affect Scalability güzel bir konu
Bu alıntıyı bu kitaptan yaptım. Buradan satın alabilirsiniz Link İşimizin can alıcı noktasına çok değerli bir açıklama ile örnek vermiş. Bir sistemin nasıl çalıştığını bilirseniz size dönüşü sıradışı sonuçlar olacaktır. Her developerın bilmesi gereken bir iş akışı. Aksi hal performans arttıracam derken tahayyülün iğvasına dönüşebilir.
Threads Affect Scalability
I’ve noticed that many large sites end up spending a lot of effort optimizing their systems in the wrong places.
As an example, let’s say that you’re building a one-page site that should support 1,200 simultaneous users, with a response time of one second or less, and you have plans to scale-up later on to 120,000 users.
During load testing, you reach your maximum acceptable response time after reaching 120 simulated users on a single CPU, and the CPU is 24% busy. As you increase the load, you find that CPU use stays the same, but response time increases. By the time you reach 1,200 users, response time is ten seconds—ten times what it was at 120 users.
At this stage, you will need ten CPU cores (best case, assuming linear scaling) to support your target user load and performance goals in the short term, and 1,000 cores in the long term.
To determine to what extent you can to optimize this scenario, you measure the time it takes to process each phase of a single request on an unloaded machine.
The results are in Figure 5-1.
Figure 5-1.
Time to process a hypothetical web page request You find that receiving 0.5KB at 128Kbps takes 5ms, obtaining the needed data from the database takes 77ms, generating the response HTML takes 2ms, and sending the 5KB response at 384Kbps takes 16ms.
When faced with data like this, the first place many developers would look for improvements is the slowest part of the process, which in this case is the database access. In some environments, the database is a black box, so you can’t tune it. When you can, the usual approach is to put a lot of emphasis on query optimization. Although that certainly can be helpful, it often doesn’t completely solve the problem. In later chapters, I’ll show some reasons why that’s the case and what you can do about it. For this example, let’s assume the queries are already fully tuned.
The next largest chunks of time are spent receiving the request and sending the response. A typical initial reaction of developers is that “you can’t do anything about the client’s data transmission rates, so forget about the request and response times.” As I’ve shown in Chapter 2, that clearly is not the whole story.
That leaves the time to generate the HTML, which in this case is only 2 percent of the total requestprocessing time. Because that part of the application appears to developers to be most readily under their control, optimizing the time spent there is often where they end up spending their performance improvement efforts. However, even if you improve that time by 50 percent, down to 1ms, the overall end-to-end improvement seen by end users may only be 1 percent. In this example, CPU use would decline to 12 percent, but you would still need the same total number of CPUs; it doesn’t improve scalability.
I would like to suggest looking at this problem in a much different way. In a correctly designed architecture, the CPU time spent to process a request at the web tier should not be a primary factor in overall site performance or scalability. In the previous example, an extra 2ms one way or the other won’t be noticeable by an end user.
In this example, and often in the real world as well, reducing the CPU time spent by the web tier in generating the pages reduces the CPU load on each machine, but it doesn’t improve throughput or reduce the number of machines you need.
What’s happening in the example is that the site’s throughput is limited by the IIS and ASP.NET thread pools. By default, there are 12 worker threads per CPU. Each worker processes one request at a time, which means 12 requests at a time per CPU. If clients present new requests when all of the worker threads are busy, they are queued.
Since each request takes 100ms to process from end to end, one thread can process ten requests per second. With 12 requests at a time, that becomes 120 requests per second. With 2ms of CPU time per request, 120 * 0.002 = 0.24 or 24% CPU use.
The solution to scalability in this case is to optimize thread use, rather than minimizing CPU use. You can do that by allowing each worker thread to process more than one request at a time, using asynchronous database requests. Using async requests should allow you either to reach close to 100% CPU use, or to push your scalability issues to another tier, such as the database. At 100% CPU use, you would only need one quarter of the CPUs you did at the start.
Adding more worker threads can help in some cases. However, since each thread has costs associated with it (startup time, memory, pool management, context switch overhead), that’s only effective up to a point.
In this example, caching helps if you can use it to eliminate the database request. Threads come into play when you can’t. When CPU use per server averages 70 to 80+ percent under peak load, then it tends to become a determining factor for how many CPUs you need. At that stage, it makes sense to put effort into optimizing the CPU time used by the application—but to minimize the number of servers you need, not to improve performance from the user’s perspective.
Of course, there are cases where CPU use is the dominant factor that you should address first, but once a site is in production, those cases tend to be the exception and not the rule. Developers and testers tend to catch those cases early. Unfortunately, threading issues often don’t appear until a site goes into production and is under heavy load.
Low CPU use per server is one reason some sites have found that using virtual machines (VMs) or IIS web gardens can improve their overall throughput. Unfortunately, VMs add overhead and can complicate operations, deployment, and maintenance. You should weigh those options against the effort to modify your applications to improve thread use through async requests and related optimizations covered in this chapter.
Threads Affect Scalability
I’ve noticed that many large sites end up spending a lot of effort optimizing their systems in the wrong places.
As an example, let’s say that you’re building a one-page site that should support 1,200 simultaneous users, with a response time of one second or less, and you have plans to scale-up later on to 120,000 users.
During load testing, you reach your maximum acceptable response time after reaching 120 simulated users on a single CPU, and the CPU is 24% busy. As you increase the load, you find that CPU use stays the same, but response time increases. By the time you reach 1,200 users, response time is ten seconds—ten times what it was at 120 users.
At this stage, you will need ten CPU cores (best case, assuming linear scaling) to support your target user load and performance goals in the short term, and 1,000 cores in the long term.
To determine to what extent you can to optimize this scenario, you measure the time it takes to process each phase of a single request on an unloaded machine.
The results are in Figure 5-1.
Figure 5-1.
Time to process a hypothetical web page request You find that receiving 0.5KB at 128Kbps takes 5ms, obtaining the needed data from the database takes 77ms, generating the response HTML takes 2ms, and sending the 5KB response at 384Kbps takes 16ms.
When faced with data like this, the first place many developers would look for improvements is the slowest part of the process, which in this case is the database access. In some environments, the database is a black box, so you can’t tune it. When you can, the usual approach is to put a lot of emphasis on query optimization. Although that certainly can be helpful, it often doesn’t completely solve the problem. In later chapters, I’ll show some reasons why that’s the case and what you can do about it. For this example, let’s assume the queries are already fully tuned.
The next largest chunks of time are spent receiving the request and sending the response. A typical initial reaction of developers is that “you can’t do anything about the client’s data transmission rates, so forget about the request and response times.” As I’ve shown in Chapter 2, that clearly is not the whole story.
That leaves the time to generate the HTML, which in this case is only 2 percent of the total requestprocessing time. Because that part of the application appears to developers to be most readily under their control, optimizing the time spent there is often where they end up spending their performance improvement efforts. However, even if you improve that time by 50 percent, down to 1ms, the overall end-to-end improvement seen by end users may only be 1 percent. In this example, CPU use would decline to 12 percent, but you would still need the same total number of CPUs; it doesn’t improve scalability.
I would like to suggest looking at this problem in a much different way. In a correctly designed architecture, the CPU time spent to process a request at the web tier should not be a primary factor in overall site performance or scalability. In the previous example, an extra 2ms one way or the other won’t be noticeable by an end user.
In this example, and often in the real world as well, reducing the CPU time spent by the web tier in generating the pages reduces the CPU load on each machine, but it doesn’t improve throughput or reduce the number of machines you need.
What’s happening in the example is that the site’s throughput is limited by the IIS and ASP.NET thread pools. By default, there are 12 worker threads per CPU. Each worker processes one request at a time, which means 12 requests at a time per CPU. If clients present new requests when all of the worker threads are busy, they are queued.
Since each request takes 100ms to process from end to end, one thread can process ten requests per second. With 12 requests at a time, that becomes 120 requests per second. With 2ms of CPU time per request, 120 * 0.002 = 0.24 or 24% CPU use.
The solution to scalability in this case is to optimize thread use, rather than minimizing CPU use. You can do that by allowing each worker thread to process more than one request at a time, using asynchronous database requests. Using async requests should allow you either to reach close to 100% CPU use, or to push your scalability issues to another tier, such as the database. At 100% CPU use, you would only need one quarter of the CPUs you did at the start.
Adding more worker threads can help in some cases. However, since each thread has costs associated with it (startup time, memory, pool management, context switch overhead), that’s only effective up to a point.
In this example, caching helps if you can use it to eliminate the database request. Threads come into play when you can’t. When CPU use per server averages 70 to 80+ percent under peak load, then it tends to become a determining factor for how many CPUs you need. At that stage, it makes sense to put effort into optimizing the CPU time used by the application—but to minimize the number of servers you need, not to improve performance from the user’s perspective.
Of course, there are cases where CPU use is the dominant factor that you should address first, but once a site is in production, those cases tend to be the exception and not the rule. Developers and testers tend to catch those cases early. Unfortunately, threading issues often don’t appear until a site goes into production and is under heavy load.
Low CPU use per server is one reason some sites have found that using virtual machines (VMs) or IIS web gardens can improve their overall throughput. Unfortunately, VMs add overhead and can complicate operations, deployment, and maintenance. You should weigh those options against the effort to modify your applications to improve thread use through async requests and related optimizations covered in this chapter.
Yorumlar
Yorum Gönder