System’s performance: what, why and how much is good enough?

We all want more performance from systems but what does that even mean. Is Performance ever enough? What’s the threshold below which it doesn’t actually make much sense to optimize? We ask ourselves questions while testing Zenko and other Scality products. Your system is going to be only as fast as your slowest component and […]

Written By Shivani Pradhan

On July 19, 2019
"

Read more

Solve the challenges of large-scale data, once and for all.

We all want more performance from systems but what does that even mean. Is Performance ever enough? What’s the threshold below which it doesn’t actually make much sense to optimize? We ask ourselves questions while testing Zenko and other Scality products.

Your system is going to be only as fast as your slowest component and today’s everything as  service world, the network performance more than often governs or dictates what a user sees as the performance of the system.

How to define network performance?

Consider a network connection as a highway and the IO chunks as vehicles. Larger IOs are like big trucks. Let trucks be represented by 1 GB data chunks while small IOs like 512-byte, a 1 KB, 2 KB, or even a 4 KB chunks are like bikes.

Sometimes it is more important to carry a large number of packets and other times, it matters how much data each packet can carry, determining the capacity. To speed up the supply of milk cans to a city, it makes sense to use trucks. This maximum rate of data transfer that shows the capacity of the network link is called bandwidth. This is typically measured as KBps (kilobytes – thousands of bytes per second), MBps (megabytes – millions of bytes per second) or GBps (gigabits – billions of bytes per second). A mile stretch of four-lane freeway has more capacity hence more bandwidth for cars than a two-lane road. 

Throughput is the number of messages successfully delivered per unit time. Say, the goal is to increase the number of people making it to the workplace on a busy morning. If every person decides to drive a truck to work, we fill up the highway very quickly and only a fraction of people make it to the workplace as compared to if every person is on a bike.

Measured in IO per second, throughput is the rate at which vehicles pass through the highway in unit time. If your highway is full of big trucks (that is if the IO size is very large), you can fit only a few of them on the road but you can carry large quantities of data.

Applications that have high throughput requirements generally also are latency-sensitive and use smaller IO sizes.

Bandwidth is high if we use large trucks (large packets to carry more data), while throughput (IO/sec) is high if we use bicycles (small packets so we can fit more packets). Applications that have high bandwidth/high capacity requirements use larger IO sizes. As bandwidth increases, throughput decreases and vice versa.

The time it takes for a request to travel from the sender to the receiver, for the receiver to process that request and send an acknowledgment back is known as latency. It is basically the round trip time from the browser to the server.

Performance for a product is typically defined in terms of throughput, bandwidth, and latency.

How much performance is good enough?

Performance is the user’s perception of expectations from the product and its usability. Because users’ perception is susceptible to distortion, what users perceive or expect may differ from real product performance. It is hence extremely important to do performance testing to be aware of performance bottlenecks. This will help establish an intelligent service level agreement (SLA) that provides accurate guidance on capacity and scalability. With this information, a customer can avoid a potential scalability problem in production. Correct performance testing on the right use-cases and workloads is very important to ensure the product complies with customer’s business requirements. Performance testing must take into account not only how the business is expected to grow but also the rate of expected growth.

As developers work hard on optimizing performance, it is critical to understand the concept of “just enough” or “minimum improvement needed to be even noted”.  The Weber-Fechner Law involves a key concept called just-noticeable difference, the minimum increase or decrease in magnitude of a property before it becomes noticeable. Put simply, for an improvement to be noticeable by a typical user, the stimulus must be 20% more in magnitude.

Denys’s Mishunov blog on perception of time explains the 20% rule as well as the concept of performance budget and optimization. The 20% rule applies not only to improvements but also to regressions. Letting your code be a bit slower as long as it is not harming the user experience is called a regression allowance: the typical user will not notice if the end-to-end impact is less than 20%. This in no way means we shouldn’t seize opportunities to optimize our code and improve its performance, but it does enable a better cost-benefit analysis of whether a 5% improvement is worth six weeks of effort.

Performance testing must assess user experience in realistic scenarios on the target application and attempt to simulate real customer use cases. A standard performance suite has functional and end-to-end tests that measure component performance as a part of code quality and validation efforts. In a more comprehensive suite, load, stress, and soak tests play an important role in truly characterizing the product’s performance.

  • Load Scalability Test: A load test consists of applying increasing loads to an application and measuring the results. As load grows to the high end of application capacity, these tests often overlap with maximum-capacity/end-limit tests. A maximum-capacity test determines how many users an application can handle before either performance becomes unacceptable.
  • Stress Test: A stress test drives an application beyond normal load conditions. As application boundaries are pushed to the extreme, these tests typically expose the weakest components in the system by exposing failure on those before others. Making these components more robust helps raise limits and find new performance thresholds.
  • Soak Test: A soak test is a long-running test, also known as a golden run, that determines application performance or stability over a few weeks. As a dedicated system keeps running soak tests while periodically upgrading exposed issues like memory leaks, corruption issues that only manifest over long periods appear.

To simulate the customer’s use cases accurately, you must understand the customer’s workload. Each workload has unique characteristics, and each of which impacts storage latency, IOPS, and throughput. A workload is defined by:

  • How much a typical user reads or writes (or both).
  • Whether the user reads/writes are sequential or random (if reading sequentially, caching can help; not so for random reads. Sequential performance numbers are  generally higher than random performance because of caching effect).
  • The number of threads and whether those threads are parallel or concurrent:

  • The input/output (IO) size.

Other workload characteristics you must take into account:

  • Metadata workload must be characterized separately from data.
  • Other impacts, such as deduplication, compression, or encryption, must be accounted for.

Workload characterization

A workload is a set of I/O characteristics running through a group of containers or VMs that interface with compute, network and storage infrastructures. A typical application workload may interact with a web server, one or several database servers, as well as other application servers. The combination of all of these servers and the associated networked storage makes up that application’s workload.

Let’s look at defining an online banking workload. All SSL-based requests are for web pages, followed by requests for embedded images and other static files associated with the page. A page is fully received when the file and all files associated with it are received. The  IO profile in this example is 95% reads and 5% writes (i.e., 95% GET and 5% POST), and a 100% request success is assumed. Twenty-five percent of the requests are for static files that can be served from the cache, which returns a “304 not modified” response.  When a request is successfully processed, the server returns a “200 OK response code.

A typical social media workload is 95% read and 5% insert. The free Yahoo! Cloud Serving Benchmark (YCSB) simulates many types of database transactions, including a read-dominated eight-transaction mixture typical of most social media database activities and can be used to simulate simple social network user activities.

The above table is an attempt to explain the characteristics of a few workloads so they can be simulated for performance testing.

The online transaction processing workload (OLTP) typically seen in online booking, online banking, ATM transactions, retail sales, order entries is a heavy read–write mix (70r:30w) with v small IO size. Characterized by a large number of short online transactions (INSERT, UPDATE, DELETE), OLTP performance is measured in transactions per second hence throughput matters the most. Low latency is important.

OLAP, the online analytical processing is the data warehousing workloads where large amount of data is processed, streamed, combined, filtered for forecasting, data mining, online data queries etc. The workload is dealing with huge quantities of data. The original data is read but not modified. IO sizes are generally large and bandwidth dominates the performance characteristic.

Exchange is Windows mail server workload, characterized by 4K random  read write mix in the ratio of 2:1.

Backup workloads are large (256 K and up) sequential writes to a backup device, generally with a maximum of four concurrent threads, whereas video rich media are sequential large streamed reads. The streaming workload has a 256 K or larger IO size to improve bandwidth.

Decision support systems are of various types. The most common technology used to deploy the communication based DSS is a client-web server. Examples: chats and instant messaging software, online collaboration and net-meeting systems, search servers etc. DSS (decision support system) is often categorized as read-only workload, with few large sequential writes (‘batch’ and ETL). It’s measuring criteria is GB/sec (bandwidth).

Hope this gives you some idea on how you drill down various aspects of workload and build on it to get closer to your specific customer scenarios.

Did you find this information useful? Do you have something to add? I look forward to your feedback. Please reach out to on the forum!

Simple, secure S3 object storage software for modern applications