Want to see Parasoft in action? Sign up for our Monthly Demos! See Demos & Events >>


Build API Performance from the Ground Up: Using Unit Tests to Benchmark API Component Performance

Build API Performance from the Ground Up: Using Unit Tests to Benchmark API Component Performance Reading Time: 7 minutes
Unit-level performance testing is a powerful practice you can easily achieve with a tight integration between your unit testing and performance testing tools, so you can understand the performance of the components you are integrating into your target application.

APIs were originally conceived as basic integration tools that enabled disparate applications to exchange data, but they have evolved into critical glue that binds multiple processes into a single application. Modern applications are aggregating and consuming APIs at a staggering pace in order to achieve business goals. In fact, the business logic of most modern applications now relies on some combination of APIs and third-party libraries, which means that the performance of end-to-end transactions is heavily dependent on the performance of the APIs and components that the application leverages.

Given their ability to make or break performance goals, key application components must undergo a rigorous performance evaluation as an integral part of the acceptance process. The sooner you understand the performance capabilities of an application’s key components, the easier it is to remediate problems and the more effectively you can ensure that the integrated application will meet its performance requirements.

Unit tests

Using unit tests to evaluate component performance provides a number of advantages, including the following: 

  • Unit tests offer a flexible but standardized way of testing at the component level.
  • Unit tests are well-understood and widely-used in development environments.
  • Typically, unit tests require only a fraction of the hardware resources necessary for testing the entire application. This means you can test the components at a maximum projected “Stress” level (see image below) early and more often with the hardware resources available in development environments.

Build API Performance from the Ground Up: Using Unit Tests to Benchmark API Component Performance

Fig. 1: Approximate load level and test duration proportions of typical performance testing scenarios

Despite these benefits, unit-level performance testing is often overlooked because most unit testing tools lack the capabilities commonly found in dedicated performance testing tools (e.g., the ability to set up and execute various performance test configurations, monitoring of system and application resources during the test, collecting and analyzing performance test results, etc.).

But you can get the best of both worlds by executing unit-level tests with traditional performance testing tools. Below, I’ll also give you a strategy to measure and benchmark the performance of the components that your team might integrate into your target application.

Establishing a component benchmarking workflow

The need for component-level benchmarking in development environments can arise at different stages of the software lifecycle and is driven by the following questions:

  • Which of the available third-party components not only satisfies the functional requirements, but also has the best performance? Should I use component A, B, C, etc. or implement a new one? Such questions usually occur at design and prototyping stages.
  • Which of the alternative code implementations is the most optimal from the performance perspective? Such questions usually occur during the development stage and are related to code developed internally.

A properly configured and executed component benchmark can help answer these questions. A typical component benchmark workflow (shown in Fig. 2 below) consists of:

  1. Creating unit tests for the benchmarked components.
  2. Selecting benchmark performance parameters (these are the same for all components).
  3. Executing the performance tests.
  4. Comparing the performance profiles of different components.

This is illustrated in Figure 2 below:

Build API Performance from the Ground Up: Using Unit Tests to Benchmark API Component Performance

Fig. 2: A typical component benchmark workflow

For a concrete example, let’s look at how to compare the performance of four JSON parser components: Jackson (streaming), JsonSmart, Gson, and FlexJson. Here, we will use JUnit as the unit testing framework and Parasoft Load Test as the load testing application, which is part of Parasoft SOAtest. The same strategy can be applied with other unit testing frameworks and load testing applications.

Creating unit tests for the benchmarked components

The JUnit test should invoke the component like the target application would. This applies to the component configuration, the choice of the component API methods, and the values and ranges of method arguments.

The desired level of results verification depends on the nature of the tests. As a rule, you would do more extensive unit test results verification for reliability tests, but perform a more basic level of verification for efficiency tests (since ‘heavy’ results verification logic may skew the performance picture). On the other hand, it is important to perform at least some verification to ensure that the component gets properly configured and invoked.

In our benchmark, the JSON content is loaded from a file at the beginning of the performance test and cached in memory. The JSON file has a size of 225KB and contains 215 top-level account description objects. Each account object has a “balance” name/value pair:

{        "id": "54b98c2b7b3bd64aae699040",        "index": 214,        "guid": "565c44b0-9e6d-4b8e-819c-48aa4dd9d7c2",        "balance": "$3,809.46",    ...    }

The basic level of component functionality verification is implemented in the following way: the JUnit test invokes the parser API to find all the “balance” elements inside the JSON content and calculate the total balance of all account objects in the JSON document. The calculated balance is then compared with the expected value:

public class JacksonStreamParserTest extends TestCase {        @Test        public void testParser() throws IOException {                   float balance = 0;            JsonFactory jsonFactory = new JsonFactory();            String json = JsonContentProvider.getJsonContent();            JsonParser jp = jsonFactory.createJsonParser(json);                       while (jp.nextToken() != null) {                               String fieldname = jp.getCurrentName();                if (fieldname != null && fieldname.equals("balance")) {                    jp.nextToken();                                       balance += TestUtil.parseCurrencyStr(jp.getText());                  }            }            TestUtil.assertBalance(balance);               }    }

Because we are comparing parsers for efficiency, we are using lightweight resource verification to avoid distorting the performance picture. If you choose to do some heavyweight results verification, you can compensate for it in the same way that we compensate for the resource consumption of the performance testing framework, described below. However, this will require a more elaborate preparation of the Baseline Scenario.

Compensating for the testing framework

Because we are comparing the performance of components that run in the same process as the container performance testing application, we need to separate the share of the system-level and application-level resources consumed by this application and the JUnit framework from the share consumed by the component itself. To estimate this share, we can run a benchmark load test scenario with an empty JUnit test:

public class BaselineTest extends TestCase {        @Test        public void testBaseline() {                       }    }

If the resource utilization levels of the baseline scenario are not negligible, we need to subtract these levels from the levels of the component benchmark runs to compensate for the share of resources consumed by the testing framework.

Selecting and configuring benchmark performance parameters

The test environment setup should emulate the major parameters of the target environment, such as the operating system, the JVM version, JVM options such as the GC options, the server mode, and so forth. It may not always be possible or practical to reproduce all of the deployment parameters in the test environment; however, the closer it is, the less of chance that the component performance during the test will differ from that on the target environment.

Understanding concurrency, intensity, and test duration

The major performance test parameters that determine the conditions under which the components will be tested are load level (defined as load intensity and load concurrency) and load duration (see Fig. 3 below).

To shape these generic performance test parameters into concrete forms, start by examining the performance requirements of the target application where you anticipate using this component. Application-level performance requirements can provide concrete or approximate characteristics of the load level to which the component should be subjected. Translating application performance requirements into component performance requirements, however, presents multiple challenges. For example, if the application performance requirement states that it must respond within M milliseconds at a load level of N users or requests per second, how does this translate to performance requirements for a component that is a part of this application? How many requests to a specific component will one request to the application generate?

If there is an older version of the application, you can make an educated guess by tracing a few application-level calls or by examining call trace statistics collected by an APM (Application Performance Management) tool. If neither of these options are available, the answer can come from examining the application design.

If the component load parameters cannot be deduced from the target application performance specifications or if such a specification is not available, then an alternative choice for a benchmark is to run at the maximum load level that can be achieved on the hardware available for the test. However, the risk involved in this approach, when taken without any regard for the expected load levels, is that the benchmark results may not be relevant in the context of the target application.

In any case — but particularly for the benchmarks where the load levels are driven by the performance limits of the hardware available to the test — be aware of how the resource overutilization impacts the test results. For example, when comparing component execution times, it is generally advisable to stay below the 75-80% CPU utilization level. Increasing the utilization above that level can adversely affect the accuracy of test execution time measurement and may distort the benchmark results.

Often, the aggregate performance testing parameter ‘load level’ is not explicitly separated into its major parts: intensity and concurrency (see Fig. 3). This can lead to an inadequate performance test configuration.

Build API Performance from the Ground Up: Using Unit Tests to Benchmark API Component Performance

Fig. 3: The major performance test parameters: intensity, concurrency, and duration

Load Intensity is the rate of requests or test invocations at which the component will be tested. In a performance testing application, the load intensity can be configured directly by setting the hit/test per second/minute/hour levels of a performance test scenario. It can also be configured indirectly as a combination of the number of virtual users and the virtual user think time.

Load Concurrency (in our context) can be described as the degree of parallelism with which a load of a given intensity is applied. Concurrency level can be configured by the number of virtual users or threads in a load test scenario. Setting an appropriate concurrency level is essential when testing for potential race conditions and for performance problems due to thread contention and access to shared resources—as well as for measuring the memory footprint of multiple instances of the component.

Test Duration for a component benchmark test depends on the test goals. When approached from the mathematical standpoint, one of the major factors in determining the test duration is the statistical significance of the load test data set. However, this type of approach may be too complicated for everyday practical purposes. 

To read the rest of the blog, view the whitepaper here.

Written by

Sergei Baranov

Sergei is a principal software engineer at Parasoft, focusing on load and performance testing within Parasoft SOAtest.