Want to see Parasoft in action? Sign up for our Monthly Demos! See Demos & Events >>
Want to see Parasoft in action? Sign up for our Monthly Demos! See Demos & Events >>
If code coverage is an issue for you, make sure you’re measuring it right, and measuring all of it from all the tests you run. Leverage automatic JUnit code coverage test case generation to quickly build and expand your tests to get meaningful, maintainable complete code coverage. Unit test coverage is a great way to make sure you’re measuring everything correctly.
I recently wrote about how easy it is to fall into the trap of chasing code coverage percentages, which led to quality discussions, so I thought I would take a deeper dive into code coverage problems and solutions. Specifically, the coverage number itself, the value of auto-generated JUnit tests, and how you can identify unit tests that have problems. And how to keep doing a better job with execution.
Let’s start with the coverage metric itself and how we count code coverage. Code coverage numbers are often meaningless, or at best misleading. If you “do” happen to have 100% code coverage, what does it even mean? How did you measure it?
There are lots of different methods to measure coverage.
One way to measure code coverage is from a requirements perspective. Do you have a test for each and every requirement? This is a reasonable start… but it doesn’t mean that all of the code was tested.
Another way to measure code coverage (don’t laugh, I actually hear this in the real world) is by the number of passing tests. Really, I mean it! This is a pretty awful metric and obviously meaningless. Is it worse or better than simply counting how many tests you have? I couldn’t say.
Then we come to trying to determine what code was executed. Common coverage metrics include statement coverage, line coverage, branch coverage, decision coverage, multiple condition coverage, or the more comprehensive MC/DC or Modified Condition / Decision Coverage.
The simplest method, of course, is line coverage, but as you have probably seen, tools measure this differently, so the coverage will be different. And executing lines of code doesn’t mean you’ve checked all the different things that can happen in that line of code. That’s why safety-critical standards like ISO 26262 for automotive functional safety and DO-178B/C for airborne systems require MC/DC.
Here’s a simple code example, assuming x, y, and z are booleans:
If ( (x||y) && z) { doSomethingGood(); } else {doSomethingElse();}
In this case, no matter what my values are, the line has been “covered.” Admittedly, this is a sloppy way to code by putting everything on one line, but you see the problem. And people actually write code this way. But let’s clean it up a bit.
If ( (x||y) && z) {
doSomethingGood();
} else {
doSomethingElse(); /* because code should never doSomethingBad() */
A simple glance might lead me to the conclusion that I just need two tests – one that evaluates the entire expression to TRUE and executes doSomethingGood() (x=true, y=true, z=true), and another test that evaluates to FALSE and executes doSomethingElse() (x=false, y=false, z=false). Line coverage says we’re good to go, “Everything was tested.”
But wait a minute, there are different ways the main expression can be tested:
Value of x | Value of y | Value of z | Value of decision |
False | False | True | False |
False | True | True | True |
False | True | False | False |
True | False | True | True |
This is a simple example, but it illustrates the point. I need 4 tests here to really cover the code properly, at least if I care about MC/DC coverage. Line coverage would have said 100% when I was half done. I’ll leave the longer explanation about the value of MC/DC for another time. The point here is that no matter what method you use to measure coverage, it’s important that what you’re validating through assertions is meaningful.
Another trap many fall into is to add an unsophisticated tool to automatically generate unit tests.
Simple test-generation tools create tests that execute code without any assertions. This keeps the tests from being noisy, but all it really means is that your application doesn’t crash. Unfortunately, this doesn’t tell you if the application is doing what it’s supposed to, which is very important.
The next generation of tools works by creating assertions based on any particular values they can automatically capture. However, if the auto-generation creates a ton of assertions, you end up with a ton of noise. There is no middle ground here. You either have something that is easy to maintain but meaningless or a maintenance nightmare that is of questionable value.
Many open-source tools that automatically generate unit tests look valuable at first because your coverage goes up very quickly. It’s in the maintenance that the real problems occur. Often times during development, developers will put in extra effort to fine-tune the auto-generated assertions to create what they think is a clean test suite. However, the assertions are brittle and do not adapt as the code changes. This means that developers must perform much of the “auto” generation over again the next time they release. Test suites are meant to be reused. If you can’t re-use them, you’re doing something wrong.
This also doesn’t cover the scarier idea that in the first run when you have high coverage, the assertions that are in the tests are less meaningful than they should be. Just because something can be asserted, doesn’t mean it should be, or that it’s even the right thing.
public class ListTest {
private List<String> list = new ArrayList<>();
@Test
public void testAdd() {
list.add(“Foo”);
assertNotNull(list);
}
}
Ideally, the assertion is checking that the code is working properly, and the assertion will fail when the code is working improperly. It’s really easy to have a bunch of assertions that do neither, which we’ll explore below.
If you’re shooting for a high-coverage number at the expense of a solid, meaningful, clean test suite, you lose value. A well-maintained suite of tests gives you confidence in your code and is even the basis for quickly and safely refactoring. Noisy and/or meaningless tests mean that you can’t rely on your test suite, not for refactoring, and not even for release.
What happens when people measure their code, especially against strict standards, is that they find out they’re lower than they want to be. And often this ends up with them chasing the coverage number. Let’s get the coverage up! And now you can get into dangerous territory by either an unreasonable belief that automated JUnit testing has created meaningful tests or by creating unit tests by hand that have little meaning and are expensive to maintain.
In the real world, the ongoing costs of maintaining a test suite far outweigh the costs of creating unit tests, so it’s important that you create good clean unit tests in the beginning. You’ll know this because you’ll be able to run the tests all the time as part of your continuous integration (CI) process. If you only run the tests at release, it’s a sign that the tests are noisier than they should be. And ironically this makes the tests even worse because they’re not being maintained.
Software testing automation isn’t bad — in fact, it’s necessary, with the complexity and time pressures that are common today. But auto-generation of values is usually more hassle than it’s worth. Automation based on expanding values, monitoring real systems, and creating complex frameworks, mocks, and stubs provide more value than mindless creation of assertions.
The first step is to measure and get a report on your current coverage, otherwise, you won’t know where you’re at and if you’re getting better. It’s important to measure all testing activities when doing this, including unit, functional, manual, etc., and aggregate the coverage properly. This way, you’ll be putting your effort into where it has the most value – on code that isn’t tested at all, rather than code that is covered by your end-to-end testing but doesn’t happen to have a unit test. Parasoft can accurately aggregate code coverage from multiple runs and multiple types of tests to give you an accurate measure of where you’re at. For more on this, check our whitepaper.
Tools that create unit test skeletons for you are a good way to start. Make sure those tools connect to common mocking frameworks like Mockito and PowerMock, because real code is complicated and requires stubbing and mocking. But that’s not enough – you need to be able to:
You can do all of these things manually, but it takes too much time and effort. This is an excellent place to leverage automation – for example, Parasoft’s unit testing solution provides automatic recommendations in real-time in the IDE, integrating with open source frameworks (JUnit, Mockito, PowerMock, etc.) to help the user create, scale, and maintain their JUnit test suite and provide broader coverage. If you’re curious about this technology, you can learn more about why people hate unit testing and how to bring back the love.
If coverage is an issue for your project, make sure you’re measuring it right, and measuring ALL of it from all the tests you run. And as you start expanding your coverage with unit tests, you can leverage guided test creation to quickly create and expand your tests to get meaningful maintainable code coverage. Parasoft Jtest will create tests that are maintainable as your code grows and changes, so you’re not doing the same work over and over again.
Arthur has been involved in software security and test automation at Parasoft for over 25 years, helping research new methods and techniques (including 5 patents) while helping clients improve their software practices.