Testing Guidelines

Motivation

Tests are vital for software reliability and maintainability. Writing tests requires additional effort now, but saves considerable time in the long run. Tests enable us to modify code and quickly discover when we introduce errors [1]. Tests also provide future contributors with examples of how functions and classes were originally intended to be used.

Tests should be readable and maintainable. Well-written tests are easier to understand and modify when the behavior of a function or method is changed. Consequently, tests should be held to the same code quality standards as the rest of the package.

When bugs are discovered, they should be turned into test cases to prevent the bug from emerging again in the future [2].

Overview

Pull requests that create or change functionality must include tests and documentation before being merged.

PlasmaPy uses `pytest <https://docs.pytest.org>`_ for software testing. The test suite may be run locally or automatically via pull requests on GitHub. PlasmaPy undergoes continuous integration testing of the code base by Travis CI and AppVeyor, including code examples in docstrings. Codecov performs test coverage checks and shows whether or not each line of code is run during the test suite. CircleCI tests that the documentation can be successfully built. The results of the documentation test builds are displayed using Giles. PlasmaPy’s test suite is automatically run whenever a pull request to the main repository is made or updated.

Running Tests

Running tests on GitHub

The recommended way to run PlasmaPy’s full test suite when contributing code is to create a pull request from your development branch to PlasmaPy’s GitHub repository. The test suite will be run when the pull request is created and every time your development branch is subsequently updated.

Travis CI and AppVeyor run code tests and check that code examples in docstrings produce the expected output. Travis CI runs the tests in a Linux/MacOS environment whereas AppVeyor runs the tests in a Windows environment.

The results from Travis CI are used to generate test coverage reports which are displayed by Codecov. These reports show which lines of code are covered by tests and which are not, and allow us to write targeted tests to fill in the gaps in test coverage. The results displayed by Codecov will be marked as passing when the code coverage is sufficiently high.

Circle CI performs a test build of the documentation in both HTML and LaTeX formats, and reports any errors that arise.

If any inconsistencies with the PEP 8 style guide are found, then pep8speaks will comment on the pull request and update that comment as the pull request is updated.

Running tests from the command line

The recommended method for running the test suite locally on your computer is running

python setup.py test

in the repository’s root directory. This command will run all of the tests and verify that examples in docstrings produce the expected output. This command (which was enabled by integrating pytest with setuptools) ensures that the package is set up. These tests should be run in a Python environment in which PlasmaPy has not already been installed.

Command line options for pytest may be passed using the -a flag. For example, if you want to stop pytest after two test failures, return short traceback reports, and run tests only if the test path contains plasma and not blob, then run

python setup.py test -a "--maxfail=2 --tb=short -k 'plasma and not blob'"

One may also run pytest from the command line.

Some tests in the test suite can take a long time to run, which can slow down development. These tests can be identified with the pytest annotation @pytest.mark.slow. To skip these tests, execute pytest -m 'not slow'. To exclusively test the slow tests, execute pytest -m slow.

Running tests within Python

After installing PlasmaPy by running pip install plasmapy or python setup.py install, then PlasmaPy’s test suite may be run using

>>> import plasmapy
>>> plasmapy.test() 

Writing Tests

Pull requests must include tests of new or changed functionality before being merged.

Best practices for writing tests

The following guidelines are helpful suggestions for writing readable, maintainable, and robust tests.

  • Each function and method should have unit tests that check that it returns the expected results, issues the appropriate warnings, and raises the appropriate exceptions.
  • Bugs should be turned into test cases.
  • Tests are run frequently during code development, and slow tests may interrupt the flow of a contributor. Tests should be minimal, sufficient enough to be complete, and as efficient as possible.
  • Slow tests can be annotated with @pytest.mark.slow when they cannot be made more efficient.

Test organization and collection

Pytest has certain test discovery conventions that are used to collect the tests to be run.

The tests for each subpackage are contained in a tests subfolder. For example, the tests for particles are located in plasmapy/particles/tests. Test files should begin with test_ and generally contain the name of the module or object that is being tested.

The functions that are to be tested in each test file should likewise be prepended with test_ (e.g., test_atomic.py). Tests may also be grouped into classes. In order for pytest to find tests in classes, the class name should start with Test and the methods to be run as tests should start with test_. For example, test_particle_class.py could define the TestParticle class containing the method test_integer_charge.

Assert statements

  • Pytest often runs tests by checking assert statements.
def test_addition():
    assert 2 + 2 == 4

When assert statements raise an AssertionError, pytest will display the values of the expressions evaluated in the assert statement. The automatic output from pytest is sufficient for simple tests as above. For more complex tests, we can add a descriptive error message to provide context that can help us pinpoint the causes of test failures more quickly.

def test_addition():
    assert 2 + 2 == 4, "Addition is broken. Reinstall universe and reboot."

To make the error statement easier to read, the values of variables can be included in the error message by using f-strings.

def test_addition():
    result = 2 + 2
    expected = 4
    assert result == expected, f"2 + 2 returns {result} instead of {expected}."

Floating point comparisons

Comparisons between floating point numbers with == is fraught with peril because of limited precision and rounding errors. Moreover, the values of fundamental constants in astropy.constants are occasionally refined as improvements become available.

Using numpy.isclose when comparing floating point numbers and astropy.units.isclose for astropy.units.Quantity instances lets us avoid these difficulties. The rtol keyword for each of these functions allows us to set an acceptable relative tolerance. Ideally, rtol should be set to be an order of magnitude or two greater than the expected uncertainty. For mathematical functions, a value of rtol=1e-14 may be appropriate. For quantities that depend on physical constants, a value between rtol=1e-8 and rtol=1e-5 may be required, depending on how much the accepted values for fundamental constants are likely to change. For comparing arrays, numpy.allclose and astropy.units.allclose should be used instead.

Testing warnings and exceptions

Robust testing frameworks should test that functions and methods return the expected results, issue the expected warnings, and raise the expected exceptions. Pytest contains functionality to test warnings and test exceptions.

To test that a function issues an appropriate warning, use pytest.warns.

import pytest
import warnings

def issue_warning():
    warnings.warn("Beware the ides of March", UserWarning)

def test_issue_warning():
    with pytest.warns(UserWarning):
        issue_warning()

To test that a function raises an appropriate exception, use pytest.raises.

def raise_exception():
    raise Exception

def test_raise_exception():
    with pytest.raises(Exception):
        raise_exception()
        pytest.fail("Exception not raised.")

Test independence and parametrization

In this section, we’ll discuss the issue of parametrization based on an example of a proof of Gauss’s class number conjecture.

The proof goes along these lines:

  • If the generalized Riemann hypothesis is true, the conjecture is true.
  • If the generalized Riemann hypothesis is false, the conjecture is also true.
  • Therefore, the conjecture is true.

One way to use pytest would be to write sequential test in a single function.

def test_proof_by_riemann_hypothesis():
     assert proof_by_riemann(False)
     assert proof_by_riemann(True)  # only run if previous test passes

If the first test were to fail, then the second test will never be run. We would therefore not know the potentially useful results of the second test. This drawback can be avoided by making independent tests that will both be run.

def test_proof_if_riemann_false():
     assert proof_by_riemann(False)

def test_proof_if_riemann_true():
     assert proof_by_riemann(True)

However, this approach can lead to cumbersome, repeated code if you are calling the same function over and over. If you wish to run multiple tests for the same function, the preferred method is to use pytest’s parametrization capabilities.

@pytest.mark.parametrize("truth_value", [True, False])
def test_proof_if_riemann(truth_value):
     assert proof_by_riemann(truth_value)

This code snippet will run proof_by_riemann(truth_value) for each truth_value in truth_values_to_test. Both of the above tests will be run regardless of failures. This approach is much cleaner for long lists of arguments, and has the advantage that you would only need to change the function call in one place if something changes.

With qualitatively different tests you would use either separate functions or pass in tuples containing inputs and expected values.

@pytest.mark.parametrize("truth_value, expected", [(True, True), (False, True)])
def test_proof_if_riemann(truth_value, expected):
     assert proof_by_riemann(truth_value) == expected

Pytest helpers

A robust testing framework should test not just that functions and methods return the expected results, but also that they issue the expected warnings and raise the expected exceptions. In PlasmaPy, tests often need to compare a float against a float, an array against an array, and Quantity objects against other Quantity objects to within a certain tolerance. Occasionally tests will be needed to make sure that a function will return the same value for different arguments (e.g., due to symmetry properties). PlasmaPy’s utils subpackage contains the run_test and run_test_equivalent_calls helper functions that can generically perform many of these comparisons and checks.

The run_test function can be used to check that a callable object returns the expected result, raises the expected exception, or issues the expected warning for different positional and keyword arguments. This function is particularly useful when unit testing straightforward functions when you have a bunch of inputs and know the expected result.

Suppose that we want to test the trigonometric property that

\[\sin(\theta) = \cos(\theta + \frac{\pi}{2}).\]

We may use run_test as in the following example to check the case of \(\theta \equiv 0\).

from numpy import sin, cos, pi
from plasmapy.utils.pytest_helpers import run_test

def test_trigonometric_properties():
    run_test(func=sin, args=0, expected_outcome=cos(pi/2), atol=1e-16)

We may use pytest.mark.parametrize with run_test to check multiple cases. If run_test only receives one positional argument that is a list or tuple, then it will assume that list or tuple contains the callable, the positional arguments, the keyword arguments (which may be omitted), and the expected outcome (which may be the returned object, a warning, or an exception).

@pytest.mark.parametrize("input_tuple", [(sin, 0, cos(pi/2)), (sin, '.', TypeError)])
def test_trigonometry(input_tuple):
    run_test(input_tuple, atol=1e-16)

This parametrized function will check that sin(0) is within 1e-16 of cos(pi/2) and that sin('.') raises a TypeError.

We may use run_test_equivalent_calls to check symmetry properties such as

\[\cos(\theta) = \cos(-\theta).\]

This property can be checked for \(\theta = 1\) with the following code.

def test_cosine_symmetry():
    """Test that cos(1) equals cos(-1)."""
    plasmapy.utils.run_test_equivalent_calls(cos, 1, -1)

We may also use pytest.mark.parametrize with run_test_equivalent_calls to sequentially test multiple symmetry properties.

@pytest.mark.parametrize('input_tuple', [(cos, 1, -1), ([cos, pi/2], [sin, 0])])
def test_symmetry_properties(input_tuple):
    plasmapy.utils.run_test_equivalent_calls(input_tuple, atol=1e-16)

This parametrized function will check that cos(1) is within 1e-16 of cos(-1), and that cos(pi/2) is within 1e-16 of sin(0).

Please refer to the documentation for run_test and run_test_equivalent_calls to learn about the full capabilities of these pytest helper functions (including for testing functions that return Quantity objects).

Warning

The API within pytest_helpers is not yet stable and may change in the near future.

Fixtures

Fixtures provide a way to set up well-defined states in order to have consistent tests. We recommend using fixtures for complex tests that would be unwieldy to set up with parametrization as described above.

Code Coverage

PlasmaPy uses Codecov to show what lines of code are covered by the test suite and which lines are not. At the end of every Travis CI testing session, information on which lines were executed is sent to Codecov. Codecov comments on the pull request on GitHub with a coverage report.

Test coverage of contributed code

Code contributions to PlasmaPy are required to be well-tested. A good practice is for new code to have a test coverage percentage of at least about the current code coverage. Tests must be provided in the original pull request, because often a delayed test ends up being a test not written. There is no strict cutoff percentage for how high the code coverage must be in order to be acceptable, and it is not always necessary to cover every line of code. For example, it is often helpful for methods that raise a NotImplementedError to be marked as untested as a reminder of unfinished work.

Occasionally there will be some lines that do not require testing. For example, testing exception handling for an ImportError when importing an external package would usually be impractical. In these instances, we may end a line with # coverage: ignore to indicate that these lines should be excluded from coverage reports (or add a line to .coveragerc). This strategy should be used sparingly, since it is often better to explicitly test exceptions and warnings and to show the lines of code that are not tested.

Generating coverage reports locally

Coverage reports may be generated on your local computer by running

python setup.py test --coverage
coverage html

The coverage reports may be accessed by opening the newly generated htmlcov/index.html in your favorite web brower. These commands require the pytest and coverage packages to be installed.

Ignoring lines in coverage tests

Occasionally there will be lines of code that do not require tests. For example, it would be impractical to test that an ImportError is raised when running import plasmapy from Python 2.7.

To ignore a line of code in coverage tests, append it with # coverage: ignore. If this comment is used on a line with a control flow structure (e.g., if, for, and while) that begins a block of code, then all lines in that block of code will be ignored. In the following example, lines 3 and 4 will be ignored in coverage tests.

1
2
3
4
try:
    import numpy
except ModuleNotFoundError as exc:  # coverage: ignore
    raise RuntimeError from exc

The .coveragerc file is used to specify lines of code and files that should always be ignored in coverage tests.

Note

In general, untested lines of code should remain marked as untested to give future developers a better idea of where tests should be added in the future and where potential bugs may exist.

Footnotes

[1]In Working Effectively With Legacy Code, Michael Feathers bluntly writes: “Code without tests is bad code. It doesn’t matter how well written it is; it doesn’t matter how pretty or object-oriented or well-encapsulated it is. With tests, we can change the behavior of our code quickly and verifiably. Without them, we really don’t know if our code is getting better or worse.”
[2]In the chapter “Bugs Are Missing Tests” in Beyond Legacy Code, David Bernstein writes: “Every bug exists because of a missing test in a system. The way to fix bugs using TDD [test-driven development] is first write a failing test that represents the bug and then fix the bug and watch the failing test turn green.