How to Test Your Dragon: Breaking Down a Team Test Strategy for Distributed Systems, Part 2

Created by: Richard Kevin Kabiling 19min read
Dec 12, 2022

Previously

In part one of the series, we established that despite automation being an effective tool that can boost software quality and team performance, unsystematically testing the system may also bring in effects that reduce its effectiveness.

We also dissected an application and its backing system down to a plausible architecture of a single system component to better understand a system under test and the different integration tiers within an architecture.

Finally, we discussed a strategy on a high level that lends direction and guidance to a possibly more detailed testing strategy.

We are now ready to move on forward and dive into the specifics!

Strategy in Detail

As previously underlined, the test pyramid provides an excellent structure for our test strategy because it effectively establishes the relationship between the component integration spectrum and test development and maintenance cost.

Aligning with the test pyramid, we will further group these tests into low-level and high-level tests.

A comparison of low-level and high-level tests

Low-Level TestsHigh-Level Tests
Test Objectparts of an application via direct code invocationone or more integrated system components (running applications) via their interfaces and environments
Coverage Focusline coverage

branch coverage
acceptance coverage

technical acceptance coverage
Test Typesunit tests

white box integration tests. Examples: application white box integration tests, infrastructure black box integration tests
black box integration tests (or acceptance tests) distributed across test suites that focus on discrete levels of integration. Examples:
system component black box integration, system end-to-end black box integration, system end-to-end with UI black box integration
Environment and Dependencieslocally or in CI, dependencies in code are mocked unless mocking proves to be too expensive, in which case, collaborators are emulated or stubbed (i.e., infrastructure dependencies)locally or in CI, dependencies external to the scope and environment are emulated or stubbed

in a test environment, a subset of the tests may optionally be run

in a production environment, an even smaller subset of the tests may optionally be run as smoke or sanity tests

📘

The term integration test is used very liberally in the community and can mean a lot of things:

  • testing two classes
  • testing a function or a method integrated with a channel
  • testing a function or a method integrated with an infrastructure component
  • testing a fully integrated application or component
  • testing multiple fully integrated components

While these examples of integration tests are helpful, the ambiguity does not bring much to the discourse. To make it specific, I will use qualifiers to communicate better what I mean when I use the term integration test.

Please see the examples shown below from this repository.

Low-Level Tests

Low-level tests interact directly with code components that make up a single application. These code components are very low-level and could be classes, functions, and some combination of them. In the context of hexagonal architecture, this means testing the different code components found in each "layer" of the architecture. Precisely these will be the tests that the strategy recommends:

  • unit tests
  • white box integration tests

Because we are only testing parts of the application, these tests are generally:

  • very fast
  • very simple
  • very cheap to write

Structurally, white box tests are characterized by either one of the following (or both).

  • direct function or method invocation
  • usage of mocks or fakes

Most importantly, these tests focus on line and branch coverage on all code paths written by the engineers-- emphasis on written by the engineers. This guarantees validity at the lowest levels of the system.

Core Domain

    @Test
    void throwsWhenAccountNotFound() {
        var currency = Currency.getInstance("PHP");

        var accountId = new AccountId(UUID.randomUUID().toString());
        when(retrieveAccountPort.findAccount(accountId)).thenReturn(Mono.empty());

        var merchant = merchant(currency, 200L);
        when(retrieveMerchantPort.findMerchant(merchant.id())).thenReturn(Mono.just(merchant));

        var command = new PayCommand(
                accountId,
                merchant.id(),
                500,
                currency
        );
        var throwable = catchThrowableOfType(() -> payUseCase.pay(command).block(), SourceAccountNotFoundException.class);

        assertThat(throwable.getSourceId()).isEqualTo(accountId);
    }

An example unit test that tests a pay function throws a SourceAccountNotFoundException

The core domain holds the main business logic of the application. Mainly, this means a lot of the work here involves validation, processing, and orchestration instead of integration. Therefore, if done right, it will most likely contain the bulk of the logic written by the team. Because of this, most tests in the core domain layer are unit tests, i.e., fast, isolated, portable, and explicit function-wise tests with many of the dependencies mocked, faked, or stubbed.

@SpringBootTest(classes = {
        PaymentService.class,
        ValidationAutoConfiguration.class
})
class PayUseCaseTest {
    ...
    
    @ParameterizedTest
    @MethodSource
    void throwsWhenInvalidParameters(PayCommand command, String messageSegment) {
        var throwable = catchThrowableOfType(() -> payUseCase.pay(command), ConstraintViolationException.class);

        assertThat(throwable).hasMessageContaining(messageSegment);
    }

    static Stream<Arguments> throwsWhenInvalidParameters() {
        var sourceId = new AccountId(UUID.randomUUID().toString());
        var merchantId = new MerchantId(UUID.randomUUID().toString());
        var amount = 0;
        var currency = Currency.getInstance("PHP");

        return Stream.of(
                of(null, "pay.command: must not be null"),
                of(new PayCommand(null, merchantId, amount, currency), "pay.command.sourceId: must not be null"),
                of(new PayCommand(sourceId, null, amount, currency), "pay.command.merchantId: must not be null"),
                of(new PayCommand(sourceId, merchantId, -1L, currency), "pay.command.amount: must be greater than or equal to 0"),
                of(new PayCommand(sourceId, merchantId, amount, null), "pay.command.currency: must not be null"),
                of(new PayCommand(new AccountId(null), merchantId, amount, currency), "pay.command.sourceId.value: must not be empty"),
                of(new PayCommand(new AccountId(""), merchantId, amount, null), "pay.command.sourceId.value: must not be empty"),
                of(new PayCommand(sourceId, new MerchantId(null), amount, null), "pay.command.merchantId.value: must not be empty"),
                of(new PayCommand(sourceId, new MerchantId(""), amount, null), "pay.command.merchantId.value: must not be empty")
        );
    }

An example white box integration test that loads validation configuration and tests validation annotations using parameterized tests

In the case that there is functionality that we can't explicitly test using unit tests, such as in the case of annotation-based validation in Java and Spring (JSR-303), we may write additional white box integration tests, i.e., white box tests (as we defined) that also bring in other active components and configurations into the mix during testing.

📘

These white box integration tests tend to blur the definition of integration and unit tests. Thus in some schools of thought, they are still considered unit tests. To be explicit, I have chosen to stick with the term integration test.

We may also introduce additional unit tests to bolster coverage for utilities, strategies, and similar constructs.

Infrastructure Adapters

static {
        TestEnvironment.start();
    }
 @Test
    void retrievesWhenExists() {
        var currency = Currency.getInstance("PHP");
        var id = new AccountId(UUID.randomUUID().toString());
        var record = new AccountRecord(id.value(), 2000L, currency.getCurrencyCode(), currency.getDefaultFractionDigits(), null);
        repository.save(record).block();

        var result = port.findAccount(id).block();

        var expected = new Account(id, 2000L, currency, 0L);
        assertThat(result).usingRecursiveComparison()
                .isEqualTo(expected);
    }

An infrastructure white box integration test that connects directly with a database (using docker)

The infrastructure layer of the hexagonal architecture focuses on integration with infrastructure components and other services like databases, queues, and 3rd party APIs. Typically this involves the adapter converting domain objects into requests and then dispatching them to SDKs to perform its tasks.

The preferred tests for the infrastructure layer are infrastructure white box integration tests. In particular, SDKs are notorious for being very difficult to mock (case in point: AWS SDK), and mocking, to this degree, is tedious, challenging to follow, and unreadable.

Let's break this down. In the implementation of infrastructure white box integration tests, we prefer the following:

  • The methods of the adapter are directly invoked; for simplicity, tests may be limited to the primary adapter interface and should already cover most of the orchestration – if any.
  • the infrastructure layer should connect to a local emulation of the system to keep things portable and predictable
  • the return values from the function invocation are verified
  • the infrastructure emulation state is verified

See the following examples.

Test ObjectTest
UserRepositoryAdapter.save(...) invokes multiple repository instances that map directly to a PostgreSQL table (UserRepository, AddressRepository, ContactDetailsRepository, FriendshipRepository) to save a User.The test starts up an embedded HSQL database with PostgreSQL emulation (or a PostgreSQL container in docker compose).

The test invokes the save method directly to save a user.

The test queries PostgreSQL directly via some existing infrastructure code or framework to verify that the User is saved.

The test cleans up the environment.
ForeignExchangeServiceAdapter.list invokes a 3rd party API to list current foreign exchange rates.The test starts an embedded Wiremock instance (or a Wiremock container in docker compose) and stubs the return values.

The test invokes the list method directly to retrieve rates.

The test validates that the returned list is as expected.

The test cleans up the environment.
FileStorageAdapter.upload uploads a file to AWS S3.The test starts up a localstack instance in docker compose and creates the appropriate buckets.

The test invokes the upload method directly.

The test validates that the file was indeed uploaded to localstack S3.

The test cleans up the environment.

A table illustrating examples of white box integration tests for the infrastructure adapters

🚧

Spinning up large infrastructure components during the test may take time. To maintain a low execution duration, the tests should spin up the environments carefully and cache them across the test lifecycle.

Similar to the core domain, we may introduce additional unit tests to bolster coverage for utilities, strategies, and similar constructs. Furthermore, while not recommended, we may add more unit tests to the adapter and its dependencies if coverage is insufficient.

Application Adapters

    @Test
    void returnsErrorOnSourceNotFound() {
        var sourceId = new AccountId(UUID.randomUUID().toString());
        var merchantId = new MerchantId(UUID.randomUUID().toString());
        var amount = 500L;
        var currency = Currency.getInstance("PHP");
        var command = new PayCommand(sourceId, merchantId, amount, currency);

        given(payUseCase.pay(command)).willThrow(new SourceAccountNotFoundException(sourceId));
        var request = new PaymentRequest(sourceId.value(), merchantId.value(), new BigDecimal("5.00"), currency);
        var result = webClient.post()
                .uri("/payments")
                .body(Mono.just(request), PaymentRequest.class)
                .exchange()
                .expectStatus().isEqualTo(422)
                .returnResult(ErrorResponse.class)
                .getResponseBody()
                .blockFirst();

        assertThat(result.code()).isEqualTo("SourceAccountNotFoundException");
        assertThat(result.message()).isEqualTo("Source account not found");
        assertThat((result.details().get("sourceId"))).isEqualTo(sourceId.value());
    }

An example application integration test that tests the POST /payments endpoint and uses a mocked use case

The application layer of the hexagonal architecture focuses on getting input from various channels and dispatching them as commands to the core domain layer for processing. The abstraction of an input adapter also sometimes involves middlewares, handlers, decorators, or filters apart from the adapter itself. These structures augment the adapter's behavior to add common concerns like parsing, security, logging, validation, error handling, etc.

The preferred tests for the application layers are application white box integration tests. These tests go through the integrating channel to test the functionality offered by the input adapters and all the configured middleware. Because the domain is not the prime concern here, we prefer to mock its use cases.

More specifically, we very strongly recommend the following when implementing application white box integration tests:

  • The adapter is invoked via the channel it integrates to assure that relevant middlewares and handlers are invoked. For simplicity, the tests may also cover the dependencies of the adapter.
  • The core domain components that the application adapter drives are mocked
  • The return values from the channel are verified
  • The state of the mock is verified

See the following examples.

Test ObjectTest
A POST /users HTTP endpoint invokes SaveUserUseCase.save(…) to save usersThe SaveUserUseCase is mocked.

The HTTP server hosting the endpoint is started.

The test invokes the POST /users endpoint via HTTP.

The test validates the HTTP response.

The test validates the mock state.
A PaymentEventListener listens to payment events from the payment-events topic and invokes SummarizePaymentStatsUseCase.summarize to compute summary statistics of paymentsSummarizePaymentStatsUseCase is mocked.

The Kafka server is started.

The Kafka PaymentEventListener is started.

The test dispatches a payment event to the payment-events topic.

The test validates the mock state.

A table illustrating examples of white box integration tests for the application adapters

🚧

Spinning up listeners, servers, and infrastructure components during the test may take time. To maintain a low execution duration of these tests, the spinning up of these listeners, servers, and components should be carefully planned and cached across the test lifecycle.

Similar to the core domain, we may introduce additional unit tests to bolster coverage for utilities, strategies, and similar constructs. Furthermore, while not recommended, more unit tests may be added to the adapter and its dependencies in case coverage are insufficient.

Utilities, Strategies, and Similar Constructs

    @ParameterizedTest
    @MethodSource("source")
    void areConvertedFromBigDecimalToLong(Currency currency, long amountInLong, BigDecimal amountInBigDecimal) {
        assertThat(Amounts.fromAdjustedAmount(currency, amountInLong)).isEqualTo(amountInBigDecimal);
    }

    @ParameterizedTest
    @MethodSource("source")
    void areConvertedFromLongToBigDecimal(Currency currency, long amountInLong, BigDecimal amountInBigDecimal) {
        assertThat(Amounts.toAdjustedAmount(currency, amountInBigDecimal)).isEqualTo(amountInLong);
    }

    public static Stream<Arguments> source() {
        return Stream.of(
                of(Currency.getInstance("PHP"), 4500, new BigDecimal("45.00")),
                of(Currency.getInstance("JPY"), 45, new BigDecimal("45")),
                of(Currency.getInstance("JOD"), 45000, new BigDecimal("45.000")),
                of(Currency.getInstance("PHP"), 500, new BigDecimal("5.00")),
                of(Currency.getInstance("JPY"), 500, new BigDecimal("500")),
                of(Currency.getInstance("JOD"), 500, new BigDecimal("0.500"))
        );

    }

Example parameterized unit tests for a utility function used for converting from BigDecimal to Long based on the currency.

As previously mentioned, some functionalities are abstracted within the application into utility functions, strategies, or similar constructs. The general recommendation for these is to unit test them accordingly.

Low-Level Tests in Summary

Low-level tests are white box tests that focus on code. More specifically, they should:

  • should test different parts of the application based on the architecture (core domain layer, application layer, infrastructure layer)
  • should test code that integrates with mocks, emulators, and stubs
  • should focus on line and branch coverage, not necessarily on high-level requirements

These are the low-level tests we discussed.

UnitApplication White Box IntegrationInfrastructure White Box Integration
Application Sectioncore domain

utilities, strategies, and similar components
application adaptersinfrastructure adapters
Environment and Dependenciesdependencies are mockedcore domain use cases are mockedenvironment is emulated
Test Specificsfunctions are invoked directly

validated against return value and mock states
invoked via channel or protocol

validated against channel response and mock states
functions are invoked directly

validated against return value and emulated environment state

High-Level Tests

High-level tests are all black box integration tests that interact directly with one or more real running applications, their exposed interfaces, and their environments – in contrast with low-level tests that directly interact with code.

Characteristics

Because we are dealing with one or more applications at a time, these high-level black box integration tests generally:

  • have larger environments
  • take more resources
  • take more time to execute
  • require more orchestration

As previously established, because of these reasons, these tests cost more and thus should trend to a lesser number.

Structurally, these high-level black box integration tests are characterized by the following:

  • interacts with the application via its interfaces and its environments
  • spins up real running applications for the objects under test and their emulated environments.

Despite this, it is of utmost importance to keep the tests completely runnable locally and consequently in CI to keep the test portable, consistent and reliable.

📘

While it is mandated that high-level tests are runnable locally and CI, they may be engineered so that a subset may be run against a test environment and an even smaller subset against production as smoke tests or sanity tests. In summary:

  • tests should be runnable locally
  • tests should be runnable in a CI environment
  • a subset of tests may be configured to run against a test environment
  • an even smaller subset of tests may be configured to run against production as smoke or sanity tests

Multiple Suites, Multiple Integration Levels

We can run black box integration tests against various integration levels. To keep things organized and efficient, primarily when spinning up dependencies, it is to our benefit to separate these tests across suites that focus specifically on just one integration level.

Typically tests on lower integration levels are more straightforward. However, these tests may not be able to cover a whole journey or hit more components (possibly miss the UI too). But what they lack in terms of range, they make up with simplicity and depth of scope, meaning these tests would be able to cover fewer functionalities more deeply via more straightforward tests in more significant numbers.

These are the various levels of integration that we will focus on.

System ComponentSystem End-to-EndSystem End-to-End with UI
Integration LevelA single running back-end applicationOne or more running back-end applicationsOne or more running back-end applications, including the UI
Scenariosconsumption of propagated state

propagation of state to outside the application

typically sections of a user journey

technical requirements of a single service
state propagation across applications

user journeys that cut across multiple applications

technical requirements that require orchestration of various services
state propagation across applications

user journeys from the UI

technical requirements that need orchestration of multiple services
Environment and DependenciesAll collaborators of the application are emulated or stubbedAll 3rd party collaborators of the system are mockedAll 3rd party collaborators of the system are mocked
Typical ProcedureThe environment is spun up

The application is spun up

The environment is configured

The application receives a request and possibly returns a response

The test validates against the response

The test validates against the application state

The test validates against the environment
The environment is spun up

The application is spun up

The environment is configured

The applications are orchestrated

The test validates against the responses

The test validates against the application state of multiple applications

The test validates against the environment
The environment is spun up

The application is spun up

The environment is configured

The applications are orchestrated

The UI is automated

The test validates against the UI state

The test validates against the application state of multiple applications

The test validates against the environment

More Granular Slices, Maybe?

Depending on the size of the system under test and the team that manages and maintains it, it may be appropriate to have test suites for more or less discrete levels of integration.

The strategy does not mandate or detract against it and leaves this decision to the maintenance team. More specifically, these slices of the system may be tested in a separate test suite. The boundaries of the slice are emulated or stubbed.

Very Important, Coverage Focus

More importantly, in contrast with low-level tests that focus specifically on code coverage, high-level tests, on the other hand, concentrate heavily on acceptance coverage. This means that the scenarios these tests implement disregard low-level implementation details and only focus on requirements from the users' perspective that use the system or team practice and governing bodies within the organization or the industry. Particularly:

  • Business acceptance requirements - these include the functional requirements that most users of the system are concerned with.
  • Technical acceptance requirements - these include non-functional requirements that most governing bodies are concerned with. These include handling resilience, security, etc. These requirements may teeter close to non-functional use cases. Still, it is up to the discretion of the team to determine what's appropriate and what should be separated in another test suite (e.g., load testing and performance will most likely live in a nun-functional performance test suite; live reliability and availability checks will most likely live in a chaos engineering test suite or script).

We can map these requirements directly to one or more scenarios. We then implement these scenarios and include them in one of the test suites depending on the required level of integration.

Again, Not Line or Branch Coverage

Although these tests still yield line and branch coverage, this is no longer of concern at this level since the expectation is that low-level tests have already covered them perfectly fine. Especially at this level, winding code paths might make acceptance tests unwieldy and unreadable.

Because of the shift in focus, coverage will naturally overlap, whether in acceptance coverage or line/branch coverage. Because of this, it might be essential to treat these coverage foci independently and make sure that:

  • low-level tests focus on coverage
  • high-level tests alone focus on acceptance requirements

This mindset prevents discussions from devolving into unproductive arguments about scenarios or coverage overlaps.

A Huge Note on Tooling

Feature: Payment
  Scenario: Successful Payment
    Given the following accounts exist:
      | id | currency | balance |
      | A  | PHP      | 500.00  |
      | B  | PHP      | 250.00  |
    And the following merchants exist:
      | id | account id |
      | X  | B          |
    When the client pays PHP 50.00 using account "A" to merchant "X"
    Then the payment is accepted
    And the payment is saved
    And the payment has 2 transaction entries
    And the payment has a DEBIT transaction entry of PHP 50.00 on account "A"
    And the payment has a CREDIT transaction entry of PHP 50.00 on account "B"
    And the account "A" balance is PHP 450.00
    And the account "B" balance is PHP 300.00

  Scenario: Failed Payment due to Non-existent Account
    Given the following accounts exist:
      | id | currency | balance |
      | A  | PHP      | 500.00  |
      | B  | PHP      | 250.00  |
    And the following merchants exist:
      | id | account id |
      | X  | B          |
    When the client pays PHP 50.00 using account "C" to merchant "X"
    Then the payment is unprocessable
    And the payment error code is "SourceAccountNotFoundException"
    And the payment error message is "Source account not found"

  Scenario: Failed Payment due to Non-existent Merchant
    Given the following accounts exist:
      | id | currency | balance |
      | A  | PHP      | 500.00  |
      | B  | PHP      | 250.00  |
    And the following merchants exist:
      | id | account id |
      | X  | B          |
    When the client pays PHP 50.00 using account "A" to merchant "Y"
    Then the payment is unprocessable
    And the payment error code is "MerchantNotFoundException"
    And the payment error message is "Merchant not found"

An example application acceptance tests for a simple payment feature that uses Gherkin

Because high-level black box integration tests are acceptance tests, usage of acceptance test frameworks like Cucumber, which uses Gherkin, or Gauge, which uses Markdown, is recommended.

These frameworks and how they enforce the writing of tests facilitate detaching of lower technical details and allow even non-engineers to get a good view of the acceptance requirements and scenarios. However, this is just a strong recommendation and not mandated by the strategy.

High-Level Tests in Summary

High-level black box integration tests focus on business and technical acceptance. More specifically, they should:

  • should be separated into multiple test suites that focus on different integration levels of the system (application, end-to-end, etc.)
  • should test real running applications that collaborate with emulators and stubs
  • should focus on business and technical acceptance coverage and not on line and branch coverage

What's Next?

Even when prioritizing implementing this test strategy, we will always have to refer back to the test pyramid and its recommendations.

Because low-level tests are the foundation of the strategy, making sure that they are numerous and well crafted is of utmost importance. Only then should we start concerning ourselves with tests focusing on more integrated views of the system. Meaning we have to prioritize in this order:

  • unit tests
  • white box integration tests (application, infrastructure)
  • system component (just 1)
  • end-to-end
  • end-to-end with UI

As familiarity betters and maturity increases, we can now focus on strengthening the test strategy by possibly introducing more test suites that focus either on non-functional requirements (performance, load testing) or on other levels of integration (system slices).

Summary

The test pyramid establishes the relationship between the component integration spectrum and cost (development and maintenance). Because of this, it provides an excellent guide for our test strategy.

In detail, the strategy dissects a system in varying levels of integration and suggests corresponding tests for each of them. We have defined two main groups of tests, low-level tests that focus on line and branch coverage and high-level tests that focus on requirements and acceptance coverage. In more specific detail, these are the tests previously discussed.

TestTypeObject under TestTest Structure
UnitLow-Levelcore domain classes and functions

utilities, strategies, and similar structures in the code
test code directly

mocks collaborators

validates against returned value & mock state
Infrastructure White Box IntegrationLow-Levelinfrastructure adapterstest code directly

uses emulated environment component or embedded alternative

validates against returned value & environment state
Application White Box IntegrationLow-Levelapplication adapterstests through the channel (HTTP, GRPC, topic, queue, etc.)

mocks core domain use cases

uses channel infrastructure (Kafka, etc.)

validates against returned response from the channel and mock state
System Component Black Box IntegrationHigh-Levela single system componenttests via system component interfaces

spins-up emulated environment components and stubs collaborators

validates against returned response from the channel, environment state, application state
Slice Black Box Integration TestHigh-Levela methodical slice of the systemtests via system component interfaces

spins-up emulated environment components and stubs collaborators around the slice boundaries

validates against returned response from the channel, environment state, application state
End-to-End Black Box IntegrationHigh-Levelback-end systemtests via system component interfaces

spins-up emulated environment components and stubs for 3rd party collaborators

validates against returned response from the channel, environment state, application state
End-to-End with UI Black Box IntegrationHigh-Levelsystem with UItests via the user interface

spins-up emulated environment components and stubs for 3rd party collaborators

validates against user interface state, against environment, against application state

This strategy should be a good enough jump-off point to more structurally and strategically test a large distributed system in its different levels. As a team's automation practice matures, the team may further refine this strategy by incorporating non-functional test suites or introducing test suites that focus on different slices of the system to better align with user needs.

Hopefully, this long discussion on test strategies could give a more structured understanding of testing and lessen the tendency to haphazardly test for testing's sake, especially in the face of dragons.

"Fairy tales are more than true: not because they tell us that dragons exist, but because they tell us that dragons can be beaten."

– Neil Gaiman, Coraline

Thanks to Nyker Matthew King for having a second read on this write-up. If anyone has questions, please feel free to reach out to me via [email protected].

Contributors

Richard Kevin A. Kabiling  💻🔧 Engineering Manager at Core Payments Infrastructure Engineering, Maya Philippines Inc.; 🧗 Climber and 🎮 Gamer

Learn more about Maya!