June 2, 2020 - 5 min read

How to adapt software testing for the cloud

Cloud adoption has almost reached its saturation point. 94% of enterprises ran workloads in the cloud in 2018, and more than half planned to migrate more workloads throughout 2019. We often associate cloud computing with production applications and infrastructure, but it’s also a prime platform for QA.

Traditionally, QA teams tested in locally managed environments. Having a completely isolated environment gives QA teams greater privacy and control over their testing processes, but it comes with significant tradeoffs:

  • Maintaining dedicated infrastructure has financial and productivity costs
  • Scaling tests is difficult, especially when running performance or load tests
  • Testing lacks advanced capabilities, such as testing from different device types or geographic regions

As applications become more cloud-centric, QA teams must shift focus towards cloud-based testing. This means deploying and running tests on cloud platforms, as well as testing the full functionality and behavior of workloads running in the cloud. We’ll also look at how QA teams can test the resilience of cloud applications in ways that were impossible using traditional test environments.

What are the benefits of cloud testing?

Testing in the cloud shares the same benefits as running production workloads in the cloud: reduced capital expenditures, easier scalability, less operational overhead, and a lower total cost of ownership. Cloud testing also allows for testing methods that are difficult if not impossible to use on-premise, such as:

  • Automatically provisioning test environments on-demand
  • Testing from a variety of different device types and architectures
  • Running distributed tests from multiple zones or regions
  • Running high-capacity load tests, availability tests, regional tests, and resilience tests without straining your infrastructure

Testing in the cloud has an added benefit: QA learns more about the platforms that production applications are running on. Cloud platforms add an additional layer of complexity to applications that QA teams need to account for, particularly scalability and recovery from failure. In addition to their normal tests, QA must start testing for application resilience with Chaos Engineering. Running Chaos Engineering experiments can shine light on how cloud workloads respond under stressful situations. We’ll explain how in the following sections.

How can QA start testing in the cloud?

In order to leverage the cloud effectively, QA must lean heavily on automation to provision a test environment, deploy the latest version of an application, and run their tests. To do this, analysts need to further develop their engineering skills and learn skills such as:

  • Declarative programming and scripting
  • Managing cloud platforms and assets using infrastructure as code (IaC)
  • Using test automation frameworks and services
  • Using continuous integration and continuous delivery (CI/CD) to integrate testing into the software build and release pipeline

With this in mind, let’s look at how you can implement different forms of testing in the cloud.

Functional and end-to-end tests

Functional testing verifies that a build behaves according to specifications. Functional tests range from sanity and build verification tests (BVT)—which check the integrity and stability of the build—to usability tests that evaluate the functionality of the entire product. This requires a test environment running the latest build, and this is often the first bottleneck for teams that don’t have an automated deployment process.

This is where infrastructure as code (IaC) comes in handy. IaC lets you define a complete IT environment using manifest files, which you can use to provision resources on-demand. Manifest files can be easily shared between teams, letting QA teams create environments almost identical to production. In the same step, QA can deploy a build artifact to this environment, run their test harness from the CI/CD solution, and tear down the environment afterwards. With automated testing tools like Selenium, SoapUI, and Apache JMeter, you can run fully automated browser and API tests in a fresh production-like environment with zero manual intervention.

Alternatively, many hosted CI/CD services can also provision temporary environments (called “review apps” by Heroku and GitLab) for testing. They might not have the same structure as production, but they can be useful for running quick BVT and sanity tests before running your full test suite. Hosted CI/CD services also make it easy to scale and parallelize tests. This can be used to run BVT, sanity, and regression tests simultaneously for much faster results.

For tests requiring more specialized equipment, such as mobile apps, services like AWS Device Farm and Firebase Test Lab can easily integrate into your CI/CD pipeline. Instead of buying your own testing hardware, you can use these services to run automated tests on each new build. This will increase the cost of your build process, but the benefits are huge compared to manual testing.

Non-functional testing

Non-functional tests verify the operational behaviors of an application, such as security, performance, scalability, and resilience. They also ensure that the product continues working in various and sometimes extreme situations, such as outages or traffic surges.

A common challenge with non-functional testing is in creating test conditions. Let’s say you want to see how your application performs during:

  • A domain controller failure
  • Massive latency spikes due to a routing error
  • A 100x increase in active users

How would you replicate these scenarios? You might create firewall rules, modify or disable parts of your network infrastructure, or flood the network with traffic, but these can have unintended (and undesirable) side effects even in a fully isolated network.

Chaos Engineering is one of the newest tools in QA’s toolkit, but it is the most effective at testing for reliability under adverse conditions. Chaos Engineering lets you inject failure conditions into your applications and infrastructure in order to test resilience and identify points of weakness. These conditions can include increasing network latency, consuming CPU or memory, or shutting down an entire cluster.

Initially, you should use Chaos Engineering to test a small portion of the application (known as the blast radius), identify any points of failure, and feed your findings to the development team. As the application becomes more reliable, increase the blast radius by including additional components or by increasing the severity of your experiments. With Gremlin, you can easily create experiments and kick them off from your CI/CD service using the Gremlin API.

Once the QA, development, and operations teams feel confident in the resilience of the application and infrastructure, the next step is to run experiments in production. This may seem drastic if your team is used to having dedicated test environments, but the reality is that no test environment can accurately replicate production. Testing live systems helps identify problems unique to production, letting you address them before they can affect customers.

Application-layer testing

The experiments we mentioned are useful for testing infrastructure, but what if your applications run in a serverless environment such as AWS Lambda, Google Cloud Functions, or Azure Functions?

With application-level fault injection (ALFI), you can run chaos experiments from within the application itself. This lets you limit your blast radius to specific requests, fine-tune your attacks, and even target serverless environments. For example, you can use ALFI to run an attack targeting a specific user account. If you have a production account that you use strictly for testing, you can use ALFI to inject failure only for that account and perform tests without affecting other users. Not only does this not impact your systems, but your customers won’t even be aware that the attack is happening.

Conclusion

The cloud has already taken over software development, and it’s time for QA to follow suit. Transitioning from manual testing or on-premises testing to fully automated remote testing isn’t an easy process, but doing so can save you significant amounts of time, money, and duplicated effort. CI/CD, test automation, and Chaos Engineering are essential tools in the cloud era, and now’s the best time for QA teams to learn how to leverage them effectively.

Categories
QA