The challenges of end-to-end testing and how to face them

Published: 4 Aug 2024
Written by: Chun Fei Lung

End-to-end tests can help you discover problems in web applications, but sadly are not free of problems themselves.

End-to-end tests are conducted using real B(r)owsers

End-to-end (E2E) testing is a form of black-box testing where a web applications is tested from a user’s perspective. End-to-end tests can be conducted manually, but many development teams use tools like Selenium, Puppeteer or Playwright to automate E2E test cases. Automated test suites can save developers a lot of work, but also come with some drawbacks. Most notably, end-to-end tests can be flaky, i.e. fail for no apparent reason in a non-deterministic way.

This paper presents the results of an online survey that asked 78 experienced test engineers about 13 commonly mentioned challenges with end-to-end testing using Selenium and how those challenges are addressed.

About the article

Title	Challenges of end-to-end testing with Selenium WebDriver and how to face them: A survey
Year	2023
Author(s)	Maurizio Leotta (University of Genoa) Boni García (University Carlos III of Madrid and Sauce Labs) Filippo Ricca (University of Genoa) Jim Whitehead (University of California, Santa Cruz and Sauce Labs)
Venue	International Conference on Software Testing, Verification and Validation

The 13 challenges that respondents were asked about are described below, in order of perceived importance (from high to low).

Asynchronous interactions, where page content is modified or loaded asynchronously, were perceived to be the most important challenge in end-to-end testing. The test script needs to wait until the DOM has been updated – but how long?

In general, respondents rely on various kinds of waiting strategies to ensure that the script only interacts with pages when they are “ready”. Most are satisfied with the standard implicit, explicit, and fluent waiting strategies that are available in Selenium. Others perform validations on the page before executing the next Selenium action.

Brittleness refers to tests that are fragile, i.e. tests that break when the system under test is modified to accommodate requirement changes, bug fixes or new features. Fixing such tests needs to be done manually and is often tedious and expensive.

Respondents deal with brittleness in multiple ways:

Coordinating the test team with the development team so that the latter can use specific localisation strings or anchors which can be targeted using Selenium.
Providing developers with direct feedback on end-to-end tests makes them immediately aware of any problems that they may have caused.
A Page Object pattern decouples the test actions from the implementation of the web page, and simplifies maintenance of code that interacts with the web application.

Flakiness, i.e. when tests break randomly for no apparent reason, can waste a lot of time for testers trying to debug a non-existing fault in the code.

Because flakiness is often caused by asynchronicity, strategies for dealing with flakiness are largely similar. However, respondents also propose a few strategies that deal specifically with flakiness:

Re-executing test scripts a predefined number of times before committing it.
Automatically re-execute failing test scripts to increase the likelihood of detecting only actual failures.
Keeping tests short and atomic (self-contained), because the longer a test runs, the higher the risk of flakiness.

Assertability refers to the ability to verify that the application is in a correct state. Selenium script assertions are mainly DOM-based. Other assertions, like the look and feel of a page and accessibility levels can be challenging to implement.

The majority of respondents suggest integrating Selenium with tools like Percy, which can detect visual changes between test runs.

Scalability becomes a concern when a web application grows very large or test coverage needs to be high. Because each test can take quite a lot of time to run, many try to execute tests in parallel.

However, for this to be possible it is essential that development of the test suite takes parallel execution into account from day one, as refactoring tests that were originally meant to run sequentially is very difficult. Some respondents also experience difficulties due to state interferences in the tested web application. In these cases, it might help to execute tests against isolated instances (side note: For example, using Docker). Finally, some respondents suggest to simply reduce the number of tests to run.

Slowness is a commonly heard complaint about end-to-end tests, but ends up in the middle of this list. Slowness is often caused by suboptimal waiting strategies and browsers themselves.

Respondents typically try to work around these issues by reusing the same browser instance as much as possible to reduce setup time, executing test scripts in headless mode to skip the relatively heavy rendering process, and by replacing common interaction steps like login, navigation and teardown by API calls that take less time to complete.

Failure analysis is the process of analysing a failed test. A test failure can be caused by many things, including changes in the web application, communication between any of the application’s components, or even failures within the test itself.

Debugging tests can be made easier by logging extensively, writing meaningful error messages, and taking screenshots or HTML snapshots (or even video recordings) when a problem occurs.

Maintainability is strongly affected by how the test suite is set up, and what architecture and design patterns are used.

Respondents highlight the importance of developing reusable atomic components, and the use of the previously mentioned Page Object pattern to decouple test actions from the implementation of the application.

There is little agreement about whether developing test scripts is difficult and time-consuming. If done well, it apparently doesn’t have to be time-consuming. Respondents suggest using helper methods and the Page Object pattern to reduce development time.

Infrastructure refers to the browsers and drivers that need to be set up and maintained. This is not very challenging (side note: Unless you work in a very small company, in which case this may still require a significant amount of effort.): many respondents use WebDriverManager, container-based solutions, or simply outsource infrastructure management to cloud providers like BrowserStack.

Cross-browser testing likewise is not a problem. Cloud-based solutions make this very easy, and browsers nowadays mostly behave in the same way.

Getting support from Selenium user communities is not very difficult. Respondents state that the Selenium community is supporting, kind, and welcoming.

Likewise, respondents find the Selenium documentation quite clear and comprehensive, although a few would prefer documentation in other formats, like video tutorials or webinars.

Summary

Asynchronicity, flakiness and brittleness are the three most important challenges in end-to-end testing
Many challenges can be dealt with using strategies that resemble general best practices for software design

The challenges of end-to-end testing and how to face them

Summary

More about software testing

Also on Chuniversiteit