Much has been made of what has become known as chaos engineering – the umbrella in which techniques such as Netflix’s famed chaos monkey (more here) resides. Collectively a set of techniques where parts of a system will be randomly or semi-randomly be disrupted in a manner reflecting component failure etc. to verify that system resilience holds true. As a strategy which could arguably be applied in a monolith world equally as it is typically used in a microservices context. The difference being the impact on a monolith will be potentially far greater. Regardless of monolith or microservice, this is typically a strategy when running at scale to confirm eveything is robust, and continues to be extremely robust. This kind of testing typically has to actually execute in a production environment as trying to simulate large scale systems is very difficult.
Alongside this, another form of testing/verification implemented in a production environment is the use of synthetic transactions. Whilst chaos engineering has a high profile, synthetic transactions are less so. But, as a strategy it is as equally important. Let me take you through why I say this, and the full potential of synthetic transactions if fully exploited.
Firstly, as Development processes grow and the adoption of Continuous Integration progresses towards Continuous Delivery, the rate of change is going to accelerate. In a medium to large scale environment, this kind of velocity can become incredibly quick, no longer is the release process days or weeks, but now seconds and minutes.
Let’s put some numbers to this, an organisation with say 80 developers operating to the ‘2 pizza’ rule means 10 teams at least (not every team is 8 people). Each team is triggering a dozen builds per day, of which let’s assume 2 go all the way to production. That’s 16 production changes per day, but 120 build cycles per day (if you only allow a standard working day that is a build cycle every 4 minutes). If each micro-service takes 3 months to build, you’ll have 48 micro-services in 2 years. All of this needing to be tested and some combination of this needs to built and deployed 16 times a day. Whilst the amount of compute power applied can help, the rate of change means sooner or later you have to start making some risk based decisions on what gets tested or not. Particularly as you move up the test stack (see the diagram below) – Chris Richardson’s Microservice Patterns book (here) also argues as you move up the stack that the tests become more brittle – I’m not entirely convinced of this. The brittleness is a reflection of stability or lack of in the business rules and component interfaces. Yes, constant UI changes will have the potential to break UI test scripts, but this is to an extent a reflection of stability in interface behaviour which is influenced by business thinking and combined with how the test scripts are built.
Compound this with the need to have realistic data sets, that are synthetic; remember that legislation such as GDPR is pretty much going to prevent the practise of cloning data from a live environment into pre-production environment as customers are not likely to approve of personal data being used in test scenarios. This is a practise that I am not a fan of, but circumstances (such as meeting time pressures) have made it a necessary evil. Even if the data is obscured it is a task that isn’t trivial if the data integrity and consistency needs to be retained.
Not only is it going to be challenging to have data sets that are plausible and the number of possible combinations of environments for running a vast battery of tests against each environment are going to cost time and money to instantiate. Then populating with data and testing all of this is going to reach a point that both organisational pressure to put features and fixes into production, on top of the fact that by the time the testing cycle has completed there will be a raft of new features and fixes impacting the code just tested – which may include changes or improvements to just identified issues.
All of this points to the fact that there is just a finite amount that can be done without impacting the delivery pipeline.
Synthetic transactions are most commonly used to evaluate system performance it’s particularly helpful for measuring the network layers where a synthetic transactions can be tracked through each network step in the live environment. At the infrastructure level, the impact is a microcosm of bandwidth and compute consumption. This tracking allows the bottlenecks etc to show up, a load balancer that is slowing things as it is at its limits. New firewall rules that are slowing throughout etc. The measurements then give you true performance insights, particularly given the system loading profile is real. The problem is that for this kind of approach it isn’t unusual for the application to recognise a synthetic transaction and avoid actions that have business consequences, so no financial impacts to correct for example. But this does mean functional paths are not fully exercised. For example you’ll never get to see that your payment system is now failing to process some transactions as the logic will stop short of executing a financial transaction. So a minor tweak the this logic, may score harmlessly in the code analytics, and therefore not be subject to all the rigour of integration testing. Net result, payments start failing, and a biproduct is unhappy or cuistomers abandoning purchases. So how do you detect the problem – with a bit of luck errors are getting trapped in the log analytics, worst case the problem only starts showing up when your analytics starts identifying a trend of customers abandoning transactions or complaining – all of which is too late. However the application of synthetic transactions can help us pickup on these issues if allowed to actually run all the business processes.
By allowing a synthetic transaction actually execute all logic then we end up with perhaps a payment event occurring, and then failing. As we’re tracking the synthetic transaction, the point in time in which the issue is identified has become earlier.
The initial response to all of this is, to include into the application logic for synthetic transactions, but doing so then means you’re no longer properly exercising the software, your test is simply validating synthetic data can be handled.
So how can this conundrum fully execute a synthetic transaction results in material impact, but not doing so means we may not fully cover the most critical end to end processes? There are several strategies available, and a combination of these rather than one maybe necessary. In fact a lot of this is no different to transaction handling across web services :
- Service supplies will often provide test values that are synthetic in their environment e.g. a dummy shipping address that a courier has a special zero rate for. Test accounts for banking. Use this information as part of your data supporting the synthetic transaction.
- Compensating transactions, allow the system to use a real product, but at a specific point in the process the impact of the synthetic transaction is countered with a compensating transaction. For example, the with the e-tailer the ordered item is identified as a test transaction pulled from the products to be shipped and has an immediate return process executed replenishing the stock. This is no real difference to handling errors.
- The system is able to accommodate data corrections, such as the deletion of transactions, or for example reporting of accounts adjusted by applying the record of synthetic transactions to the reports. This has limits and in the financial world may be perceived as ‘tampering with the books’
- retrospective flagging of transactions that should be ignored or deleted – this is often described as a soft delete, and as an approach not an uncommon use of soft deletes is a common mechanism that can support this. A soft delete being where a record is marked as deleted but never actually removed from the system so that transactional integrity is not risked. Gaps in sequence numbers can’t occur. Anything. With the soft delete is then excluded from all operations on the data.
Hopefully I have convinced you that Synthetic Transactions have a lot of potential. So far Ive talk about two different techniques for use in production, but let’s put them into context of other better known testing processes:
The triangle conveys several things, firstly the breadth of testing. The lower tiers should be aiming for the greatest coverage largely because it is the simplest level in which to instigate the tests. As you move up the tree the test setup is likely to become more complex – creating and loading test data sets, evaluating across multiple components whether the test has succeeded. This has a strong correlation to maturity of the organisation and its automation and testing.
The top two tiers which are chaos engineering and synthetic transactions aren’t necessarily more complex technically, but need a broad understanding of the impact of the transactions across the business domain.