(Alternate titles/tldr: âStubbing at the Data Layerâ, or âData Layers should come with Stubsâ, which is related to my earlier post, Services should come with stubs, but this proposes moving stubs up the stack.)
Micro-services have an interesting problem that, personally, I havenât seen solved really well: integration testing in the large.
Where by âin the largeâ, I mean testing with many micro-services (100s to 1000s), not testing with large data (that maybe useful in certain contexts, but not what Iâm focusing on).
See my other post, The Futility of Cross-System Integration Testing, for more on the topic, but none of the systems Iâve personally worked on (which, granted, is a small n
) have solved having cross-system tests that provide 100% isolation and control of each testâs input data.
Not being able to provide this isolation and control, for me, is non-negotiable, as otherwise your test suites are invariably flaky and usually slow, and actually become a burden to development velocity instead of an acceleration.
Tangentially, I suppose that is an interesting articulation of my personal testing philosophy: while the normal/obvious goal for testing is verification of system requirements, my primary goal is subtly different: itâs developer productivity. Tests should make developers faster. If your tests arenât making your development velocity faster, that is a code smell/test smell that should be invested in.
Testing in-the-small vs. in-the-large
As context, if you are doing micro-services in-the-small, e.g. ~5-10 services within your entire organization, you are probably okay and hopefully donât have too much of a testing challenge.
This is because you can, in theory, easily spin up each of your individual services, either on your development machine or a CI machine, and control each serviceâs individual state (e.g. setup service Aâs MySQL database, service Bâs Redis instance, etc.).
Given each test run now has itâs own copy of all of your services, youâve achieved per-test isolation and control, and so should be able to have a robust test suite.
However, once you hit micro-services in-the-large, letâs say >10 or >100 or >1000 services, starting up each service in â~exactly like productionâ becomes very complex and basically untenable (too complex, too slow, brittle config, donât want to install Oracle on each developerâs machine, etc.).
So, thatâs the problem space Iâm focused on: 100s/1000s of services where local-/CI-deployed per-developer/per-test instances of each service is no longer possible.
1st Approach: Shared Integration Environments
Generally where Iâve seen companies go next is an integration environment, where you have ~100s/~1000s of micro-services all deployed in a shared environment, so you no longer start each service locally.
Instead you have tooling to deploy test instances (e.g. of the specific code/service youâre currently working on) either to the shared environment itself, or have cluster traffic proxied back to your locally-running service. But, other than your own test instance, all of the other services are already running/deployed and using their own data stores (as if they were in production).
But now youâve given up any measure of control of your input data, and isolation from other tests.
So while I think these integration environments have their uses (primarily manual new feature testing and perhaps very limited automated smoke tests), they are otherwise horrible for automated tests (again, see my other post).
1st Alternative: Services Ship with Stubs
Where Iâve historically asserted architectures should go next, is for services to come with stubs, which are in-memory implementations of the same production contract/API, but that are very cheap to instantiate and setup (e.g. to setup a StubOrderService
with two dummy orders, one completed/one cancelled, you want to test your system against).
(Tangentially, I recently saw this practice echoed by a Google test engineer, who also makes the point that having service owners implement stubs+tests for their own API makes them produce better, more client-friendly, APIs.)
If all of your micro-services ship with stubs, it becomes possible to write ânear-integrationâ automated tests, where you can start to write âlooks like cross-systemâ tests, that test how your service A talks to another service B, but uses the in-memory implementation of service B.
And because service B is a stub, we can get a version of the âin-the-smallâ testing approach, where we can conceivably startup all (or a non-trivial subset of all) of the micro-services your automated test needs to touch, e.g. not just service B, but ~10/20/however many other services that your test transitively touches (ideally this would be handled dynamically/automatically by your stub infrastructure).
There are two primary downsides of this approach:
-
Service owners now have multiple implementations to write and maintain: their real production codepath, and their stub codepath, and
-
The stub services donât/canât 100% mimic production.
That said, for systems Iâve built, this approach has still been a net-win, as the huge benefit of fast, consistent, stable ânear-integrationâ tests is worth the trade-off of up-front and ongoing costs of maintaining the stubs.
Iâve personally used this for ~testing-in-the-medium (e.g. not quite testing-in-the-large), primarily when working across vendor system boundaries, and had it work out well. Again, see my other post.
2nd Alternative: Move Stubbing up the Stack
This 2nd/new alternative, moving stubbing into the data layer, is meant to address the two cons of âservices ship with stubsâ, which both fundamentally stem from having multiple service implementations (one production, one stub).
So, if you take a step back, why are we writing stubs in the first place?
The goal was control and isolation of test data. (Note that faster test execution speed is a great side-benefit as well.)
What breaks control and isolation?
Itâs I/O. Specifically shared I/O. The shared database. The shared cache. The shared Dynamo table. Shared I/O is what breaks control and isolation.
By having each team write stubs for their services (the previous/1st alternative), each team is repeating the âavoid I/Oâ incantation/decision in their stub. Make it only in-memory. Avoid the production I/O. Which basically means âdonât use our production data store, use âsomething elseââ.
Hopefully each team is sharing as much code as possible between their production and stub service implementations, perhaps using their own frameworks or abstractions (maybe an ORM that supports multiple backends), which may or may not be leaky. (Although, pedantically, I do assert stubs donât have to provide 100%, or even close, production fidelity to be useful; but if they can, itâs a great goal.)
However, if you assume a large, mostly homogeneous environment (e.g. the typical ~medium-large tech company), an interesting proposition emerges: instead of each team re-coding this âavoid shared I/Oâ decision, what if we provide centralized âpotentially fake I/Oâ data store APIs/abstractions, that service owners can then reuse?
This is taking what we want each team to do in-the-small, which is share as much code between their production service and stub service, and move that into an in-the-large shared, leveraged library/framework that then each team gets for free.
We are basically moving the conditional stubbing logic âup the stackâ, and providing service teams APIs that, in theory, means they write their service once, and we could, at our shared/provided data layer, isolate the I/O such that in testing environments itâs fake (e.g. in-memory and sandboxed-per-test), but in production itâs the regular production data store.
There would be two ways of deploying/using these stubbed-data-layer tests:
-
Write âlooks like cross-systemâ tests that are actually using local, in-memory instances of each service.
E.g. letâs say you call service B, and in production, service B is started with
new ServiceBImpl(new DyanmoDbService())
. And, again in production, you make wire calls to service B.Instead, in the test (somewhere in the stub/test infrastructure code), we would locally instantiate
new ServiceBImpl(new StubDynamoService())
, and then just make method calls directly on it (instead of proxied/networked wire calls), which would use all of the sameServiceBImpl
code, but use an in-memory/fake âdynamoâ implementation that is super quick and isolated.If you have a 100% homogeneous environment, e.g. all JVM or all .NET, you could instantiate these services all in-process, which would be extremely quick.
-
Use the shared integration environment, where each micro-service is deployed to the shared cluster, but all services in that cluster are instantiated with their stub data stores, e.g.
new ServiceBImpl(new StubDynamoService())
.Then, even though these services are a shared deployment, each test would flag its requests with a unique test execution id, which would be passed through our call graph (e.g. along with our other RPC instrumentation data), and used as a token within the
StubDynamoService
,StubCacheService
, etc., to isolate data operations for that test execution from the data of any other test execution.E.g.
GET /someKey
with test execution id âtest-run-1234â would be a different lookup/sandbox than the sameGET /someKey
with test execution id ânext-run-6789â.
Both of these approaches have their pros/cons, and are not mutually exclusive.
Note that the idea of stubbing at the data layer is not novel, as Iâve written before about The Holy Grail of Database Testing being an immediately-bootable, immediately-resettable version of a relational database. Nor was that musing original either; Iâve seen a few data stores provide local versions, e.g. DynamoDB Local.
So, data stores shipping stubs is a known/viable idea, Iâve just not seen it applied across-the-board as a way to specifically to solve the problem of micro-service integration testing.
1st Challenge: Relational Databases
If, as an infrastructure team who owns a data service (e.g. manages/developers your companyâs preferred/mandated key/value store, cache service, etc.), youâre given the job of providing a stubbed version of your data service, your life may or may not be easy.
If you own a key/value service or a cache service, your life will probably be easy: these APIs are fairly straightforward, and writing naive/in-memory versions of them is simple. You can likely maintain a very non-leaky abstraction of the production service without much effort.
However, if you own/provide a relational database service, then this mandate will be a challenge, as SQL is a very robust API, and it seems hard to provide a 100%-non-leaky abstraction that, to clients, looks just like production, but is quick and isolated.
Granted, there are viable in-memory databases, e.g. sqlite, H2 and others, that could be leveraged for providing the stub version, but itâs unlikely youâll be using sqlite/H2/etc. for your production data store, and unfortunately SQL is sufficiently complicated/nuanced that Iâve never seen a project be able to run against multiple database vendors (e.g. the test suite can run against MySQL or Postgres or Oracle or H2) without significant development cost.
(Iâm sure there are exceptions, e.g. projects whoâve been multi-vendor from day 1, but I think they are comparatively rare.)
So, honestly, I donât have a good answer for this.
If you limit your application to a specific/simple subset of SQL, such that you can really use either H2 or Postgres without much/any application-code changes, that seems your best bet; but Iâm sure especially historically Oracle applications will have a tough time getting there.
2nd Challenge: Migrating Legacy Services
If you are big enough to have a testing-in-the-large problem, you also invariably have a non-trivial amount of legacy code, that was written before you knew you were going to be large, and before you knew you were going to need some sort of fancy testing-in-the-large solution.
Especially if your storage layer is a relational database (see previous point), it will likely be non-trivial to convert all of your production services over to this new subset of stub-supported storage APIs.
Also, any sort of adhoc in-memory caches, hash maps, etc., that your application previously thought could be shared process-wide should now be sandboxed, just like all the other storage options. In theory replacing these âjust an in-memory map/cacheâ with a stub-aware version is not complicated, but finding all of the existing places in your services that use them would be tedious.
So, if you decide to tackle this, youâre likely looking at a multi-year migration effort, which would be extremely expensive and also risky unless you were 100% convinced this approach was going to be worth the sweat and toil to get there.
You could mitigate some of this risk by prototyping/migrating a subset of services first, and evaluating what the actual migration cost and post-migration benefit was before committing to rolling it out across the whole company.
3rd Challenge: Handling Near/Offline Applications
Iâm fairly confident this approach would work well for online services, which generally synchronously interact with their data stores.
Unfortunately, there is also a non-trivial amount of business logic that is written in near-line (e.g. streaming frameworks like Kafka or Kinesis) and offline (e.g. Hadoop) services.
In an ideal world, our architecture would expose this nearline/offline application logic to observation by integration tests as well.
There are at least two complications:
-
We now have even more abstractions (Kafka, etc.) to make stub-aware, so our cost goes up.
E.g. for Kafka, can we teach Kafka about our per-test-execution sandboxes?
E.g. if my test-run-123 sends a
FooEvent1
and your test-run-456 sends aFooEvent2
, ideally not only are these events kept separate (isolation), but the test run execution id (123 or 456) is captured so that when our streaming application logic processes the event, and does subsequent side-effects, e.g. making API calls or database writes, it can pass along that test run id, to continue the facade of isolation.Is it possible to teach the real Kafka about this? Probably not. How easy would it be to write a non-scalable, in-memory, stub-aware version of Kafka? Probably easier than a stub-aware relational database, but still expensive.
Also, thatâs just Kafka/Kinesis; what about Hadoop? Spark? Pig? Maybe if all of your reads/writes went through a consolidated Hadoop data provider.
-
Due to streaming/offline logic being asynchronous, test timing becomes a nightmare.
In the online world, services would use data stores that are in-memory stubs that we can assume are 100% consistent and âflushâ immediately, so itâs conceivable for a test to have a transactional, basically synchronous view of the world.
E.g. immediately after asking for âorder 123â to be created, a test can do a read for âorder 123â and make sure it was created successfully.
However, nearline and offline logic is inherently asynchronous. After our test asks for âorder 123â to be created, how long should it wait to observe some nearline/offline side effect? Maybe the nearline is available within a few minutes? Maybe the offline is available the day after? Do we hard-code that as a
Thread.sleep
in our test?This async timing seems like a mess that tests should not even try to handle.
That said, if I were to wave a magic wand, and have a very large infrastructure team budget, I can imagine both streaming and offline frameworks (DSLs/APIs/etc.) that, when given the metadata of âwhat is your inputâ and âwhat is your outputâ, could conceivable be invoked âimmediatelyâ after an online event occurs.
For example:
-
I write a near-line streaming application,
StreamingApp1
, that takesFooEvents
and outputsBarEvents
.In production, this is super-scalable, e.g. it asynchronously batch processes 10,000
FooEvents
at a time with Kafka/etc. -
I deploy
StreamingApp1
to the shared integration cluster, which uses metadata to know âthis application processesFooEvents
â. -
I write an automated test that makes an API call to
FooService
that causes aFooEvent
to be created. -
The stubbed Kafka/message service (which
FooService
just wrote too) knows âwhen aFooEvent
is created, tellStreamingApp1
about it.â -
My
StreamingApp1
logic runs, processing just the single message appropriate for my test execution run.(This execution could either happen synchronously on write to the test clusterâs
FooEvent
stream, or asynchronously within a reasonable window that the test could potentially poll for/block on, e.g. ideally a few seconds.) -
My automated test can now observe the affect of
BarEvent
being created, and assert against the expected streaming application logic.
You could imagine the same scenario for offline jobs: in production being ran as normal, nightly across 100s/1000s of machines per job with map/reduce, but for integration tests being ran periodically/automatically on-demand, within the each testsâ execution sandbox, on only the subset of test data applicable to that test.
Granted, Iâm way down the rabbit hole here on armchair-architecture, and building robust stubbed/integration testing support into both near-line and offline frameworks, plus migrating legacy jobs to said framework, is a non-trivial cost that only the biggest tech companies could conceive of doing.
But this is also optional; I wanted to mention near-line/offline application logic for completeness, but I think focusing on providing data-layer stubs for just online applications is prudent.
4th Challenge: Cross-System Data Setup
This is not really a challenge, as I donât think it should be hard to implement, but if you have a stubbed approach (for either service-layer stubs or data-layer stubs), itâs important to have an easy way to setup test data across all of your services.
E.g. you donât want to have a test method have to know âokay, to create an Order, I need to poke relational table A with some new rowsâ, then âokay, now to create a Customer, I need to touch the key/value store and put some JSON in thereâ.
This leaks the storage details into your tests, and would likely be very verbose/ugly.
You should have a framework that the test can declare the entities that it wants for this specific boundary case (and only those entities, e.g. every test should start with zero data, to have maximum isolation and also conceptual simplicity), and then have those entities pushed into their backend services/data stores in a service-/store-agnostic way.
Historically Iâve had good luck with Fowlerâs Object Mother pattern to provide a builder/DSL for creating your test entities.
5th Challenge: Data Store Lock-in
One interesting 2nd-order consequence of this proposal would be, if successfully implemented, it would become harder for teams to use new/bespoke data stores (and also streaming/offline frameworks, if the company was ambitious enough to execute those in a stub-aware fashion) that might solve their specific serviceâs challenges, but is not one of the companyâs supported stub-aware data stores.
At that point, the team would either have to implement a stub for their new data store themselves (which may or may not be easy, per previous discussion of key/value vs. relational stubs), lobby an infrastructure team to provide support, or perhaps just fallback on providing a service-layer stub for awhile.
This would reduce the agility of teams in terms of their data store choices.
However, I think the pragmatic reality of medium-to-large company development (and why many developers end up not liking it), is accepting that youâre restricted to company-standard/company-mandated choices in terms of technology, architecture, data stores, etc., which prioritizes cost-to-the-org productivity over cost-to-the-individual productivity.
Relation to Pat Hellandâs Entities
Recently Pat Hellandâs Life beyond Distributed Transactions went by Hacker News, and it was a pleasant read.
While what Iâm proposing, migrating/rebuilding all of your companyâs micro-services on top of new/stub-provided storage APIs, is daunting, it relates to Patâs concept really well.
Ideally, your application logic should be, using his terms, scale-agnostic. Itâs the upper layer of code that has the application programmerâs business logic, and should ideally be blithely unaware of the overall scale/complexity of the ecosystem it runs in.
If you follow his proposed architecture, and have a distinct set of lower-layer, scale-aware code that deals with in-the-large concerns of scale, it is coincidentally also a very good slice point to implement what Iâm proposing here, which is in-the-large concerns of testing.
His article also talks about offline application logic having the same scale-agnostic/scale-aware divide, e.g. whatâs provided by map/reduce, which is similar to my armchair musings about tricking near-line/offline code to be ran âalmost synchronouslyâ for the increased ease/robustness of test timing/assertions.
So, my guess is that many companies are already using architectures that are close to what Pat proposes: various online, near-line, and offline frameworks that make application developersâ lives easier by letting them write scale-agnostic code with the scale-aware magic being reused/provided out-of-the-box.
The problem is that, having evolved organically, retrofitting each of a companyâs chosen scale-aware frameworks to also know about scale-aware integration testing (means lots of isolated tests, not necessarily lots of test data) would be non-trivial.
Conclusion
So, wrapping up, this is my latest suggestion for what companies struggling with the morass of shared integration environments should attempt to do.
While I think âservices ship with stubsâ is still a great starting point, especially when you donât control all the services (e.g. you can write your own stubs for vendor APIs or internal APIs you donât control), if you want to scale stubs to be enterprise-/company-wide, then I think âstubbing at the data layerâ will result in a longer-term ROI, as you wonât have pay the stubbing cost for every single micro-service.
As a disclaimer, I have personally written many service-layer stubs, with a good deal of success, but have not had a chance to try this approach of data-layer stubs (although technically I wrote fakesdb, which is a data-layer stub in ~2010, but that was not applied to cross-service/in-the-large testing like Iâm proposing here).
Iâm cautiously optimistic, albeit with disclaimers of large up-front cost.
But, if you think on a long horizon, e.g. if youâre a big company, with ~100s/1000s of developers, and you think to ~4-5 years out: do your really want your developers continuously struggling with, and wasting time on, trying to write automated test suites that are slow, flaky, time-consuming and of debatable, likely net-negative, value?
I think, on that time horizon, and at that scale (100s/1,000s of developers), embarking on this sort of effort starts to become conceivable and worthwhile in terms of ROI.