I really enjoyed going to Strangeloop this year. The signal-to-noise ratio of the presentations and conversations with other attendees was great.
Embarrassingly, Iâm finally getting around to writing some blog posts about it. I had wanted to do a single review post, but that turned out to be unrealistic, so Iâll do a series of smaller ones about the specific topics/presentations I enjoyed.
The opening keynote was VoltDB, a next-generation âNewSQLâ relational database built by database industry guru Michael Stonebraker.
A Look at âOldSQLâ Architecture
I was initially skeptical of this keynote, as keynotes are not usually product-based, but most of Stonebrakerâs talk ended up as a convincing argument that the fundamental architecture of current-generation databases has hit its scalability limits.
While this is not really news in terms of NoSQL/BigData products these days, it was nonetheless interesting to hear, from a relational database expert, a logical explanation of exactly what parts of current relational database architecture are slow and why.
For example, based on profiling (covered in OLTP Through the Looking Glass), Stonebrakerâs assertion is that only 10% of time is spent doing real work, like updating the core data structures and indexes, and the other 90% is secondary concerns like waiting for disk IO, locking, etc.
Which means that traditional ways to speed up databases (novel B-tree algorithms/etc.) really canât do much if the rest of the incidental cruft remains.
Deadpans NoSQL
Given Stonebraker is an old-school RDBMS guy, he also had a unique perspective on NoSQL.
As he tells it, he lived through a phase in the industry (late 60s/70s?), before the relational model âwonâ, and before ACID was taken for granted, where there were competing technologies and data models, each vying for prominence.
His point was that, looking back, relational- and ACID-based models won for a reasonâthey greatly simplify life for the application programmer.
So, he asserts that NoSQL-style eventual consistency isnât the answer, itâs actually a step backwards, with a cheeky quote that âeventual consistency means âcreates garbageââ (ha!).
I am inclined to agree, and have hoped for awhile that eventual consistency will be a passing fade until database technology catches up todayâs operational constraints. Products like VoltDB and Googleâs Spanner make me optimistic that this will be the case.
VoltDB as âNewSQLâ
So, finally, Stonebraker talked about VoltDB as a âNewSQLâ approach, which forgoes the traditional architecture by being single-threaded within a partition (no locking), in-memory (no disk, network-based replicas), and moving the computation to the data (stored procedures).
By doing this, VoltDB still provides ACID, but with impressive vertical scalability improvements, and, via partitioning, horizontal scalability as well.
I have to admit that the architecture sounds pretty sexyâit has all of the things you look for in a system that can scaleâin-memory, little contention, horizontal scaling.
Whereâs the Middleware?
My biggest concern about VoltDB is that it requires a huge change to how applications are currently builtâyou can only invoke stored procedures.
This is because you no longer get cross-wire call transactions, like in SQL/JDBC where you can do âbegin transaction, select âŚ, select âŚ, update âŚ, commitâ, interspersing business logic with your SQL calls, and still have it all complete transactionally and with some amount of read isolation.
Instead, with VoltDB, every wire-call to the database is its own transaction. Thatâs it. This is where VoltDB gets some of its big wins, because it doesnât have to do any locking/versioning to keep a transaction âin flightâ while waiting for your applicationâs next wire call (which may take awhile or never come back).
Which makes senseâbut itâs a huge change to todayâs N-tier/middleware-based architectures, as you have to move any logic that must be transactional into VoltDBâs stored procedures.
This is a tough pill to swallow; any sort of domain model/ORM-based architecture goes away, as they are usually predicated on a chatty SQL connection that still provides cross-wire call transactions.
Moving Computation to Data?
Stonebraker asserts this requirement for stored procedures is âmoving the computation to the dataâ, which is a popular approach to BigData; instead of moving TBs of data to your client/middleware machine, you ship your code directly to the database machine.
But VoltDB seems differentâthe types of computation stored procedures allow you do are not general purpose computations, e.g. making calls to other systems (you canât do anything that will block), nor do any real heavy calculations (again would block), itâs just a way to batch a few SQL operations together.
I suppose this is similar to Hive, but the limitation seems more natural for Hive because your almost always doing just variation SQL-ish transformations on your data, and not real business logic.
Anyway, perhaps it is just my bias, but Iâd prefer to keep computation at the middleware layer.
Potential Compromise
If I were to pick up VoltDB, I think I would try to keep a traditional domain model/ORM-ish architecture, and just use optimistic locking to enforce transaction isolation.
So, if my middleware did something like:
orm.beginTxn();
// one call to VoltDB
val b1 = BankAccount.load(1);
b1.balance += 10;
// another call to VoltDB
val b2 = BankAccount.load(2);
b2.balance -= 10;
// sends update b1 and b2 as 1 call/transaction
orm.commitTxn();
The UPDATEs for b1
and b2
would happen atomically.
But what about read isolation? I think optimistic locking would work for this, e.g. the SQL on the wire would really be:
-- b1 = BankAccount.load(1)
SELECT id, balance, version FROM bank_account WHERE id = 1;
-- b1.balance += 10
-- b2 = BankAccount.load(2)
SELECT id, balance, version FROM bank_account WHERE id = 2;
-- b2.balance += 10
-- orm.commitTxn
UPDATE bank_account SET balance = 20, version = 2
WHERE id = 1 AND version = 1;
UPDATE bank_account SET balance = 0, version = 3
WHERE id = 2 AND version = 2;
So, now if anyone else has touched either bank_account
in between my read and my write, the version = 2
clause will fail, and Iâd know the data is stale.
The trick would be that Iâd need VoltDB to fail the whole transaction if the UPDATE
modified count for any of the statements was zero.
This is basically moving isolation enforcement to the client, meaning it would have to fail or retry if the optimistic lock failed. I would be fine with that though, as optimistic locking is easy to build into an ORM.
With a bit of work, I could see an ORM like Joist supporting VoltDB as a backend just like the traditional MySQL/Postgres backends.
Unfortunately, I donât think VoltDB can do this todayâthe client API can only invoke stored procedures, so it would require a sort of meta-stored procedure that took a list of tables/values to update and iteratively eval
âd them.
Operational Concerns
My only other concern with VoltDB is that itâs a new piece of infrastructure software that requires learning the ins/outs of. And since it owns your data, you want to make especially sure you donât mess something up.
This is not VoltDBâs fault, itâs just the reality of relying on a new software package. Iâve seen a few bad things happen before when deploying new software that were not the softwareâs fault, but a configuration misunderstanding or error. You just hope that you catch these sort of things before your data is gone.
In that regard, Iâd enjoy seeing a RDS-style offering for VoltDBâI donât want to log into servers, configure clusters or logging, or whatever, I just want a GUI that says âgive me X many servers, go!â.
Tangentially, it would be awesome if Amazon RDS let vendors build their own integrations, so you really could have a âVoltDB Engineâ drop-down option in RDS, but supported by VoltDB instead of Amazon staff.
It is not realistic for Amazon RDS themselves to support/integration all of the myriad of new databases available these days, so it would be great if RDS was more of an open marketplace through which vendors themselves could make their databases available as a PaaS.
Conclusion
Iâve really enjoyed looking into VoltDB.
I havenât had time to play with the community edition locally yet, but Iâm going to try that soon and see how it goes. Iâm not really sure what the application development/test/deploy cycle will look like, which Iâm sure one of their tutorials will cover.
So, in retrospect I was surprised, but I thought the Stonebrakerâs VoltDB keynote was very good, and it has me checking out their product. I definitely recommend watching the video, which Iâll link to here in a few weeks when Strangeloop makes it available online.