Disclaimer: Iâm posting this in ~70% done state, instead of letting it languish, as writing it was useful in guiding/changing my thoughts.
Kent Beck has a few recent blog posts about bitemporal modeling, which Iâve read about/bookmarked a number of resources over the years:
- Bitemporal modeling from Wikipedia
- Slowly changing dimension also from Wikipedia but w/more details on various schemas
- The Value of Bitemporality, a 2019 blog post on a Clojure database, XTDB
- An HN Discussion about this post
- pg_bitemporal a link from the HN discussion
- Chronomodel a Ruby project that uses PG schema inheritance & hooks into Rails/ActiveRecord
- And, of course, Martin Fowler has an article as well
Kentâs post in particular had a cute example of a Perspective
abstraction, that allows establishing a view of the data at a given âeffective timeâ and âposting timeâ (donât worry about these terms, weâll cover them soon):
class Perspective:
def __init__(self, effective, posting):
self.effective = effective
self.posting = posting
def sees(self, other):
return other.posting <= self.posting and other.effective <= self.effective
And this was neat enough that it made me think, should I be using something like this?
So, I thought about it, and decidedâŚprobably not. And this post is my notes/thoughts on how/why/why not Iâm using bitemporal, or bitemporal-ish, approaches to achieve similar goals, in the schemas that I work on.
But, starting at the beginning, bitemporal fundamentally means âtracking two (bi) dimensions of time (temporal)ââŚwhich seems confusing, isnât there only a single dimension of time? So letâs look into what these two dimensions are.
System Time
I like to start with a super-simple entity, letâs say Author
with a firstName
.
Also staying simple, letâs store it in an authors
table with a first_name
column. And we build a CRUD UI on top of this authors
table, that just UPDATE
s the first_name
column whenever the user hits âSaveâ in our UI.
Now, lets say a user files a bug, âwhy is my authorâs firstName
âBobâ, but I swear it should be âFredâ?â And we engineers, debugging their issue, would like to see how a given Authorâs firstName
has, or has not, recently changed in the database (in the system).
Which brings us to our first bitemporal dimension: system time, also called transaction time (in the Juxt/XTDB post), or posting time (in Beckâs post), or record time (in Fowlerâs post). Personally, I think âsystem timeâ is the best name, b/c Iâve worked on financial systems that have a transactions
table and those âtransaction datesâ are part of the domain model itself (i.e. effective time, which weâll get to next) and not actually âthe timestamp the database transaction committedâ, which is what weâre referencing to with system time. It also seems like SQL:2011âs FOR SYSTEM_TIME AS OF ...
syntax uses âsystem timeâ as well.
This ask is basically an audit trail, of which there are many approachesâhistory tables, history rows, all sorts of approaches that focus solely on âauditingâ. One of my current favorites is CyanAudit, which uses triggers and a separate cyanaudit.tbl_audit_event
table to record rows that track the old_value
/ new_value
for every time the first_name
column (or any column) in your database changes, and has a particularly useful undo_transaction
function.
Regardless of the tactical approach, they all boil down to what did the database look like as of timestamp X?
And thatâs the first temporal dimension of bitemporal modeling. Simple enough: auditing/point-in-time history.
In my opinion, this 1st dimension is a âmust haveâ for nearly every CRUD application, either for regulatory/auditing purposes (hence the name), or even just the sanity of engineers debugging the system behavior (âoh, we sent this email to âthe wrong addressââŚb/c thatâs actually what email was in the system at the time we sent the email, so this wasnât actually a bugâ).
So whatâs the next dimension?
Effective Time
Effective time is the second dimension, and again staying super simple, letâs track something like Author.heightInInches
.
We could just add a height_in_inches
column to the authors
table, store height_in_inches = 72
and weâre done. Which is great, this is probably perfectly acceptable for most applications (or, more accurately, most attributes in most applications)âsay a bookstore that just needs to know âthis author is 6â tallâ.
But what if weâre working on a pediatric healthcare system, and now we realize that âheightâ is not a static/immutable attribute of a child; the patient team needs to see, directly in our app, how the height of a child is trending over time. So maybe we store something like:
- The
patients
table has aname
column - The
patient_heights
table haspatient_id
,measured_at
, andvalue_in_inches
columns
Now we can see âthe child was 4 feet tall on Jan 1 2018, then 5 feet fall on Jan 1 2020â, etc. Concretely, our UI can pull back SELECT * FROM patient_heights WHERE patient_id = 1
and graph the height over time.
This is âeffective timeââwhen in the real world did this new value become effective, become true, for the given entity, even though the old value is still correct for the previous real-world time period.
I.e. while system time fully replaces the old value with the new value, insinuating the old value was fundamentally incorrect & now has no use (except for auditing), in effective time the old value is still correct for itâs real-world time period, but now we have a new value for a new time period.
When Does Effectiveness Matter?
We can model our new heightInInches
attribute two different ways: just as a single column, or as a range of values over time.
Which should you use?
I assert that it depends on what your users care about.
Letâs go back to the Author.firstName
example: do we need to track the âeffectivenessâ of firstName
?
If Iâm Amazon.com, or a bookstore, or a payroll application, probably notâan authorâs first name, in those systems, can be a single, psuedo-immutable fact. I say âpsuedo-immutableâ because we can update it to fix typos, but, in general, the users in the UI do not need to know âthe Authorâs first name was Bob for the 1st 30 years of their life, and then they changed their name to Fredâ.
Further, not only do the users ânot care about thisâ, but they also do not want the cognitive burden or UX overhead of even being able to enter, or reason about, this nuance of âFredâs birth name was Bob, but now itâs Fredâ, for every single attribute of every single entity in the system.
But, if youâre building the database system for a courthouse, and your user is the courthouse clerk, that user does really care about âwhat was this personâs legal name over time?â.
Basically, I believe that most attributes in your system can be just simple fields, with no effective time; however, depending on your domain, and typically the most important aspect of your domain, your users likely will care about âhow this real-world value changed over timeâ (height of a patient, legal name of a citizen, etc.), and so for those, and only those, you should incorporate âeffectiveness trackingâ into your domain model.
However, even once decided that an attribute warrants effectiveness tracking, how you model this depends on your domain, and specifically how your users likely already handle this change in their own real-life workflows. Letâs look at some examples.
Examples
Purchase Orders
An example of âa value that changes over timeâ might be âhow many bricks did we order for this construction project?â.
For example, at the start of the project, we ordered 2,000 bricks, and later we realized we needed another 500 bricksâthis seems like a perfect example of âthe effective value for âhow many bricks we needâ changed over timeâ.
So, should we jump right to an out-of-the-box bitemporal system like XTDB or a home-grown Perspective
type, that adds an effective_date
& system_date
columns to our purchase_orders
table?
My current assertion is no, because how the users think of a Purchase Order âchanging over timeâ is already modeled in their workflows: they create a contract modification, called a Change Order, that goes through its own âget internal approval, send to trade to sign, watch for e-signatureâ workflow before â2500 bricksâ becomes the official ânew valueâ.
So, our system might have a purchase_orders
table, with a date
column (denoting when the PO âbecame effectiveâ), but then weâd also have a change_orders
table, with a purchase_order_id
FK, and its own date
column, against noting âwhen the CO was effectiveâ.
This lets us model âchange over timeâ (the PO established an initial âbricks orderedâ as 2,000, but then the CO bumped it up to 2,500), but represented in the real-world artifacts/materialized first-class entities.
Legal Name Changes
Thinking back to the courthouse example, this same approach of âreal-world change modeled via entitiesâ could be used there as well: maybe a BirthCertificate
entity establishes a personâs original given name, and then a NameChangeApplication
denotes each legal name change, with metadata about when it was signed (effective), who approved it (do legal name changes require approval? Iâm not sureâŚ), etc.
Bank Transactions
Even the prototypical âeffective timeâ example, of a BankTransaction
has âeffectivenessâ fundamentally built into its real-world data model: the transaction date.
Versions of a Contract
These all seem relatively easy so far, but drilling into Beckâs specific example, of âfinancial contracts managed over 20 yearsâ, I see a complicating, nuanced ask for both:
- The user wants to know âwhat did the system think our contract was âas of last weekââ (solvable with just an audit trail), and if they also need:
- The systemâs own business logic needs to know âwhat was the contract as-of two years agoâ, because maybe in our niche contract-billing use case, weâve agreed that even future bills will calc against what the âknown-at-the-timeâ contract was, and not the âfixed to be correctâ contract is.
E.g. this is a rare case where system behavior itself must be able to query data from historical system times (not non-auditing purposes, i.e. performing actual business logic, to derive new behavior).
I donât think Iâve ever worked on a system that needed itâfrankly, Iâd tempted to assert itâs actually a 3rd dimension, where we teach the domain about an additional âknown timeâ or have dual Contract
/ContractVersion
entities. See the appendix on âKnown Time != System Timeâ for more, but that musing aside, this âbusiness logic needs to read prior system timesâ is the prototypical example for when your system needs bitemporal capabilities.
Similarities
In all of these examples, âeffectivenessâ, in my opinion, is best modeled directly via entities that describe âhow the change was triggered in the real worldâ, and not by leaning on a bitemporal framework.
(âŚI guess, to be fair to Beckâs post, maybe databases like XTDB, and projects like pg_bitemporal
, led me to assume Beckâs post is asserting our entire domain model should be modeled bitemporally, when really âbitemporalâ-ness is just a schema pattern to be applied to specific entities within our model, which is exactly how I would apply it; although, even then, I think I would lean in to more domain-specific naming, ContractVersion
, etc., and away from super-generic, bitemporal-ish terminology like posting
& effective
columns.)
After some thought, all of my examples (save perhaps Beckâs) are really just using Snodgrassâs term of âapplication-time period tablesâ, or valid-time tables. Which is great, and makes sense, as even one of my first âbigâ systems used valid-times to track insurance benefit plan years.
Conclusion
So, in conclusion, in terms of bitemporal modeling, I think:
- The 1st bitemporal dimension, system time, is extremely important, and every system should have an audit trail
- Given this matters to all attributes, you should use a system-wide solution like CyanAudit
- Bonus points for your UI being able to âtime travelâ back to a given system timestamp, i.e. if you use a database/ORM that supports
SELECT * FROM authors AS OF TIMESTAMP '2023-01-01 12:00:00'
- The 2nd bitemporal dimension, effective time, is only important to the most intricate/differentiating parts of your domain model, and unnecessary overhead for the rest.
- You can divide attributes into 80/20 buckets of âpsuedo-immutable factsâ (like an authorâs name, just use a
first_column
column) vs. âfacts that have multiple correct values over time that our users specifically care aboutâ (like a patientâs height, use aPatientMeasurement
s table) - For attributes that need effectiveness, you should model these as explicit entities that the users already recognize (i.e. change orders, transaction dates, patient measurements)
- You can divide attributes into 80/20 buckets of âpsuedo-immutable factsâ (like an authorâs name, just use a
- If you really have business logic that relies on âknown atâ time (rare IMO), personally I would treat that separately than âsystem timeâ, and model this as âversionsâ (see appendix below)
- Perhaps this is what most people mean they say âbitemporalâ, is explicitly materializing known-ness for business logic to access.
- Instead of using dates to reason about âknown-nessâ, I think itâs much easier to think of this as versionsâwe had v1 of a contract, v2 of a contract, etc., and yes, we need to âfind the versionâŚas of some dateâ, but that seems a much simpler mental model than âposting date vs. effective dateâ
Coming back to the Perspective
class from Kent Beckâs post, I think itâs unlikely Iâd use it, because it couples two dimensions that are fundamentally different, and I think is actually more confusing and unnecessary to try and solve them via the same abstraction.
Particularly for system time, because each slice of system time is basically a unique, âinternally consistentâ view of the world at that point in time, I canât imagine ever wanting to accidentally âmixâ data across separate system times (i.e. forget to have the self.posting_date <= other.posting_date
from the Perspective
class).
And, if users or product do ask for that behavior (reading data across system times), Iâd suspect thatâs really an ask for either effectiveness (like patient heights, hopefully) that we hadnât realized yet, or maybe effectiveness and known-ness (even rarer, like billing for contracts based on their known-at time).
Maybe the biggest difference is just terminology: I think of âsystem timeâ fundamentally as an artifact of the database/storage technology, that business logic itself should never be able to cross/intermix, and a dimension that applies to all data, whereas âknown timeâ is so rare, intricate, and generally confusing to deal with, that we should model it explicitly, and only as necessary.
Appendix: Choosing Your Dimensions
When evaluating âwhat dimensions does this attribute need?â, think does your business logic (or UI) depend on:
- Nothing (no dimensions) i.e. the authorâs first name âis and has always been Bobâ
- These are the psuedo-immutable facts, i.e. 80-90% of attributes in your schema
- Use the audit trail to see them evolve over time
- Effective time (one dimension), i.e. the childâs height over time, purchase orders w/change orders
- You can model âeffectivenessâ without âknown timeâ explicitly modeled in your domain
- Use the audit trail to see them evolve over time, i.e. your business logic doesnât need to âtime travelâ to previous known-ness
- Known time (one dimension), i.e. versioned architectural plansâŚ
- We know âArchitectural Plans v2â changed â# of bricks from 200 -> 300â, and we want to have both âthe old value + new valueâ materialized in our domain so that we can show/calc diffs in the UI, but itâs not like â200 bricksâ is still meaningful in some prior time period (what effective time represents)
- Known time and effective time (two dimensions), i.e. Beckâs contract billing
- Your business logic itself needs to âtime travelâ (known-ness) and then, after filtering to âfacts at known timeâ, applies additional filtering for âeffectivenessâ
Basically:
- Some real-world values have âeffectivenessâ (height of a child, citizenâs given name), some donât
- Sometimes business logic needs to âtime travelâ to prior known-ness (versions of architectural plans)
- Sometimes we need to both âtime travelâ (version) entities that themselves also have (effectiveness) (versions of a contract)
But all of these seem like they can be modeled in your domain.
Appendix: Known Time !== System Time
After writing most of this post, Iâm now asserting that âknown-timeâ is not the same thing âsystem timeâ: âsystem timeâ is a physical attribute of your storage system, that marches ever onwards, whereas âknown timeâ is explicitly versioning specific entities within your domain model.
Thinking of Beckâs use case: âthe contract was âknown asâ $20k, and weâve fixed it to be $40k, but we want to bill it with the historical/known-at $20kâ.
Usually for this use case, we focus on the â$20kâ old value, and think, well, the database snapshot from back then would have the â$20kâ value, so we should use the system-time dimension to go back and recover the old value.
But, what if we focus on how the â$40kâ new value actually replaces the $20k old value?
I think the assumption is that this transition is simply whenever the UPDATE contracts SET value = 40 WHERE id = 1
statement runs. And that edit / UPDATE
, which is recorded implicitly in the system-time dimension, creates the old value/new value transition we want to track.
But, what if the âfixed contract valueâ actually needs to go through approvals? Or the âEdit Contractâ UI is not a desktop fat-client that does a single, final UPDATE
mutation, but is actually doing lots of auto-saves while the fixed Contract
entity is wip?
Both of these would cause edits that move the system-time dimension forward, with âchanges to the Contract
â, but those changes have not yet actually invalidated/replaced the old contract value yet. They would need to be staged somewhere in the ânext version of the contractâ.
This makes me think that we should materialize the notion of âversions of the Contract
â directly into our domain model. I.e. when weâre asked to âbill the contract with the âknown atâ valuesâŚâ, we should model it with ContractVersion
s, similarly to how we modeled the PurchaseOrder
and ChangeOrder
flow.
This makes bitemporal-ness a lot more pedestrian:
- if you need âknown atâ for auditing, use an audit trail,
- if you need âknown atâ for business logic, add âversionsâ to your schema,
- if you need âeffective atâ for business logic, add date/date ranges to your schema.
And thatâs it.
Which makes me unclear/skeptical of what a âbitemporal databaseâ like XTDB would bring to the tableâgranted, it would be great to have system time/auditing built into the database (instead of using a 3rd party solution like CyanAudit)âŚbut, other than that, any notion of âeffectivenessâ or âbusiness logic needs to access old valuesâ (known-ness) I think Iâd want modeled as first-class notions in the schema, and not metadata passed around as query params.
If you think you need to âcross system-timeâ to perform business logic (i.e. not auditing), you likely have a hidden entity (contract versions, change orders), that you should model explicitly in your domain.