I’ve had a rule of thumb for data modeling that unfortunately I forget on a regular basis, so I thought I’d write it down to maybe help myself actually use it more.
The rule of thumb is “always separate user intent from derived behavior/business logic”.
This rule is particularly applicable to things like
status fields, which have a way of becoming “half-user controlled” and “half-system controlled”.
As the most recent example where I’ve run across this, consider a system of tasks for a stereotypical “tasks have predecessors/successor in a project plan”:
In terms of data model, we’ll focus on just a few things:
- There is a task
statusfield that has three potential values:
- The requirements state that the system handles all
NotStarted <-> InProgresstransitions, i.e. it “auto-starts” tasks once preceeding tasks are
- However only the user can say “this task is actually done (or not done)”.
An initial attempt at modeling this is a single
Task.status field that is an enum of
Then we use business logic to do “not rocket science but still somewhat nuanced” things like:
- Anytime a predecessor task changes maybe change the successor Task’s
status, but only if it’s not
- In the UI, treat
status = Completeas “you checked complete” but
status = NotStarted | InProgressas “you didn’t check complete”
This is all fine and not that bad, but we end up with a “sometimes the field is written by X and sometimes it is written by Y”:
Which is not terrible, but generally more of a “business logic is hidden in susceptible-to-being-spaghetti ‘push’ code”.
I.e. it’s pretty common in this setup for, if the user unchecks “task is complete”, to forget to re-run the “ah right, set it back to the ‘based on predecessors’ value” logic.
Generally a cleaner way of modeling things is to strictly delineate user intent from derived behavior, i.e.:
- The user intent of “this is complete yes/no” is it’s own “thing” (database field)
- The calculated “potential status based on predecessors” logic is it’s own thing (derived field)
- The calculated combination of “status based on user intent or potential status based on predecessors” is it’s own thing (another derived field)
I.e. our data model would move from having a single
status field to:
Task.is_completeis a boolean that is directly/always controlled by the user intent to mark “yes, this is/is not done”
Task.status_based_on_predecessors(probably not stored/persisted, so not a real column-in-the-db) does the calc of “this task should be
InProgressif all predecessors are
Task.statusstill exists, but is now derived (although likely still persisted for simplicity of reads) by the calculation “if
This moves the model to be more like a DAG of inputs with nodes of calculated values:
Which makes the application logic more functional, more reactive, rather than
if statements sprinkled in various places.
Granted, a separate but tempting tangent is that reactive / data flow paradigms have not generally taken hold on the server-side yet, especially at a “more than just lifecycle hooks within a single micro-service/monolith/ORM codebase” scale, so you still have to generally nudge/wire these derived values together, but I think the end result is still cleaner than the original “fuzzy ownership of a single field” approach.