4 min read

Always Separate User Intent

Table of Contents

I’ve had a rule of thumb for data modeling that unfortunately I forget on a regular basis, so I thought I’d write it down to maybe help myself actually use it more.

The rule of thumb is ā€œalways separate user intent from derived behavior/business logicā€.

This rule is particularly applicable to things like state or status fields, which have a way of becoming ā€œhalf-user controlledā€ and ā€œhalf-system controlledā€.

As the most recent example where I’ve run across this, consider a system of tasks for a stereotypical ā€œtasks have predecessors/successor in a project planā€:

In terms of data model, we’ll focus on just a few things:

  • There is a task status field that has three potential values: NotStarted, InProgress, and Complete
  • The requirements state that the system handles all NotStarted <-> InProgress transitions, i.e. it ā€œauto-startsā€ tasks once preceeding tasks are Complete.
  • However only the user can say ā€œthis task is actually done (or not done)ā€.

An Okay Way

An initial attempt at modeling this is a single Task.status field that is an enum of NotStarted, InProgress, and Complete.

Then we use business logic to do ā€œnot rocket science but still somewhat nuancedā€ things like:

  • Anytime a predecessor task changes maybe change the successor Task’s status, but only if it’s not Complete
  • In the UI, treat status = Complete as ā€œyou checked completeā€ but status = NotStarted | InProgress as ā€œyou didn’t check completeā€

This is all fine and not that bad, but we end up with a ā€œsometimes the field is written by X and sometimes it is written by Yā€:

Which is not terrible, but generally more of a ā€œbusiness logic is hidden in susceptible-to-being-spaghetti ā€˜push’ codeā€.

I.e. it’s pretty common in this setup for, if the user unchecks ā€œtask is completeā€, to forget to re-run the ā€œah right, set it back to the ā€˜based on predecessors’ valueā€ logic.

A Better Way

Generally a cleaner way of modeling things is to strictly delineate user intent from derived behavior, i.e.:

  • The user intent of ā€œthis is complete yes/noā€ is it’s own ā€œthingā€ (database field)
  • The calculated ā€œpotential status based on predecessorsā€ logic is it’s own thing (derived field)
  • The calculated combination of ā€œstatus based on user intent or potential status based on predecessorsā€ is it’s own thing (another derived field)

I.e. our data model would move from having a single status field to:

  • Task.is_complete is a boolean that is directly/always controlled by the user intent to mark ā€œyes, this is/is not doneā€
  • Task.status_based_on_predecessors (probably not stored/persisted, so not a real column-in-the-db) does the calc of ā€œthis task should be InProgress if all predecessors are Complete, otherwise NotStartedā€
  • Task.status still exists, but is now derived (although likely still persisted for simplicity of reads) by the calculation ā€œif is_complete then Complete else status_based_on_predecessorsā€ i.e. InProgress or NotStarted

This moves the model to be more like a DAG of inputs with nodes of calculated values:

Which makes the application logic more functional, more reactive, rather than if statements sprinkled in various places.

Granted, a separate but tempting tangent is that reactive / data flow paradigms have not generally taken hold on the server-side yet, especially at a ā€œmore than just lifecycle hooks within a single micro-service/monolith/ORM codebaseā€ scale, so you still have to generally nudge/wire these derived values together, but I think the end result is still cleaner than the original ā€œfuzzy ownership of a single fieldā€ approach.