Linear Tech Talk Notes

July 4, 2023 3-minute read

Just posting my quick notes after watching this great Linear.app Tech Talk on their application’s synchronization approach. This is similar to my last post/notes on Vulcan.io, although both have the disclaimer that, while I enjoy paying attention to the offline/two-way sync problem domain, I have (thankfully?) not had to work it in myself yet.

My takeaways are that Linear’s initial client-side architecture was:

Download all of your org’s data (basically your tenant) in 1 GQL call,
Put literally all of it into in-memory MobX objects,
Write a React UI against the MobX objects,
Profit!

I’m a MobX fan, so love the callout.

This architecture still required a lot of up-front effort to build (sounds like they basically built a client-side, offline-first ORM on top of IndexedDB and MobX), but the pay-off was a super-fast offline-first UI that was their product’s primary differentiator (in the video, he was an aside that mentions the Linear co-founders did zero product/market fit research during their first few months as a startup, because they took it as obvious that millions of people use issue trackers, so of course there is a market, they just needed to build the best one).

After this initial architecture, they progressively realized:

Okay, so “downloading all the org’s/tenant’s data (even if fast-forwarding to only changes since your last boot) at once can be slow”, so they split it into two “smaller and important boot data” (blocking) + “larger but less important boot data” (non-blocking) loads.
Okay, after that, some orgs have huge data, so instead of pulling the entire client-side db into memory, we’ll lazy-load the client-side db –> in-memory MobX objects (with admittedly a pretty neat approach which was very transparent to the React UI: they would basically use sync access to trigger loads, and briefly return [] while the async load resolved in the background),
Okay, after that, some orgs have so much data (issues, comments, history) that even pulling it all in the client-side db is too slow, so now we’ll even lazy load the server-side db -> client-side db (which is starting to look more and more like a traditional network-requiring app…)

Afaict, I think they’ve basically ended up with three levels:

The ~static subset of the graph needed to even boot the app (basic org + users + team data)
The ~static subset of the graph that needs to load very soon after boot (maybe projects? issues assigned to you?)
The dynamic subset of the graph that lazy loads as you navigate the app (ideally speculatively running these to maintain the as-fast-as-possible UX)

(And all of this is just reads, tracking writes is also required.)

Basically the problem generalizes to what subset of the server-side object graph does the client need, and how/when do you incrementally crawl/lazy-load the object graph down to a) the client-side db and b) the client-side in-memory UI to best achieve the immediate-ui-load performance vs. slow-app-bootstrap trade-off.

When you’re small, you can build an MVP and assume the client-side’s data subset is the tenant’s entire dataset, which is super-clean & elegant, and you definitely should do this to ship the MVP and gain customers…but then eventually data/tenants in the real-world are large, and you’ll come back around to this problem of needing to lazy-load a subset of the graph. Which of course is a luxurious problem to have.