Minimize disaster
My life is busy and I have a lot of little useful things I want to make, so I don’t often slow down and reflect on the way I work and how it’s evolved. But recently I was prompted by Erin Kissane’s excellent publication to listen to the recorded Massey lectures of Ursula Franklin titled The Real World of Technology. There were many lovely insights from this listening, but one thing she spoke on which stuck with me was her perception of the difference between “planning to maximize gain” and “planning to minimize disaster.” Franklin identifies the latter with “women’s work,” or, “situational and holistic work,” in principle, and ties it to feminine patterns of thinking. This type of work requires contextual evaluation and individual judgment of situations, resisting hierarchical power structures and top-down planning. It encounters and adapts to particular moments, like nature itself, making its progress slow but robust.
I find myself reflecting on these lectures often now as I try to carefully navigate the complex technical landscape of local-first software building. Because of my hubris in designing and creating my own lower-level tooling, the scope of my concern in the development process is pretty wide. I care about the color of a button, and I also care about how the changes the user makes to a document are represented in data, timestamped, and transmitted between devices on the network. And while this results in a lot of work and a fair amount of stress (especially when a bug is discovered), there’s also a material benefit to this mindset, in my opinion. My role is not just system designer, but system user, for basically every system I incorporate in Biscuits. There is no one else to pass the buck to, or to blame; there are also no limits on how well I can tune these systems to harmonize together.
This level of control naturally leads me to stress more over details, edge cases, and bugs. I have 8 different apps, either deployed or in experimental phases, which use my underlying Verdant local-first storage and synchronization framework. Each one has different needs and patterns of usage. Every app has exposed one significant bug in Verdant’s design or implementation, and also demanded one or more new features or changes to usage. I could never have designed Verdant from the beginning to be what it has become on account of this natural growth. Neither could I have confidently promoted it as a reliable local-first solution without examining it through the varied lenses these apps have provided.
Local-first is a minefield
But there’s another looming factor about local-first, particularly, which causes me to return to Franlkin’s thinking on “planning to minimize disaster” and its connection to “women’s work.” I refer to how much can go wrong, and how difficult it can be to recover from failure. This is not immediately obvious to an outsider, especially as local-first continues to enjoy a modest hype cycle, but local-first is not all upside. It is, fundamentally, a distributed systems problem: translated, a hard problem. For all the genuine joy of working with a well-made local-first system, what’s going on behind the scenes is often terrifying.
Move fast, etc
To explain what I mean, first I’ll return to typical “cloud” computing system models. This is the infrastructure model the Web 2.0 used to become the titan it still remains today. Users access software by interacting with a cluster of servers, which are the only source of all data in the system. Before a user can start using an app, they must first ask the server what data they have. Often, they are required to register an account to do anything at all. Aesthetically this mimics a hierarchical power structure, with the disempowered user relying on the authoritative server’s permission to encounter the world it is the warden of.
There are significant constraints here, and some surface-level risks. For example, “centralized” infrastructure means that an attack, disruption, or mistake can lead to loss of availability or even long-term loss of data for users because of something that happened on a server they have no control over. Providers are incentivized, if not quite required, to keep their servers running and healthy—or users may abandon the product. But this also leads to bad actors exploiting this relationship in ransom attacks, for example. On a practical level, it also means features like offline usage are relatively hard to enable.
But from a stability perspective, there are benefits, too. If a bug is discovered in the software, the patched code needs only to be deployed to the servers owned by the software provider. If data is lost in the middle of the night, the database can be restored from a backup immediately, once and for all users, unilaterally performed by the software provider. Sort of like how if your nation is ruled by a monarch, all you need to do to change the government is sway one guy.
When Silicon Valley types “move fast and break things,” they often have these reassuring benefits implicitly in mind. It’s possible to take large risks to maximize gain, because even worst-case scenarios are often recoverable with a minimal amount of forethought and a willingness to wake up at 3AM to patch a server. Or at least, that’s the mythology. In actuality I’m sure these kinds of mistakes do contribute to the vast majority of startups which end up failing.
Slow and steady
Ok, finally, contrast this with local-first. While local-first patterns take various forms, the one I chose for Verdant still does include a server. The difference is the role this server plays in the life of data. A local-first server is really more of a conduit and redundant storage solution. The “canonical” view of data doesn’t really live on any one device, but is assembled on any given client based on a reconstruction from data which has been made available to it over time. It’s more like a mutual social fabric than a hierarchy.
Since each client, including devices owned and operated by users, is responsible for interpreting a history of changes into a ‘view’ of the app’s data, that means consequential failures could happen anywhere in the system. If I ship a bug to the mobile app installed on a user’s device, and that bug erases the data, that failure will immediately make the app unusable for that user… and could propagate out to the rest of the system, too. Refreshing or re-launching the app won’t help. The autonomy granted to individual clients means they are also more responsible for their own wellbeing.
It’s hard to properly emphasize how scary this is in practice. If a user contacts me reporting data loss on their phone, I get a cold feeling in my chest knowing that I don’t just have to patch my own server or revert to a backup; I must determine the root cause, fix Verdant, and then possibly ship a custom remediation which attempts to recover the data on the user’s own device. (Side note: paradoxically, this is also why I prefer to own the low-level systems; I know my code is buggy but I understand it deeply and I can have it updated in minutes).
If I fail to build in appropriate safeguards and fallbacks in the system itself, I could be in a situation where all I can do is apologize and refund them. And for all the advantages local-first has over cloud-based software models, this disaster scenario is a lot more likely to happen. There’s no room for ‘breaking things’ in this world.
Minimizing disaster in the system
Even in the local-first software community, I think I still encounter people bringing a “maximize gain” style planning to the endeavor who maybe haven’t internalized these risks. I tend to believe this happens with talented system designers who come in from the Web 2.0 world and are more interested in ‘solving the problems’ of local-first systems than building and shipping products people use. I don’t mean that offensively. But I do think that even the most considerate and careful of these efforts may fail to meet the unique challenges of these systems.
There’s a tendency with software people—and especially, I believe, men in this field—to want to really solve things. I’m a man, and I recognize this tendency in myself, too. It would feel so good to be smart enough to architect the right system, from the start, meeting all the needs and edge cases. It’s a mindset which erases the contexts of usage, the on-the-ground wisdom which is gained by living with and observing the system in use. In its more ideological form, it labels contextual and heuristic adaptations as ‘hacks’ or ‘bodges,’ an inferior form of logic to well-planned top-down logical structure.
There is some rhyming here with the age-old debate about statically-typed languages versus dynamic ones. Even as an ardent user of Typescript, myself, I hope to never be deluded into thinking that my compile-time typechecking is so mathematically perfect as to alleviate any need for runtime validation. I think Typescript is, in fact, a wonderful middle ground where you can reap the benefits of both in harmony. But as ‘solution architecture’ becomes more and more abstract, the temptation to believe one can plan a system to model any eventuality with perfect logic becomes strong.
Building Verdant, and Biscuits on top of it, has helped me connect with a different way of encountering challenges. I had to recognize early on that, even at my best, I was never going to design a bug-free distributed system. And when even the most subtle bugs can produce irreversible damage to a user’s experience, I was forced to rework how I do software.
Before I knew what I was in for, I started out on that journey by eschewing the typical ‘interesting problems’ of local-first (like sync, or reactivity, or network topology choices) and focusing most of my design effort on schema migrations. It seemed to me that if I wanted to deploy software to user devices, and continue to maintain it over the years I hoped to support my ideas, I would need to plan for making iteration and correction of the underlying data as painless as possible. I definitely took this as a challenge to be clever, which I may have overdone. But the migration system of Verdant is still something I’m proud of; providing both specificity and adaptability through all the uncertain conditions of distributed data.
But it wasn’t perfect. In fact, there are still probably bugs in migrations which may lead to data no longer conforming to the schema, which is a kind of thing that’s hard to put back in the bottle. I think a “top-down” engineering mindset would have led me into many difficult months of agonizing over perfecting that algorithm, and honestly I probably wouldn’t succeed. This is a kind of tarpit trap for system builders, I think.
Instead, I decided to launch an initiative to solve the other end of the problem: making Verdant data self-healing. While this is less ideal than a perfect migration, storage, and sync system—in that user data is inevitably lost even with efficient self-healing data—it is far more pragmatic and provides value in a wider variety of contexts. And besides, I could perfect my algorithms all I like, but computers and networks themselves will still fail me. Fault tolerance and self-healing is a good use of effort.
While exploring this topic with Verdant, I discovered just how tricky it can be. Some of my initial versions of self-healing data actually produced more invalid data in unexpected cases. But because I use Verdant apps every day, I have usually found these pretty quickly and have corrected them over time. In return, my apps are able to keep themselves stable in situations I didn’t plan for, preserve as much user data as possible, and provide a way for users to at least salvage whatever is still intact, even down to individual document fields.
A brief gender corner
It’s a delightful thing, to me, that even after identifying “planning to minimize disaster” and “women’s work” together with the help of Franklin’s ideas, and despite comfortably identifying as a cisgender man, I feel free to nurture this ‘feminine’ approach to technology in my own work. I believe in the potential of local-first software, after all, and it would be a terrible shame if I felt too bound to ideas of masculinity to adapt what I think has been an excellent mindset for building it.
Not that these ideas are particularly gender-coded. I doubt most people would have identified these categorizations without prompting, and surely they still seem arbitrary or just wrong to many. But I find this uncharged analysis through gender lenses has helped me clarify the subtle connections between forms of work and types of systems I have really come to appreciate, and the subtle ways they are opposed to other kinds. Not as if there is a superiority of one or another approach, but just as a way to analyze, make decisions, and build intuition.
It’s my experience that having strong, charged ideological commitments around things like gender leads to inflexibility and inhibited curiosity that would prevent a person from gaining these kinds of insights. Our cultural landscape doesn’t seem capable of recognizing this kind of interconnectedness, and the cost of rigid ideological commitments. Ask any (I hate to say this, but, male) software developer if gender theory and software system design are connected, and they would probably shrug, and of course many would actively recoil from the thought. It’s not that I can’t see this point of view, it’s just that I think it’s quite limiting.
So, yeah, I’m a man who’s currently learning a lot from feminist women about how to deepen my practice with technology, and it’s great. Recommended.
Thanks for reading.