Partitioning in distributed system

I just watched a talk from Sam Newman at goto; conference about splitting monolith into microservices

At around 39:00 he talks about the partitioning problem which is a question “What should the rest of your system do if one of the microservices crashed and doesn’t respond?”

When you are in monolith world you almost never have that problem – your system either crashes completely and aborts the user interaction or goes through.

In microservices world this kind of thing is something you have to anticipate and figure out what your strategy can be to solve it. At this point please watch the video and come back (it’s the last 5 minutes).

***

The solutions discussed in the video kinda reminded me of another more abstract thing that I’ll formulate like this “Almost any technical problem in your software can be compensated on a business level“. Or in other words – if you have a bug, there is probably a workaround to it that includes spending more money (by doing some process manually, buying more hardware, paying to some external expert to solve it etc.)

Note that this is a very generic rule – please disregard it if you design a software for IAEA. Please.

But overall I find it useful to keep it in mind.

Here is why – when you design a new feature you can think also about two things:

  1. How can you monitor if the feature actually works in production.
  2. What can be done on a business level (i.e. without code changes) in case the feature breaks.

Usually it’s not so hard to do that in the design phase and once included along with feature (internal) documentation it will save a lot of effort for everybody when things go wrong. Also usually the customer (or business-analyst) knows a lot on that topic so talk it through.

Things break, the question is how good you are prepared for that.

Also watch the video, it’s worth it.