Smart Developers Dont Code?

While trying to understand performance of a system of a system I am working I was trying to understand the performance and design implications of the implementation provided by our development teams.

The main issue being that we have two components in a system that are using Kafka, one developed by our team in Australia, the other by the team in the US. Now this is not exactly a fair comparison we've had issues with our product, but to say it processes a large amount of messages in a very short time is not untrue, it does it with a machine with 4 CPU's and 2GB of RAM. The US team on the other hand has 42 available running Kafka consumers and barely processes 1 message per second.

This is insane, with almost 6 times the computational capability it is orders of magnitude less efficient at processing the amount of Kafka messages than our tiny little Java program.

While trying to understand the why it was slow then potentially think about a way on how to speed it up, I had to grapple two additional items, one being it is an OSGI program and two it used Camel to fetch from Kafka.

So my train of logic is it the OSGI part that is causing problems or is it the Camel part, how is the number of simultaneous consumers specified in Camel actually behaving? how does this interact with OSGI? How can I make this more parallel with appropriate threading? Is it using threads effectively? Can we use parallel patterns such a pipelining effectively? Are we staging IO intensive operations right to avoid completely tying our compute time with IO?

This was problematic, as I had to learn not only what their code did, but all the other components including how OSGI behaved and Camel.

I quickly came across an article called "Smart Developers don't code" - which argued that one should simply use the tools other have crafted for them, and while I might agree with some of the general sentiment I had strong reservations about it.

Frameworks and Assumptions

When we first started developing our section of the program we initially use Spring Kafka, it quickly got us off the ground and we had a prototype product we could use. However the model soon started to break down a lot as we had to add more of the functionality required for our component. Frameworks assume a couple of things, and they are great at starting, or event maintaining a program while you are working within the constraints it sets. This is true for any API, or SDK, while the assumptions hold true things are fine, however once your program starts to stray from that there is resistance (impedance) and these begin to tear your program. The most important way this begins to break at things is when you need to have internal adaptor layers, or require to use seperate model than the one you intended to program with. The extreme is that it forces your program to operate as independent executables.

Frameworks also have another problem, they require knowledge on their use, remember that a framework makes assumptions, by using that framework you have taken that set of assumptions into your program space they are now things you need to keep in mind, along with the technical details on API usage. You might not know those assumptions your self or yet to discover them, partly because it's not documented or because there is no defined behaviour.

A better way to say this problem is "You use Spring, but do you know Spring?" - this identifies the problem, you might use the API but do you know what it does?

Plumbing

The article proceeds to illustrate that developers should really be plumbers, a long derogatory description of modern day development. The idea is that you are not really building anything but just tying the data flow from one frame work to the next. However I would argue at least plumbers know the rules and regulations, they would need to know how water flows through the pipes, and what is legal and not legal as required by the building codes of the area. A developer who just ties one framework to do one job and links a chain of frameworks is not really a plumber, as he does not know what is happening.

This kind of development leads to stack traces of doom as I call multi-paged dumps which contain little to no relevant information, exceptions caught within exceptions, transferred to one layer to the next, and a level of Opacity within your own program.

Information Theory

How much information is required to build your program? Think about the scope of what you are trying to achieve, and what the boundaries of your program are. How many bits of information do you need to describe it?

Your program is so much more than just the bits you wire, it is the sum of the bits you wire, both internally and externally. How many bits of information do you know about your program?

Information about a system cannot be shed it can only be shuffled to another location that you don't see or don't know but its still present.

A good way to think is the cloud - at least initially - companies thought that they could get rid of expensive IT staff by externalising to the cloud, or avoid the cost of hosting maintaining servers. However the burden of said system didn't disappear it was merely transferred and in some cases (more often than not) it was more expensive.

An example was from a friend whose company wanted to go full DevOps and Cloud. The outcome was interesting, but I made a prediction that roles cut would be transferred else where that the knowledge of maintaining the system and that of the data centre setup would be transferred to the development team. This is because the information to maintain a system is not gone, this is why DevOps is a thing it is merely the transfer of knowledge of the network setup of a data centre to the development teams, setting up the clusters, network addresses all configurable now - no nasty network engineer, except that the software team is now the network engineer. The DC/Cloud server has their network engineering teams hidden from you, how else did you configure your K8 cluster in the first place, what they have done in transferred to you the cost the software configuration of said items. Regardless if you build a cluster in a cloud system, a physical cluster or a virtual one the cost (information size) of that setup just got transferred from one group, it is still present.

The same is true when using frameworks, you've simplified your "code" - however at the cost of adding additional bits to your program, most of it opaque and unknown to you. You've merely transferred the cost of bits from one visible to opaque, you don't simplify you've merely shifted.

What is worse is that this cost is larger than you think in other ways for one you don't know what the size of the framework is, in other words the amount of bits is unknown but also opaque in that you don't know what part of the framework are relevant to the total information size of your program. To effectively decide on what and or how to use something you now need to know your own programs information + the frameworks information, then you need to account for how much you DON'T know.

Games as Examples

This kind of Framework tension is always present none more so that in the Games Industry - for example BioWare and their use of the FrostByte engine. A company want to make a game and usually use a third party game engine, this makes a lot of sense but, game engines are interesting beasts because they really do make hard assumptions on what they can offer. For some choosing the wrong engines can be a terrible cost, leading to potentially using the engine less than they expected not saving them much (adding to costs along with licensing) and potentially a lot of re-writing. This is not generally often spoken much about in articles like the one above where choosing the wrong thing is more costly than doing it solo, or potentially picking a different tech.

Plumbing CV's

The articles idea that one should just plumb somebody else's code is what leads to Framework CV's, people listing their languages that they program in, along with the frameworks they have used (I do it too). A long list of things you can plumb with not showing the creative skills that an actual engineer/developer requires.

Conclusion

The end of the article concludes with an equally hilarious straw man of advocating the use of Camel in this ideal plumbing world. Oddly enough again I am not against the idea of using frameworks, however one should code for what they need. Many times there are bugs introduced in a system because of the framework you used and now you have to work around it. The insane cost of adding another framework to solve your problem doesn't solve it - it merely makes it opaque and the information you need to know increases dramatically outside the use of the framework functionality you are using, you need to own those assumptions as if it where your own. It is not that smart developers don't code, rather they chose when not to and they own that code along with all the bits that it requires.

I found this post: Complexity is a source of income - Which I think illustrates this "Programming as Plumbing" mentality as a means of pushing "product", by "vendors"