What is my software doing?
“10PM - Do you know where your children are?” goes the public service announcement from the 1960s, as sampled in the Soulwax remix.
An unlikely but accurate new remix would be “It’s 10PM - Do you know what your software is doing?”.
Here you are. It’s 10pm and you have no idea what on earth your software is doing. Some combination of race conditions? Perhaps lack of access to services? No, it’s file system behaviour. Is it? If it walks like a duck...
It’s 11pm, and you still have no idea what on earth your software is doing. The logs for the audit of all transactions were entirely accurate but didn’t cover this. The part of the code which matters is just buried in mystery.
It’s midnight, do you know what your software is doing?
No. You don’t.
It’s 3am. You just got back from an enthusiastic pub quiz (which your team won) after finishing a major project. Do those logs mean anything to you? No.
The truth for many software vendors is that they simply don’t have a great deal of insight into what their software is up to once it’s in the wild. They get bug reports and occasional logs, but for them the world outside their development environment is unknown. Like the edge of the known universe, the information hasn’t yet reached them. But unlike the edge of the known universe it probably won’t because nothing is helping it get there.
The problem is that even with some of the best coding in the world most parts of most programs go by invisibly because they are doing boring things, but these boring things result in bigger, more interesting things.
The line between boring things and interesting things is blurred and moving. When a report comes from a client that the stock counts are wrong early in the month but correct later in the month, I am suddenly very interested in the stock count value. But during the project that was just a “nice to have” and didn’t get much project time.
It’s nobody’s fault - it’s just how software tends to work.
During the final phases of a project, the code is tested, finished against the spec and then deployed. Logging and debugging code is added to ensure the main scope of the project is working, and perhaps some user or event tracking is added for business intelligence.
But ask any product owner at the end of the project what they’d like to log and the truthful yet nebulous answer of “everything” will come back. “Everything”? Really? I need to log when I cache and read config? When I increment the ‘a’ value in a rarely used debugging function?
“Everything” is meaningless because it means different things for different people at different times. Yet this is what we go with: log “everything we can think of as important”. But we still always need more information.
A reaction to this is to install analytics and monitoring packages, which gather data about certain aspects of the software, and either require configuring to log specific events, or they tie into a known framework or specific part of the language.
And so software vendors, like you, end up installing multiple tracking packages to tell them what’s going on in the wild. They install analytics, database logging, server monitoring, cloud activity and more but still, reports come back which are not reproducible and where the information just isn’t there.
If you’ve worked on anything in production you know that access to accurate, relevant, timely and detailed logs is a rarity. The problem is that “everything” is still not defined. “Everything” is in the eye of the beholder.
The typical case is that the software has gone and done something which strictly speaking wasn’t-not-in-the-spec but nobody would have actively requested that it do that. “Everything” didn’t include this case.
The brutal truth is that most software vendors don’t know what most of their code is doing, most of the time.