We recently did a survey of our sysadmin friends, and found that 60% of folks are using Excel and about 15% are using their heads to manage the state of their IT infrastructure(surprising another 15% use power of positive thinking, but that is a topic for another post).
Is the documentation/state of your network in somebody’s head? Have you considered the bus factor for your business?
According to Wikipedia, the bus factor is a measurement of the concentration of information in individual team members. It is also known as the lottery factor, truck factor, bus/truck number, or lorry factor and connotes the number of team members that can be unexpectedly lost from a project (“hit by a bus”, as it were) before the project collapses due to lack of knowledgeable or competent personnel.
We have been there – and we get it. IT infrastructure is last one to get money/budget for anything because most businesses see IT as just a cost center. If you ask for money for a DCIM/IT asset management tool or a CMDB (configuration management database), you are often told that there is no budget and you are encourage to stick with Excel and Visio or maybe an open source tool.
Unfortunately, Excel, Visio, and open source tools have a deal-breaker limitation: Somebody has to manually input the information and keep it up to date. And we all know that no matter how hard you try and how many processes you put in place, updates will eventually stop happening reliably. Maybe not everybody is onboard or maybe the person who makes a change at 3AM forgets to update or is too tired and thinks that he/she will do it the next day but gets busy putting out fires and never gets to it.
So ultimately, all the correct information is in somebody’s head and that is never a good situation.
How to run the bus test?
The ‘Bus Test’ is a simple principle:
“Knowledge should be duplicated between multiple team members.”
Knowledge doesn’t just mean facts and history, it also means processes, development, and access to accounts, to name a few.
Things tend to get more complicated as your IT infrastructure grows. Here are some ways to address the bus factor:
I have seen accounts shared either by:
Using the same email/password combo
Sharing the old-school way (Can you email/slack me the password again?)
It is slow, insecure, and most important, it doesn’t pass the Bus Test. One person leaves and poof, no more logins.
Recommended method: Use a centralized password vault like Device42 or another centralized password tools such as Lastpass, Secret Server, or Click Studio.
Any repeatable action (aka: a process) should be documented and assigned a lead. To do this you can use tools like Confluence, google docs or trello etc.
The lead should be responsible for updating the process when it changes and answering any questions about the process.
Here are some of things that are constantly changing in IT infrastructure:
Adding or moving hardware boxes or firing up new VMs in private or public cloud
Hardware warranty info/contracts
Spare parts inventory
Adding/moving network and power connections
Installing/removing software and services
Services changing from QA to production and increased dependencies
If all of these changes are not automatically detected and documented, you will run into the same issue: it will not pass the bus test!
Recommended method: Use a self-documenting CMDB like Device42. A self-documenting CMDB is something that doesn’t require too much up-front work to setup and can document the state of your network automatically. Here is a previous post that talks about self-documenting CMDBs: https://blog.device42.com/2016/08/problems-with-cmdbs/
So, are you ready for hit by bus factor? If not, you should get ready, because, well… you never know.