Solving software issues faster with the help of customers

Picture the scene:

You’re a software engineer, your product is out there being used every day by a collection of users who are all in the same line of business yet all have their own unique take on how the job should be done.  Outlook gets your attention with a new email notification.  You open the mail and begin to read.  It’s from one of your customers and they are asking for some support.  The email is short and gets straight to the point and goes something like this…

“When we did X, A happened! Please can you fix it?”

And that’s it.

If the action “X” and the outcome “A” are very specific (i.e. there is only one set of conditions under which action “X” can be performed and “A” is one of say two possible outcomes) then there is a chance you’ll be able to work out what the fault is and possibly provide advice to the customer on how to address it.  Unfortunately, this perfect scenario is rare in software.

Faults

Faults (I’m using the term fault rather than bug as bug implies there is an error in the software whereas fault implies it just didn’t do what the user expected) are very often the result of misunderstanding, circumstance, bugs or a mixture of all three in varying proportions.  So, as the imaginary software engineer, what’s your next move going to be?

Unless “X” and “A” are absolutely crystal clear then I’m fairly confident your next move would be to drop the user an email and ask them a whole raft of questions, the sole purpose of which is to try and establish facts.  This takes time and unfortunately all too often it results in a response that goes something along the lines of…

“That’s all the end user reported”

What?

Yes, that’s right, the person reporting the fault is just an intermediary.  And it gets worse… there’s a chance they may not have seen or used your software.  So, where do you go from here?  Not sure?

That’s the problem we, as software engineers, face almost every day.  Faults being reported with little or no supporting information.  The problem is, it doesn’t help anyone when we have to say “I’m unable to reproduce the fault with the information available”.

That’s a point worth remembering…  until we can reliably repeat a fault, fixing it is guess work because if you can’t reliably repeat it, how can you tell if you’ve fixed it?

Thankfully, it doesn’t have to be like this.  There is a better way: More information.  The more information we have about the fault the better the chances are that we will be able to fix it.

Unfortunately, this can lead to conflict between us and the end user because sometimes the response we get when we request more information suggests that we don’t understand what it’s like for them.  It’s true, I’ve never tried to rescue someone from a burning house for example, but that doesn’t mean I don’t understand.  But turning the tables… do users understand the problems we face?

Feeling the pressure

It’s unlikely there is a life on the line when I’m looking at a fault, but if I get it wrong, I could put hundreds of lives at risk (there are hundreds of installations of my team’s software out there in use every day).  There is pressure there, but there is also pressure from the customers themselves to get things fixed quickly.  Unfortunately, with little or no information, I have to assume that the error is located somewhere in the codebase for the product.  Considering MODAS Manager, if you ignore third party libraries, there are 67,000+ lines of source code.  So which one is at fault?

As the imaginary software engineer, I’m guessing you’re sat there scratching your head thinking “I don’t know?  Help!”  That’s how it can feel for us sometimes too.

Obviously, armed with the action “X” you can make some assumptions about the areas of the code base that could be at fault.  The outcome “A” may narrow it down further, but you could still be looking at hundreds and hundreds of lines of code.

So how much information is enough?

In actual fact, the level of information my team typically needs is this: What call sign were you?  What time did the issue occur? What were you trying to do?  What was the outcome? Of those, the last couple could take some explaining, but a photograph from a phone is often enough to paint a reasonably clear picture.  Armed with this level of information, which isn’t actually that much more than “We did X and A happened”, we can start our quest.

We have quite detailed logging available to us, but to use it we need to know a rough time and, depending on which software is at fault, a call sign.  After that, it may be obvious from the logs where the error occurred. If not, then the ‘what were you trying to do?’ and ‘what was the outcome?’ questions come into play.  Sometimes the state of things before the issue occurred can be relevant.  Unfortunately, this is sometimes very difficult to remember, especially in the heat of the moment.

Another useful nugget of information is the answer to the question “Can you repeat it?”.  If you can, then try and find a convenient time when you’re not under pressure and repeat it, making slightly more detailed notes about the steps required to repeat it.

We don’t like to let our customers down. Unfortunately without sufficient information, our quest to find faults can sometimes be almost impossible.

Next time

So the next time you experience an issue with our software, make a brief note of when, where (vehicle call sign for example) and what (the actions that resulted in the error and the outcome).  If you have time, snap a shot of the screen (a picture paints a thousand words) and try to jot down the state of things before the fault.  If you can repeat it and you’ve worked out how, provide that information for us.  If you know the system produces log files and you know where they are, grab them and send them along to us as well (include the files for the day of the fault and maybe the day before as well – sometimes a fault that is picked up today is caused by an error that occurred yesterday).

And please, if we ask for more information, try and understand the situation from our point of view.  We aren’t trying to make life difficult for you, we want to provide you with the speediest fault turn around we can, but to do that we need information and for the most part, the only people who can provide that is you, our customers.  Only by working together as a team can we make getting support a much smoother experience.

 Thanks for reading.