One variable at a time

For the past five weeks I’ve been troubleshooting an issue I encountered when I added a well-known hard drive manufacturer’s drive to my system. It’s taking so long partly because it requires seven-to-ten days to reproduce, but also because there might be several pieces (variables) to the puzzle.

Background

I have one of their drives (USB) plugged into my home server and I’m using it to back up all the Macs in my house. After a week or so, the drive will drop off the bus and only unplugging the drive’s power will reset it and allow the system to see it again. Additionally, the management software they included was throwing errors into the system log.

When I first contacted them, they wanted me to change the USB cable being used, uninstall their software and remove the other two USB drives (same manufacturer) and (by the way) the single FireWire drive (different manufacturer). Or move this drive to another computer (which would change the load on this drive), use a different USB cable and not install the management software.

One at a time

Of course I said no. We’d only make one change at a time. How else would we figure out what was the root cause?

I can only think of two situations when it makes sense to change more than one variable at a time and it turns out that they’re special cases of changing one variable at a time.

One is when you can safely eliminate all but one of the variables you’re changing from being the culprit. Changing an interface color from blue to red (when you’re working on an unrelated disk subsystem) might be a good example.
There’s still danger in doing this, however, because you’re not always certain that a combination of “unrelated” variables might become related.

The other is when you feel confident that the various changes you’re making will manifest themselves in unique or distinct ways and that you’ll be able to identify each change’s impact.
Coming up with a working generic example for this case is harder to do. One possible example might be adding more memory, an extra core and moving the application software to an internal disk. If your capacity/performance monitoring software gives you deep enough insight into these systems, it might be safe to make all three changes at once.
Being able to make this decision usually requires having intimate knowledge of and deep insight into the system(s) being changed.

Conclusion

It’s possible that I’m wrong and that there are more situations when you can change more than one variable at a time, but I don’t think so.

What do you think? When you’re troubleshooting, are there other justifications for changing more than one thing at a time?


(I’ll cover making predictions about the impact of your changes in another post.)

This entry was posted in System Administration and tagged . Bookmark the permalink.

2 Responses to One variable at a time

  1. Mike Plant says:

    I totally agree, when coding I always take a step back and go one at a time or one block at a time to determine the root cause of errors. Not the fastest method for debugging but I always find the issue, even if it’s only the surface.

Comments are closed.