Send As SMS

Sunday, February 26, 2006

The "it works for me" problem !

This is one problem anyone who works with IT people has come up with. Many people think this only afflicts the IT illiterate. Often because those in the know follow it up with some gibberish that they hope will make the user go away. "Have you tried turning it off and on again" is the much touted reply. It doesn't.
Anyone who has to liaise with others like third party vendors, system administrators or fellow software developers will have hit this brick wall. Recently I was making a connection to a database stored on such a server administered by someone else. Suddenly I couldn't get a connection to said server. I hit refresh and tried again, this time a partial result but then the script couldn't reach a database on it's own local host. I persevered and tried some other long term systems that shouldn't be failing like this. They also had an intermittent error connecting to local host. This simply screams some kind of network problem, either the server in question or possibly a device somewhere in the stack is behaving erratically. Fortunately this isn't the first time this has happened on this server, every so often it starts mis-behaving, similar problems to this, can't connect to databases, local, remote - on mars where ever it can't connect. It's like a door that wont open every fifth time or so. So I e-mail the administrator of that machine and say hey the server is playing up again the usual thing, remedy usually is a cheeky reboot, clear out a gremlin or another. Then the infamous reply "I've looked and it seems fine." You've looked how ? Opened a log, maybe made a tcp connection, check for connectivity to other legacy systems running on local host, used a command line tool to check database latency to simply connection requests ? As it turns out, no, a remote desktop connection is, it appears, all that is required to prove a server is completely functional. Open it up move the mouse around maybe restart a non critical service, if all goes well the server is absolutely top notch 100% fine. Apparently.
Not to be discouraged I do try to continue my development, but no those intermittent errors are still there. So then and this is the kicker, after a further e-mail exchange - it's decided it is my fault, it's the mySQL server I installed that's the problem. Despite the fact I've got code that can't connect to a remote MS SQL server, it's still the mySQL server that's wrong. Or maybe the code, yes that's right code that I've used one hundred times before that run usually flawlessly on several other systems is to blame. Despite the fact that based on previous empirical evidence, a server reboot solves this problem it's my code. I'm not encouraging us developers to be arrogant here, a trait of our bread, because the first time this happened I did indeed comb through all my code and did in fact find several connections I wasn't closing, but even still after closing them all guess what still intermittent errors. Remembering of course that the technology we're talking about is scalable way beyond the 100 or so users I have using my stuff. We're talking about millions of suer and thousands and thousands of transactions a second in mySQL, similar fabulous statistics in MS SQL, so forgive me if I don't blame an entire inability to connect on a few unclosed connections in my code. OK what do I suggest in these situations, problem management, people skills, decorum, calm - DONT PANIC. Don't rant like I have above, or if you are going to rant do it on a pointless web page that no one looks at. From a normal users point of view it's more difficult, you can't fight fire with fire you can't throw technical definitions back at people of the action you've already taken to make something work. You simply want your scanner to scan and the tech support people are telling you that they can't replicate the error, or despite being connected to your machine remotely it all seems to be working to them. What this answer sounds like to a user is one, you [technical support] think they [the user] are stupid and not using something correctly. Two that you [technical support] might even think they [the user] are lying, speaking with forked tongue, misrepresenting the truth. Lets face it users as a rule don't invent a problem just so they can find an excuse to converse with us technical folk. I know in some circles the term [user] means the stupidest person on the planet that might ever come to sit in front of a terminal running your system. In a design sense that is a good thing, it's a great thing, from a support standpoint it's not so great.
Sometimes it is as if staff want to deny the problem because they see it as a fault not to a piece of equipment, a bunch of circuit boards, but to themselves. They [technical support] think the user thinks that it's their [technical support] fault, the server is broken and it's your [technical support] fault because you're [technical support] the maintainer - damn you [technical support] all to hell. It is simply not the case and certainly not the case from other technical brethren. We all know only too well how machines despite all the testing you care to throw at them will hide some bug or fault that just doesn't get picked up, or if it does would cost so much to fix given as it will only occur infrequently and it can so easily be fixed by a reboot, it's not worth it! Trust me it isn't, if something simply isn't working users just want to let you know, they'd like it fixed also but they don't think it's your fault.
Another system I'm privy to has intermittent faults, it works for some users and not for others, the solution at that institution is to sit a user down internally and have them user the system from inside, if that works the system is fine and it's the users set up at home. To this solution I say pha. Pointless waste of time, no one is disputing that it works internally the problem is that some users can't access their stuff from home / Barbados. My preferred solution is to completely scrap the system and move to a more universally accepted system like a proper VPN. Next time a user reports a problem or a colleague says they can't run your code, the problem isn't their fault, or yours, it is both your problem and a problem to be solved.


Dom said...

Fools trust in Cisco and rely on proprietary access clients :)

10:06 AM  
M said...

This post has been removed by the author.

10:36 AM  
M said...

I wonder if anyone out there has any interesting thoughts on open source propergation if not to servers I'm sure its move to the Desktop and the implications of using open source solutions there.

10:50 AM  

Post a Comment

<< Home