Free Newsletters

   All InfoWorld Newsletters
IT Troubleshooter | Harper Mann » January 2006

January 27, 2006 | Comments: (0)

To Err is Human

I've been reading a lot lately about Recovery Oriented Computing (ROC). Its a fascinating set of research and ideas regarding how to help recover from failures in a variety of IT environments. One of the aspects covered in the research is the notion that human error counts for a great portion of the problems in modern IT systems. Rather than assuming that we can just automate more and more of IT operations and management and eliminate people and their error prone ways, ROC assumes (probably rightfully so) that humans inevitably make mistakes. David Patterson and Aaron Brown, both with the Computer Science Division at Cal Berkeley, present some very compelling views in their paper To Err is Human.

ROC is a very refreshing and compelling set of thinking. I'll be writing more about it here as it sounds like it will have a significant impact on IT troubleshooting.

If you are interested in discussing ROC email me at thebaum@splunk.com.

Posted by Michael Baum on January 27, 2006 09:12 AM


January 20, 2006 | Comments: (0)

A Camp for Sys Admins

This weekend I attended Camp SysAdmin, a get together of leading IT professionals in the Bay Area. The event was sponsored by Bay Lisa, LOPSA (league of professional system administrators), NaSPA (network and system professionals association), Splunk, SysAdmin Magazine and Usenix. What all of the sponsors of the event share in common is a commitment to improving the everyday lives of system administrators by fostering a dialog around the growing challenges facing IT professionals in an increasingly complex world.

There is no doubt, as the world becomes more on-demand, the demand for system administration is on the rise. This year more than $100B will be spent managing the worlds data centers and that number is growing at more than 2.5 times the rate of new hardware spend. The current or imminent deployment of new technologies like VOIP, Web Services, Virtualization, competing Open Source solutions just to name a few will mean solving more difficult problems and working longer hours.

I've always been surprised at the lack of dialog regarding system administration challenges, both in the popular IT press, but also among the system administration community itself. Camp SysAdmin was a useful forum to spark more conversations, connections and solutions for us all.

Campers were able to participate in two of five potential tracks discussing a wide range of system administration topics. We had Brian Aker of MySQL fame discussing troubleshooting Open Source technologies. Ethan Galstad, lead Nagios contributor, joined the campers in exploring operations collaboration in the data center, including the impact of outsourcing data center operations. Richard Whitehead, CTO of Clarus Systems, led a roundtable on IP Telephony and Eric Allman, author and Chief Science Officer at Sendmail helped campers explore all aspects of administering messaging infrastructures.

The day was filled with incredible conversation and intense exchanges of opions and ideas. Thank you to everyone who worked so hard to make the event possible and to all the campers who participated. If you were not able to attend, the sponsors will be making the recorded audio of the roundtables and the results of the camp survey available soon.

If you are interested in getting involved in Camp SysAdmin email me at thebaum@splunk.com.

Posted by Michael Baum on January 20, 2006 07:48 AM


January 10, 2006 | Comments: (0)

When Problems Don't Really Get Fixed, Do They Just Go Away?

Have you ever wondered after rebooting a device, server or software creating nasty errors whether the problems go away just because the errors disappear?

It seems with all the complexity of modern systems we often focus on only removing the violation rather than finding and fixing the root cause. I'm sure like me, you wonder, will the problem come back? Will the problem manifest itself in another way, when I least expect it. Is the problem a symptom of a larger issue? It is so hard just to get rid of the error that we often give up or run out of time solving the real root of the problem. Too many times, we just push the problem off to someone else.

This post about Wordpress and MySQL errors reminded me this can happen even in fairly simple systems as well.

If you have a great story about never getting to the root cause, write me at thebaum@splunk.com.

Posted by Michael Baum on January 10, 2006 09:05 PM


January 03, 2006 | Comments: (0)

Troubleshooting Web 2.0

Over the holiday break two contrasting types of stories were appearing about the future of Internet technology. There were stories about the wonderful Web 2.0 stuff and how all the promises of the "Attention Economy" as O'Reilly calls it are coming to fruition and changing our lives.

Then there were the reality stories about how hard it is to keep all this on-demand, next generation Internet stuff up and running.

The simpler and more pervasive services get, the more we all rely on them. But, of course there's the rub. When the applications and services are provided across the network we are all depending on things we don't control.

As the next generation Internet and Enterprise Applications become more on-demand and more intertwined, troubleshooting problems will continue to be at the center of more news stories. Stay tuned.

Do you have a Web 2.0 troublshooting scenario to share? Write me at thebaum@splunk.com.

Posted by Michael Baum on January 3, 2006 10:07 PM


Technology White Papers

 

InfoWorld Technology Marketplace

  • Virtually Limitless Virtual Storage - Do you need virtualization space savings of 50% or more with virtually no performance impact? You might be able to get storage...
  • Invisible IT? - The goal of IT is to become an invisible entity within a larger organization. Eliminating visibility and road blocks IT ...
  • It Really Is Easy to be Green - "Green IT" is a popular concept. And IT organizations are learning the influence that IT purchase decisions have on data...
  • Key Strategies For SOA Testing - SOA requires a unique approach to testing. Unless you're willing to reorient your testing procedures and technology now,...
  • Eliminate Botnet Security Risks - Botnets are widely regarded as the top threat to network security. This Whitepaper explains how botnets have traditionally...
  • Zero Day Protection For Your Network - Zero day attacks are a growing threat because they pass undetected through conventional signature-based defenses. Rather...

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
» BUY A LINK NOW

Sponsored Technology Links