- Transforming ITIL to Agile
- Visualization Coolness
- Change Detection
- Green IT Machine
- Continuous Training
- Community and Cooperation are the Keys to Success!
- Ignoring the source code is akin to an ostrich sticking its head in the sand
- Remember when men were men and wrote their own device drivers?
- My downloads is bigger than yours!
- It's all about working together
February 06, 2006 | Comments: (0)
What's Missing from Production System Troubleshooting
Alexandre Rafalovitch recently wrote me regarding our recently survey of system administrators and why product IT systems are so difficult to troubleshoot. Alexandre writes
By now we pretty much established that until the developers themselves try to support/troubleshoot their own products in production (or get loud enough feedback), they will not understand how to make their products easier to manage post-deployment.
The surveys of the how do you deal with it now kind should always include questions on why commercial solutions are not suitable (usually due to installation/license difficulties) and also what the companies creating the products could do to make things easier in a long run.
I think this is sort of the whole point of the survey results. Attendees at Camp Sys Admin overwhelming stated that they have so much data already it's killing them. It's interesting to think about the amount of IT data being generated every day in a typical enterprise shop. Forget about the network, firewall and security data. I'm talking about just your basic web servers, application servers and databases. Hundreds of gigabytes to several terabytes in IT data a day is not atypical for a good size data center.
The notion that IT people need even more data generated by developers kinda misses the point. Troubleshooting production applications is a whole lot different that debugging code in development or staging environments. Production systems involve many technologies and systems that just don't appear in pre-production environments.
When I was at Yahoo everyday we were fire fighting production problems that never manifested themselves in development or staging systems. At Splunk, my current company, we routinely see problems with our software that only occur with multi-terabyte data sets in very large production systems. Perhaps in another post I'll discuss how we deal with building QA environments to deal with this.
Alexandre comments further that commercial solutions are not suitable for solving production troubleshooting problems. True, today's solutions most often require extensive amounts of code instrumentation. IT people generally don't want to and/or can't instrument code in production environments. For starters, generally we don't even own most of the code running our applications and services.
So we're left to deal with all the ever-changing evidence our machines generated. And boy it sure is piling up quickly.
How much IT data do you have in your data center? Write me with your estimate at thebaum@splunk.com.
Posted by Michael Baum on February 6, 2006 10:10 PM
RATE THIS ARTICLE:
-

- COMMENTS
Michael,
Actually you misunderstood my comment. I do want the same thing as you, I just think you can actually make things even better with the weight of sysadmin organisations behind you.
The whole response got a little too bit for this comment, so it is back at my blog at the name link.
Regards,
Alex.
TOP STORIES
ADDITIONAL RESOURCES

- Virtualization: A Step by Step Approach to Success
- Dialing up Agility with Business Transformation
- 5 Things You Need to Know About Storage Virtualization

- Is your smaller organization ready for High Availability?
- Is system maintenance doing more harm than good?
- Virtual Test Lab Automation: Manage development infrastructure





