February 14, 2006 | Comments: (0)
Debugging Versus Troubleshooting
Often I hear developers talking about how they don't understand system administrators and why they are so primitive when it comes to finding and fixing problems. What most developers do not understand is there can be a significant different between the tools appropriate for development environments and the tools safe for production environments. Here is a good example. This is from a recent post entitled Guerilla Debugging for Java by Russ Olsen.
I love Eclipse. I really do. Eclipse has made creating, and especially debugging Java code so much easier that it is hard to imagine living without it. And that last bit is the problem. Sometimes we have to live without Eclipse. I have worked for customers who run very large and complicated J2EE applications, who for security reasons they will not let Eclipse within 200 yards of their servers. Dont even ask. Then there are the times where I have been ssh-ed in to some machine, trying to figure out what is going wrong without the benefit of a GUI. Very little opportunity for an Eclipse session there. Finally, I have been up against Java applications with such an enormous memory footprint that Eclipse would have been the IDE that broke the camels 2 Gig memory limit.What do you do when cruel fate separates you from your favorite IDE? You resort to the kind of techniques that programmers used before Eclipse...
Do you have a good story of living in production environments without your development environment tools? Write me with your story at thebaum@splunk.com.
Posted by Michael Baum on February 14, 2006 07:27 AM
February 06, 2006 | Comments: (0)
What's Missing from Production System Troubleshooting
Alexandre Rafalovitch recently wrote me regarding our recently survey of system administrators and why product IT systems are so difficult to troubleshoot. Alexandre writes
By now we pretty much established that until the developers themselves try to support/troubleshoot their own products in production (or get loud enough feedback), they will not understand how to make their products easier to manage post-deployment.
The surveys of the how do you deal with it now kind should always include questions on why commercial solutions are not suitable (usually due to installation/license difficulties) and also what the companies creating the products could do to make things easier in a long run.
I think this is sort of the whole point of the survey results. Attendees at Camp Sys Admin overwhelming stated that they have so much data already it's killing them. It's interesting to think about the amount of IT data being generated every day in a typical enterprise shop. Forget about the network, firewall and security data. I'm talking about just your basic web servers, application servers and databases. Hundreds of gigabytes to several terabytes in IT data a day is not atypical for a good size data center.
The notion that IT people need even more data generated by developers kinda misses the point. Troubleshooting production applications is a whole lot different that debugging code in development or staging environments. Production systems involve many technologies and systems that just don't appear in pre-production environments.
When I was at Yahoo everyday we were fire fighting production problems that never manifested themselves in development or staging systems. At Splunk, my current company, we routinely see problems with our software that only occur with multi-terabyte data sets in very large production systems. Perhaps in another post I'll discuss how we deal with building QA environments to deal with this.
Alexandre comments further that commercial solutions are not suitable for solving production troubleshooting problems. True, today's solutions most often require extensive amounts of code instrumentation. IT people generally don't want to and/or can't instrument code in production environments. For starters, generally we don't even own most of the code running our applications and services.
So we're left to deal with all the ever-changing evidence our machines generated. And boy it sure is piling up quickly.
How much IT data do you have in your data center? Write me with your estimate at thebaum@splunk.com.
Posted by Michael Baum on February 6, 2006 10:10 PM
TOP STORIES
ADDITIONAL RESOURCES

- Virtualization: A Step by Step Approach to Success
- Dialing up Agility with Business Transformation
- 5 Things You Need to Know About Storage Virtualization

- Is your smaller organization ready for High Availability?
- Is system maintenance doing more harm than good?
- Virtual Test Lab Automation: Manage development infrastructure


