February 17, 2006 | Comments: (0)
Higher Availability Future: Autonomic Computing or Recovery Oriented Computing?
It is fascinating to me that so many smart people can disagree on the best future approach to higher availability infrastructure. The
autonomic computing crowd led by IBM is touting self-healing and self-regulating computing systems. On the other hand the recovery oriented computing (ROC) folks led by researchers at Berkeley and Stanford declare failures are inevitable. ROC proposes the key to higher availability is helping humans to recover infrastructure from failures faster.
I have written here previously about ROC, but its time to start a dialog on comparing and contrasting these two radically differing views on the future of better infrastructure availability.
You notice I am talking about infrastructure availability not individual system availability. As an industry we have focused for decades on building more reliable individual components and systems. But now the reliability problem has moved to a different level. Take all these highly reliable components and systems and put them together with software developed by multiple vendors or adopted from different open source projects and the reality of complex systems settles in.
Can we build autonomic computing infrastructure that is self-healing and self-regulating beyond simple problems and single systems? Or will humans always be an important part of repairing and recovering IT infrastructure?
Our friends from Berkeley and Stanford offer an interesting perspective dubbed the Ironies of Automation. Their argument goes something like this.
Automation does not remove human influence, but instead reduces IT personnel understanding and can actually make their job harder. Automation increases complexity, reduces visibility and provides no day-to-day interaction and learning. ROC argues for better tools to help, not replace people.
So what do you think? Autonomic Computing or Recovery Oriented Computing? Which will lead us to higher availability infrastructure? Send me your vote to thebaum@splunk.com,
Posted by Michael Baum on February 17, 2006 11:33 PM
January 27, 2006 | Comments: (0)
I've been reading a lot lately about Recovery Oriented Computing (ROC). Its a fascinating set of research and ideas regarding how to help recover from failures in a variety of IT environments. One of the aspects covered in the research is the notion that human error counts for a great portion of the problems in modern IT systems. Rather than assuming that we can just automate more and more of IT operations and management and eliminate people and their error prone ways, ROC assumes (probably rightfully so) that humans inevitably make mistakes. David Patterson and Aaron Brown, both with the Computer Science Division at Cal Berkeley, present some very compelling views in their paper To Err is Human.
ROC is a very refreshing and compelling set of thinking. I'll be writing more about it here as it sounds like it will have a significant impact on IT troubleshooting.
If you are interested in discussing ROC email me at thebaum@splunk.com.
Posted by Michael Baum on January 27, 2006 09:12 AM
TOP STORIES
Top 10 stories of the weekA new place to hide rootkits
Sun exec on OpenSolaris, Linux
AT&T: No free iPhone Wi-Fi info
MS to appeal E.U. fine
XP SP3 causes endless reboots
Vista as insecure as Win 2000
Google grilled on human rights
Java ubiquity an edge in RIA battle
The InfoWorld news quiz
ADDITIONAL RESOURCES

- Virtualization: A Step by Step Approach to Success
- Dialing up Agility with Business Transformation
- 5 Things You Need to Know About Storage Virtualization

- Virtual Test Lab Automation: Manage development infrastructure
- Improve Resource Utilization and Lower Operating Costs
- Protect Your Data with SSL


