- Open Source Network Monitoring Luminaries Compare Approaches / Technologies at LinuxWorld
- Cisco's Network Management Push Raises More Questions than Answers
- Are SANs really as infallible as vendors would have you believe?
- Open Source Network Management Tool You Should Care About: NeDi
- False Positives in IT Equal Wasted $$$
- Open Source Giving Customers More Leverage Against Management Vendors
- Tracking Application Performance Getting Trickier ...
- IT trouble often starts with inconsistency in process
- Systems and Network Dashboards Need to Get with the 'Blink' Principle
- Cacti Brings Great Reporting Functionality to Open Source Monitoring
August 17, 2006 | Comments: (0)
Open Source Network Monitoring Luminaries Compare Approaches / Technologies at LinuxWorld
Yesterday at the LinuxWorld event, an impressive group of some of today's hottest open source network management projects came together to compare visions / approaches for the network monitoring and management their technologies are tackling, and the common threads between them.
Participants included:
> Matt Massie, Project Lead, Ganglia.> Kees Cook, Creator of SendPage.
> Ian Berry, Project Lead, Cacti.
> Tobi Oetiker, Project Lead, RRDtool, MRTG, SmokePing.
> Remo Rickli, Project Lead, NeDi
We've heard a lot of discussion about how open source is threatening the big proprietary suites, and competing on cost. But there were some other really interesting technology trends that these open source network monitoring luminaries were highlighting that are a lot different than the typical "Big 4" (Tivoli, Openview, Unicenter, Patrol) suites' approaches.
For example, one of the distinguishing features of these open source approaches is that they are all "lightweight." They're all designed to be unintrusive, with low drag on the network. For projects like Ganglia, for example -- which is optimized for monitoring clustering systems -- this is critical for performance (any sort of latency or I/O overhead will cripple a cluster). One of the reasons why Remo Rickli created NeDi was that the tools for network discovery typically had a huge overhead toll (in sharp contrast to the lightweight script Nedi, which can discover the network using SNMP, telnet ... and SSH soon ... without dragging on the network). SendPage (an open source paging system) is also optimized for very low latency and high volume paging requirements (for notifications about network anomalies).
Another common thread is that these approaches are all very visualization-friendly. They're collecting info, then spitting out the results in XML and RSS often, with great chart and graph capabilities. They are all geared towards the scripting languages and creating very dynamic, customized presentation capabilities. Ian Berry's Cacti is the obvious example to call out here -- it's very much oriented towards advanced graph viewing interfaces and templates -- and the ability to keep clicking down to more detailed views (that pulls in from RRDTool). Cacti has become such a well-known tool for graphing info, that it's in fact seeing use beyond network monitoring, and in niche disciplines like seismic and medical data.
"When you're trying to put together an open source system for monitoring your network, you have a very rich field to choose from," said Thomas Stocking, Founder of GroundWork Open Source. "What's really exciting today is that all of these disparate, best of breed solution creators are starting to really come together to explore what's possible when their technologies are interoperating with each other. We're sort of entering the next phase of the open source-based approach to network monitoring, and it's great that all these creators are in close dialogue with each other."
Posted by Harper Mann on August 17, 2006 03:04 PM
August 01, 2006 | Comments: (0)
Cisco's Network Management Push Raises More Questions than Answers
In Q4, another InfoWorld blogger noted Cisco's 'rolling thunder' in systems management. According to Greg Nawrocki:
"I think it's really interesting to consider the consolidation that will likely take place in the systems management realm in the next year. Over the last couple of years, there's been an explosion of vendors with dynamic server provisioning capabilities -- Opsware, Cassatt, Platform, Data Synapse, Levanta, Egenera, Qlusters, BladeLogic, etc., etc. -- the environment is ripe for acquisitions, and Cisco is one of the looming giants that's up to the task. To me, the question is not 'if' Cisco will continue to beef up its systems management capabilities -- but 'when' they will officially acknowledge that they intend to compete head to head with the Tivolis, Openviews, etc. of the world. As highly distributed applications become normalized in enterprise -- the value of 'intelligence in the network' keeps getting more compelling for enterprise customers."
That speculation that Cisco was to compete with the "Big 4" continued in a Network World article by Lynn Haber, which pointed out that Cisco's acquisition of Sheer improved the interoperability between Cisco gear and system management vendors.
In that same article, Jim Hull, VP of engineering at MasterCard, said, "If Cisco does focus its strategy on open standards that will let users pick the best management tools."
But when I read Denise Dubie's Q&A ("Cisco Exec Touts Network Management Push") with Cliff Meltzer, Sr. VP of the Network Management Technology Group, today on Network World, I felt like her questions were all the right track, but some of the answers seemed a bit opaque. For example, on the question about how Cisco's new management technology will "allow you to manage other vendor gear."
I would like to hear more specifically how Cisco intends to handle the practical technical issues that are common to environments with a large diversity of different network hardware. For example, how will they make it easier to poll different MIB values across Foundry, Juniper and other competitors' equipment?
How will Cisco's management product deal with competitors' network security? For example, firewalls blocking the monitoring protocols. A customer might have a router built in '99, and it will respond to SNMP and ICMP -- but it may not have a web interface or HTTP. Whereas a modern router will have an HTTP interface, may have other configuration interfaces -- where you can send to it with usual versioning questions.
Those are the types of common issues that a network admin runs into in environments with heterogeneous equipment. It would be helpful to hear Cisco explain its management product roadmap a bit more specifically to that degree of granularity.
Cisco talks about handling multiple vendor gear, but their Network Analyst Modules page -- the sort of engines, if you will, for their monitoring solutions -- also seems to say that it monitors traffic exclusively for Cisco Catalyst Switches and Routers.
I am intrigued by Cisco's forays into network management. But until there's a little more direct evidence, it seems like the story their telling about managing heterogeneous environments is a bit more spin than substance.
I think there is room for improvement in the Network and Systems Management arena, but I wonder about gargantuan companies' interests in general solutions. It's too easy to sow in small technical problems where all the parts work much better from the one vendor and maddeningly almost (but not quite) work for everybody else.
This is one of the advantages of Open Source. Because of the way it's structured, it's to everyone's advantage to work with as many other systems as possible. That incentive just doesn't exist for commercial products, particularly as the behemoths take over more and more of any one domain and start to spill over into new domains.
Posted by Harper Mann on August 1, 2006 12:17 PM
June 19, 2006 | Comments: (0)
Are SANs really as infallible as vendors would have you believe?
The original rational behind the enterprise SAN and fiber channel fabric market was to put in place a world-class infrastructure for protecting data. By dissolving the direct-attached relationship between servers and storage, the idea was that when servers failed, the storage would still be accessible.
But studies about downtime causes are steadily surfacing data that suggests that SANs themselves go on the fritz relatively frequently. This recent vendor-produced white paper ("Why Email Fails") by MessageOne, for example, cites SAN failures as one of the leading overall causes of messaging system failures, and suggests that compared to other causes of failures, SAN outages tend to lead to a considerable amount of down time.
So what are some of the common afflictions in SAN technology? Jon Toigo, principal of Toigo Partners recently enlightened me:
"There are a number of problems with SAN technology that are giving enterprise IT pros problems.One is the poor set of standards upon which they are based. You could fill a whole library with fiber channel standards. They've been developed primarily by players in the industry. And the really interesting thing about their standards is that you can build a switch to the letter of the standard with absolute certainty that the switch will not work with competitor switch, which is also built to the letter of the standard. Now, you figure it out. When is a standard not a standard? When has the storage standards process been hijacked by the vendor community? I once referred to a fiber channel standards group as the Taliban of the storage industry, and I got in trouble for it. Somebody responded that the comment was over the top, but someone else noted that it was also a little unfair to the Taliban.
Secondly, SAN technology is prone to a lot of human factors in the configuration and set-up ... failures that are based on the fact that people may not understand how to create a zone or they may not understand how to properly configure their SAN. There are lots of configurations and complexities that require ongoing fine-tuning and optimization, and create a lot of dependencies. Which, of course, means lots of little things that can break.
The other common criticism of SANs is that they are extremely difficult to manage. If you have a heterogeneous environment -- meaning you've got disk arrays from multiple vendors -- it is almost impossible to manage a SAN. And there have been ongoing efforts, like SMI, at the storage networking industry association, to try to come up with the universal holy grail of management, except that it's all held hostage by the vendors. To a vendor, making management easy means that you also make it easy to deploy his competitor's product in the same storage fabric as his, and he doesn't like that idea. So the storage vendors' participation in management standards has been halting at best. For end users, it's tough to see when a SAN problem is building, and it's really tough to quickly troubleshoot and rectify the problem when it occurs. So the only way around it, to get a little more efficiency and a little more resiliency out of the SAN, is to buy all the pieces from a single vendor. So while the price of compute has dropped (on a per gigabyte basis) at the rate of 50% per year since the mid- 90's ... the price of a storage array made up of a bunch of commodity disks has actually accelerated at a rate of 125% per year."
I completely agree with Jon's points. SAN vendors have sold a "bill of goods." SAN complexity is through the roof ... possibly two orders of magnitude over conventional storage, which makes them nearly impossible to manage. Most failures are due to configuration changes, not hardware or software bugs or breakage. Add that to the extra virtual file system layer needed to manage SAN requests -- which is on the order of a whole other OS with the concomitant problems with races and resources -- and you've got even more trouble.
The big SAN vendors want it complex so they can make the case for how expensive it all is. If you look at the per disk cost for SAN approved disks, they are on the order of $3000 or $4000 each, plus the extra charges for going > 1TB and similar. As an IT director, I don't see any value in paying more just because I went over some arbitrary value like 1TB. It's also galling to pay thousands of dollars for the same damn disk that cost $150 bought raw from the manufacturer.
Posted by Harper Mann on June 19, 2006 07:39 AM
June 05, 2006 | Comments: (0)
Open Source Network Management Tool You Should Care About: NeDi
Documenting the network and keeping tabs on the equipment in your environment is the equivalent of doing the dishes, taking out the trash -- i.e., the everyday household chore minutia that gets pretty tiresome over the long haul. It's even worse when network hardware fries and you don't have your configuration backed up. What's needed is an automated way to keep your network inventory and collect the configurations so recover from problems can be just a few clicks away.
In recent blog entries, I've discussed some compelling open source tools for network monitoring and management -- and without question, one of my favorites is one called NeDi ("Network Discovery Suite"). NeDi was created by Remo Rickli, and pulls SNMP values, using the Cisco Discovery Protocol -- and makes it really easy to visualize the environment and present it on the front-end (with PHP).
I've been working with this tool in customer environments for a number of years, and it's as solid at retrieving and displaying info about what's on the network as any commercial product out there. It's also faster than most. It runs at night and picks up any network node configuration changes and stores them.
About a year ago, Paul Venezia did a nice curtain raiser article that explained what makes NeDi so good. When Venezia wrote this article, the 1.0 version of Nedi hadn't come out, so I'd just add that it has now added new features:
* Improved the calendar usage in the frontend. Date and Time can be selected in Devices- and Nodes-List now.
* Monitoring overhauled
* Devices-Table is grouped by rooms (if available).
* New Vlan Information in Devices-Status
* Changed some icons to be more intuitive
Posted by Harper Mann on June 5, 2006 04:47 PM
May 22, 2006 | Comments: (0)
False Positives in IT Equal Wasted $$$
What do the "type-ahead" features on many mobile devices and the "auto formatting" feature in Microsoft Word have in common? They're both infuriating when you're in a hurry to complete a message or doc, and the thing is urging you in a direction you don't want or need to go. For all the intelligence they build into software, humans still tend to know exactly what they want before the machine does, and sometimes the "AI" stuff can be more annoying than productive.
I think the enterprise IT equivalent to these types of consumer AI snafus is when false positives trigger unwanted alerts or events.
We're all familiar with the email spam filtering problem. If you put the filters on strong enough to keep the spam off, you're also invariably going to block some valid messages.
Then you have the content monitoring systems, where keeping proprietary company info from being divulged electronically is the impetus ... and intelligent agents block certain emails from being transmitted. False positives with these types of systems are extremely disruptive to business productivity. Btw: here's an interesting Computerworld article about content monitoring systems.
The new class of intrusion detection systems, meanwhile, are getting more sophisticated at blocking unauthorized users and putting them into honey pots -- where they get locked out. But as the mousetraps get better, it becomes tougher to enable the people on the "white list" to consistently have the access they need, and the configuration complexities are increasing. This recent Network World article talks about common false positives specific to Wi-Fi intrusion monitoring.
And with the big network and systems monitoring tools, the annoyances typically manifest themselves in the form of "false negative" alerts rolling in for events that are not important -- where the help desk gets pinged with too many irrelevant or insignificant alerts, you have noise that may block out the REAL problems or situations that need attention.
The bottom line across all intelligent agents and alerting systems is that they're only as good as the human touch on the back end that's fine-tuning them. Each require constant input -- alerting the system to new resources in an environment; correcting false-positives or false-negatives as they happen so the system can 'learn;' etc. So while organizations are sold on the autonomic / automated functionality of these systems, each typically require a significant tax in the form of human labor for babying them along and teaching them about the desired result.
Posted by Harper Mann on May 22, 2006 12:15 PM
May 18, 2006 | Comments: (0)
Open Source Giving Customers More Leverage Against Management Vendors
Last week the Financial Times weighed in on the fundamental flaws in software licensing models, and the fact that open source and other innovation is giving customers some leverage to end the abuse.
One related trend that I'm seeing -- for the monitoring and management products -- is that customers are also sick of commercial iron tools that put the burden of complex engineering and integration work on the customer. As many of the big systems software vendors have grown and added to their product lines via acquisitions, they add robust, fancy agents to their product lines that may provide some additional functionality ... but increase the complexity of installation / configuration / ongoing management for the customers, and tend to be used by only a small fraction of customers.
There was a very good article from Nicholas Hoover at Informationweek that explained the related growing pains for CA Unicenter ... and the article gave a great snapshot of the new pressures that customers are placing on their systems monitoring / management tools.
Posted by Harper Mann on May 18, 2006 05:45 PM
May 17, 2006 | Comments: (0)
Tracking Application Performance Getting Trickier ...
At the recent Interop event, I was really amazed by how many exhibitors are hawking application performance solutions. Literally, you can't walk more than 50 feet in any direction on the exhibit floor without seeing some booth personnel dressed like a pit crew member, waving a checkered flag and yelling into a megaphone about application speed, reliability, uptime, etc.
Now what's interesting to me is the way that development trends these days are going to change the game with respect to how application performance is monitored and managed, and how infrequently I heard these new trends being discussed in these vendors' pitches.
Consider a few of today's hot application development trends, and how they are likely to impact that way that app performance is monitored in the future:
AJAX -- Ann Bednarz at Network World had an interesting write-up yesterday where she canvassed a number of folks about the practical considerations for implementing AJAX. One of the points made near the end is that because of the predictive nature of AJAX and how it fetches extra data in anticipation of demand - AJAX apps tend to put extra drag on the network. AJAX is one of the really hot application development approaches today (especially for web applications), so we're likely to hear more increasing discussion about the performance issues associated with deploying rich AJAX applications.MASH-UPS -- With XML and Web services, we're seeing many more corporate developers today that are writing applications that call a bunch of applications on the back-end. The mash-up craze so far has been pretty consumer-focused, but it's starting to creep into enterprise application development discussions as well. So the implication is that as more applications start to communicate with more databases and security systems, the sheer number of subcomponents that one must consider when evaluating application performance goes up considerably.
SOA -- InfoWorld has given a ton of discussion to service-oriented architectures, and while there are a wide range of opinions about how wholeheartedly SOA is being deployed by customers today -- there's no question that monolithic apps are being replaced by services. As Nemertes analyst Andreas Antonopoulos pointed out at Interop, however, with the flexibility of SOAs also come some new management issues. If instead of one 4-tier web app for CRM or ERP you are now running a collection of 100,000+ web services that create a composite application -- how do you manage these with the big, top-down management apps that so many customers use today?
While at Interop, I also enjoyed NetworkPhysics' "It's not the Network" t shirts -- which point to the fact that whenever something goes wrong for the user, the poor network guy is typically the first one that gets blamed when the finger pointing begins. But with the explosion of subcomponents (and connections between these components) in the typical IT environment today, chances are that the fingerpointing is going to continue to get more and more complex as we move ahead.
The classic, network-centric approach to monitoring applications is to leverage sentries to look at packets going by on the network to determine a specific application's latency and performance stats. This approach originated with RMON (remote network monitoring management information base) in RFC 1757 in the IETF, and it requires a good characterization of the architecture of your system, and understanding the relationship between the given application's connections to servers, databases, authentication systems, provisioning systems, and any other touch points that have I/O and could potentially be a bottleneck.
Developers often also run "synthetic transactions," where you drive the GUI through an associated user activity, and monitor the execution of the task and note any problems before the application goes live. My experience has been that roughly between 1/3 and 1/2 of enterprises run synthetic transactions for integration testing before deploying new applications. But typically they aren't run too often once the app is in production, because they can affect network performance (and they're often brittle, if you change one minor variable in your network, you can break the test).
Posted by Harper Mann on May 17, 2006 02:31 PM
May 12, 2006 | Comments: (0)
IT trouble often starts with inconsistency in process
A few years ago, I attended an ITIL (IT Infrastructure Library) training session at a major financial services company with >$3-billion annual IT spend. Each of the line of business IT managers being trained had about 100 people working under them, and each had multimillion-dollar IT budgets.
So at that time, one of the first steps of ITIL training was to ask these folks what the process was for some common scenarios. For example, if they needed a new exchange server for their group -- who did they order it from, who installed it internally, and how long would the whole process take, start-to-finish?
And what we found was that all nine of them had different answers ... there was a complete lack of consistency across the processes.
The point of ITIL is to get your IT organized in such a way that you're as streamlined as a McDonald's burger assembly line. When someone needs something, that's a service that you provide to them. Getting an Exchange server up and running, for example, should be a very well-baked service, with very exact costs known.
ITIL also draws an important distinction between "incidents" and "problems." Time and again, IT groups work to fix incidents rather than problems -- for example, tinkering away at an FTP server (that no one is using internally), when an Exchange server is down (a 'service' that might be worth more than $50k to that type of business).
The ITIL principals suggest that the proper way to run an IT organization is like a service provider, and large enterprises today are increasingly charging individual business units for IT services. These so-called "operational level agreements" are getting more common every day.
Posted by Harper Mann on May 12, 2006 01:08 PM
May 10, 2006 | Comments: (0)
Systems and Network Dashboards Need to Get with the 'Blink' Principle
In Malcolm Gladwell's most recent book -- "Blink: The Power of Thinking Without Thinking" -- he explains the power of the first moments of human perception, and how often more information is gleaned in those initial moments than in the subsequent thought and analysis that follows.
Coincidentally, I picked up "Blink" at the same time that I also happened to be reading "Information Dashboard Design -- The Effective Visual Communication of Data." This book -- authored by Stephen Few -- highlights a number of ways that dashboards fall short in their implicit goal of presenting a lot of information to the user in a way that's intuitive and digestible. For example, there may be inadequate context on the data ... or there may be excessive detail ... or there may be useless decoration on a display that's simply distracting. Few, is an admirer of Tufte, who developed many interesting graphics and web design principles that remain popular today.
I think there are principles in both books that are often lost on the folks that design systems and network monitoring / management dashboards. The main point of a dashboard shouldn't be to show CPU utilization on disk #12 in the server room, or other minutia on low-level hardware information. The business guys don't care. What they care about is -- is this network up or down? Is this application performing fast or slow?
And particularly with the emergence of commodity hardware sprawl, where we have 8,000 Linux boxes in a datacenter -- how much do you care about one going down, when you can easily just plug in a new one and replace it?
So the point is that while the low level information is important to the IT guys in the trenches, IT dashboards have a long way to go in terms of putting the information up-front, and providing the type of first-glance indicators that make these tools useful to the business audience. With most dashboards, the tools themselves continue to relegate us to chasing rabbits down a hole, thinking about bits and bytes, while the critical information is lost in quantity. Dashboards should be about the bigger picture surfaced with enough context to guide what needs to be done now.
Here are a few additional URLs to dashboard design principles / discussions from Stephen Few that I stumbled upon that you might find useful for further reading, if the topic interests you:
http://www.perceptualedge.com/about.htm
http://www.b-eye-network.com/blogs/few/
http://www.perceptualedge.com/examples.htm#
Posted by Harper Mann on May 10, 2006 04:58 PM
May 02, 2006 | Comments: (0)
Cacti Brings Great Reporting Functionality to Open Source Monitoring
A few weeks ago, I pointed out RRDTool, a powerful open source technology for storing and graphing network monitoring data.
Collecting data is all well and good, but you also have to have good reporting -- which is why there's so much excitement today around another great open source tool called Cacti.
According to the project founders: "Cacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices."
Rumors on the Cacti project forum have it that in the docket are some additional plug-ins that will enable automated email notifications about network performance, as well as new "thresholding" capapbilities that trigger alarms and notifications. The other exciting thing about Cacti is its ability to scale to big environments. Because it batches its operations, and because the polling happens locally -- you get high performance monitoring / reporting for distributed environments.
One of the criticisms of open source network monitoring solutions to date (compared to their proprietary counterparts, such as Openview, CA Unicenter, etc.) -- has been that the reporting is not up to snuff. With Cacti, the reporting functionality in open source-based network monitoring has taken a giant leap forward.
Posted by Harper Mann on May 2, 2006 04:38 PM
May 01, 2006 | Comments: (0)
Network Weathermap -- A Free Monitoring Tool You Should be Using
I'm down here all week at the Interop event, where conference sessions are already underway, and the exhibition floor opens up tomorrow.
One of the really cool new open source tools that's on display this year is called Network Weathermap.
According to the Network Weatherap web site: "Network Weathermap is a perl tool that displays in a visual way the utilization of the network links of your network. The required data are acquired from graphs created by the MRTG package and are displayed as two ways colored arrows on a map representing the logical topology of the network. The resulted image is presented in a web page using extra DHTML and JavaScript code for web-over pop-ups, based on the OverLib JavaScript library."
At Interop, I installed Network Weathermap to monitor the InteropNet -- the event network that delivers wireless, VoIP, and high speed internet access throughout the huge exhibition center here at Mandalay Bay.
If you go to this link, you can check it out. I'll apologize in advance -- there are so many layers of security on top of the InteropNet that this particular link will be down periodically throughout the week (it works perfectly here on the local subnet).
What you're looking at is a core topology for the network that shows the connection to the Internet and the firewalls and the main core routers inside. At the bottom of the page is a clickable link that shows the nodes in the NOC that are doing the monitoring, as well as some performance stats on them. There's also a subnet map that you get when you click on show floor and the PEDs -- and you can get detailed analysis of the interface that are on all of the routers and equipment.
If you were to set up this sort of visio map of a big enterprise network with an Openview or similar, it would cost you a ton of money. I think one of the themes that's really going to surface at this year's event is that there are some VERY compelling open source monitoring tools out there that can be easily plugged in, and that can perform under the rigors of just about any environment.
I'll be reporting back on some other cool open source monitoring tools throughout the week, here from the show.
Posted by Harper Mann on May 1, 2006 01:56 PM
April 25, 2006 | Comments: (0)
Open Source Tools: "data" versus "information"
In a recent Businessweek article, an IBM exec pointed out that "Today, the amount of information we produce increases by about 800 megabytes per year for every man, woman, and child on the planet." The article goes on to point out enterprises' ongoing need to translate "unstructured data" into truly useful "information."
This need to translate raw data into actual information is becoming a pretty critical issue for open source network and systems management projects today too.
Say you're looking for "information" on an open source network monitoring tool that would be good for your environment. The logical first step would be to go to SourceForge. So go there and search for "network monitor."
You get a ton of results, but which of these tools is the best for your environment? Good luck figuring that out, based on the search results. The most popular and proven open source network monitoring tool -- Nagios -- doesn't even appear in the first screen of the retrieved search results. In the open source discovery process, it's easy to find a lot of data, but not that easy to find the exact information that you need.
The old complaint around open source was that the documentation was lousy. People were quick to complain that open source is an engineering phenomena -- that the documentation was pointed towards open source hobbyists, not enterprise end users. For the end user that was used to unpacking a Dell box and finding crystal clear quick-start guides and troubleshooting menus, open source documentation was a little frightening.
The new news around open source documentation is that with the popularization of Wikis and other group sharing tools, the documentation has become dynamic. Documents even weeks old can be out of date given the speed of open source projects. Documentation is often written where user comments greatly enhance its usability. Project participants are actively submitting end user-oriented feedback, new entries and fixes to existing entries, which clarifies the documents and make them user-friendly. Wikis also play nicely with documentation processes such as indexing, which results in more standardized, intuitive information.
Open source has a new documentation problem today: its own popularity. The enterprise IT pro needs to have a better method to understand what's relevant for their unique requirements, and to let the rest of the data / noise fall away.
Posted by Harper Mann on April 25, 2006 03:16 PM
April 18, 2006 | Comments: (0)
Open source network monitoring tools you should care about: MRTG and RRDtool
In today's "mash-up" application development craze, the innovation is being driven largely by the fact that APIs are now more open and accessible, and presentation layer technologies such as AJAX are creating very compelling new ways to visualize data streams from multiple services. Developers can get to services more easily, and with XML they can deliver the data in much more dynamic ways via web applications.
Similar trends are driving software development innovation today in network monitoring. Tools such as MRTG (Multi-Router Traffic Grapher) and RRD (Round Robin Database) make it possible to more easily collect data from a greater number of devices on the network, and convert the data into XML for easy consumption on the front end.
According to Alex van den Bogaerdt, who wrote a very good tutorial on RRD -- "MRTG started as a tiny little script for graphing the use of a university's connection to the Internet. MRTG was later used as a tool for graphing other data sources including temperature, speed, voltage, number of printouts and the like."
Free under the GNU GPL, network professionals started using MRTG to poll network devices, retrieve MIB (Management Information Base) and SNMP (Simple Network Management Protocol) values, and use Perl scripts to post the results on graphs on web pages. Because MRTG is so good at polling devices and producing graphs, it quickly became widely used not only by the open source folks cobbling their own solutions together, but also by very large proprietary vendors such as HP who, according to this site, borrow from some of MRTG's capabilities for OpenView.
In network monitoring, the whole goal is to be able to monitor the right things. You don't want too much data, or you'll be flooded. Too little, and you miss important info. So there's this very fine line you have to walk to "right size" your monitoring system.
With MRTG and RRDTool (creator Tobias Oetiker's next generation MRTG, which extends the scalability and functionality), polling devices and getting just the information you want has become much, much easier. And because they use open C APIs and the info is dumped into XML format, these open source tools are interoperable with just about everything.
Posted by Harper Mann on April 18, 2006 12:33 PM
December 15, 2005 | Comments: (0)
Catching Sasser and Zotob at InteropNET with Network Physics
Dwight Barker, Director Product Management from Network Physics shared with me today his story of looking for virus attacks across InteropNET. Dwight is part of the volunteer NOC team and has been busy the past few days tracking all the TCP flows from begining to end across the network. Using the Network Physics MP2000 appliance running the Netsensory OS, he's been able to time stamp a bunch of key metrics along the way and track failed connections and potential viruses.
Many of the machines running on the InteropNET, including servers on InteropNET and the laptops exhibitors and attendees bring to the show are typically behind firewalls and virus scanning systems when on their home networks. When these machines connect to the network at Interop, they're now on the public Internet without any protection and are vulnerable to attack.
Worms like Zotob, blaster and Sasser, exploit well known wide area file services on ports 135, 139 and 445 ports in Windows servers and clients. Once a worm gets on a system, it really only knows its own IP address. So worms start their attacks by blasting out attempted connections to any and all subclass addresses on the network.
In Dwight's case today he noticed a suspect with address 130.128.80.60 attempting and failing connections on many 130.128.x addresses. He was able to use one of the built in templates, or what the company calls "Insights" to look for and chart extreme spikes in attempted connections.
Each of the show booths at Interop are assigned a business group in the Network Physics system by IP address, so the spikes can be viewed by exhibitor. Dwight can aggregate the failed connections over time and catch the spikes that indicate denial of service attach or vulnerability attack. In one case a single IP was flooding the network port 135. The offending IP address turned out to be a laptop in a vendor booth infected with a Sasser worm. The InteropNET team tracked down the booth by IP address and to the exhibitor's surprise should up to find the specific laptop and clean it.
During the course of two days, the InteropNET team caught five infected machines and walked the aisles to clean machines.
Posted by Michael Baum on December 15, 2005 07:03 PM
December 14, 2005 | Comments: (0)
The IT Troubleshooter @ Interop NYC
This week the IT Troubleshooter is hanging out at the InteropNET network operations center (NOC) in New York. You can check out the NOC through one of the four live webcams. InteropNET is the the world's largest temporary network, built as a living lab of best-of-breed technologies. Fourteen vendors including AGN, APC, Avaya, Citrix, Computer Associates, Cyclades, Extreme Networks, Fluke Networks, Infoblox, Network General, Gigamon Systems, Juniper, Splunk and Quest have commited gear, software and people to run internet access for the Interop trade show.
The NOC consists of twenty crack system administrators and a handful of help desk support staff working together to set-up, manage and troubleshoot the InteropNET systems. A total of 60 people make up the complete InteropNET team.
The network installation starts on-site at 6:00 am Friday and is delivered by Monday in time for the beginning of the show. Troubleshooting in this type of just-in-time environment can be pretty arduous given the timeframes and the range of technologies to integrate. The network uses 6000 feet of fiber and 20,000 feet of CAT 5 cable. The 802.11a/b/g wireless network has to cover about 200,000 square feet. Internet access to the outside world is a 45Mb DS3 circuit. To read more about the InteropNET mission and the journey towards it's exisitence read an interview with Glenn Evans, Interop's Lead Network Engineer.
These savvy system administrators have some great stories about cutting edge troubleshooting problems and techniques. Over the next few days I'll be bringing you some of their tales. Stay tuned.
Posted by Michael Baum on December 14, 2005 09:27 AM
TOP STORIES
Hyperconnected users growingSteve Jobs to keynote WWDC
CSC settles kickbacks case
MS previews SMB software
What does HP-EDS really mean?
Mac Office 2008 SP1 released
HP buys EDS for $13.9 billion
Corporate IT spending slows
MS targets smartphone market
Sun to clarify JavaFX plan
ADDITIONAL RESOURCES

- Virtualization: A Step by Step Approach to Success
- Dialing up Agility with Business Transformation
- 5 Things You Need to Know About Storage Virtualization

- Is your smaller organization ready for High Availability?
- Is system maintenance doing more harm than good?
- Virtual Test Lab Automation: Manage development infrastructure


