Free Newsletters

   All InfoWorld Newsletters
Grid Meter » January 2006

January 30, 2006 | Comments: (0)

Grid's about data management, not just CPU horsepower

Although the tide is slowly turning -- perhaps in large part due to the mass of press regarding the EMC / Acxiom Deal -- there is still a "Grid means raw compute power" feel to a good percentage of the marketing put forth by Grid vendors. I keep hearing the prevalent equation that Grid equals better utilization for high performance computing, which is hardly a problem these days with current commodity computing hardware.

It's not that the enterprise IT guys are stupid and just 'don't get it.' It's that Grid has been marketed to answer problems that aren't even remotely near the top of enterprise IT pros fix list. According to Steve Tuecke, CEO of Univa:

tuecke.jpg "For many large organizations, their environment includes multiple data centers, each a distinct IT island. While each datacenter may be well managed as far as compute power goes -- and there's no need for better utilization (they're happy to just buy more commodity boxes) -- they're faced with this 'Wild West' in terms of managing the data between those different islands.

There are common scenarios where enterprises have a distributed organization with large data sharing and distribution needs -- anywhere from replicating data between data centers and clusters, to better flow management, to improving collaboration among a distributed team, and better analysis. It's the tying together of these multiple IT islands, without ripping and replacing existing infrastructure, that has these enterprises interested in Grid.

If a company has a group of users in the U.S. with one large set of data, and another group in Japan with other large sets of data -- it sometimes may not be practical to move the data, but it may be possible to instead run jobs against that data remotely. These are the scenarios where Grid has thrived in the past, in research and science -- where you're commonly faced with this decision whether to move the compute or move the data. It's this need to tie together distinct IT islands for computation, as well as the data, with security that overlays on existing, often hairy security environments (without rip and replace)."


Over the next couple of weeks, I'll be taking a close look at enterprise data management issues, and including perspectives form other Grid pioneers.

Posted by Greg Nawrocki on January 30, 2006 09:55 AM


January 25, 2006 | Comments: (0)

LAMP may be just what Grid is looking for

In Monday's blog I mentioned William Fellow's observation that, "If grids can find a place in one of the open source stacks, such as LAMP, it would undoubtedly help further adoption." Again, William hits the nail smack on the head.

But what does that mean? I must confess this thought has occupied my brain for the last few days. As is generally the case with grid this is going to mean many different things, but the one thing I keep coming back to is scalability.

In an interesting article this past summer a common theme through it is that LAMP is really great for a proof of concept and small scale enterprise systems, but needs quite a bit of "hands on" when applied to large enterprise systems.

In the article is a good quote by Patrick Lohr of iStockPhoto: "PHP saves me money when I'm a little business but ultimately it costs me more when I'm a big business, he complains, adding that whereas enterprise-class systems often force you into best practice, LAMP products let you hack together quick systems. We've proven that we're successful - now, how do we scale this thing?"

But scalability of what? Well, resources of course. LAMP stands for Linux/Apache/mySQL/PHP (although the "P" is often debated it is not all that relevant in this example). The two common taxonomies that we often use in discussions of Grid computing are data grids and compute grids. So in the case of LAMP, resources are compute based in the Apache application and data based in the mySQL application. Grid experts caution us not to think of data and compute Grids separately, but as one animal. LAMP may indeed be just what Grid was looking for.

Posted by Greg Nawrocki on January 25, 2006 09:01 AM


January 23, 2006 | Comments: (0)

451's William Fellows: Grids to Disappear Into Stack

The 451 Group's William Fellows is always good for a dose of reality around the evolution (or lack thereof, in certain areas) of enterprise Grid. The new issue of GRIDtoday includes a lengthy Q&A with Fellows - who calls out a number of specific challenges that remain for enterprise Grid computing adoption.

Fellows points out "the standards that do exist today are not relevant and there is no evidence that standards are being used in implementations or that any product is being built using them. If grids can find a place in one of the open source stacks, such as LAMP, it would undoubtedly help further adoption."

Last week, Peter Yared from ActiveGrid talked to Open Resource about the concept of a "lightweight architecture" as a big looming shift for enteprise application development. Yared defined a lightweight architecture as "running straightforward, usually open source, software stacks with service oriented API's on large clusters of commodity machines." This has been the preferred architecture for big Internet players like Google, Amazon and Yahoo - Yared says - and mainstream enterprise is starting to lean towards the model as well.

Grid computing is a complex animal, as are operating systems, web servers, scripting languages and databases. Yet still it is rather trivial to get the latter four, that comprise the LAMP stack, up and running for those with even a small amount of sysadmin knowledge. This is not the case with Grid. In many OS distributions one can simply pop a GUI, check a box, and presto, LAMP or major components thereof are up and running.

Like the electrical grid, from which Grid computing took it's namesake, success will be achieved when no one realizes that it is there, but causes quite the stir when there are interruptions in service.

Posted by Greg Nawrocki on January 23, 2006 08:38 AM


January 17, 2006 | Comments: (0)

Data management on the Grid, and the Grid FTP standard

In the Grid community, there's a popular expression that "access to the data is as important as access to the compute resources."

And no Globus Toolkit subcomponent is more central to Grid data access issues than the Globus implementation of GridFTP (File Transfer Protocol). Globus' GridFTP is a high-performance data transfer protocol and software suite optimized for the gamut of data access issues -- from bulk file transfer, to the nitty gritty details of getting the data out of complex storage systems within virtual organizations on the Grid, and pretty much every data requirement in between.

The FTP protocol actually originated with the ARPAnet community all the way back in 1971 (here's a link to a good historical synopsis). FTP has seen many new specification twists and turns through the years.

In 1973, the Internet Engineering Task Force received a number of initial 'requests for comment' (RFCs) for FTP specs. The version that perhaps signaled the maturity of the protocol arrived in 1985, when Jon Postel and Joyce Reynolds (of ISI) authored RFC 959. RFC 959 included extensions to FTP to further "1) promote sharing of files (computer programs and/or data), 2) to encourage indirect or implicit (via programs) use of remote computers, 3) shield a user from variations in file storage systems among hosts, and 4) to transfer data reliably and efficiently."

FTP became a pervasive protocol with the arrival of the commercial Internet. But as Grid computing usage accelerated in e-Science in the late 90's, new challenges arose for Grid users who needed to access different storage systems between virtual organizations. Storage systems had become increasingly customized to serve specific user needs -- and the FTP protocol in its existing form was unable to reconcile this explosion of incompatible disparate systems for accessing data.

So in 2001, the GGF and Globus Alliance authored the GridFTP protocol, which better navigates different types of storage systems, has a number of compelling new parallel and striped data transfer capabilities, and includes various new instrumentation and TCP buffer features.

Today, by default, the Globus implementation of GridFTP will work on any storage device that has a POSIX file system for the storage, and TCP/IP for the network.

"It doesn't matter whether you're running RAID or not, EXT3 versus XFS, PVFS or GPFS," said Bill Allcock, technology coordinator at Argonne National Laboratory, and one of the authors of GridFTP, both the protocol (developed in the GGF) and the Globus implementation. "We work fine on all of those. The one caveat is that certain configuration parameters can have a much larger impact on some of those than the others. For instance, GPFS wants big reads -- they want large sequential reads. Whereas PVFS wants you to match whatever the stride size is. But Globus GridFTP will work on all of them, just out of the box, regardless of system type."

Today, Globus GridFTP has pervasive use in the e-Science Grid community. The high energy physics community in particular has been a huge user from the start. A notable recent use was by the Relativistic Heavy Ion Collider (RHIC) community in Brookhaven - who used Globus GridFTP to sustain 600 megabytes per second of data transfer (from Long Island, New York, to Japan) over 11 days.

For the British Broadcasting Corporation (BBC), their frequent large file demands (the typical broadcast hour today requires 280 GB for all pre-processed media streams), are met by GridFTP. Here's a link to some compelling work they're doing with the Belfast e-Science Centre for that effort.

Posted by Greg Nawrocki on January 17, 2006 10:01 PM


January 17, 2006 | Comments: (0)

More extraterrestrial Grid efforts underway

With the Stardust sample return capsule safely on the ground, NASA and researchers at the University of California at Berkeley are launching a new SETI-like Grid for analyzing the flotsam and jetsam retrieved from the farthest reaches of our solar system and beyond.

According to the release:

"Westphal and his colleagues at the Space Sciences Laboratory have created a 'virtual microscope' that will allow anyone with an Internet connection to scan some of the 1.5 million pictures of the aerogel for tracks left by speeding dust." So this is more a human ocular instrumentation Grid than the traditional computational and data Grids most frequently referenced.

To ensure that the volunteer scanners know what they're doing, each must pass a test where he or she is asked to find the track in a few test samples." And like a traditional Grid there appear to be methods of resource verification and monitoring and discovery of resources. Albeit a bit more manual a process than most Grid middleware would tout!

Like SETI@home, which is the world's largest computer, we hope Stardust@home will also be a large computer, though more of a neural network, using brains together to find these grains," said Bryan Mendez of the Center for Science Education at the Space Sciences Laboratory."

When it comes to the Grid, we're all just resources ...

Posted by Greg Nawrocki on January 17, 2006 07:17 AM


January 13, 2006 | Comments: (0)

What's Microsoft got up its sleeves for Grid Computing in '06?

Microsoft has, for quite sometime, been a great unknown in Grid computing. An early supporter of the Globus Toolkit, they have since been rather quiet in the areas of technical and high performance computing, until late last year.

The Bill Gates keynote at Supercomputing 2005 in Seattle was basically a "here we are" type of announcement. Gates primarily focused on his vision for compute Grids (to be expected, since it was at the Supercomputing event), but he also highlighted a vision for scientific workflow and was quick to point out that Microsoft was a driving force behind XML and Web Services standards.

The Financial Services market was one of the first to embrace Grid computing as a useable tool for analysis. They remain one of the primary drivers. The fact is the reference standard application and data user interface for this market is still the spreadsheet, which, can be loosely translated to Microsoft Excel. While the practicality of using the interpreted math engine of a spreadsheet to perform mass calculations is suspect, I doubt were going to get the average Financial analyst to start parallel programming in C. A Grid driven version of Microsoft Excel would indeed be a killer app.

There have long since been many smaller players actively providing Grid and Grid-like products for Windows operating systems. Digipede for example, has recently received accolades for its Digipede Network Grid computing solutions for the Microsoft .NET platform. It is not hard for one to imagine Microsoft leveraging these companies in any way its massive bankroll desires. We'll possibly start to see some of this MS Grid ecosystem unfold in the first half of '06, when Windows Compute Cluster Server 2003 ships (MS already has already announced a partnership with Platform around the release.

Tony Hey and Fabrizio Gagliardi both joined Microsoft last year. Tony is an internationally recognized leader in parallel computing and has been a key driver in Web Services standards, especially as they relate to Grid computing. Fabrizio is taking leave from CERN where he served as Project Director for EGEE, a pan European Grid effort for e-science. No shortage of talent at Microsoft -- but they are excellent additions for extra Grid credibility and expertise.

Does Microsoft "get" Grid computing beyond compute grids and cycle scavenging? Will Microsoft support and embrace the standards that have been so important to the evolution of Grid computing thusfar, or will they simply redefine Grid as they see fit?

Many questions still remain, and admittedly the above are a fairly disjoint collection of observations with no real central theme. So how does this warrant a position on a 2006 watch list? Five hundred pound gorilla jokes aside, Microsoft obviously has something brewing in the area of Grid computing, and they have a history of proving they can't be ignored.

Posted by Greg Nawrocki on January 13, 2006 05:19 AM


January 12, 2006 | Comments: (0)

#5 of "6 Grid vendors to watch in '06" - Cisco Systems

Distributed resources depend on a network. And not just any network, but one that can bridge architectures and protocols seamlessly. The bridge on the Cisco logo is not coincidental; the company's first products were indeed a network bridge between disparate systems. A service their hardware provides to this day. (for trivia buffs: since Cisco is HQ'd in San Jose, most people don't connect the dots that 'Cisco' is short for San Francisco - and their logo is based on the Golden Gate bridge)

One could also argue that the general purpose protocol of TCP-IP wouldn't be what it is today without the physical implementations of it manifested in Cisco hardware.

Useful qualities of experience - which is vital to Grid - implies a level of reliability and simplicity. While Cisco does indeed make hardware that is quite complex and requires a great deal of know-how to configure and operate -- my experience has been (and I've worked with some obscure Cisco hardware) that it always seems to work as one would expect it to. And with Cisco's purchase of Linksys and subsequent guidance of that product line, Cisco hardware is nearly as household an item as a toaster.

As previously noted in this blog, Bob Aiken -- Director of Academic Research and Technology Initiatives at Cisco -- noted that we're seeing a "blurring of the boundaries between operating systems, networks and middleware." And about a month ago, Network World's management beat journalists Denise Dubie and Phil Hochmuth broke the news that Cisco is announcing new 'application aware' management products, which would allow customers to "monitor and measure application performance on a network."

Cisco's truly does put the "intelligence in the network", and the convergence of virtualization, loosely-coupled services, systems management and dynamic provisioning capabilities is changing the role of the network from the mere transport of IP packets to central nervous system for the IT infrastructure. With Cisco's "Application-Oriented Networking" product line and an active "Server Networking and Virtualization" group I predict we will see "Grid" and "Cisco" mentioned in the same sentence with greater frequency in 2006.

Posted by Greg Nawrocki on January 12, 2006 08:02 AM


January 11, 2006 | Comments: (0)

Metering NetApp's Grid efforts in '06

2005 was a big year for the emerging storage virtualization market. As enterprise continued to embrace Xen's hypervisors and VMware's virtual machines for server virtualization / partitioning -- the industry also got a big taste of the future of storage virtualization, with new product releases by EMC (Invista) and NetApp (V-Series).

Clearly, these two are going to be clawing at each others' throats as they vie for storage virtualization market share in 2006.

I'm really intrigued to watch the evolution of NetApp's Data OnTap GX -- the operating system that's driving their virtualization product evolution. Ok, I may have fallen for their marketing hype, hook, line and sinker -- but the fact that these guys are drilling down to the OS level tells me that these are venturing into uncharted waters with respect to the complexities of what providing virtualized data really means. It's more than an application and more than middleware, it's a new way of thinking about process control, device management, file management and networking, all those elements that we learn about in the first few chapters of any book on operating system concepts.

Another key concept is information management, how files and other system resources are accessed and controlled. NetApp has mentioned that another key component of the slimmed down first release of Data OnTap GX is its unified namespace capability. Clearly they see that multiple ways to access the same data and aggregations of data presented in a unified manner are the key to virtualization. It's going to be interesting to see how this is done and if they are conforming to standards where the rubber meets the road.

As I said a couple of months ago, I will be interested to meter what aspects of this operating system NetApp decides to expose, both from a functional aspect and in terms of open source. If the OnTap GX operating system is effective and becomes a commonly accepted method of data virtualization, it could prove revolutionary to enterprise Grid adoption.

Posted by Greg Nawrocki on January 11, 2006 07:46 AM


January 10, 2006 | Comments: (0)

Univa - hot Enterprise Grid vendor to watch in '06

Univa is a rather obvious addition to the 2006 watch list.

With the recent beta announcement of Univa Globus Enterprise (PDF), the company is officially opening the doors for business and is the first Grid start-up assuming the services / support / distribution model role for the open source Globus Toolkit (similar to Red Hat's role supporting Linux).

Not only will Univa satisfy the proverbial "throat to choke" requirement for enterprise Grid end users -- they are developing their own GT features and extensions that will make the Globus Toolkit truly "enterprise ready."

Even so, Univa understands that Grids are never as simple as popping in a disk and letting it go. To that effect they have built a world-class professional services organization that will be instrumental in the uptake of their software and the proliferation of Grids in general. As an additional benefit, professional services are also a key feedback mechanism to product development. In a nascent market loosely defined as "Grid computing," where there is still debate about exactly what "Grid computing" means, such feedback into product development is absolutely essential. For this reason, there are a ton of enterprise Grid users, and more importantly, potential users that are watching Univa with great interest.

There is also the fact that the founders of Univa are the definitive pioneers / inventors of the Globus Toolkit. Steve Tuecke, Ian Foster and Carl Kesselman were also key players in laying the groundwork for the standards on which most Grids are built -- and bar none, no one "gets" Grids (and enterprise requirements thereof) to the extent that these guys do.

So what does a value add layer, and providing enterprise class support and professional services to the open source Globus Toolkit really mean?

The Survey conducted at GlobusWORLD 2005 asked users, "What are your greatest concerns about implementing the Globus Toolkit?" One of the major concerns that arose was that there was no commercial report for the Globus Toolkit. Now there is.

There is little question as to how pervasive the Globus Toolkit is in the research and academic communities. It is often referred to as the de facto standard for Grid computing middleware. Now that commercial support no longer an issue, I believe that it is only a matter of time before we see a similar pervasiveness in enterprise. The analogy has been made before, but the similarities to the Red Hat - Linux story are too coincidental to be dismissed.

As for competition, with the Univa development of UGE as a catalyst for enterprise uptake of the Globus Toolkit and strong ties with Grid computing heavies like IBM - it's not going to be easy to overtake these guys as the leading services / support / distribution provider for enterprise Grids.

Posted by Greg Nawrocki on January 10, 2006 07:02 AM


January 06, 2006 | Comments: (0)

Eyes on EMC's Grid Computing Efforts in '06

Yesterday's acquisition of Acxiom's Grid software by EMC was perhaps a harbinger of other big EMC Grid announcements to come in '06.

Acxiom's home-grown Grid software has been the envy of other data processing vendors, and the Grid community has a high opinion of the talents of Acxiom's Terry Talley (chief architect) and Alex Dietz (CIO). While they've secured remarkable results with their Grid over the last few years, they've been relatively quiet about the intricacies of how it works, because of the competitive advantage it brings to the organization. But the performance has been impressive.

"Historically, when we ran our software on conventional platforms, we'd jump through hoops to get a 5% gain in performance on a particular application," said Alex Dietz, CIO at Acxiom, in an interview with Ian Foster last year. "With [our] grid, we go 10 times faster, and we could go 100 times faster, if we decided to. The incremental scalability of grid blows your mind."
But the Acxiom partnership is just one of many reasons why the Grid community is really interested in EMC's Grid efforts in 2006.

For starters, as Grid technology evolves, the industry is moving away from mere compute Grids, and towards data Grids -- where data virtualization and storage challenges could escalate EMC to one of the key outspoken players for the industry.

Pushing on Grid directions is also a way for EMC to stay competitive with IBM. The rivalry between the two was well-documented in '05. Network-Attached Storage was one of the key areas in which these two vendors battled, and for the Grid community, the issue of coordinated data sharing (including navigating firewalls and pulling data out of storage devices at the edge) is another frontier area for the industry that will see a lot of activity in 2006.

EMC's had some interesting personnel additions over the last couple of years that make it clear they're serious about Grid. Ian Baird (from Platform) is the company's CTO for Grid and Utility Computing Solutions. Jeff Nick (former on demand heavy hitter from IBM) is the company's CTO. EMC isn't a company that's just fashionably turning up the dial on Grid for PR purposes -- they're in it for the long haul.

And another critical reason why EMC will be fascinating to watch in Grid discussions in '06 is their ownership of VMWare.

There's currently a considerable work going on in the open source Grid community (i.e., within the Globus Alliance) to facilitate virtual machines in Grid computing environments.

The first step of the virtualization evolution was on a single box -- on an SMP, for example. Today, a lot of enterprise is using virtualization to manage a cluster of virtual machines. The next stage is figuring out all of the workload, data distribution and security issues that will allow virtualization to become participant across enterprise Grids. This is more challenging than when they're just homogenized away inside a single cluster with a single file system and a single interconnect.

One of the areas where virtual machines are known to be weak is in handling a lot of I/O (disk or network I/O). Performance issues arise for VMs in I/O-heavy scenarios, and they just don't tend to behave as consistently as they do on bare hardware. But VM's in a Grid will have a very specific requirement for highly optimized I/O -- so this should be an interesting evolution area for virtual machines in '06 and beyond, and EMC is in a great position to lead the industry with VMWare.

Posted by Greg Nawrocki on January 6, 2006 07:51 AM


January 04, 2006 | Comments: (0)

Platform - Interesting Grid Vendor to Watch in '06

Back in 1993 -- in an interview with Computerworld ("Sharing Software Ekes More Life Out of Older Unix Boxes") -- Platform CEO Songnian Zhou introduced his company's Load Sharing Facility (LSF) breakthrough as "a network operating system that makes it easier to share compute resources across a heterogeneous Unix network." Today, Platform's LSF has become the most widely used job scheduler in enterprise Grid production (Linux and Unix) environments.

Platform wrapped up '05 with the release of the Enterprise Grid Orchestrator -- a new platform for the integration and management of heterogeneous resources in a Grid environment. 2006 should be a very interesting year for the Toronto-based company, which partnered with Microsoft for LSF to offer some key functionality in the Windows Compute Cluster Server release next year.

As a company, Platform has received zero dollars in venture funding through the years -- every dollar has been derived from customer sales, according to Zhou, who sees a number of factors today that indicate a quickly maturing Grid industry.

"The killer app for grid is not the components of applications -- ERP, CRM -- but the connection of all of these components of enterprise applications to form business processes," says Zhou. "Industry analysts are consistently saying that in order to support SOA as the architecture for applications, you need a 'service oriented infrastructure.' So how do you provide this service-oriented infrastructure to deliver resources where and when needed, based on cost and demand rather than just fixed assets? The Grid provides a very solid set of core technologies for SOI."

Platform is looking beyond the traditional Grid application set. However, instead of simply looking down another alley they are thinking in terms of application aggregation, the commonalities that exist between different application sets and where this all comes together. This type of thinking is going to be critical in the coming year.

Posted by Greg Nawrocki on January 4, 2006 08:20 PM


January 04, 2006 | Comments: (0)

6 vendors to watch in enterprise Grid in '06

John Kenneth Galbraith said "Economics is extremely useful as a form of employment for economists." I've heard similar comments from IT pros regarding Grid computing.

It's clear that the mainstream enterprise IT community is not going to take Grid seriously until there are end user deployments and demonstrable results on a much grander scale than what's been demonstrated so far.

It may be a bit cliche, but I truly believe that 2006 is going to be a make-or-break year for Grid Computing -- and it's my prediction that it will be the former rather than the latter. We are finally starting to apply Grid beyond the same old HPC cycle scavenging duties that it was originally conceived for. The importance of data grids and the significance of virtualized environments for both data and execution space are more frequently the topic of Grid discussion. These are the Grid directions that are going to be of most significance to the enterprise community.

I'm also encouraged by the interesting Grid technologies being introduced by the vendor community. So I present my top 6 companies to watch, in no particular order, in enterprise Grid in the coming year:

Platform Computing
Univa
Network Appliance
EMC
Cisco
Microsoft

In the spirit of the New Year cliff hanger, over the next six days I'll dive into each of these companies and examine why their Grid efforts will be interesting to watch in 2006.

Posted by Greg Nawrocki on January 4, 2006 08:24 AM


Technology White Papers

 

InfoWorld Technology Marketplace

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
» BUY A LINK NOW

Sponsored Technology Links