No sane person would ever think of organizing a warehouse based only on the external attributes of each box and ignoring their content. Still, that's the challenge that many storage administrators and IT managers face when trying to prepare their storage repositories for compliance and security, not to mention daily operations.
Data classification is an all too obvious remedy for information storage ills, but manually inspecting and categorizing each archive is a physical impossibility for companies with terabytes of data and millions of files. Moreover, there is no easy system to store the additional metadata gathered from a manual inspection of each file.
Enter EMC Infoscape, a new application that offers the tools to automatically inspect files from just about any storage system and to categorize them according to flexibly defined rules. Using those rules, customers can automatically move files across different tiers of Celerra storage or analyze their impact via a rich set of pre-built reports.
I got a look at Infoscape last week in EMC's Houston lab. The most striking aspect of Infoscape is its comprehensive classification structure that includes, in addition to content categories, service levels (logical containers that group files with similar handling requirements), domains (such as sales or marketing or engineering), and lifecycles (milestones in the life of a file).
But the foundation of the solution is the ability to create rules that can be applied automatically to classifying data. For example, Infoscape rules could automatically flag any file that contains social security numbers or credit card numbers as confidential, and a candidate for encryption. Infoscape doesn't automatically encrypt data, but it's easy to imagine other applications (from EMC or a third party) using its metadata to trigger encryption.
Similarly, Infoscape allows admins to assign files to a certain service level. For example, flagging files owned by a specific team or person as "mission critical" could result in a snapshot being taken at least every three hours. Here again, the snapshot itself is handled outside of Infoscape.
Lifecycle rules would allow you to apply storage policies based on how recently a file was accessed or modified. For example, you could set a rule that flags as "tier 2" files that haven't been accessed in the last three months.
Obviously different departments within your company might want different rules, so Infoscape allows you to define organizational domains and create separate rulesets for each.
How do you bring together all these different rules under a common umbrella? In Infoscape lingo you create a taxonomy, which serves as the wrapping paper for all of the above. In essence, a taxonomy is where you define a category and assign a service level and a lifecycle to it.
Infoscape can scan files (and index them if instructed to do so) from just about any place they may reside, but support is currently limited to Microsoft environments. This first release is also limited to files, which leaves out e-mail and structured data sources such as Oracle databases. Moreover, moving files automatically to different storage tiers is possible only across EMC Celerra devices.
Even with those exclusions, deploying Infoscape will be a complex endeavor, tantamount to a complete re-assessment of your data structure. It will also require re-discovering many of the business rules that are hidden in your applications.
It's no mere coincidence that EMC is offering a parallel consulting offer: For many customers that outside help will be desperately needed. From what I have seen so far, Infoscape is a good practical start toward the ILM (information lifecycle management) promised land that EMC and others have been preaching. ILM is a good place to go, and Infoscape points the way. But it won't be a short trip or an easy one.
EMC Infoscape
EMC
Available: September 18, 2006
Cost: Starts at $125,000 plus $9,000 per terabyte
Verdict: Easy to use but difficult to deploy is a fair characterization of my first experience with Infoscape. The application has a pleasant, browser-based interface and a rich toolbox for automatically classifying files according to sophisticated rules. However, Infoscape is limited to classification; other storage solutions will be needed to execute on its promise. Further, consulting services may be needed to implement it.
Posted by Mario Apicella on September 18, 2006 01:56 PM








