Free Newsletters

   All InfoWorld Newsletters
Storage Adviser | Mario Apicella » WP from Infortrend: why SATA plus RAID5 puts your data at risk

June 02, 2006 | Comments: (0)

WP from Infortrend: why SATA plus RAID5 puts your data at risk

I have reached the conclusion that I'll never see the end of my reading backlog: There are so many interesting topics and unfortunately so little time. If anyone has found a way to read (and understand) things faster I am all ears.

Regardless, part of my catching up with the world this week was reading a white paper on RAID6 from Infortrend that I wholeheartedly suggest reading to anyone is using or thinking to use SATA drives.

Why? Because there are some risks involved in using RAID 5 with those large drives that could affect the integrity of your data.

Wait! I was going down too lightly with this, let me rephrase: YOU COULD LOSE ALL YOUR DATA using RAID5 and SATA drives.

You may have heard this before, (and so did I) but still should read that paper. It's not exactly a new topic, but one certainly worth understanding better.

Which brings me back to the WP from Infortrend that explains the why and the how with detached accuracy and professionalism, and using the most effective human language, math. For those who are not comfortable with equations, the WP has crystal clear tables and charts.

Update: The author of the WP is Ted Pang, director of technical research for Infortrend.

Once again, RAID5 plus SATA is a bad idea. Actually, I will go out on a limb and say RAID5 plus ANY large drive is a bad idea. Do yourself a favor and read that WP from Infortrend to find out why.

Now, what's my next reading assignment?

Posted by Mario Apicella on June 2, 2006 06:38 AM


RATE THIS ARTICLE:





 

  •  
  • COMMENTS




I remember reading almost exactly the same white paper when 9.1G SCSI drives first came out - close to ten years ago. 9.1G! On one disk! Amazing! You lose two of those and you are *screwed*. Well, guess what - the new SATA drives have similar MTBF as the old 4.3/9.1 SCSI's did, and those seemed to work pretty well - many of them are still in use. (original 9.1G Seagate Hawk 800,000MTBF, 5 year life 5 year warranty, new Seagate Sata NL35XX for SAN's 1,000,000 MTBF 7 year life, 5 year warranty)

Raid5 with an automatic hot standby pretty much solves the issues raised in the white paper (ie human error, slow response to drive replacement, etc). And unlike Raid6, Radi5+hot standby is supported by pretty much every new server out there...

dateline 2016: Don't use the new 4TB drives in Raid5...see white paper for why...

JON

Posted by: Jkimball at June 2, 2006 09:11 AM

WOLF!!!

Posted by: John at June 2, 2006 11:18 AM

It really impacts your credibility when you write "LOOSE" instead of "LOSE". Nevertheless, good information. Thank you.

Posted by: Mike Duffy at June 2, 2006 11:32 AM

MTBF isn't what hurts you with large drives; you have to worry about non-recoverable read errors.

Fibre channel drives typically have a non-recoverable read error rate of 1 sector per 10^15. SATA drives typically have a non-recoverable read error rate of 1 sector per 10^14. So you are 10 times more likely to have a non-recoverable read error on a SATA drive compared to a FC drive. That assumes the drives are the same size!

During a RAID5 reconstruct, the system may read every sector on all the surviving disks. If anyone of the surviving disks has a non-recoverable read error the reconstruct will fail. It's the same as having a double disk failure.

If you do the math, during a RAID5 reconstruct, a 144GB FC disk has a 1 in ~3.5 million chance of a non-recoverable read error. A 500GB SATA drive has a 1 in ~ 100,000 chance.

That is where the problem lies.

Posted by: Glenn Duzy at June 2, 2006 11:55 AM

The burning question I was left with after reading the PDF article wasn't the SATA to Raid-5 relationship. . . After reading statements like:
"However, these affordable options also pose their own risks as SATA drives fail more often than Fibre Channel (FC) or SCSI drives."
Really??? Why? Is SATA circuitry inherently electronically more sensitive and failure prone than FC/SCSI? Or are these (typically) cheaper platters/drive hardware as low-end-consumer drives, than the (typical) FC/SCSI drive hardware?
If so, then perhaps its more a question of "I just don't want to use SATA drives AT ALL yet, until they're more reliable in the future, regardless of any raid considerations."

Posted by: John K. Humkey at June 2, 2006 12:07 PM

Noted and corrected. Thank you for catching that, Mike.

Posted by: Mario Apicella at June 2, 2006 12:11 PM

Not technical enough for you? That's what David Schwaderer had to say (sent by e-mail)

Oh please, I find the Infortrend RAID 6 white paper pathetic. It does not even touch on what the so-called "RAID 6 multiplication" is and how to perform the matrix inversion calculations for recovering double data loss.

A real one I wrote for Veritas that completely explains all RAID 6 foundations is posted on the Veritas Architect Network at:

http://www.veritas.com/Vrt/offer?a_id=13418

Thanks for all considerations,
David Schwaderer

The past is not dead. In fact, it's not even past. - William Faulkner
The future is already here. It's just not evenly distributed - William Gibson, "Neuromancer" Author

Posted by: Mario Apicella at June 2, 2006 12:46 PM

geez. i haven't heard that much fire-and-brimstone preaching since i went to my girlfriend's church! this WP makes it sound like SATA drives fail weekly.

Posted by: david at June 2, 2006 12:54 PM

I can't help but notice that the RAID6 "standard" is presented by Infortrend, the very company raising the alarm about RAID 5's "vulnerabilities." Like most WPs I see, this one is a self-serving advertisement masquerading as objective content.

Not to say that RAID6 isn't cool and useful, I'd just rather hear it from a truly objective source, like an independent university research lab.

Posted by: Ray Martin at June 2, 2006 01:52 PM

OK,,, let's try this...
Raise your hand if your small business can afford a SAN. OK, no hands.. well, that makes sense...
Let's try this. Raise your hand if your small business can afford a TB of space on a RAID 5 SATA-based server.
WOW! everyone's raising their hands! imagine that!
As sales materials and scare tactics go, the above paper is well written. Technically, I've had more SCSI discs fail than SATA drives, so does that mean I should put together a whitepaper stating that SCSI drives are less reliable? Thinking about it now, I've NEVER had a SATA drive fail yet, and most of the workstations here have them...
I know they fail, I'm not trying to say that. I just have to stop and laugh at the scare tactics, and hang my head in disguist for those who fall for it.

Posted by: Michael Roark at June 5, 2006 11:28 AM

Sorry about the late post to this thread ... odds of dual drive failure in Raid 5 array while possible, do not seem probable. Due diligence with Tape backups is the safety net.

Posted by: Chuck at June 5, 2006 11:48 AM

Quote: "Technically, I've had more SCSI discs fail than SATA drives..."

I think the logic of this statement is flawed.

Of course you have. So have I. How long have SCSI drives been out and how many do you have in your organization? Now, how long have SATA drives been out? I'm guessing, that it's probably not long enough to fully realize the MTBF, yet.

I'm not dissing SATA drives. On the contrary, I think they're great. They're fast (compared to IDE drives), have massive storage volumes, and they're reasonably priced. All in all, a great buy for those who otherwise might not be able to spring for SCSI or FC drives.

Posted by: L. A. at June 6, 2006 07:05 AM

I am not taking side (for now) in this interesting debate but I want to acknlowledge John's one word comment, probably the most concise and effective dismissal I have seen in a long time.

Too bad I cannot agree with you, John.

Also, Chuck, you're welcome to post anytime. There is no time limit here.

One may like or not the WP from Infortrend, but also other vendors are raising a red flag, and not only with SATA drives but with all large drives.

Take for example NetApp DP: Isn't that a very close relative of RAID 6?

I don't dislike SATA drives, but a longer rebuild time means a prolonged exposure to the possibility of a second drive failing.

If Glenn's math is correct (and I believe it is), that's the problem in a nutshell.

AFAIK, nobody is saying not to use SATA drives, but if you do having RAID 6 is a good idea.

Anyway, wherever you stand keep those comments coming.

Thank you!

Posted by: Mario Apicella at June 6, 2006 12:25 PM

The math on large vs. small drives is wrong. The failures break into several types:


head to surface - actual failure to write or read on a sector. Proportional to the number of sectors read/written, which is the data size plus the parity. RAID-6 has more parity writes.


head motion errors - total failure or positioning errors. Large drives tend to have less head motion, since the cylinder size (sectors x heads) is larger and more data are read before a physical seek.


circuit failures - proportional to the number of drives, larger drives mean fewer parts to fail

human error - I think the "human error" is more likely with more drives, and unlikely overall. I have machines running about 70TB total and have yet to see the human error, although I do see lots of drive failures.


I don't see any evidence that large drives are more likely to fail, and RAID-6 adds both writes and drives, so failure is more likely although far better tolerated. Note that RAID-5 with hot spare is almost as good, after rebuild a second drive failure would not lose data.

Posted by: Bill Davidsen at June 7, 2006 06:09 AM

I don't disagree with the arguments you raise in your post, Mario, nor even the content in the Infortrend whitepaper (although they do have an odd-sounding name for their company) but would you be a little more helpful and tell us:

What hardware vendors currently support RAID6?
What software solutions exist that support RAID6?

I'd be perfectly happy to implement this, except, well, it's not an option on 99% of the hardware RAID solutions out there. Software support? Well, I'm too busy writing code right now to play around building large arrays. That's what you guys do at InfoWorld, right?

Posted by: Henry Mason at June 7, 2006 07:43 AM


Bill, thank you for your post, but I don't think the problem is that large drives are more likely to fail.

My understanding, is that it takes longer to rebuild a large drive when it fails, and during that time, we are talking hours and hours, the LUN is vulnerable.

If during that time one of the cleaning people bumps the handle of the broom into one of the remaining drives, data kaputt!

Intel has a cute animation on this that even I can understand: http://intel.com/design/storage/flash/base.htm

Posted by: Mario Apicella at June 7, 2006 08:51 AM

It's difficult to disagree with your comments Henry. After all the world is not always what we wish it was.

Reminds me that years ago you couldn't buy a car that had seat belts or airbags. Then people began asking (demanding?) for safer vehicles.

As a reporter I can only point my finger, but people who actually buy stuff have the power to be more convincing.

Sorry, I am not going to make a list here of who is offering RAID 6.

However, if your vendors don't, ask them why. May be the answer will be: "... because nobody is asking for it". I have heard that before, haven't you?

Posted by: Mario Apicella at June 7, 2006 09:20 AM

A couple of points:

1 - Rebuild time on a 500GB drive is several hours IF the array isn't busy. I've seen it take 14 hours

2 - HP, NetApp, Adaptec and MANY other vendors offer RAID-6 they just call it different things.

3 - I've seen dual drive RAID failures too. Once in an EMC Symetrix before the rebuild from hot spare completed.

4 - Modern drives use embedded servo on EVERY surface and don't use the cylendar (sic) concept. moving the heads is actually faster than selecting another head and re-aligning.

Posted by: Howard Marks at June 7, 2006 10:38 AM

just a follow up - the unrecoverable read error on the old Seagate Hawks (4.3/9.1G) was "less than 1x10^14". i still say the new SATA's are an acceptable replacement for older SCSI's

In our tests the 200G SATA1's easily outran the old 9.1's in a complete rebuild. Combine, the arrow 10Mbps bus, the slow rotation and seek times and it grinds away all night...

My point was, a modern raid5+hot SATA which is available in any new server is at least as reliable as the older array it replaces...I think that is still valid.

JON

Posted by: Jkimball at June 7, 2006 01:12 PM

I give up. What does SATA have to do with this versus any other drive type?

Posted by: Dennis at June 8, 2006 08:58 AM

One wonders when multiple drive failures are symptomatic of another problem. How many "Double" disk failures actually occur within the same short time period verses single disk failures which go untreated for an inordinate amount of time only for another drive to crap out later on?

If a drive goes today but isn't replaced and another goes next month, is that a double disk failure?

Posted by: Mark at June 12, 2006 12:24 PM

Reply to Dennis:

HDDs, like most products, are built to a price point. The market position for SATA HDDs are in the mass-market 'value' category, while the market position for FC, SCSI, and presumably SAS HDDs is in the high-reliability 'enterprise' category. As such, the FC/SCSI/etc HDDs have more robust mechanics and electronics, while the SATA/ATA/etc HDDs have cheaper, slightly less robust mechanics and electronics. This allows the SATA/ATA HDDs to be sold much cheaper than the FC/SCSI HDDs.

While there is a minority of individuals in the community of interface standards developers (ANSI/INCITS T10 [SCSI/SAS], T11 [FC], T13 [ATA/SATA]) that would disagree, there is really no inherent technical issue with (e.g.) SATA that makes it less reliable than (e.g.) FC.

Posted by: Joe Breher at June 19, 2006 09:42 AM

 STORAGE SPRAWL PODCAST
Listen to the latest podcast:
MP3    RSS    Get Podcast    Archive    Mobilize





Technology White Papers

 

InfoWorld Technology Marketplace

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
» BUY A LINK NOW

Sponsored Technology Links