Google analysis of hard disk failures
Feb. 18th, 2007 08:49 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Google has released the paper that I’ve been wanting to talk about for months. This is a really interesting read. We collected the S.M.A.R.T. data from hard disks in use for years and learned some very interesting things. Including... running hard drives hot now and then doesn’t seem to produce as many failures as one would think. If you really want to predict failure, well, you’ll have to read the paper.
no subject
Date: 2007-02-18 02:57 pm (UTC)Might explain why we have so many hard drive failures the past few years.
no subject
Date: 2007-02-18 04:19 pm (UTC)Morgan Stanley ran their rooms just at the edge of what is usually considered to be a good temperature - 65F or so. So much equipment was failing that Sun paid for an environmental study that found several problems, one of which was that they were putting their Enterprise-class machines on open "dunkin donuts" style racks next to each other. As one went down the line, the side-vented heated exhaust of one system was being sucked into the next one. By the time you got to the fifth machine in the line, internal machine temperatures were routinely over 150F. I was shown one system board where the components had literally melted off the board. They ended up putting cardboard baffles between the systems to vent the air properly.
Sunfire architecture has an almost NEBS-compilant temperature resiliency built in - perhaps partially as a result of that problem, which ended up costing Sun a heinous amount of money in replaced hardware. So my thought is that there might be a brand-name association with drive heat resiliency as well.
no subject
Date: 2007-02-18 07:19 pm (UTC)