Conference Report: 2007 LISA: No Terabyte Left Behind

Today's final technical session was Andrew Hume's invited talk, "No Terabyte Left Behind." There's a dilemma: Space is cheap, so users want, get, and use more of it. However, this leads to all sorts of interesting problems, like how to partition and how to back up the disk (especially when you get towards terabytes on the desktop). Traditional tools (such as dump take 2.5 days to back up 250GB). Making the space available from servers can be problematic (local or networked file systems and the associated problems with network bandwidth). We've talked about these issues before, but there are still no good solutions.

Let's take a hypothetical example of recording a TiVO-like service without any programming wrappers. Recording everything all the time for both standard and high definition programming leads to about 1.7 petabytes per year worth of data, even assuming no new channels get added. This is too big for the desktop, so we'll need to use space in the machine room: a 2U or 3U generic RAID unit at 2-4TB/U costs up to $1,500/TB and you'd need 133 of them per year. This uses 16TB per square foot and requires 27 feet of aisle space per year with modest power and cooling. But that's a lot of money and space per year. We can possibly be clever looking at the access patterns; for example, moving the older and less-accessed shows off to tape, or keeping only the first 5 minutes of the show on disk and the rest on tape, and thanks to a tape library (the example Andrew used was an LTO-4 with 800GB/tape and 120MB/s sustained write at 60-sec access and a 2.5PB library costs $172/TB and uses 41TB per square foot, and expansion units are $7/TB and 79TB/square foot) can still provide every TV show on-demand with no user-visible delays. Sounds good, right?

Wrong. It gets worse when you realize that the media is not infallible. Ignoring the issues with tape (such as oxide decay, hardware becoming obsolete, and so on), we've got problems with disks.

Here's the reality about really using disks, networks, and tapes: Things go bad, trust nothing, and assume everything is out to get you. You don't always get back what you put out. Compute a checksum for the file every time you touch it, even when it's read-only. Yes, it's paranoid, but it's necessary if you really care about the data integrity, especially with regard to disk and tape. He's seeing a failure rate of about one uncorrectable and undetected error every 10 terabyte-years, even in untouched, static files.

As disk use grows, everyone will see this problem increasing over time. The issue of uncorrectable and undetected errors is real and needs attention. We need a way to address this problem.

Back to my conference reports page

Back to my professional organizations page

Back to my work page

Back to my home page

Last update Feb01/20 by Josh Simon (<jss@clock.org>).