« September 2005 | Main | November 2005 »

October 2005

October 28, 2005

Beware of Cyanide Gas

I'm at Storage Networking World (SNW) in Orlando this week, and I really enjoyed a talk that I just saw by Dave Federspiel about how to destroy data. The talk was part of the tutorial track, so it wasn't about his company's products, but about how to destroy data in general. (For the record, he works at Data Security, Inc which makes degaussers that magnetically wipe data from disk and tape, but the SNIA tutorials do a good job of staying vendor neutral, so you wouldn't have known that from his talk.)

Hopefully everyone knows that when you delete a file on a computer, it's barely gone at all. The directory entry for the data may be gone, but the blocks themselves are still on disk, and there are a variety of tools you can use to read them.

Overwriting the data on disk can help against casual intruders, but it won't stop people who are serious about recovering secret data - like the NSA, for instance. There is a tool called an MFM (Magnetic Force Microscope) that lets you read the magnetic forces directly off the surface of a disk platter. Disk heads don't always track to exactly the same location, so you can often see a shadow of old bits just to the side of the new track. It some cases you can read multiple versions going back in time. Also, disk drives can automatically remap bad blocks. In that case, you'll overwrite the replacement block, but the original block remains - perhaps flawed but still mostly readable.

You might think that shooting a bullet through a disk - or hammering it to bits - would make it unreadable, but the NSA can apparently read useful data off of a platter that has been cut into chunks that are only 1/25th of an inch. You have to hammer on a disk for a long time to get it into chunks that small.

Using encryption and throwing away the keys works pretty well. The weak links there are the strength of the encryption algorithm and making sure that there are no extra copies of the keys.

Degaussing devices can also work well, but it isn't enough to use a random big magnet. That leaves plenty of data for the MFM to read. To really delete the data you have go through a specific sequence of magnetic field forces. The details depend on what kind of magnetic media you are using.

Another technique is to grind off the surface of that platter. Apparently that works pretty well, but you have to be careful because the chemicals are carcinogenic. Also, if the grinder isn't maintained, chunks may start to come off that are bigger than 1/25th of an inch.

Or you can incinerate the drive, which also works well, but beware of the cyanide gas that is released.

October 26, 2005

Saving Puppies in Washington, D.C.

Last Wednesday I spent the day in D.C. visiting Senators and Representatives (or more often their staffers), to talk about data regulation broadly, and data privacy specifically -- how to protect credit card numbers and other personal information. There are at least half a dozen consumer data privacy bills in committee right now, so the timing was good. Afterwards, I had a chance to talk with some members of the media (see here and here).

I had two goals: First to share the concerns that I heard from our large financial customers earlier in the week (see "Hacker hits up to 8M Credit Cards"); and second to hear how Congress is thinking.

I was impressed with Congresswoman Zoe Lofgren's knowledge of encryption. When I worked at MIPS Computer in 1986 as the junior-most computer programmer, I was frustrated with US export law. It was illegal for us to ship standard Unix encryption out of the country, even though it was based on the "Enigma Code" that Germany used in World War II and that Alan Turing cracked during World War II. We had to do painful work-arounds to ship MIPS Unix out of the country. That made no sense to me, but I guess that's what happens when you put the Bureau of Alcohol, Tobacco and Firearms in charge of cryptography. (Cryptography was considered to be a "munition".) Lofgren was a part of a two-person Congressional team, one Republican and one Democrat, who fixed all that. (Nice to see bipartisanship actually solved a problem.)

That change related to our mission of promoting encryption for credit card protection. It used to be that export regulations made it tricky for US companies to develop and ship good encryption, and tricky for global companies to deploy encryption broadly. Now businesses—both inside the US and abroad—have access to high-powered military-grade encryption.

Our visits had two flavors: people representing locations where NetApp has offices, and people who are sponsoring privacy protection bills. The two types of visits had completely different feels. The first type was very friendly. "Hello Mr. Employer in my district or state. Wonderful to meet you. How can I help?"

The second type was much more interesting, at least from my technical/engineering perspective. The staffers responsible for the legislation weren't always the most technical people, but they were clearly quite familiar with the issues, including the use of encryption to protect data. Some bills specifically identified encryption as a practice that should be used, and others did not, but even for bills that weren't explicit, staffers indicated that "Of course banks should encrypt customer data." And they were interested in understanding the effect of their legislation on the corporations (our customers) that would need to implement.

It seems unlikely that any of the bills will reach the floor this year, but I think there's a good chance that something will pass next year because both Republicans and Democrats want better protection for consumers. As Kevin Brown, the VP of Marketing at Decru (a NetApp company) likes to say, "Protecting consumer data is like saving puppies. Who is going to argue against saving puppies?"

October 21, 2005

"Hacker hits up to 8M Credit Cards"

This week I spent a day in New York in a group session with a dozen of our largest financial customers.

Probably everyone has noticed all the headlines lately about lost and stolen credit cards. "Bank loses a million customer records." "Hacker hits up to 8M credit cards." This has obviously become a hot topic for banks and other financial institutions, and I got to hear first hand how they are thinking about the issue.

The reason for these headlines is that California and New York have passed disclosure laws that require companies who lose confidential consumer data to notify the customers and the public. The laws have had their desired effect, because an issue that had been simmering in the background for years has suddenly become national news.

Some of our customers have had headlines written about them, and I can tell you that this is painful for them. I learned that one customer spent tens of millions of dollars in the cleanup required for just one lost backup tape. The cost included figuring out which of their customers' credit cards had been lost, notifying those customers, and then paying for a year's worth of credit reports to help those customers track whether anyone was using the stolen credit card numbers. One follow-up study of people who were notified that their personal information had been lost found that 20% of the people had already stopped doing business with that company and another 40% were considering it.

You can be sure that these companies are highly motivated to solve this problem. It goes beyond the money to deal with a particular lost tape; they worry about the cost to their reputation. As I said, the disclosure laws are having their desired effect.

Part of their challenge is the conflict between protecting data and keeping it secret. The best way to protect data is to make multiple copies and send them offsite. One of our customers said, "You've got to understand, I've got ten thousand backup tapes at Iron Mountain. We're not talking about a small problem here." Another customer said, "That's nothing - I've got a hundred thousand tapes at Iron Mountain." Yet another customer said, "I've got a Six Sigma quality program in place, but even if I meet my quality targets, with so many tapes, I'm still going to lose 10 or 15 tapes a year."

The bottom line was that pretty much everyone in the room had plans to encrypt their backup tapes at a minimum.

October 18, 2005

Typing and Talking for the Rest of My Life

It's getting harder and harder to fill up a disk drive.

If I type for the rest of my life, I won't come close to filling a disk drive. Let's optimistically say I live 50 more years, and let's say that I type 12 hours a day at 60 words a minute - that comes to about 4.7 GB of data, which barely puts a dent in a large ATA disk drive these days.

If I talk for the rest of my life, the stored audio won't even fill a disk drive. If I use a 4kbs codec for telephone quality audio, then the same 50 years of 12 hour days yields 394 GB. Still doesn't fill Seagate's largest drive.

So how do big storage users go about filling racks of disk drives? I have a theory that there are only three ways to generate "Really Big Data". They are:

  1. Generate data by computer.
    • People can't type that fast, but computers sure can. Good examples in this category are computer aided design and Hollywood animation and special effects. Compilers are also a good example. Type in the smallest program you can think of, and then check out how big an executable the compiler spits out.

  2. Get millions of people all typing at once.
    • One person can't type that fast, but a million can. Yahoo!'s e-mail is a good example. Last I heard, Yahoo! had 750 million e-mail accounts. (My Engineering background compels me to admit that only 250 million of those are active accounts - the others have apparently been abandoned.) Other examples would be the transaction records of lots of ATM machines or the access logs of an active web site.

  3. Sample the real world.
    • Typing and talking are slow, but start snapping high quality digital photos or shooting digital movies and you can chew up disk space fast. Commercial examples include seismic data for oil and gas exploration, medical imaging and satellite data.
For really big data, combine more than one of these techniques. Most cash machines these days take photos of people as they withdraw their money: sample the world times millions of people. Oil exploration is another good example. Oil companies start with a seismic image of the ground (sample the world) and then blow it up to many times that size as they analyze the data with seismic processing tools from companies like Landmark Graphics.

I'd be curious to hear an example of "Really Big Data" that doesn't fall into one of these categories, but so far I haven't found one.

October 13, 2005

I'm Going to Keep Locking My House

After my post about using encryption to delete data by "throwing away the keys", I got an e-mail from a reader arguing that destroying data does not equal deleting the data.

The full text is here, but in summary the reader argued that encryption that is unbreakable today may be cracked in the future, especially when you consider potential breakthroughs like quantum computing. Although the reader didn’t mention it, potential breakthroughs in theoretical mathematics (around factoring large prime numbers) could also make some codes easier to break than they are today.

I once met a man who advises people on home security. He said, "There is no absolute protection against a determined intruder. But most thieves don’t target a particular house - they go after the easiest house on the street. If the door on one house is locked, they move on to the next." Likewise, I’m going to argue that the "throw away the keys" technique is safe enough to be very useful.

But first, let me acknowledge that the reader makes a fair point. The data really is "still there" in some sense, even if it’s impossible for anyone to read it today. However, technology for cracking codes is likely to improve. In fact, some people have argued that you could send a secret "data time capsule" into the future by using a key just strong enough to defeat ten years worth (or whatever) of Moore’s law improvements. Of course, Moore’s law isn’t really a "law", but assuming you don’t care too precisely about the date, the idea works fine.

Here are some reasons I think cryptographic deletion is still useful:

  1. Today, most backup tapes have no protection whatsoever. And we have zero ability to delete data from backup tapes in warehouses. Anything is better than what we have now.
  2. Most crypto exploits are partial. For instance, the recent problems reported in SHA reduce its strength from 80 bits to 60-something - the data is less secure but still protected. (Doubling the key length helps dramatically.)
  3. For many applications, protecting data for a few decades is plenty. To use my "tapes in a warehouse" example, we might use cryptographic deletion to protect data for a decade or two, but then physically destroy the tapes after that. Maybe we can re-read the tapes and re-write the data we want to keep every ten years, but we don’t have to do it every day as the data ages.
  4. The Department of Defense trusts military-grade encryption for the country’s most sensitive data. I say if it’s good enough for top-secret military data, it’s good enough for social security numbers.
The bottom line is that for many applications, there is simply no practical alternative to cryptographic deletion, and lacking any practical solution, what people are doing is nothing at all. To me, the question isn’t whether cryptographic deletion is perfect - the question is whether it’s better than anything else we’ve got.

In summary, even though locks aren’t perfect, I’m going to keep locking my house.

October 05, 2005

No Data = No Beer

Last week I mentioned that I always learn something interesting when I ask CIO’s, "What bad thing happens if you can’t access your data?" One of our German salesmen (I was in Germany last week) gave me this example:

Hi Dave,

I have an example which is more specific to Germany (and it fits well because of Oktoberfest season). We just had a well-known German brewery in our demolab this week. And their IT-Manager said: "If we cannot access our data, 10 minutes later we have to stop filling bottles." The reason behind this is: they have to be able to identify every bottle of beer (when produced, which charge, etc) for legal reasons.

Bottom line: no data -- no beer. Biiiiiig issue in Germany!

Best regards from Stuttgart,

Gregor

Never mind airplanes and drugs, THIS is serious.

October 02, 2005

Lock It Up And Throw Away The Key

I’m in Europe this week to talk to the press about data regulations and about our Decru acquisition, and also to visit customers. When I look at government regulations of data, they tend to fall into three very broad categories. Regulations about:

#1) Data that companies must keep

#2) Data that companies must keep secret (or private)

#3) Data that companies must delete

A banking transaction is an example of all three. If you deposit a check, the bank must keep a record of the transaction for seven years (10 years in Germany and Italy). The bank must keep the data private - unless the government wants to investigate your taxes or something. After seven years, the bank must delete the data.

It is obvious that encryption can help keep data secret. It is less obvious that encryption can help delete data. Consider the example of a company that makes weekly backup tapes and sends them offsite. After a few years, the record of your check deposit may be saved on hundreds of different tapes. How can you delete that data, after seven years, especially given that it sits on tape next to other data that must not yet be deleted?

Every encryption system has "keys" that are used to encrypt data, and to decrypt it later. The easiest way to delete data is to throw away the key. This works even if you have lots of copies on lots of tapes stored in multiple warehouses. The most common question I get about throwing away the keys is this: "Is the data really gone? Is it legally considered to have been deleted?" The answer is yes! That’s exactly what the Department of Defense’s DoD 5015.2 Certification means. If you throw away the key, you can legally consider the data to have been deleted, at least as far as the Department of Defense is concerned.

Recent Posts



Subscribe to Dave's Blog

RSS 2.0
Atom
© NetApp, Inc.  |  "Safe Harbor" Statement