Typing and Talking for the Rest of My Life
If I type for the rest of my life, I won't come close to filling a disk drive. Let's optimistically say I live 50 more years, and let's say that I type 12 hours a day at 60 words a minute - that comes to about 4.7 GB of data, which barely puts a dent in a large ATA disk drive these days.
If I talk for the rest of my life, the stored audio won't even fill a disk drive. If I use a 4kbs codec for telephone quality audio, then the same 50 years of 12 hour days yields 394 GB. Still doesn't fill Seagate's largest drive.
So how do big storage users go about filling racks of disk drives? I have a theory that there are only three ways to generate "Really Big Data". They are:
- Generate data by computer.
- Get millions of people all typing at once.
- Sample the real world.
- People can't type that fast, but computers sure can. Good examples in this category are computer aided design and Hollywood animation and special effects. Compilers are also a good example. Type in the smallest program you can think of, and then check out how big an executable the compiler spits out.
- One person can't type that fast, but a million can. Yahoo!'s e-mail is a good example. Last I heard, Yahoo! had 750 million e-mail accounts. (My Engineering background compels me to admit that only 250 million of those are active accounts - the others have apparently been abandoned.) Other examples would be the transaction records of lots of ATM machines or the access logs of an active web site.
- Typing and talking are slow, but start snapping high quality digital photos or shooting digital movies and you can chew up disk space fast. Commercial examples include seismic data for oil and gas exploration, medical imaging and satellite data.
I'd be curious to hear an example of "Really Big Data" that doesn't fall into one of these categories, but so far I haven't found one.





Comments