« April 2007 | Main | June 2007 »

May 2007

May 31, 2007

My Philosophy of Language (Why Google is My Word Usage Guide)

Linguists have two rival views of human language: prescriptive and descriptive.

 

    Descriptive says: Never mind what's "right" or "wrong", here is how people actually speak. Language is defined by people, so let's describe that.

    Prescriptive says: Here is how people should speak. People who speak differently are simply wrong. They are damaging the language and should be corrected.

Prescriptive is fine for grammar school teachers, but in my opinion, if you want to communicate effectively in the real world of grownups, you should figure out how other people use words and use them the same way. If people are hopelessly confused about what a word means, then it's best to avoid it unless you are prepared to lead a campaign to educate the masses. (People who read my blog on whether iSCSI is a form of NAS (here and here) won't be surprised to hear that I am a descriptivist.)

I love the American Heritage Dictionary (see my book list) because it is descriptive and often backs me up when people try to correct my grammar. For instance, I use data as a singular word, but some presciptivists insist that it is the plural of the Latin word datum. American Heritage, on the other hand, reports that 77% of their usage panel now accept data as a singular word. Language changes. If it didn't, we'd all still be speaking Latin, Sanskrit, or Proto-Indo-European. (Proto-Indo-European is the ancestor to both Latin and Sanskrit, and the American Heritage has a dictionary of Proto-Indo-European word roots. My favorite is deru or drue, which is the root of words like tree, true, durable, druid, and even tryst - presumably because trysts happen under trees.)

Google is great for probing today's usage. In my recent blog on data deduplication, I had to decide whether to use dedup, de-dup, dedupe, or de-dupe. I checked with Google and got these results:

    dedup: 35,400 matches
    de-dup: 75,200 matches
    dedupe: 116,000 matches
    de-dupe: 789,000 matches

De-dupe is the winner by a landslide, so that's what I used. To get accurate results, you have to use the exact phrase feature of advance search, because otherwise Google treats the dash as a space.

You don't always have to go with the majority. Perhaps you want to invent a new phrase, or make a point. For instance, NetApp officially refers to iSCSI as IP SAN, even though Google shows iSCSI winning by 4,940,000 to 637,000. We want to make the point that people can use iSCSI in place of Fibre Channel SAN for surprisingly high-end applications. On the other hand, we always use iSCSI nearby so that people will know what we are talking about.

Sometimes you want a unique name. Last week my wife and I had a baby girl. Her name is Mira Hitz, which has no matches now but will as soon as I post this!

May 23, 2007

How Data De-Duplication Fits into our Master Plan

Let me explain how our data de-duplication announcement this week fits into our long-term strategy. One blogger described our goal as making Data Domain the "next entrée on NetApp's dinner plate". Actually, de-dupe is part of a much higher-level strategy.

To summarize the announcement, we now support data de-duplication on all of our storage systems. (It takes a license.) If the same block of data is present in two different LUNs or files, the storage system spots this and saves space by keeping just one copy. For two years this functionality has been available for backups using SnapVault for NetBackup, but now people can enable de-dupe for any data on any NetApp storage system.

In some cases, like nightly backups of the same data, de-dupe can yield compression ratios as high as 50-to-1, although 10-to-1 or 20-to-1 are more common. Other cases, like user home directories, may save 40% or less. It all depends on how redundant the data is. De-dupe helps customers buy less storage, use less power, cooling, and floor space in their data centers, and – in the end – save money. (See here to understand why helping customers buy less storage is a good strategy for NetApp.)

Buying less storage is the small picture. The big picture is that we want to help customers create a disk-based copy for all of their primary storage.

Many customers already create disk-based copies for mission critical data, to ensure business continuity in case of disaster, but we believe the trend is to create disk-based copies for everything. Tape-based backup just isn't keeping pace with improvements in disk drives. Plus, compliance and discovery for litigation are creating new requirements that tape drives could never meet.

Interesting things start to happen when you create a disk-based copy of everything. Instead of doing searches on primary storage, which could hurt performance, why not search the secondary copy? If the people running decision support systems want their own copy of a critical database, why not clone the secondary instead of paying for a whole new copy? Why not create lots of cloned copies for the test and development team preparing to upgrade to the next version of Oracle or SAP? When you create a copy of everything, and add functionality like snapshots and clones, what you end up with is a smart copy infrastructure that can completely change the way you think about data management.

This won't happen overnight. We understand that. But anything that helps people reduce the cost of creating copies helps us achieve our vision more quickly. In the short run, data de-duplication helps customers save space and save money, but what's more important is that by reducing the cost of copies, it helps us achieve our master plan.

May 17, 2007

I Was Interviewed by Frontier Journal – Hear it On-Line

I was recently interviewed by Ed Zhang at the Frontier Journal. Here is the mp3.

You can hear why venture capitalists wouldn't fund us until after we actually had paying customers. (Angel investors funded us all the way to first product ship.) You can hear about Mike Malcolm who is the third founder of NetApp, along with James Lau and I. (Mike left in 1995, after we hired Dan Warmenhoven to replace him as CEO. Mike is a genius and is still doing interesting stuff.) I talk about my role at NetApp as "company philosopher".

They also have a collection of interviews with Steve Wozniak, Vinton Cerf, Richard Stallman, Jimmy Wales and many more, in case you don't want to listen to me.

May 11, 2007

Pop Quiz: Should You be a Late Adopter or an Early Adopter?

Technology follows a predictable adoption life-cycle. First the innovators try a new technology. They are crazy and will try anything, just for the fun of it. Then come the early adopters. They typically have a problem so hard that they must take risks to solve it. Next are the early majority, the late majority, and finally the traditionalists – the “quill pen” folks, who avoid change if at all possible.

Startups are full of people who love new technology, and they assume that customers do as well. The reality is that many companies, especially large enterprises, would rather avoid change, and for good reason. Change is disruptive. Change upsets people. Change requires new expertise. Change might fail, and it often costs more than you expect.

There are lots of good reasons to be a late adopter, so I’ve come up with two questions to help customers figure out where they should be on the adoption curve:

    Question #1: Can existing products do what I need?

If you have no problem, why change? High-tech companies always seem to have new problems. They want to simulate a bigger chip than ever before, or render more orcs on a battlefield, so they are comfortable being early adopters, and they have learned to do it well. Part of their business model is to manage the risks of early adoption.

Low-tech companies occasionally have problems that existing products can’t solve. For instance, people are considering disk-to-disk backup because tape-based backup isn’t keeping up with the growth in storage. The problem is, many companies are not used to being early adopters. It makes them uncomfortable, and they don’t have the skills to manage it well.

    Question #2: Are IT costs too high?

If existing products do what you need, and IT costs are not too high, why on earth would you change? When considering IT costs, it often makes sense to ask what percentage of total costs are IT related.

Suppose you are an oil company, and you have spent $12 billion on an oil refinery. Now you are trying to figure out whether to spend one million dollars on the IT infrastructure to support it, or ten million. Without knowing anything else about the problem, I can tell you the answer. Spend ten million; spend twenty; who cares! That is such a low percentage of the overall budget that it doesn’t matter. Just don’t ever let that refinery go down. Or chip fab. Or battleship.

On the other hand, consider Oracle’s On Demand business. Instead of buying Oracle software and running it on their own equipment, customers outsource to Oracle. Oracle buys the servers, buys the storage, and manages the software. The Austin Data Center where Oracle does this is the largest installation in the world of Dell/Linux and NetApp storage. I don’t know what percentage of their overall costs are IT related, but I know it’s huge. Existing products could solve their problems, but the cost would be exorbitant, so Oracle has been very aggressive with technologies like NFS and Linux for enterprise applications. They have led the way in making these technologies safe for others.

What’s most important is that you make a conscious decision between early and late. Sometimes early is better, and sometimes late is better, but at least you should ask the question! Use my two questions to figure out what makes sense for your business, or your particular project.

A big part of NetApp’s strategy is to enable customers to change when it’s convenient for them. Unified storage is a great example. Our storage systems support Fibre Channel SAN along with NFS and iSCSI, which means that customers can start with traditional SAN for their database applications, if that’s what makes them comfortable, but they can change to Ethernet storage whenever it makes sense. (Ethernet storage often brings significant savings.) Alternately, they can start with NFS or iSCSI, secure in the knowledge that they can easily upgrade to Fibre Channel SAN if their requirements become more demanding.

By doing our innovation in a single, unified storage architecture, we create an environment that lets customers adopt new technology when they are ready. Early if that is appropriate, or late if that makes more sense. Either way, NetApp’s approach makes it easy, because the new technology is part of the same architecture that the customer has already installed.

In other words, NetApp enables change, but we don’t force change down our customers’ throats.

May 04, 2007

My Offensive iSCSI Blog (My Philosophy of Communication)

It seems my blog on whether iSCSI is SAN or NAS is offending people:

 

In that blog I said, “Many technical people are offended by the idea that iSCSI might be NAS.” Sure enough! Here are two technical people offended by my post. (At least I know my audience.)

In some ways, that blog was more about a philosophy of communication than about iSCSI.

When I’m communicating badly, it’s often because I don’t understand how my audience thinks. The words and ideas I’m using don’t mean the same thing to them. But if I take the time to hear and understand their worldview, then I can speak their language and communicate better. Sometimes I even change my own worldview.

That blog is the story of what happened when I took the time to understand how business people think about their storage.

When I first encountered business people who thought that iSCSI was NAS, I reacted just like Mario and Marc: I thought they were idiots and I wagged my shaming finger at them. But when I learned that their worldview is based on infrastructure, capital expenses, and organizational structure, I realized that to them, iSCSI really is more like NAS than SAN.

The idea of iSCSI as SAN is incompatible with their worldview, and if I speak that way, I will confuse them. I might accept Marc’s finger of shame if I were harming my audience, but in fact I am helping them by using words in ways that clarify and not mislead.

Does this mean that business trumps technology? That we should categorize iSCSI as NAS? No! That would be equally misleading to the technical people. iSCSI is a block-based protocol, so in terms of how you manage it, and how it interacts with applications – key parts of the technologist’s worldview – iSCSI is very much like SAN.

I consider business and technology to be equally important. Since categorizing iSCSI either way will be confusing, I refuse to categorize it at all. I simply list all three names instead of just two: SAN and NAS and iSCSI.

Plato had advice on this subject: “Why should we dispute about names when we have realities of such importance to consider?”

Sometimes I think that technical people do themselves a disservice, when talking with business people, because they focus too much on technical details that don’t matter to their audience, instead of focusing on the issues that do. Don’t waste time on irrelevant categorization; spend time on how to save money with iSCSI. In this case, audience trumps speaker.

Recent Posts



Subscribe to Dave's Blog

RSS 2.0
Atom
© NetApp, Inc.  |  "Safe Harbor" Statement