« July 2007 | Main | September 2007 »

August 2007

August 31, 2007

What Killed The Storage Service Providers?

Storage Service Providers (SSPs) were common back in the dot-com days. The idea was that instead of buying your own disk drives, you could buy disk space from an SSP and access it over the internet, or over some kind of metropolitan area network.

Many people believe that the SSPs died out because corporations want to keep their data nearby and will never allow it to be stored offsite. (I followed Jon Rath’s blog to two articles about the possible resurgence of SSPs, here and here.)

I don’t think offsite data was the main problem. I believe that SSPs died because their business model evaporated out from under them.

To understand the original SSP business model, you have to think back to the crazy days of the dot-com boom, when the SSPs were popular. Companies were growing so fast that they just couldn’t keep up with the hiring. Money was plentiful, but IT staff was scarce. Everyone believed that there was a limited time to make a land grab, so they couldn’t afford to slow down.

In that environment, the SSP business model was: We charge more, but you’ll pay it anyway, because you can’t hire IT people yourself. Part of the reason that startups couldn’t hire IT staff themselves was that SSPs and other IT-centric startups were luring them in with pre-IPO stock options. Why be in a supporting role at a product-centric startup when you could run the show at an IT-centric startup?

The model fell apart after the dot-com crash, because dollars became scarce, potential customers no longer needed to grow fast, and there were plenty of unemployed IT people around to hire. “We charge more so you can grow fast” lost its appeal.

In response, SSPs tried to change their business model. They claimed, “We can save you money because of economies of scale.” The problem is, they never actually demonstrated that. Helping your customers grow very quickly, but at a high cost, is very different from helping your customers save money. The SSPs just weren’t able to make the change.

Part of the cost problem is that the bandwidth required to feed a big disk array is expensive. It’s true that bandwidth keeps getting cheaper, but then again, disk drives keep getting bigger. Some SSPs focused on outsourcing storage within co-location facilities where customers would also host their servers. This solved the bandwidth problem, but left SSPs vulnerable when the co-lo model collapsed. The final blow was that many of the SSPs’ biggest customers were the failing dot-com companies.

In summary, their business model evaporated, their technique for dealing with bandwidth cost collapsed, and their best customers went bankrupt. (“Other than that, Mrs. Lincoln, how was the play?”) So it’s true that the SSPs flamed out, but that doesn’t prove that the storage outsourcing model can never work.

I don’t buy the part about corporations never allowing data offsite. Ask people at many corporations where their voice mail is stored; they have no idea, and it doesn’t worry them. (Answer: Offsite in a phone company data center.) Ask them where super-important legal documents are? (Answer: Offsite in an Iron Mountain warehouse.)

Of course, you only let critical data go offsite if you really trust the provider, which makes this a difficult business for startups.

One way to get around the bandwidth problem is to outsource the whole application, instead of just the storage. There is usually less bandwidth between the application and the user’s eyeball than there is between the application and the storage. This is exactly what Oracle does with its On Demand business, which has been quite successful, and likewise SAP with their Managed Services offering. Another approach is to store less critical data, like photos at Shutterfly, or personal e-mail at Yahoo! and Google. For disaster recovery copies, the data must be offsite anyway, so it takes just as much bandwidth to mirror to your own remote storage as it does to mirror to outsourced storage. Iron Mountain is getting into this business.

I doubt it will ever be cost effective (cheap enough bandwidth, fast enough bandwidth) for SSPs to outsource all disk drives, and I’m sure there is some data that big corporations will want to keep close to home, but there are already many situations where it makes sense to outsource the management of important corporate data. I predict this trend will keep growing.

 

August 21, 2007

Oracle Optimizes Its Database for NFS

NFS has become critical to data center grid environments. As a result, Oracle has optimized its code specifically for NFS. Instead of relying on the operating system, Oracle’s Direct NFS Client generates NFS requests directly from the database.

Direct NFS was inspired by experience at Oracle’s Austin Data Center. Oracle uses NFS to run its applications on tens of thousands of Linux servers accessing many petabytes of NetApp storage. In 2005 they had 12,000 Linux servers and 3 petabytes of NetApp storage. Today’s numbers aren’t public, but they are much larger.

When an operating system capability becomes sufficiently important, Oracle pulls it into the database. Memory management became critical, so Oracle said, “Just give me the raw pages, and I’ll manage them myself.” Disk caching became critical, and Oracle said, “Just give me the raw disk blocks, and I’ll cache them myself.” Now NFS has become critical, so Oracle says, “Just give me a raw TCP/IP socket, and I’ll generate NFS requests myself.” 

Steve Kleiman has argued that as Oracle becomes more sophisticated, the operating system becomes little more than a device driver framework that gives the database raw access to the hardware. That sheds new light on Oracle’s Unbreakable Linux program.

What exactly does Oracle gain from Direct NFS? The primary benefits are simplicity and performance. 

It’s simpler because you don’t have to worry about how to configure NFS. What timeouts should you use? What caching options? It doesn’t matter. Oracle looks at how you have NFS configured to figure out where the data lives, but aside from that, your settings don’t matter. Oracle takes control.

It even works with Windows. Just mount the data that Oracle needs using a CIFS share, and Oracle figures out the location of the data and accesses it via NFS. (CIFS is great for home directory sharing, but it isn’t designed for database workloads.) 

Performance is better because Oracle bypasses the operating system and generates exactly the requests it needs. Data is cached just once, in user space, which saves memory – no second copy in kernel space. Oracle also improves performance by load balancing across multiple network interfaces, if they are available.

For more technical details on Direct NFS, check out this article by Kevin Closson. He works for PolyServe, which is a NetApp competitor, but technically speaking, he talks good sense. I also recommend this article, by NetApp’s John Elliott, comparing Oracle performance over Fibre Channel, NFS and iSCSI. 

NetApp has been closely involved in Direct NFS from the very beginning. Peter Schay came up with the idea while he worked for Oracle’s “Linux Program Office”. He wanted to simplify things for Oracle customers running on Linux, many of whom were hosted on Oracle’s On-Demand environment at the Austin Data Center. He worked closely with NetApp engineers to prototype and test the idea. The Oracle ST team used his functional specification to develop the production version of Direct NFS now shipping in 11g. (Today Peter works for NetApp.)

I love how NFS has evolved over the past couple of decades. Twenty years ago, it providing file sharing to small engineering workgroups; today it provides the data backbone for some of the world’s largest data centers. What it is about NFS that has allowed it to make this transition? What is it about NFS that Oracle would choose to build it directly into their database? That’s the topic for another post!

August 08, 2007

Think of a Will as a Program You Can Only Test By Dying

Both legal documents and computer programs are written in a language that looks somewhat like English, but isn’t. You may recognize many words, but you are sadly mistaken if you think that fluency in English translates into LEGAL or COBOL.

The smallest “useful” computer program simply prints “Hello World!”. It does almost nothing, so most of the program is overhead. In C, it takes 53 characters of program to print 12 bytes of text – an overhead factor of 4.4.

I’ve been reading LEGAL this week, because some friends of mine are writing their will. I agreed to be the trustee in case both parents die while the kids are still young. It occurred to me that the “hello world” of wills is this:

Leave everything to my spouse. If s/he is dead, then split it evenly among my kids.

This is pretty much what my friends’ will said, but to express these 83 bytes of idea took 18,700 bytes of LEGAL, for an overhead factor of 225. That is, LEGAL is 51 times less efficient than C.

Why is LEGAL such a shitty language?

For starters, it doesn’t use modern techniques like subroutines or standard libraries. Consider this phrase from my friends’ will:

any and all household goods, furniture, furnishings, utensils and supplies, paintings, pictures, glass, silver, papers, rugs, china, books, linens, objects of art and other, similar articles of tangible personal property which I may own at the time of my death, any wearing apparel, jewelry and personal effects which I may then own and any interest which I may then possess in any automobiles

Wouldn’t it be easier to define ALL_MY_STUFF as a macro in a standard library? Instead, lawyers cut-and-paste big chunks of text from other legal documents. Part of the problem is that LEGAL was invented thousands of years ago, long before there were computers to help with mundane issues, like formatting and making sure that parenthesis match up properly. LEGAL would be so much easier to read if it used indentation and parens to make the structure clear, rather than subtle rules of comma placement.

In their defense, lawyers are legitimately afraid to make changes, because there is no way to debug or test a legal document. Think of a will as a program that you can only test by dying. If the program is wrong, your heirs could lose their inheritance, or they could be tied up for years in court.

When writing in a language that is impossible to test, it is only prudent to make the smallest change possible. When you see a complex, bizarre phrase in a legal document, there’s a good chance that it was added after somebody lost a legal battle. “Oops – we listed glass, china, silver and linens but we forgot to specify utensils. Also, let’s add ‘similar articles of tangible personal property’ just in case we missed anything else.” Imagine the outrage of the children cheated of their utensils! Once a phrase has been tested in court, no lawyer dare change it. As a result, the ancient heritage of LEGAL shows through. Chunks of text may have been copied thousands of times, over hundreds of years. You can even see bits of Latin sprinkled throughout.

Of course, there is one additional reason that legal documents are so long: Many lawyers are paid by the hour.



Subscribe to Dave's Blog

RSS 2.0
Atom
© NetApp, Inc.  |  "Safe Harbor" Statement