January 05, 2009

Memories of a different life

In 1996 I graduated from Brown University to go work at SGI on what was a rather interesting technical problem: how to make hybrid scheduling work.

Just to recap, in the early 90’s the UNIX vendors were adding threading to their kernels and there was intense debate over what the right model was:

Thread_3

In the 1 on 1 model, each user thread had a kernel  thread. In the N on 1, all of the user threads were multiplexed on one kernel thread. Finally in the M on N model, many user level threads were multiplexed on a smaller number of kernel threads.

The N on 1 model was the “simplest” to implement, but did not allow the application to exploit multiprocessors. One property of the N on1 model was that synchronization between different threads did not require a context switch.

Although 1 on1 was simple enough to implement it had, what at the time seemed serious, issues. The two most discussed were the memory resources necessary to maintain the kernel data-structures, and the second was the performance implications of making every synchronization operation a kernel operation.

So the M on N model was promoted as a compromise between the N on 1 model and the resource intensive 1 on 1 model. To implement the M on N model, two schedulers were required : a kernel and user level thread scheduler. The central notion was that the kernel scheduler would schedule processors and the user level thread scheduler would assign threads to the available processors represented by the kernel threads. The nice property of this model is that if synchronization was required between user-level threads, the performance would be as good as the N on 1 model.

The challenge of this model was that the thing scheduling the physical resources, the kernel scheduler, had incomplete information and was prone to making mistakes. And the cost of mistakes was idle processors and therefore worse performance.

Ultimately a variety of attempts were made to fix the problem including scheduler activations and scheduler aware synchronization and nano-threads but none good enough to stop the ultimate triumph of the 1 on 1 model.

It was the need to co-ordinate between the two schedulers that eliminated the advantages of the M on N model.

One could almost argue that this was a computer science boondoggle.

So what does this have to do with storage?

At the end of the day the lesson I learned from the whole experience is that having multiple layered resource schedulers scheduling the same resources is difficult to make work.

If you look at the storage stack, what you’ll see is multiple distinct storage layers each managing virtualized pools of storage that they have no visibility into. And what’s worse the thing that actually does the physical resource management has no visibility into what the resources are being used for.

So to me it looks that the storage stack has evolved over the last 20 or so years to look a lot like the hybrid thread scheduling model.

And that is enough to give me reason to pause and think.

December 21, 2008

Shut Down for the Holidays

This is my last post for 2008.

To all of my readers, be happy and be safe.

And I can’t wait to get going in 2009.

Happy_2 

December 16, 2008

SANscreen 5.0

Today NetApp announced the latest version of our SANscreen product, SANscreen 5.0. This is a big release that enhances, what I think is, the best heterogeneous SRM product.

Delivering a service oriented view of storage

We are in the process of watching the evolution of storage from an interesting set of devices with capabilities to a service driven infrastructure. What do I mean by service driven infrastructure? Prosaically, it's the aggregation of a set of capabilities, things like performance, reliability, protocol, cost and a commitment that the storage will provide those capabilities, also known as the Service Level Agreement.

For some companies this has already happened. For others, it's a work in progress.

One of the key challenges is how to provide the visibility into the storage tier to ensure that the services promised are the services delivered. 

SANscreen, originally built by Onaro, has continued to evolve and extend it's core value proposition, the ability to quickly and easily see what is going on throughout the storage infrastructure. Using an agentless infrastructure, SANscreen is able to see more, faster and more reliably than any other tool.

Building on top of this very efficient data collection mechanism, SANscreen allows storage teams to monitor the service levels of their storage infrastructure.

So what's new?

The latest release extends SANscreen in three interesting ways:

  1. A data warehousing infrastructure that allow storage architects to slice and dice their data
  2. Extend SANscreen to iSCSI infrastructures.
  3. Deliver the kind of analytic tools necessary to execute a storage tiering strategy.

iSCSI and SANscreen 

It's more than just an FC world out there, and any SRM tool needs to understand iSCSI infrastructure. And SANscreen is no different.

Storage architects that have used SANscreen understand how providing visibility into the infrastructure in an almost real time way makes it possible to mange the storage sprawl. Onaro got it's start clarifying the complex SAN fabrics that were put together over the last decade.

An increasingly important part of the storage landscape is iSCSI. An increasingly challenging part of deploying iSCSI is how to manage the fabric and the storage attached to that fabric.

With the latest release of SANscreen, NetApp is providing a tool that although not providing an end-to-end fabric view, does provide insight into the fabric end-points.

Storage Tiering 

When you're flush, time is money, so you spend money to save time. When you're short on cash you spend time to save money. The trick is to get enough bang for the buck for the time you spend.

Storage analysts talk about the need to have a comprehensive storage tiering strategy that spans the entire enterprise, and then stare dumbly as storage architects ask the inevitable question; how? Hetero SRM tools never really got enough information to be useful, homogeneous tools only told you how to use their stuff effectively.

And don't get me started on the question of FC vs iSCSI.  Never mind what is real FC, how about the basic question of how much FC do I need and how much iSCSI do I need?

To produce a reasonable tiering strategy you need complete, up-to-date, comprehensive data and the analytics necessary to do something with that data. SANscreen has always had the comprehensive, up-to-date and complete part covered, now we added the analytics.

What exactly do I mean by analytics?

  1. Provide reports into how capacity is being used

     

  2. Map capacity used by business group to some notion of dollars per terabyte.

     

Using both of these mechanisms it's possible to understand the current behavior in your storage infrastructure, and more importantly change it.

Let's be precise, if you want to drive to a particular global view of $/TB, understanding what's going is the first step. But once you have that information, it's really important to act on it. And once you've acted on the information see what happened.

I call this the find, act, verify cycle of storage management.

Too many tools make the find and verify phase an unending adventure. Make those two phases quick and painless, and you'll spend more time doing and less time trying to figure out what to do. Net result, you'll get more efficiency out of your infrastructure.

What's really interesting is that when you combine SANscreen's ability to monitor storage SLA's with the new tiering tools, it becomes possible to correctly identify storage that is on the wrong tier.  And even better, if storage is moved to the wrong tier, then the storage group can discover that and move the storage back without having to deal with the dreaded scream at storage group, storage group fixes problem, scream storage group for creating problem cycle.

But for any tiering strategy to be effective, the consumer has to be involved. And that's where SANscreen's new charge back capabilities come into play.

 

If you measure behavior, and show how the measured behavior is deviating from the goal, you'd be surprised at how quickly the behavior moves towards the goal.

For more information check out the website where you'll find a couple of new whitepapers that describe the product in more detail.

December 15, 2008

Backup, the new storage tiers and real snapshots

When I first described my new terminology for storage tiers, Tim Burlowski, asked how backup and recovery fit into that model.

It turns out that that is an interesting question.

Backup, the only working ILM/HSM solution

Step 10 feet away, and you realize that backup is HSM.

The backup process creates cold data, the backup image. The backup software them moves the backup copy to cheaper tier of storage. When a restore is performed, the backup software moves the backup copy from the cheaper tier to a more expensive tier.

The question on the table, then, is where does this form of HSM still make sense?

Captive IOPS

One obvious place where HSM as part of backup process is for the backup of data on a Captive IOPS tier. The tier is built using very expensive storage. Storing cold data that will probably never get accessed seems odd and a waste of money.

But that's not necessarily true.

If the Captive tier is built using disk drives, and the application  has a high IOPS data density, a lot of the capacity of the disks is  unused. The cheapest place to store backup copies, in that case, would be on the same disks you're using for primary data.

If the Captive tier is built using SSD's like EMC storage, then it makes a lot of sense to use a backup solution to move backup images off of the flash onto cheaper storage. If the application has a low IOPS data density, then a better approach is to use flash cache in front of disk.

If the Captive tier is built using a flash cache in front of disk, the cheapest place to store the backup copies may be on the disk drives.

More generally, the backup methodology of a captive tier made up of flash cache and disk, will be the same as the backup methodology of the capacity tier.

So what about the capacity tier and backup?

In the general case, the capacity tier will be backed up much in the same way that it always has been.

A capacity tier that has real snapshots can eliminate the HSM part of the backup process.

In fact, in an earlier series on backup I proved that for a VMware infrastructure, the cheapest place to store backup copies was on the same disks that stored the active data.

Now I want to generalize that comment for all storage..

If you consider IOPS density, disk drives must contain large amounts of cold data. A subset of the cold data in a data center is the backup data. That data can either be stored on a separate set of disks or on the same disks that are being used to serve IOPS.

To store the data on the same disks the following things need to be true

  1. Creating the backup image must consume minimal IOPS
  2. The existence of the backup image must impose no performance penalty
  3. Be able to store a large number of images.
  4. The storage must be configured using some form of RAID-6 or RAID 1-0
  5. The cost of the backup images must be cost effective

It turns out that for NetApp systems all five are true.

So as disk capacities expand, storage architects who have the option to use real snapshots will stop moving data off of the capacity tier to some cheaper tier.

Said in a slightly more poetic way:

The final resting place for data on disk will be the disk where the data first got created.

The net effect will be a much simpler and more cost effective storage infrastructure to support backup.

One minor addendunm added after I wrote this post.

As Martin G says in the comments below, having the backup copies on the local disks doesn't protect you from site failures. So, of course, some form of storage replication is required. Thankfully NetApp has a cool technology called SnapMirror that replicates all of the snapshots to a remote destination in a space efficient way.

In an earlier post on VMware and backup I show how all of this fits together.

image

I should have included this need to replicate the data in my original blog post. So I am rectifying that error now.

December 09, 2008

MAD Blog: Chuck, there you go again.

Today is a fun day.

Chuck Hollis has taken me to task for not understanding what real fiber channel is. And he claims that because he's been selling FC for longer, we should trust him.

Fine, I don't know what real fiber channel is and, frankly, I don't care.

But guess what, I've been at a company selling unified storage for a lot longer than EMC has.

So appealing to my seniority, much like Chuck appeals to his seniority in the FC space, the  NX4 is not real unified storage. The value proposition of the NX4 is not that of unified storage, no matter how hard he and EMC wishes it was.

December 03, 2008

MAD Blog: Onaro is not as much rubbish!

I couldn't help but laugh when I read Martin's blog about ECC.

When I first started working in the storage management space, i tried to understand who the competition was and understand why they were successful.

It was only after I experienced ECC in all of its pointless glory, did I truly appreciate how much opportunity for innovation there was in this space. And it was precisely because of that opportunity that I worked in storage management for about 5 years. One of the things we tried to do at NetApp was to try and build products our customers wanted to use. Products that solved real problems. And I wish I could say we had the perfect product, but that would be foolish. I can say that there were a lot of customers who really liked what they got. I hope some of them didn't think it was rubbish...

But I was delighted to read in a later post that Onaro may be useful. That was indeed high praise from someone so obviously and justifiably frustrated with the state of the art in vendor SRM tools.

When I first met the Onaro guys i was genuinely impressed with what they had. This was a very smart team, with a very cool product underpinned by some very interesting technology that solved a problem that, frankly, EMC and ECC should have solved a years ago.

To this day, Chuck's comments about NetApp's Onaro acquisition remind me of this parable.

Back from two days in New York

Just spent two days in New York talking to customers about Disk to Disk backup.

Had a great meal at artisanal cheese.

Bought a stuffed animal at the last remaining FAO Schwartz. p>

December 01, 2008

Flash and the new storage classification model

So I'll be out-of-the-office  this week visiting some customers, which may mean that my blog writing may slow down.

After defining some new storage tiers, repeated here for clarity of exposition

Two IOPS tiers:

  • Captive (used to be Captive IOPS)
  • Shared (used to be Shared IOPS)

And a capacity tier called

  • Capacity (used to be Capacity Efficient)

If you followed my post from last week, you'll remember that I made the case for disk being the media of choice for any capacity tier, because the IOPS density of disk matches the economics of this tier.

Flash, unlike disk, has an IOPS density, that is closer to 1 than to 0. That suggest that Flash belongs elsewhere ...

Captive and Shared Tiers Go Flash?

Chuck Hollis has made the case that Captive and Shared tiers will go Flash. Ironically enough, I agree with Chuck but I disagree on significant details.

EMC has positioned flash only as a stable storage. Chuck has argued that the entire dataset of Captive and Shared tiers belongs on SSDs. I think that's wrong.

NetApp has positioned flash as both a new layer in the memory hierarchy and stable storage and intends to deliver both, and I, rather unsurprisingly, think that's right.   

Why the difference?

If you read my post on IOPS data density, you'll recall that I argued that some pretty important datasets actually have a very low IOPS data density. What that means is that the dataset requires a medium that has an IOPS density that is closer to zero than to one.

But flash has an IOPS density that is closer to one than to zero...

Following that logic, you would  have to conclude that disk not flash is the right medium for those datasets..

Which suggests that Flash as Stable Storage is not the right answer for every byte of every dataset that gets deployed on Captive storage tiers!   

So where does Flash fit in?

If you read my post on The Principle of Uniform Resource Latency, you'll see that I argue that it's not the absolute performance that matters but the worst case performance. Storage arrays provide significant amount of memory to cache read and write operations. If an entire dataset fits into the memory of either the client system or the storage array, then the performance of the system is defined by the memory operations. On the other hand, if the dataset does not fit into main memory, then performance is constrained by the ability to push data to and from disk.

So now we have three kinds of datasets;

  1. Low IOPS data density
  2. High IOPS data density but fit into memory
  3. High IOPS data density but don't fit into memory

And only one of those datasets, the third, appears to be a candidate for (Enterprise Flash Drives) EFD. But it's not that simple.

Remember the goal is to improve performance. Putting the dataset on an EFD is an approach another is to put enough memory, like a PAM module, in the storage array to avoid having to go to disk.

Which brings me to NetApp's flash strategy.

  1. Provide Flash as an alternative to disk for applications that really require that kind of storage.
  2. Provide Flash as a cheaper read cache for every other application

So the role of Flash?

In my opinion, I believe that flash will play two roles:

  1. Be a memory cache
  2. Be stable storage

And that because of the economics of (1), will probably play a role in all storage tiers, including capacity tiers. As for (2), I believe, that the role will be a lot more limited than we imagine today.

Which wins out, of course, is a matter of conjecture today, but we'll definitely know the answer in a couple of years.

November 25, 2008

The principle of uniform resource latency

Just before I commented on Karl Dohm's quixotic quest to prove the NetApp sales team were snake oil salesmen, I was talking about IOPS data density.

In that post, I threw out the following statement:

I'll also state that applications expect consistent performance rather than variable performance and perform worse as variability increases.

This is a very important and fundamental concept and underlies a lot of how I think about Flash and storage tiers will shake out. So I will spend a little bit of time on this point.

So let's begin at the beginning.

Within a company, which is a conglomeration of humans working to a single set of goals, consistency is very prized. Consistency makes long term planning possible. Variability makes long term planning impossible.

The software systems that exist to support human beings must, therefore, fit into a world where people expect consistency and predictability.

As a result, software solutions have been designed and architected around providing consistent behavior.

Over time, software architectures have evolved to the point where some layer of software, typically the application, expects all the resources it uses to have the same performance.

What do I mean?

Well an application that uses a database is designed around the assumption that regardless of what field in the database is accessed, there is some maximum time the database will take to respond. The application is then designed to work given that maximum. When the application gets better behavior from the database, the user of the application is pleasantly surprised, but at no time does the user of the application experience worse performance than the expected maximum time.

In practice what this means is that application authors expects the underlying software (database, operating system, file system, storage, server, network) to provide uniform resource access to the resources they manage. Vast amounts of innovation are invested in software to provide a sand-box that provides the illusion of performance uniformity.

From a narrow and simplistic storage system point-of-view, this implies that along with IOPS there is an expectation of uniform latency, which is why benchmarks like SPC and SFS measure latency. And just in case I forget to say this, providing uniformly good latency is not just a SAN thing, but is also a NAS thing too…

And why all this effort?

Because of the simplicity of programming model. To provide that simplified model, a significant amount of complexity and cost are introduced throughout the infrastructure, but as an industry we have chosen to make that tradeoff.

So how to provide uniform resource access?

At one extreme, Seymour Cray remarked You can't fake what you don't have, and designed a system to provide uniformly excellent performance. At the other extreme you have the web which promises nothing.

What's important to note is that between spending a lot of money, and doing nothing there are lot of well established in-between places, also known as caching, that make it possible to provide on average excellent latency at a fraction of the worst case cost.

In my next in this series, I'll consider, and this time I promise, Flash and the new storage tiers.

MAD Blog: HP not letting it go ...

There is a specter haunting HP's Karl Dohm, the specter of NetApp's sale force snookering customers about the real value of NetApp storage.

Patrick mentions the true test, i.e. that there are many happy NetApp customers who are running Exchange.  There is truth to this of course, but it isn't a good basis of comparison because every major array vendor has happy Exchange customers.  However, its reasonable to say that these installations can't know what they don't know. 

Karl can not accept the notion that the value NetApp brings to the market is real. After first trying to prove that NetApp can not run on Exchange:

The problems we are talking about here are the core of WAFL, and are clearly not easy to fix - or they would be already fixed.  NetApp is not unique is having problems of course, all array vendors have their strong and weak points.  But to assert that WAFL has no weaknesses around fragmentation, performance, and capacity utilization defies common sense.  The old wounds are there for a reason.

Karl had to back peddle and admit:

I'm not saying you can't run Exchange successfully with NetApp.  In fact I'm sure you can.  The question looking for an answer is whether the user gets good value in choosing NetApp to run Exchange.

So reading Pat Cimprich's latest response was particularly gratifying.

Look, the FUD that HP is currently throwing out there is age old FUD that EMC has been throwing out for a while.

After 10 years of watching an artiste like EMC spread FUD, watching this ham-fisted attempt at spreading the same FUD is, well, embarrassing. And having to respond to it, is well irritating.

© NetApp, Inc.  |  "Safe Harbor" Statement