NetApp today announced product integration with CommVault, Quest and Mimosa in the archival space. What make this announcement interesting is that it demonstrates the value of a unified storage architecture. Earlier this year, I defined the properties of such an architecture, but never really articulated the customer value. That's what I hope to do today.
So what is an archive?
Talk to someone about archive, and you get a hundred different answers.
In practice, an archive is whatever you want it to be. But there are a couple of key properties that are meaningful and relevant to this discussion.
- An archive contains the last remaining copy of the data.
- Finding data within an archive must be fast enough.
- Data in an archive will survive for a very long time
- The probability of accessing any bit of data in an archive is very low, but the archive as a whole will be routinely accessed.
- Getting the data out of the archive into a format that can be accessed is very important.
- An archive must survive operational, hardware and site failures.
Two kinds of an archive: passive and active
There are really two kinds of an archive, a passive and an active archive. A passive archive is an archive that is created through the pre-existing backup process. As I observed in earlier post, the backup process is used to create archives by cranking up the retention period. In a tape world, where the cost of storing tape is very low, this has been a remarkably successful technology at the expense of large amounts of wasted space.
However the time to find data and the time to recover data pushes IT architects to look at disk for a piece of or all of their archive needs.
But the old model of archive everything just doesn't work. If you archive everything forever, you end up retaining lots of data which costs money because disks have to keep spinning and even if they can be spun down, consume rack space.
So we have seen the emergence of active archives, like CommVault, Quest and Mimosa, where specific pieces of data are moved from wherever they are to the archive. In this kind of archive, the move is either a cut and paste, or a copy and paste of the data. The choice of cut and paste or copy and paste is not meaningful for the rest of this discussion.
Architecture of an archive solution
At the core every archive solution has four basic components: an application you are archiving, a database, a content index and a repository. Looking at the storage, the application, database and content index require fast performance whereas the repository requires good enough performance but more importantly, must be affordable and reliable. Some archives will also require some compliance based storage to meet regulatory requirements.
A picture of such a solution would look this:.
Each colored box represents a specific class of storage that needs to be deployed and managed.
But in addition, each class of storage must be protected against site failures, so the real picture looks like this:
And what the picture does not show is the complex co-ordination required to ensure that the archive and application are available on the DR site.
The challenges of a traditional storage architecture.
Looking at the last picture, it's pretty clear that a fairly complex storage infrastructure must be created.
Using traditional schemes, you either rely on a single architecture that is stretched across all of your tiers, or bite the complexity bullet of managing four distinct tiers with four distinct management paradigms and four distinct replication schemes.
But even if you could bite off the four architectures, there is a hidden cost of such an approach called over provisioning. If the underlying disk can not be effectively load balanced across different uses, guess wrong and the disk sits idle consuming power and cooling. Any unused disk represents lost money. Especially if you consider the cost of disk and the capacity growth of disk, buying more disk than you need is a horribly inefficient.
Unified Storage Architecture
NetApp, on the other hand, has a different answer. Our unified storage architecture delivers the benefits of right-sized performance and availability and cost to your archive infrastructure.
The picture below shows a single controller for clarity, but you could imagine multiple controllers if you needed additional performance or capacity.
The operational simplicity of a single storage infrastructure with a single management paradigm is obvious.
What's not so obvious, is that there is a real cost advantage to this scheme. Through the power of things like thin provisioning, flexible volumes and the ability to select which datasets are or are not deduplicated, the underlying space can be effectively load balanced across the different uses. As a result, the amount of storage required can be right-sized rather than over-provisioned.
And what about data protection?
The real power of a unified storage architecture, however, comes to play when you consider the operational challenges of protecting the archive.
Consider this archive application which is running:
At some point in time you want to replicate the archive to a remote site. Using our snapmirror technology, a single mechanism and a single management process creates the remote replica:
And what we don't show here, is that the remote archive can be made up of different hardware to further cost reduce the overall solution.
This approach of replicating a set of snapshots as a cascade is, you guessed it, is called cascading snapshots ...