April 30, 2009

My Storage Systems TextBook

Three years ago I was asked to teach a graduate course on Storage Systems at University of California as a visiting faculty member. So, as I was preparing the curriculum I realized that there did not exist a good textbook on storage systems. By Storage Systems I mean the area below a database system. They have many textbooks on database systems, communication systems, programming languages and operating systems. But in my opinion there does not exist a good textbook on storage systems. So, I planned the lectures such that each week's lectures would contain the content present in one chapter of the textbook. Here is an outline that can be used by prospective authors of storage systems textbook:

Storage Architectures (block storage, file storage, object storage, SAN file systems, clustered systems): These lectures discussed the different types of storage architectures. We also discussed the trade-offs between these different types of architectures.

Storage Devices (Storage Controllers, Disks, Tapes, SSDs, Optical Devices): These lectures described the architectural and operational details of these different devices (and the variants in each device type). We also discussed the relative tradeoffs between these different device types.

File System: These lectures first dealt with the basics of a file system, and then discussed the design choices that were made by different types of file systems like GPFS, GFS, WAFL, EXT3 etc.

Storage Protocols (parallel SCSI, FCP, iSCSI, SATA, SAS, NFS, CIFS, pNFS, different RDMA protocols, WebDAV): These lectures dealt with different types of storage protocols. Once again we discussed the trade-offs between the different types of protocols.

Storage Protection Mechanisms (RAID algorithms, checksum techniques, Redundancy components in the controller, scrubbing techniques, DR services, long term data preservation): In these lectures we covered the different ways to protect oneself from disk failures, head failures, site failures, and also different ways to detect and correct data corruption.

Storage Efficiency Techniques (De-duplication, Compression, Thin Provisioning): I did not have time to cover this topic. However, storage efficiency has become an extremely important area of research.

Storage Management: In these lectures we covered the basic framework for monitoring, analyzing, planning and executing storage management tasks such as provisioning, backup/recovery, performance management etc. We also dealt with different types of storage virtualization techniques, and also storage management within the context of server virtualization at the host.

Storage Security/privacy: These lectures dealt with on-disk, on-wire, access control, authentication, provenance and trust issues. We also discussed the different types of possible threats and how to protect against them.

Storage Power Management: These lectures dealt with power management metrics, proactive (high density disks, efficient power sources, compression, de-dup) and reactive power management techniques (disks spin-down, shutdown), and also briefly dealt with data center cooling techniques.

Performance Enhancement (Caching, Pre-fetching, Log Structured file system, Data Layout, I/O scheduling): These lectures discuss different techniques for improving the overall I/O performance.

Workload Classification: I did not have time to cover this topic but the goal is to discuss the various types of workloads (HPC, DSS, OLTP, Archival etc) and their impact on the design of the storage systems with respect to performance.

So, as you can see a lot was covered in that class. Most importantly, the students enjoyed the class (they gave a positive evaluation) and many of them are doing very well with respect to working at good companies, and also in their PhD research with respect to publishing papers at reputed conferences. I am interested in getting feedback wrt what other topics should be covered if I write this book.


Storage Architectures for Clouds: One Size Does Not Fit All

 

Cloud computing is definitely here to stay. The key premise is that cloud providers can build large data centers cheaply and then rent compute and storage space out to small and medium sized businesses much cheaply than what the small and medium sized companies would have paid if they had built them on their own. Moreover, the cloud providers are making money while helping the small and medium sized companies. Thus, it seems like a win-win situation.

There are different types of cloud providers like infra-structure cloud providers (Amazon, Google), application cloud providers (Salesforce), storage cloud providers (Amazon, Nirvanix) etc. So, the key question is, is there a once size fits all type of back end cloud storage architecture for all of these different types of clouds? Is it cost effective for the cloud provider to always use a shared-nothing (where storage is directly attached to each of the nodes) architecture for content depot type of applications? Is it always cost effective for the cloud provider to always make keep 3 copies of every data item? There are certain non-technical reasons that have clouded (no pun intended) the thinking process and I would like to first articulate them before providing some technical analysis.

·

Don’t use Google and Amazon’s technical solutions as the reference architecture: One thing I have noticed is that many other companies are trying to mimic the Google file system architecture. It is important to note that Google and Amazon’s business model gives them the luxury of storing 3 copies of data. Similarly, since they anyway themselves have high computation needs for their internal map-reduce applications, they are able to leverage that and have smaller CPU to disk ratios for the nodes in their cluster.

·

High cost does not mean bad architecture: The higher cost of the solutions being offered by traditional storage controller vendors does not mean that their storage architecture is flawed? The business models of these companies prevent them from flooring the prices of their boxes but fundamentally these storage boxes are also made out of commodity components, and thus, their COGs are not that high. Thus, many of the cloud providers are getting away with storing 3 copies of data and in some cases having few disks behind lowly utilized CPUs because the total cost of the solution is still less than what they would have to pay to a storage controller vendor.

So, from a technical standpoint (wrt space efficiency, CPU utilization etc), I don’t think one architecture is optimum for all types of workloads and cloud deployments. Let me briefly articulate the different types of requirements:

·

Map-Reduce Applications: In these applications the CPU processing and memory needs of an application scales evenly along with the storage needs of the application. Thus, a shared-nothing architecture (with light-weight nodes with respect to number of disks attached to a node) makes sense because the CPUs are being kept busy, and also by having the storage local to the processing node one cuts down on network utilization.

·

Content Depot Applications: In these applications objects (content) is stored into repositories. There are archival content depots (write once and read maybe) and active content depots (write once and read frequently). These applications warrant heavy-weight storage nodes. That is there are thousands of disks behind a CPU complex to keep the CPU busy.

Thus, there is no single type of architecture that is appropriate for both of these workload types. Even with the emergence of flash, the light-weight shared-nothing cluster model is not useful if the application is not IOPs intensive. That is, flash is a good replacement for disks if the workload is random IOPs intensive. Otherwise, if one just wants capacity for content depots, then heavy-weight cluster nodes with thousands of disks is still the way to go.

In conclusion, I have the following different technical recommendations for different folks:

·

Cloud Providers: At the end of the day, if you use a single type of cloud storage architecture for all different types of workloads, then you will have inefficient deployments. There will be other hungrier cloud providers who will be willing to have different types of storage deployments for different workloads to gain better resource efficiencies, and thus, will be able to provide better pricing to their customers.

·

Storage Vendors: It is important for storage vendors to be adaptive and provide storage solutions for both types of workloads. They need to provide cheap and deep solutions for content depot type of workloads, and they need to provide a distributed shared-nothing storage solution that can scale to thousands of nodes for the map-reduce type of environments (if they want to be the provider for those types of applications also).


February 28, 2009

2nd Annual NetApp University Day

The second annual NetApp University day on Feb 24th/09 was a big hit. Twenty professors from top notch systems schools from across US and Canada visited NetApp. We had profs from Harvard, Duke, Wisconsin-Madison, CMU, Michigan, Stony Brook, UC Berkeley, UC Santa Cruz, UC San Diego, Waterloo, Toronto (student), Tennessee, Georgia Tech, Brown, UIUC, University of Illinois Chicago, Johns Hopkins, and Cornell attend this event. Some other professors from MIT and Stanford also wanted to attend but could not attend due to scheduling conflicts. There were technical presentations by Steve Kleiman (Chief Scientist NetApp) and by CTO and Vice President Engineering of a cloud provider (not naming them for privacy reasons). There were lively discussions on various topics such as a) Flash b) Cloud Computing c) Virtualization and d) Storage Management. A lot of ideas got exchanged in the room and everyone learned something new. Scott Dawkins (VP NetApp Advanced Technology Group) also gave a presentation on NetApp University funding/relationship model. From NetApp's standpoint it was a great opportunity to get feedback from the professors about our university relationship model and our research direction. By the way, a byproduct of NetApp's university relationship is the joint publishing of 3 papers in this year's FAST conference with university co-authors. Moreover, one of these papers also got the best paper award in FAST 2009. Finally, whether we hire them or whether they accept our offer or not, most of the top graduate students from the above universities, as a minimum, at least interview at NetApp.

Cloud Standards

I was asked to sub in as a co-chair at the 2009 FAST conference SNIA BOF on cloud computing by Alan Yoder. Mark Carlson from Sun was the primary chair person at this event. After this event, here are some things I am thinking about:

·

Involve non-traditional big boys: Any storage standardization effort without participation from non-traditional storage vendors like Google, VMWare and Amazon will not be that successful. Thus, it is important to actively lobby them to join SNIA and take their input.

·

Other Standards Initiatives: Any standard by SNIA has to work with the following other standards: a) OVF a virtualization standard b) SMI-S a storage management standard c) Cloud Computing Interoperability Forum (CCIF) and others server and network management standards.

·

Don’t standardize everything: It is not the right time to standardize everything. For example, basic data access protocols from Amazon S3 have de-facto become the data access standard that others are copying. However, other features like policy management, data auditing etc are not yet ready for standardization. That is, innovation should not be stifled by trying to prematurely standardize these features. [Mind you, policy management notions have been around for a while, but the previous efforts by SNIA in this area have not been successful].

·

Government Regulations: Standards and technology by themselves will not be enough to bring order into the cloud space. Some government regulations are also necessary to ensure protection for customers with respect to providers going out of business, or what the providers do with the customer data or where and how they store it.

·

End-End Standards: Ultimately the cloud storage related standards that SNIA comes up with need to interoperate with the overall server, network related cloud standards. For example, in some cases customers want to move their entire infra-structure from one data center to another data center (not just storage).

·

Taxonomy: SNIA is correctly targeting to initially come up with an agreed upon cloud storage related taxonomy before trying to standardize different features. For example, there are different types of clouds like a) storage as a service cloud like Amazon S3 b) application as a service cloud (application is also provided by the cloud provider) like Salesforce.com c) infra-structure provider cloud like Amazon EC2. There are public clouds as well as private clouds.

February 04, 2009

Why NetApp is Number 1 for a Storage Researcher!!

You all probably heard by now that NetApp has been selected as the best place to work in America by Fortune Magazine in 2009. This nomination is primarily based on the feedback received from NetApp employees. Now, let me give you my perspective on why I think NetApp is the best company to work for in North America. I work in the Advanced Technology Group at NetApp. Our group is part of the CTO office and it consists primarily of PhD and MSc students from top universities/research labs in the world. My perspective would be relevant for aspiring Masters and PhD graduate students who want an industrial research lab job. So, here are the top reasons why I think NetApp is number 1:

·

Get to meet with real customers on a regular basis: Every ATG member gets numerous opportunities to interact with real customers. This is a great opportunity for us to learn about real customer problems. Thus, it is not difficult to find real customer problems to work on. Trust me, this also makes for great problem motivation when writing papers.

·

Opportunity to work on a diverse range of problems in the systems area: Many students have the misconception that in a purely storage company they will not have a diverse range of problems to work on. We work on problems in areas such as storage management, on how to leverage new hardware technologies, coding theory, new emerging architectures, and data mining algorithms. You name the new emerging technical area like virtualization, Web 2.0, new memory technologies etc, and we are working on those areas.

·

Opportunity to work with product groups and make product impact: Unlike some other research labs, the work you do has a very good chance of ending up in products. Ultimately, there is no greater feeling than seeing your work being used by actual customers. You get guidance and feedback from product group architects on a regular basis. Usually, you solve a problem and build a prototype. Subsequently, the product group leverages your experience and they, in turn, build products or make changes.

·

Support for publishing in top conferences: NetApp employees get an opportunity to publish both in an internal journal and also at top conferences. Our employees are collaborating with many professors from top notch US universities. We also have an excellent summer internship program where we get top students from top universities. NetApp has an excellent publication track record at reputed top conferences like FAST and USENIX.

·

Opportunity to interact with professors and attend university retreats and conferences: NetApp employees get opportunities to attend university retreats and leading conferences. NetApp also hosts an annual University Day where top systems professors from across the world come to NetApp and meet with NetApp employees.

·

ATG labs in multiple locations: We have labs in Boston, Raleigh, Sunnyvale California, and Bangalore India. Thus, people have opportunity to work in a geographic area of their choice.

·

Proactive management and top notch colleagues: Initially, I was very nervous about leaving IBM research labs after working there for many years and moving to NetApp. But believe me, NetApp management is very proactive and pragmatic. They work very hard to make sure that their employees are happy and productive. Lastly, but most importantly, my co-workers are very positive and helpful. The atmosphere is very conducive for stimulating new ideas (a lot of patents are being filed by our team members) and for building complex prototypes.

January 06, 2009

The Birth of NetApp Blue Report

Our NetApp Advanced Technology Group has recently generated documents called Blue Reports that contain technical 1) background 2) analysis and 3) guidance information on many emerging technology trends and threats. Many co-workers from different groups at NetApp have asked me about the thinking or rationale behind these blue reports. So, in this blog I will articulate the thinking behind these reports. I will first list the problems we are trying to solve and will then briefly describe the process and how we are trying to tackle these problems. The NetApp Blue Report and the process associated in generating it is trying to solve the following issues (in no particular order):

•Empower researchers: The one thing most PhD grads hate is being told what to work on. This is a common phenomenon that I have seen at multiple research labs. Therefore, it is very important to let researchers have a say in figuring out the project they should work on.

• Bring focus and prioritize research areas: One of the common problems faced by managers in research labs is that their employees want to work on all sorts of things. Many times it is very difficult to get synergy and momentum amongst the different projects. It is desirable to work on a cohesive set of projects that can build upon each other. Furthermore, it is also necessary to quickly prune ideas without investing too much time.

• Early involvement and better communication with product groups: One of the major problems that researchers in industrial labs face is to get buy-in from product group architects. The architects feel that the researchers do not have a good understanding of the problem area, and the researchers feel that the product architects do not have an appreciation for the novelty of the idea.

• Early dissemination of analysis/guidance information of an emerging technical trend: Nowadays, new ideas are discussed in blogs, technical conferences, customer visits, and during interactions with product architects. Most researchers spend a lot of time understanding the customer problems and technical trends, and then doing a survey of related literature, analyzing the technical trend or problem, and then proposing new project ideas. This information is very valuable for executives and other employees within a company to get a good technical understanding of an emerging area and also its impact on the company.

Briefly speaking, the blue report process consists of identifying few new themes. The themes are identified by company leaders, researchers, and product architects based on emerging technology trends, competitor moves and customer problems. The notion of themes helps to bring focus with respect to what the group should work on. A set of researchers and relevant product group architects together form a theme team for each of the themes. This helps to both empower researchers as well as get early involvement by key product architects. Each theme team concretely defines the theme and its scope, does a background survey of the work done in that area, performs an analysis on what are the key problems, and how does this theme affect the company. Finally, the theme team also proposes a prioritized set of project ideas. The output of the theme team’s work is a blue report. This report can be read by others in the company to get a better technical understanding of an emerging technical area/threat. After the generation of the blue reports the research management tries to prioritize between the different themes and then allots resources for the top priority themes. Priority is determined based on company priorities, threat level, employee skill set etc. Generating a blue report typically takes around 3 months and one can periodically update it once in six months. Finally, I called it a Blue Report instead of calling it something else because I want a non-technical catchy name. Since NetApp’s new logo is blue in color, I thought about calling this report a Blue Report.

Re-emergence of DAS Model (Part 2)

As I mentioned in the previous part of this blog, primary storage at host (or DAS) model is definitely going to get more traction. This second part is only trying to articulate the different models for how the second copy of data can be stored. Second copy of data is required for availability (to overcome machine, site failures, data corruption), performance (load-balancing) and potentially for data integrity verification purposes. In this blog entry I am not focusing on the performance reasons for having the second copy. The two prevalent models being analyzed in this blog are peer-peer and host-storage controller models. In the peer to peer model, primary storage is backed at other peer nodes, whereas, in the host-storage controller model, the primary host storage is backed up at the storage controller. A peer is a light node with only 4 to 8 disks and it runs application code, whereas, a storage controller can support hundreds of disks and usually does not run application software. One could easily replace a storage controller with a storage cloud infra-structure. Furthermore, one could potentially also use tapes instead of disk based storage to store the second copy.

Now, I will articulate under what circumstances what will be the choice for the second copy of the data:

·        Case for Peer-Peer: For Map-Reduce applications, there is a need for a lot of processing nodes. Nowadays, in commodity configurations, most of these nodes come packaged also with some storage. If this storage is not being fully utilized for primary storage, then the peers will use the available space to backup other peer’s backup data. If the CPU/Memory requirements of the application don’t scale, then pursuing a peer to peer approach just for storing backup data will not be cost effective because of the low CPU/Memory utilization at the nodes. Similarly, if the cost of the storage media at the hosts is higher than the cost of the non-host storage media, then too it will not make economic sense to store the second copy at the host.

·        Case for Peer-Peer: Some companies have argued that a peer to peer setup consisting of commodity parts is more cost effective than using storage controllers. In some cases they have argued that it is cheaper to even have multiple copies of data in the peer to peer setup, rather than storing data in a storage controller. This is primarily the case if one is storing their data in Tier-1 storage controllers with full hardware redundancy and specialized non-commodity hardware. Furthermore, there is a difference in the cost to create a storage controller and the price the vendor charges. Thus, if a vendor uses commodity hardware components, then the vendor has the flexibility of reducing the price markup. In some cases due to their business models storage controller vendors will not be able to reduce the price.

·        Case for Storage Controller: If cost is really the issue, one could create a storage controller using commodity parts. That is, there are advantages to having higher number of disks behind a CPU complex (as in a storage controller) than as is the case in a host to prevent low utilization of the CPU resources.

·        Case for Storage Controller: One could argue that one can optimize the data layout format in a storage controller to focus of storage efficiency instead of performance. Furthermore, one can focus on operational efficiency by use very dense shelves, and also have the ability to power down/shutdown unused disks. It will be very difficult to realize these features in a peer to peer environment where the peers are storing both primary data as well as the second copy.

·        Case for Cloud Storage: For a small to medium size company, the operational efficiency (power, space, storage management) benefits of putting the second copy in a cloud managed by someone else is an attractive alternative. Internally, the cloud provider could employ either a peer to peer or a storage controller storage model. However, one has to trade-off cost/operational efficiency for privacy, wide-area network performance and availability concerns associated with cloud storage. Many organizations are experimenting with putting their archival storage into a cloud. There are other cost related reasons why small to medium organizations would want to also put their primary copy in a cloud (that is a separate topic for discussion).

·        Case for tape based storage: Based on the access pattern and purpose of the second copy, it could potentially be stored on tape. For example, people store backup data on tapes. People also store archival data, where the probability of retrieving archival data being close to zero, on tapes. However, the disk densities are approaching tape densities, and so, unless the tape cost is significantly lower, disk will be the preferred storage medium for the second copy due to the ability to perform random I/O, and due to the packaging of the disk head with the disk platters (that is, one does not have to worry about whether a particular head can read a particular cartridge).

Thus, as can be seen, people will choose different options based on the situation.

December 01, 2008

Resurgence of Direct Attached Storage Model

Hey, isn’t 2009 just around the corner, and why am I still talking about DAS (direct attached storage)?  The shared storage (NAS/SAN) proponents argued that NAS/SANs were desirable because a) they de-coupled compute purchasing from storage purchasing and b) one could consolidate storage administration for multiple applications by making the applications share the storage infra-structure, and thus, reduce operational costs by having dedicated storage administrators.

In the DAS model, the local storage attached to an application server is only accessible to that particular application server, whereas, in the NAS/SAN model, multiple application servers can access the common storage. Nowadays, the DAS model is making a comeback for the following important reasons:

  • Application Vendors are providing DAS solutions: Many application vendors are encouraging their customers to use direct attached storage (as an appliance) instead of using shared storage to reduce hardware costs. The application vendors are providing replication functionality to overcome box failures.  The idea behind this approach is that the application administrator will do end-end management (both application and storage) of that box.
  • Emergence of Flash Storage: With the emergence of Flash technology, one could potentially have enough flash at the host so that one can fit the entire working set of an application in the flash storage at the host. This will definitely help to cut down on network latency. Furthermore, Flash is beginning to provide a competitive IOPs/Dollar equation.
  • Emergence of Map-Reduce (web-indexing, data mining etc) Applications: The CPU, memory and disk requirements of these applications scale evenly. Therefore, it makes sense for these applications to pursue a DAS model. Some of these architectures pursue an asymmetric meta-data server model (like in Hadoop File System).

Now, some important questions that need to be answered are: 1) should these DAS boxes backup their data on to a shared secondary storage box that provides storage efficiency via de-dup/compression, power savings, search/indexing, disaster recovery etc? Or 2) should these DAS boxes be connected to each other in a peer-peer model and backup their data at other peers?   When would one want to use the former approach and when is the later approach desirable, and when is a combination of the approaches desirable? I will analyze the answers to these questions in my next posting.

November 30, 2008

Storage Vendor Requirements for Cloud Computing

Currently, there is a lot of hype regarding clouds. There are many application level (Salesforce, Google, Oracle), compute processing level (Amazon EC2), and storage level (Amazon S3, Nirvanix) public cloud providers. The public cloud providers are primarily targeting small and medium sized businesses. There are also service companies like IBM that are aspiring to provide private cloud computing solutions to enterprise level customers with multiple locations and data centers. Finally, there are many computer companies like VMWare (VCloud), EMC (Atmos), IBM (BlueCloud) etc that are aspiring to provide hardware and software to public and private cloud providers.

The basic objective of a cloud provider is to provide a web service/SLA based interface to a combination of hardware and software resources. Moreover, the cloud provider provides the necessary management support and is able to dynamically adapt the supply of hardware or software resources based on the user demand. In the past, people have advocated grid/utility computing paradigms that also provided similar benefits. People are trying to analyze the differences/similarities between cloud computing and grid/utility computing, but in our opinion this analysis is very subjective in nature and it does not provide much value-add.

In this blog entry, I will try to list all of the important cloud friendly attributes that need to be provided by a storage vendor. These features can be leveraged by storage clouds, or indirectly via application clouds.

Requirements

Cloud User Requirements

Interface requirements:

Non-Posix Interface

Web Services Based Interface

Search Capability

Ability to attach Meta-Data

Transaction Support

Partial file loading

Basic file I/O commands

Policies/SLAs requirements

Data Availability (protection from various types of failures)

Performance

-Workloads

-Some applications require fast read throughput

-Some applications are archival in nature

-There is minimal read/write or write/write conflicts

Reliability

Security

Spin Down

Data Copies and Placement

Object De-Dup

Object Versioning

Geographic Multi-Site Requirements:

Global Namespace

Global Policy Scope/Engine

Data accessible from anywhere

Integration with Edge Caches

Management Requirements

Reporting/Chargeback

Application Level Cloud Requirements on Storage

Application level clouds will be more popular than storage only clouds. Therefore, it is very important to have good integration with applications like Exchange, VMWare, Oracle, SAP.

Interop-Heterogeneity requirements

Leverage/Incorporate existing legacy resources into cloud

Clouds containing heterogeneous resources

Data Migration to clouds and from clouds

Additional Cloud Provider Requirements

Global Efficiency Requirements

Global Storage Efficiency

Global Resource Utilization Efficiency

Physical Space Efficiency

Power Usage Efficiency

Management Requirements

Change Management

Capacity Planner

Provisioning

Global DR Setup

Reporting

Multi-Tenancy  (ensuring there are secure partitions for the different tenants)

October 23, 2008

Top Storage Management Challenges

In this blog entry, I will articulate the key technical trends that will drive the
architectures of future storage management software. I have been thinking about this
topic for the past few days and so I decide to post my thoughts. Some of the key
trends that will influence storage management architectures are:

* Application Driven Management: Traditionally, server, storage, fabric and
application management tasks have been treated separately. However, going
forward, most of these management tasks will be driven by business requirements
via application level policies.  Application vendors (like Oracle Automatic
Storage Management), hypervisor vendors (like VMWare Virtual Center) and
server vendors (like IBM Director) are starting to provide storage management as
an integral part of application, hypervisor and server management respectively. 
Thus, it is imperative for storage vendors to ensure that their management stacks
interoperate (via well defined APIs) and also integrate (GUI, database, agents)
with these different management stacks. 

*  End-End SLA Management/Policy-Based Management/Autonomic
Management:  With an increase in a) the number of devices in the data center, b)
the number of tunable knobs on hardware and software resources, c) the number
of competing applications with different workload characteristics sharing the
storage infra-structure, and d) number of hardware and software layers between
the application server and the storage controller, manual planning and manual
SLA enforcement have become tedious and error prone processes.  Thus, there is
a need for both pro-active policy-based planning tools (for provisioning, disaster
recovery setup etc) and reactive SLA enforcement tools (like automatic migration,
workload throttling etc). However, it is important to still keep the administrator in
the loop. That is, plans should be presented to administrators, and they should be
given the opportunity to override or change them. It is also important to provide
end to end policy-based management. For example, a policy-based provisioning
tool should generate plans for host, fabric and storage resources. That is, figuring
out which storage controller or storage pool to use is just part of the required
solution. The generated plan should determine the number of paths in the fabric,
how the devices should be zoned, the number of switch hops etc.  Finally, with
the emergence of cloud computing phenomena, SLA management has become an
even more important problem because customers using the cloud want SLA
guarantees.

* Dynamic Data Centers:  The state of a data center is constantly changing
because of the constantly changing business (application) requirements. That is,
change in the number of users, change in capacity and performance requirements,
and change in security and availability requirements results in a data center that is
in a constant state of flux. . In general, in order to have a dynamic data center one
needs to have sophisticated change management tools.  In complex data center
environments, it is wise to pro-actively perform “What-if” analysis before
actually making the changes. Re-active problem debugging, after making a
change, is difficult and time consuming. Thus, there is a need for pro-active
planning and analysis tools.

* Increase in scale with respect to data centers, devices, management objects
and meta-data: The amount of data getting digitized and stored persistently is
increasing at a very fast pace. Furthermore, the number of copies for a data item is
also increasing. Finally, due to compliance requirements, and for sentimental
reasons, data items are being kept around for longer periods of time. All of these
factors are contributing towards an increase in the number of objects being
managed (in the range of billions of objects) and also towards an increase in the
number of devices. Furthermore, most enterprises have multiple data centers for
disaster recovery and latency reduction reasons. In addition, the amount of meta-
data associated with an object is also increasing because of system, inferred and
user provided meta-data. Thus, it is important to have scalable management server
design (either a monolithic meta-data server or a federation of meta-data servers)
that is able to handle billions of objects and thousands of devices both with
respect to GUI design as well as with respect to our data store design. The data
management architecture also needs to be able to perform planning operations
across multiple data centers. It is also very important to re-visit the notion of
sending meta-data from storage devices to the meta-data server because this data
shipping approach will not scale.

* Primary Storage not at the Storage Controller:  Storage at host (direct attached
storage) is being advocated by application vendors (like Microsoft, Oracle etc)
and is also being leveraged by Web 2.0 companies which want to scale CPU,
memory and disks (co-locate applications and storage). The re-emergence of DAS
architecture will become more prevalent with the gaining popularity of flash.
With the increased capacity of HDDs and SSDs, a large portion of the application
working set will fit at the hosts. Thus, primary storage will now exist at hosts.
With primary storage being present at hosts, one can envision  a) a peer to peer
DAS architecture or b) a client-server architecture where the DAS box is the
client and a storage controller is the backup box or c) a combination of both types
of architectures. The management software provider needs to determine whether
to pursue a distributed (where there is a management server on each host) or a
centralized (where management meta-data is shipped to a centralized management
server) management architecture. Furthermore, it is not clear whether the host
side storage should be managed by application management software or storage
management software or both. It is important to assess the benefits of each of
these different alternatives.

* Storage Management as a Service: There is a limit on the amount of storage that
can be managed by a single storage administrator. This upper bound is improving
with the use of storage management tools. However, it is still not keeping pace
with the enterprise data consumption rates. Thus, enterprises are constantly in
search of experienced storage administrators. Furthermore, all of the storage
management tasks are not performed at the same frequency. For example,
capacity planning tasks are performed at a lower frequency than data provisioning
tasks. Hence, it makes sense for an organization to outsource either all or some of
the storage management tasks. In many cases, the vendor providing storage
management services will not have storage management personnel dedicated for
just one customer. Instead, they would like to either remotely access the storage
management meta-data, or ship the storage management meta-data from the
customer site to their site, in order to perform the management tasks. Storage
management solution vendors need to ensure that their management solutions
handle the security and scalability issues of remote storage management.

*  Heterogeneous Management:  In order to ensure that they don’t get locked into
a single vendor, in future, data center operators will definitely have storage boxes
from multiple storage vendors. Data center operators do not like using many
different storage management tools to manage the storage from the different
vendors. Ideally, if there is single management software that can manage storage
from the different vendors, that software will become a key control point. SMI-S
standards are still evolving. They provide many basic profiles that help to perform
basic monitoring and control operations. However, many of the vendors have
their own extensions to the basic SMI-S profile for the various resources, and
many of the advanced features (different types of copy services operations) are
not available via SMI-S interfaces. Thus, there is a lot of effort required to build a
heterogeneous management framework.

© NetApp, Inc.  |  "Safe Harbor" Statement