In this blog entry, I will articulate the key technical trends that will drive the
architectures of future storage management software. I have been thinking about this
topic for the past few days and so I decide to post my thoughts. Some of the key
trends that will influence storage management architectures are:
* Application Driven Management: Traditionally, server, storage, fabric and
application management tasks have been treated separately. However, going
forward, most of these management tasks will be driven by business requirements
via application level policies. Application vendors (like Oracle Automatic
Storage Management), hypervisor vendors (like VMWare Virtual Center) and
server vendors (like IBM Director) are starting to provide storage management as
an integral part of application, hypervisor and server management respectively.
Thus, it is imperative for storage vendors to ensure that their management stacks
interoperate (via well defined APIs) and also integrate (GUI, database, agents)
with these different management stacks.
* End-End SLA Management/Policy-Based Management/Autonomic
Management: With an increase in a) the number of devices in the data center, b)
the number of tunable knobs on hardware and software resources, c) the number
of competing applications with different workload characteristics sharing the
storage infra-structure, and d) number of hardware and software layers between
the application server and the storage controller, manual planning and manual
SLA enforcement have become tedious and error prone processes. Thus, there is
a need for both pro-active policy-based planning tools (for provisioning, disaster
recovery setup etc) and reactive SLA enforcement tools (like automatic migration,
workload throttling etc). However, it is important to still keep the administrator in
the loop. That is, plans should be presented to administrators, and they should be
given the opportunity to override or change them. It is also important to provide
end to end policy-based management. For example, a policy-based provisioning
tool should generate plans for host, fabric and storage resources. That is, figuring
out which storage controller or storage pool to use is just part of the required
solution. The generated plan should determine the number of paths in the fabric,
how the devices should be zoned, the number of switch hops etc. Finally, with
the emergence of cloud computing phenomena, SLA management has become an
even more important problem because customers using the cloud want SLA
guarantees.
* Dynamic Data Centers: The state of a data center is constantly changing
because of the constantly changing business (application) requirements. That is,
change in the number of users, change in capacity and performance requirements,
and change in security and availability requirements results in a data center that is
in a constant state of flux. . In general, in order to have a dynamic data center one
needs to have sophisticated change management tools. In complex data center
environments, it is wise to pro-actively perform “What-if” analysis before
actually making the changes. Re-active problem debugging, after making a
change, is difficult and time consuming. Thus, there is a need for pro-active
planning and analysis tools.
* Increase in scale with respect to data centers, devices, management objects
and meta-data: The amount of data getting digitized and stored persistently is
increasing at a very fast pace. Furthermore, the number of copies for a data item is
also increasing. Finally, due to compliance requirements, and for sentimental
reasons, data items are being kept around for longer periods of time. All of these
factors are contributing towards an increase in the number of objects being
managed (in the range of billions of objects) and also towards an increase in the
number of devices. Furthermore, most enterprises have multiple data centers for
disaster recovery and latency reduction reasons. In addition, the amount of meta-
data associated with an object is also increasing because of system, inferred and
user provided meta-data. Thus, it is important to have scalable management server
design (either a monolithic meta-data server or a federation of meta-data servers)
that is able to handle billions of objects and thousands of devices both with
respect to GUI design as well as with respect to our data store design. The data
management architecture also needs to be able to perform planning operations
across multiple data centers. It is also very important to re-visit the notion of
sending meta-data from storage devices to the meta-data server because this data
shipping approach will not scale.
* Primary Storage not at the Storage Controller: Storage at host (direct attached
storage) is being advocated by application vendors (like Microsoft, Oracle etc)
and is also being leveraged by Web 2.0 companies which want to scale CPU,
memory and disks (co-locate applications and storage). The re-emergence of DAS
architecture will become more prevalent with the gaining popularity of flash.
With the increased capacity of HDDs and SSDs, a large portion of the application
working set will fit at the hosts. Thus, primary storage will now exist at hosts.
With primary storage being present at hosts, one can envision a) a peer to peer
DAS architecture or b) a client-server architecture where the DAS box is the
client and a storage controller is the backup box or c) a combination of both types
of architectures. The management software provider needs to determine whether
to pursue a distributed (where there is a management server on each host) or a
centralized (where management meta-data is shipped to a centralized management
server) management architecture. Furthermore, it is not clear whether the host
side storage should be managed by application management software or storage
management software or both. It is important to assess the benefits of each of
these different alternatives.
* Storage Management as a Service: There is a limit on the amount of storage that
can be managed by a single storage administrator. This upper bound is improving
with the use of storage management tools. However, it is still not keeping pace
with the enterprise data consumption rates. Thus, enterprises are constantly in
search of experienced storage administrators. Furthermore, all of the storage
management tasks are not performed at the same frequency. For example,
capacity planning tasks are performed at a lower frequency than data provisioning
tasks. Hence, it makes sense for an organization to outsource either all or some of
the storage management tasks. In many cases, the vendor providing storage
management services will not have storage management personnel dedicated for
just one customer. Instead, they would like to either remotely access the storage
management meta-data, or ship the storage management meta-data from the
customer site to their site, in order to perform the management tasks. Storage
management solution vendors need to ensure that their management solutions
handle the security and scalability issues of remote storage management.
* Heterogeneous Management: In order to ensure that they don’t get locked into
a single vendor, in future, data center operators will definitely have storage boxes
from multiple storage vendors. Data center operators do not like using many
different storage management tools to manage the storage from the different
vendors. Ideally, if there is single management software that can manage storage
from the different vendors, that software will become a key control point. SMI-S
standards are still evolving. They provide many basic profiles that help to perform
basic monitoring and control operations. However, many of the vendors have
their own extensions to the basic SMI-S profile for the various resources, and
many of the advanced features (different types of copy services operations) are
not available via SMI-S interfaces. Thus, there is a lot of effort required to build a
heterogeneous management framework.

Problem 1; there are more knobs than dials on a lot of this stuff. We need to get (as an industry) a better class of dial; less technical and more business focussed. Then we need a better class of knob. One knob to one dial. Hey, if you can't measure it, you can't managae it, right?!
Posted by: Alex McDonald | October 24, 2008 at 01:21 AM