Since the publication of the Google File System article in SOSP 2003, there has been a lot of interest in the Web 2.0 companies to forgo the traditional Storage Controller (server) based NAS and SAN architectures for cluster of servers with direct attached storage at each of the server nodes. This storage model can be disruptive for the traditional storage vendors because Google and Amazon are not only using this architecture internally, but are also offering hosted storage services based on this architecture. In this blog, I will try to analyze whether the world is really coming to an end for the traditional storage controller vendors, or exactly what are the trade-offs between the traditional storage controller functionality and the commodity server based model being pursued by the Web 2.0 vendors. The Web 2.0 architecture consists of commodity servers with either 2 or 4 direct attached SATA disks. They run a clustered file system, and typically also provide an object interface to their storage infra-structure.
Google and other Web 2.0 companies are making the following basic assumptions with respect to their environments:
- The workloads are primarily append only. That is, there are no read/write or write/write locking conflicts, and they do not try to optimize performance for random workloads (like OLTP). Usually, they don’t have to deal with multiple types of workloads.
- Initially, Google had a need for the CPU cycles on the server nodes. However, other Web 2.0 companies are still pursuing the same architecture even if they don’t really have the need for the CPUs because the cost per Gigabyte is much lower than the traditional storage controllers. The Web 2.0 companies are also beginning to increase the server to disk ratio.
- They need a scalable multi-site file system architecture that can support thousands of nodes.
- Different Web 2.0 applications have different levels of replica consistency requirements.
- Many of the Web 2.0 companies prefer an object interface model. That is, they want to put/get objects into persistent storage.
- Many of these Web 2.0 applications do not care about data loss because a) they can always re-create the data and b) since they do not charge their customers any money for the storage services, they employ the “Buyer Beware” policy. Most of the Web 2.0 architectures employ a 3 copy replication model to provide availability. The first copy is a local copy on a different server, and the second copy is a remote copy. The 3-copy replica mechanism is adequate for ensuring availability. They employ check-summing functionality at the file system level to detect data corruption.
- They employ home-grown storage/infra-structure management tools.
Even though the current storage controllers can satisfy the above set of requirements, for cost reasons, the Web 2.0 companies are not using the traditional storage controller boxes. The storage controller companies are typically providing the following added functionality that is seemingly not required by the Web 2.0 companies:
- Provide good performance for OLTP type workloads. More importantly, since they are general purpose storage boxes they have to provide decent performance for a wide variety of workloads. They typically have very large read caches and NVRAM to provide fast write performances.
- Provide support for different NAS/SAN protocol interfaces.
- Provide high availability (it varies from Tier-1 boxes that have for practical purposes zero down time, to Tier-2 boxes that also have decent high availability). High end controllers are usually configured in active-active configurations to provide RTO/RPO support of 0. The 3-copy availability model could also potentially provide the same level of availability, however, the high end controllers try to provide this in a space efficient manner, and they also try to provide fast rebuild support. More importantly, they use higher reliability hardware components to reduce the number of failures/rebuilds and the periods of potential exposure in comparison to using commodity parts.
- Provide varying types of continuous and point-in-time data copy services. These services help to provide support from disasters and also help to quickly test applications before they are brought on-line.
- Provide thin-provisioning support. That is, volume space is allotted on-demand.
- Provide remote debugging/diagnostic support. Most of the high end controllers provide customer support by which they can remotely diagnosis failures, and also predict potential failures.
- The storage controller model allows for a de-coupling of server/storage purchasing decision. That is, customers don’t have to purchase servers if they really only want more storage capacity. But the cost (and warranty service) of commodity servers is so attractive that the Web 2.0 companies do not care about not utilizing the CPUs on the servers (that is, only use them for the attached storage).
- Provide storage resource management software that provides monitoring, planning, analysis and workflow based action enforcing mechanisms. Currently, the web 2.0 vendors are developing their own customized management applications because most of the storage management products primarily support the hardware from the specific storage vendor. Recently, with the emergence of storage management standards, storage management software vendors are beginning to provide basic management support for hardware from other vendors.
- Provide virtualization, encryption, data de-duplication, and compression functionality. This functionality is currently not considered to be high priority by the Web 2.0 vendors. However, one could argue that data-de-duplication and compression can help to reduce the overall cost of the Web 2.0 company data infra-structure.
In conclusion, I would like to make the following key points:
- I believe that the Web 2.0 companies have correctly/smartly recognized that their needs can be adequately satisfied by cheaper commodity based storage architecture.
- The traditional storage vendors can potentially de-couple their software from their proprietary hardware and provide the subset of functionality (in a lighter version of their product) that is required by the Web 2.0 customers to reduce costs. In other words, the storage companies can provide software-only solutions to the Web 2.0 customers.
- There are many other customers (for most companies their mission critical data) that require good performance and high availability. The current Web 2.0 architectures do not target these markets, and therefore, there is still a need for the boxes from the traditional storage vendors. In a nutshell, the traditional storage companies will continue to be in business.
- There is also a need to design storage systems where based on policies one can re-configure the software characteristics such as the selection of a replication consistency model, or the selection of a buffer management mechanism etc. Business case will dictate whether the traditional storage vendors will pursue these options.
- The Web 2.0 companies can definitely use the Google storage model to provide storage hosting support for archival types of workloads. Since this is an important class of workloads, I anticipate that most of the traditional storage vendors will either get into the storage hosting business themselves, or will provide software based solutions that can work with commodity hardware which in turn will be used by storage hosting service companies.
