Hey, isn’t 2009 just around the corner, and why am I still talking about DAS (direct attached storage)? The shared storage (NAS/SAN) proponents argued that NAS/SANs were desirable because a) they de-coupled compute purchasing from storage purchasing and b) one could consolidate storage administration for multiple applications by making the applications share the storage infra-structure, and thus, reduce operational costs by having dedicated storage administrators.
In the DAS model, the local storage attached to an application server is only accessible to that particular application server, whereas, in the NAS/SAN model, multiple application servers can access the common storage. Nowadays, the DAS model is making a comeback for the following important reasons:
- Application Vendors are providing DAS solutions: Many application vendors are encouraging their customers to use direct attached storage (as an appliance) instead of using shared storage to reduce hardware costs. The application vendors are providing replication functionality to overcome box failures. The idea behind this approach is that the application administrator will do end-end management (both application and storage) of that box.
- Emergence of Flash Storage: With the emergence of Flash technology, one could potentially have enough flash at the host so that one can fit the entire working set of an application in the flash storage at the host. This will definitely help to cut down on network latency. Furthermore, Flash is beginning to provide a competitive IOPs/Dollar equation.
- Emergence of Map-Reduce (web-indexing, data mining etc) Applications: The CPU, memory and disk requirements of these applications scale evenly. Therefore, it makes sense for these applications to pursue a DAS model. Some of these architectures pursue an asymmetric meta-data server model (like in Hadoop File System).
Now, some important questions that need to be answered are: 1) should these DAS boxes backup their data on to a shared secondary storage box that provides storage efficiency via de-dup/compression, power savings, search/indexing, disaster recovery etc? Or 2) should these DAS boxes be connected to each other in a peer-peer model and backup their data at other peers? When would one want to use the former approach and when is the later approach desirable, and when is a combination of the approaches desirable? I will analyze the answers to these questions in my next posting.
