Ever since I wrote about how deduplication is re-defining backup for VMware environments, I began to wonder if I could describe the economics of backup in a set of simple enough equations that would explain why this made sense.
So here's an attempt.
Let's assume we have a dataset D.
And the size of the dataset Size(D).
And the cost of storing a byte of data is CostPerByte.
Then the cost of storing a D is Size(D) * CostPerByte and we'll call that value CostToStore.
The cost of storing a full backup copy of D then is CostToStore.
Now the value of a backup is really hard to derive. The problem is that the value of a backup declines over time because it becomes increasingly unlikely that a restore will take place from that backup.
Furthermore, the notion of value is very subjective. So rather than introduce some magic constants let's make the simplifying assumption that CostToStore is a good proxy for the value of the data. After all if the value of the data is low, then the CostToStore should also be low.
Okay so then:
ValueOfBackup(Age) = CostToStore/Age
This equation says that for any backup image, the value of the backup is equal to the cost of storing and inversely proportional to the age.
So now that I have all of the terms, then I can create an equation that says that the number of backups stored is the following:
NumberOfBackups = TotalAmountOfMoney/CostToStore
And here is where the arbitrage comes in. To store more backup copies you need to reduce the CostToStore.
So expanding the equation a little bit:
NumberOfBackups = TotalAmountOfMoney/(Size(D)*CostPerByte)
And we'll call it The Backup Cost Model.
In my next post I'll discuss why this is a good model.

Comments