After muddling around for six posts, I finally got to a point a where I think I have something really interesting to talk about.
So let's see. My original point was that there existed a single equation that could describe the tradeoff between the cost of storing data and restore time that capture the notion of RPO. And I succeeded with The Backup-Restore Cost Model.
But what the model did not account for was that a particular infrastructure could be too costly to store all of the copies of data.
Faced with the cost, and coupled with the increasingly diminishing probability that restore would ever need to be done, backup and application architects look to alternative infrastructures..
The reasoning being that is simple enough. Although the DownTimeCostPerMinute stays constant, the probability of needing to look at an older copy diminishes with time. In effect, the expected DownTimeCostPerMinute declines as the copy gets older.
Because of that customers naturally gravitate to using a cheaper infrastructure with a different CostPerByte and a different RTO. In fact, this reality, that the expected DownTimeCostPerMinute declines over time remains the justification for tape.
Alright, so how do we capture this?
It turns out that modeling disk to tape or disk to disk is easy, but modeling disk to disk to tape is harder. And that D2D2T model will have to wait a bit.
It also turns out that the model is the same whether you are using Disk or Tape as your secondary target., the only thing that changes is the constants.
Disk to Tape or Disk to Disk Architecture Model
So let's assume that there are only two infrastructures: Disk to Tape or Disk to Disk.
The cost of the primary Disk is: Size(D)*CostPerByte + RTO*DownTimeCostPerMinute.
To save some typing we'll call that PrimaryCost.
We will call the backup target cost SecondaryCost.
The goal, of course, is keep N copies, while minimizing restore time, and minimizing the amount of money. We will assume that backup copies on the primary are both more expensive and allow faster restores.
Given that we have
N = PrimaryCopies + SecondaryCopies
We also have this equation, where M is the total amount available to spend.
M = PrimaryCost * PrimaryCopies + SeconaryCost * SecondaryCopies
We can then trivially combine the equations:
N - Primary Copies = SecondaryCopies
M = PrimaryCost * PrimaryCopies + SecondaryCost * (N - PrimaryCopies)
M = N * SecondaryCost + PrimaryCopies (PrimaryCost - SecondaryCost)
Since M, N, SecondaryCost, and PrimaryCost are fixed we can analytically determine how many copies are kept on the primary and how many on the secondary.

Comments