What's the worst time to test your DR strategy?
Why in the middle of a DR event.
Why do I ask?
Well my laptop has a piece of hardware that failed. That hardware failure exhibited itself as a blue-screen-of-death. My experience with software in the past has been that blue-screen-of-death has more to do with the OS being corrupted than the hardware failing (and sometimes the two events are related). So my strategy has been to recover from an image on a different hard-drive and roll my backups forward.
As I started the re-image process, I observed that I didn't actually know if the re-image would succeed. I had never actually tested the DR process to make sure I had not screwed something up.
Part of the reason I never did the test is that I would need a third hard-drive. If you're testing the DR image is good, you don't want to copy the DR-image over the original image. And another hard-drive costs money, and hardware reconfiguration and copying data takes time.
And then that made me appreciate FlexClone even more as part regular DR management.
Using FlexClone I can create very quickly and at very low-cost a third copy. That third copy can be mounted, validated, and if it works, deleted.
As I sat there watching the data get copied, I wonder whether I was in a fantastical world of pain.
Good news is that I am not.
There is still the small problem of rolling my backups forward, but that's a manageable bit of pain.
But there is the small issue of proper DR procedures.
But just having a DR copy is not enough. The other piece is to have a DR strategy that is written down. In the middle of a DR event you can't think straight. All you want to do is get back to running as fast as possible. You're stressing out that you'll lose all your data. You're worried about lost productivity etc.
So you end up doing whatever cooky idea comes first rather than methodically figuring things out.
Well it turns out that my problem could have been solved by:
- Examining the BSOD details to see if there is a specific object that is failing
- Search on google for that object
- See what corrections are recommended
If I had done that rather than jumping to re-imaging, I would have realized a simple enough fix would have been to extract my hard-drive, put it in a USB case, delete a single offending file, and then continue.
I can assure you my new DR strategy is written down, and one of the key items is don't start until after you've drunk some coffee.
