November 19, 2009

You have $1,000 to spend...with whom will it be spent?

I'm in the midst of a decision, personally.  I'm going to invest in my house and finish off my basement.  Now many of you have done something similar - either you've invested in your home or you have made some large purchase and you did your due diligence prior to spending that money.  

What are some of the things to consider when you are spending this $1,000 you have to spend?  Let's say we're going to spend it on a surround sound system.  Here's a list in no particular order that I consider, there may be more but this is a good starting point.

  1. Quality
  2. Features
  3. Company viability
  4. Integrity of the company representatives
  5. References
  6. Cost to Value

Well, I do what most of you do - I start with an internet search.  Find providers for the product or solution I'm looking to purchase.  Then I streamline the list by identifying which product offerings meet the must-have features I'm looking for and the nice to have features.  Probably the most important part of my buying decisions is talking to people who have purchased similar products and more importantly the product that I'm looking at purchasing.  Basically I look for references - good and bad - I want to know it all.

You get the idea...we all do our "due diligence" when we're about to spend our hard earned cash.

Which is why I think it is funny when I hear people say "well, nobody ever gets fired for buying from XYZ company".

If it was YOUR money - would you have spent it that way?

I think more and more customers are finding NetApp provides the highest value overall - and our Q2 results are proof.

And truth be told...I did know someone who was fired for buying from XYZ company.  He was the CIO and oddly enough he made that statement a week earlier in a meeting.

-Chapa signing off...

November 18, 2009

Primary Storage Colors - Episode 1

 

Primary Storage Colors

you're IT

 

 

EXT.  Downtown Chicago – Day

 

A high AERIAL SHOT of the city features Navy Pier prominently in the foreground, then travels along the shoreline NORTH to the CHICAGO RIVER, following up to Chicago’s LOOP.

 

EXT.  147 W Wacker – Office Building – Day

 

Frank, an IT Director for a high profile accounting firm, is walking along Wacker to his office building after a late afternoon lunch on Friday.  His head spinning from the last series of back to back meetings starting at 7:30am with Fred the CIO and Terry the CFO of his company.

 

He’s got family duty responsibilities this weekend, he can’t let another weekend get interrupted by his work – he won’t hear the last of it…He has to figure out how to get all of this done with fewer resources

 

Voices eerily echoing:

 

  Backup and Recovery

 

  Archive and Compliance

 

  Disaster Recovery

 

  Disaster Avoidance

 

  Business Interruption(s)

 

  Business Continuity

 

  Continuity of Operations

 

  Data Protection

 

  Data Management

 

  Single pane of glass!!??!!!

 

He swats at the air as though trying to push away these painful, pesky voices.  The voices are those of his boss, bosses boss, resellers, vendors and anyone else with an opinion on his work.

 

INT.  Frank’s cubicle.  Enters Kathy DeLeon, System Administrator.  His “right hand woman”, backup admin and friend for 10 years.

 

    KATHY

    Frank you look like you just got run over by a cab.  You okay?

 

    FRANK

    What the heck is happening?  I can’t take all this stress. 

    I can’t get these voices out of my head.

 

    KATHY

    What voices?  Did you eat lunch?  One too many “dews” today Frankie?

 

    FRANK

    Nah, I keep hearing Fred and Terry’s voices talking about the action items I have around backup and recovery, disaster recovery, archive, replication, encryption, you name it.  It’s crazy, I’m crazy – geez, and here it is 2:30 in the afternoon and I have to make sure everything is square before the weekend.  If I get called again this weekend from operations again telling me a backup failed or someone can’t find a file, can’t access a system or whatever – my family is going to disown me.

 

    KATHY

    Oh, that reminds me – some guy from STIS stopped by looking for you – I met with him briefly and told him you weren’t available.  Said he wanted to talk to you about your plans this year for storage.

 

    FRANK

    ARRRGHHH!  You’re kidding right?  Ugh…if he calls back – tell him I moved to Iceland – no Greenland, I hear its colder there than Iceland.  I could use it with all the heat I’m getting from our CIO and CFO!

 

FADE to EXT.  AERIAL SHOT from FRANK’s Building “up the river” – LATE AFTERNOON

 

FADE to INT. FRANK’s BEDROOM – NIGHT

 

FRANK awakes in his own bed in a cold sweat – not sure the time, the day or the condition of his backups…

 

............

 

 

Do you feel like you just lived a part of your own life?  Was it like a dream sequence of a an IT geek’d out horror flick come true?

 

How many times are these terms Backup and Recovery, Archive and Compliance, Disaster Recovery, Disaster Avoidance, Business Interruption(s), Business Continuity, Continuity of Operations, Data Protection, Data Management, Single pane of glass used to describe what needs to be done in your environment? 

 

More importantly how many people does this represent from a functional perspective? 

 

For me it describes at least three or four separate functional areas.

 

Let’s start out by looking at the “stuff we create”.

 

Back to the Basics

We have a hard drive and we create something we like and want to keep.  So we save it. 

 

Pretty basic, huh?

 

We save it to “My Documents” (or /usr/home/dchapa for you Unix/Linux fans) – after about 100 of these “creative” moments we have a “My Documents” (or home directory) folder that is huge and not very organized.  We decide its time to get on top of things and create a folder from our C:\ called MyStuff.  Underneath that we create other folders that mean something to us like Docs, Xls and Pics.  We then move all of our documents to Docs, our spreadsheets to Xls and our pictures to, you guessed it, Pics. 

 

That’s called Data Management.  Congratulations you’re on your way to becoming an admin.

 

Alright you get the picture – if you’re reading my blog – you’re several steps past the basics – but sometimes it is just that, “the basics” that is required for us to define the terms we are using and the context in which they are used.

 

When it comes right down to it, everything we do digitally is data management - whether we are good at it or not or whether we have the tools for it or not.

 

I think we should create an organization chart that depicts where each of these terms should “report to” – in my experience that always helped during my planning or in re-capturing my life.

 

Here’s the way I remember things being organized in the past.

 

The IT Manager or Director had several people with different responsibilities.

 

Functional Area

Person with Primary Responsibility

Data backup

Backup Admin (huge area of responsibility)

Archive/Security/Compliance

Security Officer

Disaster Recovery/Business Continuity

Risk Management

Disk management

Storage Admin

Infrastructure

Network Admin

  

So, if you an relate to the character in our IT horror flick – how do YOU manage the stuff you create?

 

NetApp has a new ad campaign “OR has been replaced by AND.  So why not settle for everything?

 

I’ll take a more detailed look at the AND I’m presenting here for Data Protection in future blogs and we’ll see how our stressed out IT Director, Frank, can overcome the challenges he’s faced with.  So stay tuned for our next segment of Primary Storage Colors. 

 

The good news for our IT Director is NetApp has his AND covered.

 

Chapa, signing off...

October 29, 2009

What's your view?

I've been in the storage industry for a long time - 20+ years.  Over that time, I've been on the customer side, pre-sales Systems Engineer, Regional Sales Mgr, Consultant, Technologist, Marketing and Strategy.

I've seen a lot of changes and yet, I've seen a lot stay the same.

My approach when talking with customers is not to attempt to solve their technical issues, although that is the ultimate goal, but rather to identify how this technical defficiency impacts their business.

What are the business drivers that have led your company to look for a solution? 

I think too often some companies, and even "independent consultants" lose that focus and immediately want to sell or recommend a product or worse yet, milk your consulting budget for all that it has to give. 

I'm curious about something...

  1. What are the key business drivers that YOU are seeing (whether a customer, consultant or technology overseer) in the market today? (feel free to list as many as you'd like)
  2. Have you determined the impact overall to your business should things remain "business as usual"?
  3. How do you see technology addressing those business challenges or drivers?

I'm going to cut this blog short, by my standards, and just leave it at that...feel free to email me with your responses or post them as comments to this blog. I will have more questions later...but I think this is enough to spark some conversations.

Chapa signing off...

October 28, 2009

Tears of Recovery, Part II

On Monday, I posted a blog called Tears of Recovery.  In that post, a former consulting client of mine contracted me to redesign their backup architecture.  The three key areas with the highest priority and criticality were

  1. ClearCase
  2. MS SQL
  3. Exchange

Where we left off was at my final deliverable.  I wrote an Operations Guide for the backup/recovery solution, trained the Network Operations Center personnel and the various admins who would be managing the backup infrastructure across the country and offered several observations based on decisions they had made along the way.  I felt strongly if these areas were left unchecked, over time they would have exposure and undue risk.

Just to recap - here are the three top priority "red flags" I highlighted.

  1. In order to save costs, they chose NOT to buy maintenance on the NEW library they purchased 
  2. They chose NOT to implement the SLA that I suggested which included regular TEST recoveries of critical business application data
  3. They chose NOT to run the duplication scripts that I wrote for them to create secondary backup copies for offsite storage

Now several months after I had completed this work, including the knowldege transfer, this client had experienced a critical "system" failure.  This failure was further exacerbated by the number of failed backup jobs or partial backup jobs.  So instead of being able to restore from the previous night's backup they had to go back a bit further (1 week).  However, when they attempted to restore that backup it failed.  I believe the error message they kept getting was a rather ambiguous error message.  The bottom line was they couldn't recover.

Timeline

Day 1: Storage used for Exchange failed. (for the record, this wasn’t NetApp storage but from a big box pusher vendor)

As I mentioned, when the failure occurred (about 2:00am on a Monday morning) – the NOC personnel called the SysAdmin for Exchange and the recovery process began.  As you would expect, there was an attempt to bring the storage back online thinking there was just a “glitch” somewhere.  These attempts took several hours as much of this was done by the onsite NOC personnel and relayed back to the Admin via phone.

By 7:00am, the SysAdmin and his team were in the data center working on the next step to recovery, the tape backup.  Searching for the most recent backup was fairly painless; the pain however came when they notice the most recent backup was over a week ago.  As I mentioned above the failed or partially successful backup jobs plagued them during the previous week.

What made this worse is this Exchange server had some very high level managers and directors mailboxes on it – being down meant reduced communication.

When the attempted the restore from this backup – the job would start, the backup application would communicate to the tape library – requesting a particular barcode to be mounted in an available tape drive and the restore job would begin.  The restore would get through the first tape and request the mount of the second tape and that’s when the read error would occur.

RESTORE FAILED

Remember this red flag I pointed out earlier?

  • They chose NOT to run the duplication scripts that I wrote for them to create secondary backup copies for offsite storage

Had they cloned or attempted to clone the backup tapes, this may have given them a clue there was something amiss with their solution and could have taken other actions to remedy the situation PRIOR to a CRITICAL FAILURE.  Unfortunately they were well passed the point of no return and were in REACTIVE mode.

They tried the restore again, with the same results.  Believe it or not this went into Tuesday morning when they pulled together a “tiger team” to determine what was going to be their next move.

On a whiteboard, one of the mangers started writing down all of the vendors who had products in their Exchange environment.

1.    Switch Vendor

2.    Server Vendor

3.    Storage Vendor

4.    Software Vendors (including backup software, Microsoft naturally for Exchange and operating system vendors)

5.    Tape Library Vendor

6.    Consultant (uh, that would be me)

Then someone had the idea to call in each one of these vendors, tell them the situation and employ their help to fix the problem.  When they called my company they were told that I was out sick – I had been ill with a fever of 102.  

I'll never forget when my cell phone rang and heard my partner on the other end saying “they need you and are willing to pay whatever you want to come help them resolve the problem.”  

Well, let me tell you there’s nothing like saying “willing to pay you whatever you want” to get someone out of bed with a fever.  I loaded up on pain relievers and headed off to the site.

When I arrived I was amazed at how unorganized everything was – basically this is what was told to all the vendors before I arrive, “go and find the needle in the haystack”.

Firmware was being updated, patches were being applied, tape drives were being tested/replaced and I was gathering log information from the backup application.  All of this was happening in parallel. 

Remember…

  • They chose NOT to implement the SLA that I suggested which included regular TEST recoveries of critical business application data

Had they documented a test plan, detailed recovery process and tested this process, the chances are extremely high that they would have uncovered the issues during this test.

About half a day into my analysis I uncovered what I believed to have been the problem.  I ran over to the tape library service person and asked if he had removed/replaced the drives.  He sure did, the drives had been removed.  I asked what the problems were with the drives he found.  He outlined the problems, but I focused on one in particular.  One of the drives was out of calibration just slightly and couldn’t be brought back within spec so he replaced the drive.  I asked him if it was logical drive 10 and he responded in the affirmative. 

The service technician told me if they had purchased the service contract all of this would have been covered. 

  • In order to save costs, they chose NOT to buy maintenance on the NEW library they purchased 

What I had uncovered was the tape cartridge that failed with the read error had originally written using logical drive 10 – the drive it had been mounted to repeatedly in attempts to restore was logical drive 6.  My belief (which I’ll never be able to confirm) is drive 10 was just enough out of alignment to make it impossible for any other drive to read what it had written but not far enough to fail during the mount and write process.  Unfortunately we’ll never know.

Incidentally, after all the drives were replaced – the restore still failed in the EXACT same spot – which seemed to confirm my assumption.  

As a last resort the client sent the backup tapes and array off to a data recovery service which could read that data, recovering it to some portable media which was eventually shipped back to the client.

Five days after the initial failure the client received the portable media they had supplied to the recovery service with the recovered data on it.  Incidentally, these were all individual .pst files that had to be merged back into Exchange.  Since it was late on a Friday, the SysAdmin copied the data to a storage array they had offline (because it was having problems and not production ready – as he told me) – after he confirmed all the .pst files were there – he put the portable media on the shelf to be ‘re-used’ by whoever needed it and went home. 

Does this sound like a Looney Tunes cartoon?  Isn’t this the part were Wile E. Coyote gets the anvil dropped on his head, followed by the crate?

_21477BP~Looney-Tunes-Wile-E-Coyote-Posters2 

Saturday morning rolls around; the SysAdmin comes in with his coffee in hand and begins the long process of brining the Exchange server back online by merging all of the .pst files.  However – overnight two drives failed in this RAID-5 storage array.  All of the data copied to this array the night before was LOST. 

How do I know?  

I got a phone call – “what do I do?”  Well first thing I said was, restore from your backup - you have a pristine backup environment now – nearly everything is brand new. 

More tears of recovery…alas, he never backed it up.

Luckily for him the portable media was still on the shelf and still had the data he needed…


7 steps


Lessons learned

  • Pre-planning saves time and personal effort
  • Understand the impact to your business and invest accordingly
  • Apply business continuity strategies as it pertains to the value of the data
  • Test your plans
  • Maintain/Update your plans
  • Test again
  • DON'T PANIC

I have presented and written about this subject extensively over the last 15 years - I'm still asked today, about the book I co-authored in 2003, if I will ever expand on the DR Planning and Business Impact Analysis Planning sections.  Irrespective of the time that has passed since that book was published, the need still exists.

If you've been reading my blog you know that I have talked about NetApp's tiers of recovery - stay tuned for "Tears into Tiers" where I'll take this customer's environment and show what the customer experience would have been by taking advantage of the NetApp technology available today.

Chapa signing off...

PS.  The cost for all of this?  It was in the $100,000 range, all I know for sure is what I billed them and it was far less than $100K 

October 27, 2009

How long since you heard a busy signal?

A busy signal?  For those of you who are young and don't remember, a busy signal is what you would get when you called someone, but they were on the phone talking to someone else.  Yes, that's right - no call waiting.

What about that yellow rotary phone with the extra long cord that hung in the kitchen.  Growing up my family called it the "kitchen phone".  We also had a black squatty phone in the family room, and we called the "family room phone".  Funny even though we had two phones, only one person could be reached at a time, but we had TWO phones.  It was for convenience really more than it was for availability.  

We don't have those problems anymore...we have call waiting, voicemail, text messaging, email to our phones and heck we can even do the whole VoIP thing with some companies out there even in our own home.  We don't experience a busy signal when we call someone anymore (yes we still get that 'fast busy' but that's a whole different story) - in fact if we can't reach that person through any of those ways I just mentioned, we try all of the social media sites we are connected to them through and start "pinging" them there.  If we still get no response, we assume something must be wrong, and say "I hope they are okay".

Ha, yeah I could have a lot of fun with this one, but to bottom line it all it means that we don't like waiting and we don't like busy signals.

Same is true when it comes to our data - we don't like getting a "busy signal"...how horrible is it when you're trying to generate a report, finish early so you can get a jump on the weekend, or just simply trying to start your day and you can't access the "systems" you work on?

We want or rather perhaps more strongly, we demand that data be available to us all the time.  Continuous Availability.  

I can hear the wheels turning - some of you are saying, "wait in his last blog about his new role he said he's focusing on data protection - how is continuous availability data protection?"

Sidebar: Data Protection infers that you must have the data available to protect it - in my consulting practice back in Chicago I focused a great deal on the availability of data.  First line of defense (protection) is ensuring that your data is as available as possible within the context of the impact the company would feel if that data was inaccessible.  Data Protection is a large umbrella term which includes data availability to help maintain the continuity of your business operations.

So back to the busy signal idea - what is continuous availability?  I already have HA - why do I need this new "term" for availability?

First let's look at what is continuous availability from NetApp.  Take a look at this NetApp Play-by-play hosted by Mark Welke to learn how continuous availability is different from high availability AND how they both can work together.

This is all about MetroCluster and how you can transform your current HA environment into a continuously available environment.  No busy signals...

I'll blog more on MetroCluster - how it can be used, the added benefits beyond the surface and why I think this adds tremendous value.  For now, I would like to just introduce you to the basics of what MetroCluster is and down the road we'll look at various deployment options for MetroCluster.

And by the way, you don't have to have NetApp storage either, you can have EMC, HP, etc.  You can use NetApp V-series to bring all of this functionality to these other vendor's storage as well.

Food for thought...

Chapa signing off...

October 26, 2009

Tears of Recovery - 12 Bar Blues in E, follow me now...

Sweet home Chicago, home of Buddy Guy, Koko Taylor and the Blues Brothers just to name a few.  

It's where I grew up and lived most of my adult life, where I was a Data Protection Consultant and more specifically a Backup/Recovery Consultant.  Its that experience with a particular client that inspires me to write this blog.  

Tears of Recovery, sometimes you just want to sing the blues baby.


I had been contracted as the architect for a fresh, new backup/recovery solution for a large Telecommunication company in the Midwest.  A mixture of Unix and Windows platforms.  They were running one of the more popular backup applications that supported both platforms quite well.  The primary areas of a critical nature were the following:

  1. Exchange
  2. ClearCase
  3. MS SQL

The less critical were user home directories and non-critical functional areas.  Doesn't sound like much - but when you consider the magnitude these three areas consisted of, it started to feel quite daunting.

Now for the storage - what storage did they possess that that time?

  1. Sun
  2. NetApp
  3. EMC (very little)

The NetApp sales rep was trying to convince me to put our ClearCase VOBs on the NetApp storage - I resisted because of the challenges inherent with CC at that time and if a the network hiccuped or went away all together, ClearCase would hang because of the way NFS was used for the mount (hard was default back in those days).  However, I was convinced that moving our VOBs to the NAC (wow am I old or what - that's what we called the 'boxes' back then) would be a good idea with caveat mentioned.  We also leverage snapshots to protect the VOBs so we didn't have to be so concerned with 'downtime' of ClearCase as was the case previously with the other storage. 

Okay on with the blog...the point above is that we had the VOBs protected and we had the ability to recovery quite nicely in the event of loss or corruption.

MS-SQL - many of the companies business critical databases were in MS-SQL, including the Security Office primary database.  What was in that database, I don't know - I wasn't "allowed" to know, but I was given the keys to the kingdom to back it up. Funny how that works.  In any case, backups were a bit of a challenge, so we decided back then we would use SQL Backtrack - worked just fine, integrated very well with the backup application, so that was done from an architectural perspective.

And now Exchange - well, this was the BIGGEST most IMPORTANT component of all.  Without Exchange the company lost all electronic means of communication.  That means

  1. No email from website traffic to inside sales
  2. No email from department to department, peer to peer
  3. No email blasts out to the customer install base

Basically the company would be cut off unless they used the telephone to reach thousands of customers...which would be unrealistic.  So Exchange, like today, is a critical, very critical component to the business success.  Impact if this was down would be HUGE.  I had to pay special attention - very close attention to how we were going to protect and ensure recoverability of this application.

"Brick level" backups (several of you just grabbed your chest - its okay...there are alternatives today...this is just a story of what once WAS...deep breaths, deep breaths...) well they took too long.  Great idea and concept but for a company that had as many mailboxes and EXCH servers as this company did - unrealistic.  So we opted for the INFOSTOR backup - the only other option available to us at the time.  Again, we took our time with this one - running through POCs, test environment runs, restores, etc.  I wanted to make sure this was solid before turning over the keys.

Okay, so we got this down from a software perspective, we knew what we were using as the backup application and what agent technology we needed for the various critical applications we were protecting - now we needed to look at the backup hardware (they weren't ready for disk based backup just yet - that was phase II).

There was a heavy debate on the Tape library they wanted to select and more specifically which tape media.  They finally decided on DLT7000 since that was the media they had previously and wanted to maintain some compatibility.  After placing all the orders for all the new servers from a company in the south, as well as their arrays for exchange - we started implementing the backup solution.  

  1. ClearCase - that was easy.
  2. MS SQL - once we got it configured, it worked like a champ.
  3. Exchange - the crown jewels, one server at a time was added - from the least important to the most important.

After several weeks of planning, purchasing and implementing - we were done.  Everything was working like magic - well not magic, but like a lot of hard work and good planning.  I remember spending many hours there the first night we went live to make sure everything went as planned - which it did.  

As my final deliverable, I wrote an OpsGuide for the backup/recovery solution, trained the NOC personnel and the various admins who would be managing the infrastructure across the country and included several "red flag" observations that I felt if left unchecked over time would leave them exposed.

What were these red flag observations?  I'm glad you asked.

  1. In order to save costs, they chose NOT to buy maintenance on the NEW library they purchased 
  2. They chose NOT to implement the SLA that I suggested which included regular TEST recoveries of critical business application data
  3. They chose NOT to run the duplication scripts that I wrote for them to create secondary backup copies for offsite storage

So...are you starting to "get" the title of this blog now?  

They experienced a failure several months after I had turned over the "keys", they experienced backup failures that went unchecked, they experienced restore failures from the backups that did complete.

Yes, their eyes were filled with tears of recovery or rather an inability to recover.

Tune in tomorrow for the "rest of the story"

Chapa signing off...

PS.  Buddy Guy was born in Louisiana and Koko Taylor in Tennesee - but they had a huge impact on the Chicago Blues.  And dear Koko Taylor, the Queen of Chicago Blues passed away this year at the age of 80 on 6/3/09, may she rest in peace.

October 21, 2009

My new role

I've been at NetApp for almost 2 years as the Director of Backup/Recovery Solutions Marketing - yes that's right I said Marketing and I'm damn proud of it too.  During that time we accomplished a great deal from internal training to external awareness.  The team I managed included:

  • Nathan Moffitt, Sr Mgr
  • Doug Hammer, Product Marketing Manager
  • Lynda Black, Product Marketing Manager

Together we delivered several webcasts, TechTalks, TechChats, and even a few articles.  Nathan Moffitt and I recently co-authored Can you benefit from integrated data protection, a piece that explores more than just backup, but Data Protection as a whole.  We've done a great deal over the last nearly two years.  However, just as the seasons change - so do our roles and responsibilities.  

As I enter this new season I reflect on how much I have enjoyed leading this team but I'm very excited about the future.  In my new role, I will have the opportunity to speak on more than just Backup and Recovery - but Data Protection from end to end.  

End to end Data Protection - isn't that backup and DR?  

Well, not necessarily - especially when you are talking about NetApp.  In my blog "My favorite mistake" I wrap up that message talking about NetApp's approach to a unified or integrated data protection strategy.  One that includes three tiers of recovery, continuous availability, archive and security.  These areas as well as integration with our partners and others are where my energies will be focused as I continue to evangelize the importance of data protection planning and how NetApp solutions integrate.  

Stay tuned to the Bar and Grill - even though its more than just backup now...I have a lot to say.

Follow me on twitter too

Chapa signing off

October 20, 2009

SNW Phoenix

I've been fortunate enough to attend SNW this year in Phoenix, AZ.

I haven't been to a "trade show" in a long time, its been over a year or more.   I summed up this entire show in one word, CONSOLIDATION. Think about that word, when you consolidate things in your wallet, what do you do?  You get rid of things you don't need anymore, you reduce the number of credit cards you use and maybe you reduce the number of individual pictures of your family to carrying just a single family shot.

When you get rid of what you don't need, let's call that being Green.

Reduce the number of credit cards you use, Deduplication.

Carrying a single family picture versus the individual pictures, you got it - that's called Virtualization.

And what about cloud?  Well that would be our cell phone...okay, that was a stretch, but you get my point hopefully.

In any case, what I heard this year at SNW about these key areas all comes together in the one word, Consolidation.

1. Green - Reducing power and cooling, storage footprint

2. Deduplication - Efficient storage for backup.  Primary Storage needs deduplication and some are there, some are not.

3. Virtualization - Moving fast, this is probably the most distruptive solution area that drives the other two.  However, the challenges customers have are not around whether virtualization is the direction to go but rather is virtualization just as easy as taking my old storage and redeploying it.

4.  Cloud - I heard more than once, "cloud is our data center".

I'm only stating what I heard during the week at SNW talking with analysts, customers, vendors and hotel staff (okay they didn't have a clue, but they did direct me to the nearest restroom).  However most of what I heard does resonate with where I have seen things moving for some time now.  As the economy isn't returning quite as quickly as some have hoped, customers are searching with much more sincerity for solutions that will have a very short return on investment and a lower total cost of ownership.  And I hate to sound like a broken record, but its the "more with less" concept again.

At NetApp we have several bloggers who have touched on those key areas time and time again. 

Gilda Farvid

Larry Freeman

Tim Russell

Dave Hitz (of course)

Val Bercovici

Jay Kidd

And the list goes on...

My point being that NetApp has and will continue to focus on key technologies to enable customers to achieve their goals and initiatives - not based on what we FEEL but based on what we KNOW.  There's a lot of "KNOW" in the NetApp ranks, something I would look for in a company when I was on the customer side.

So, was SNW worth it?

Yes indeed, it proved to be a very good show in my humble opinion.  I'm looking forward to the next one.

Chapa signing off...

October 13, 2009

My favorite mistake

A while back I asked the question “do you still use an IMSAI 8080?” – The basis of that blog was essentially why are we still doing backup/recovery like we did in the 1980-somethings?

This week – I’m going to jump into that topic with a bit more depth.  As I’m typing this I’m listening to Sheryl Crow’s song “My Favorite Mistake” – completely random song selected by my iPod but I think it fits with some of our continued decisions around data protection and backup/recovery specifically.

“Backup is my favorite mistake” – I think I’ll make the t-shirts and sell them to support me into retirement. 

Backup isn’t a mistake per se – but the way in which we continue to protect data can be problematic.  Let’s look at some of the technologies that have been introduced to help us along.

 

Technologies

If we are honest with ourselves, we will all agree that backup to disk was one of our favorite mistakes.  For those of us who still needed to cut tape (and I know there are some of you still out there) – we needed to move that data from the primary disk to the tape as quickly as possible – but the jobs were starting to fail or not completing before the production window opened up and the backup jobs had to be cancelled so not to impact performance for our ‘customers’.  

Backup to disk, old JBOD, became one of the trends for Open Systems.  However, we STILL needed to move that data from that old JBOD to tape.  So, what we thought was going to streamline our backup operation – actually started to become a bigger problem – especially if that disk reached 100% capacity and backup jobs started to fail.  We needed a true disk cache – but it wasn’t available.  

Truth be told – I was doing the whole disk staging thing with NetBackup way before it was on their roadmap.  Yes – it’s true, I wrote a script to do disk cacheing or staging.

The script was created out of necessity…Back in 1999, one of my clients needed to speed up their backup and tape just wasn’t going to “cut it” (pun intended) - so we introduced disk.  Fine idea – the backup jobs didn’t suffer the shoe shining phenomenon because it was random access media and therefore would have a positive impact on our success rate. 

The critical backup jobs finished without issue - the challenge we faced was now moving that data from disk to tape before the disk hit 100%.  This disk was our “cache” – so what I did was write a UNIX script that ran from the master server and not the media server that was writing the jobs so I wouldn’t impact the backup performance. 

The script would look for completed jobs that have been written to this disk storage unit.  If those jobs were complete – I would modify the .f file (for those non-NetBackup users it’s one of the ‘control files’ of the image database) of that particular backup image so the master server would be the temporary “owner” of that job.

 

NOTE: Back then the NetBackup Media server that wrote the image was the only server that could “read” the image.  Until VRTS implemented the alternate read host option with bpduplicate (bpduplicate –altreadhost) – then you could do the same thing as my script was doing.

 

After that, I would start a duplication job to copy that image on disk to an image on tape.  Once successful, I would change the ownership back to the original media server and suddenly we would have TWO copies of the backup image.  One exists on disk and owned by the originating Media Server, the other would be on tape and was owned by the Master Server.   

This meant that recoveries were very quick - since they came from disk, letting us shorten our RTO.

RTO solutions

 That is only when the recovery came from the disk.  Another factor to disk based backup is its a finite storage medium, so that meant another script had to be written, that would run in cron, and would wake up every X minutes to check the capacity of the disk (df –k) if it was above Y% then it would start looking for images that existed on the disk storage unit AND that had two copies.  If the conditions were true then it would look for the OLDEST images that it could expire to bring the disk usage down below Z% - at which time, the script would be done until the next condition caused it to act.

It was disk staging before disk staging was even a thought in anyone’s mind (at least I don't think they were thinking of disk staging in 1999).

Now let’s look at what disk did to alleviate my client's pain.

Before

Client->Backup Server->Tape

  • Didn't hit all of the backup windows
  • Tape performance became a bottleneck
  • Started interleaving data on to tape (sacraficing the recovery time)

 

After

Client->Backup Server->Disk->Script on Backup Server to Clone Images to Tape->Script on Backup Server to remove images on disk.

  • 100% success for all critical backup jobs
  • Tape no longer a bottleneck
  • Interleaving data not necessary, improving recovery time (when on disk)
  • Cumbersome implementation with scripts, upkeep, consulting dollars
  • Still needed to drive data from disk to tape

Shortly thereafter, backup appliances started to turn up – ones that were “purposed built” specifically for backup.  They were supposed to be a much better solution over the old JBOD you had laying around – but you still needed a consultant who knew how to write scripts to move from this “backup appliance” to tape.  As the name specifies – these are backup appliances, not backup accelerators.  

How did we do?

Well, as you can see the "after" picture was quite compelling - however, look at the steps it took.  The increased complexity, the dependency on the consultant to maintain the scripts, the additional management touch points.  While we solved one problem, we introduced yet a whole series of other issues that were really NON-issues, until something went wrong.

At that point in 1999/2000, I started looking at other ways to protect their data that would bring the protection mechanisms much closer to the point of creation/modification versus a batch process.  What we really needed was a unified approach - but nothing was readily available in 1999/2000.

Things are much more complicated today with virtual infrastructures and cloud coming along - how are we to manage these entities while providing a better level/quality of service?

Fact is, NetApp is the only vendor today that offers array based data protection - not just backup, but data protection.  Application/VI integrated local operational Backup, Replication Based Backup, Continuous Availability, Disaster Recovery, Archive and Security.  Unified through the platform and the software layer, the NetApp Solutions provide the most comprehensive approach to protect, retain and recover critical business intelligence (data) under a Unified Data Protection model.  This extends beyond FAS  it includes V-Series as well.  

Unified Data Protection 


“Backup is my favorite mistake”

Bottom line is this – Why oh why must we continue these favorite mistakes?  Why not get a new t-shirt

“Backup? No way, I move forward"

Chapa signing off

 

 

September 02, 2009

Do you still use an IMSAI 8080?

My guess is you're not using an IMSAI 8080 - and if you are please crack open another "Dew" for me and invite me over to see it.  If you don't know what one is - consider this your lucky day, I've just enlightened you.  8" floppy drive was optional.

Believe it or not, I played with one of those when I started selling computers at ComputerLand of Arlington Heights, IL - Store #11 in 1983...

Imsai8080on

way back when I was 16 years old selling computers on the "sales floor" - we call that retail today.

when I was told, I'd never land a Corporate account (word to the wise - don't ever tell me I can't do something :-)

when just before my 17th birthday I landed my first Corporate account and sold 150 Compaq "Portables" to Ernst and Whinney.  Yes, back when there was the Big 8 Accounting firms

when I earned top saleman in the Midwest for IDEAssociates' Spiff program selling their multi-function cards.  Memory, serial AND parallel ports.  I know, I know, radical stuff.

Way back in the 1980-somethings.  The IMSAI was "made famous", if you can call it fame, in what movie?

Anyone know, anyone, anyone, Bueller?  Please on the honor system - no googling the answer.  Okay you can click here if you can't wait til the end of the blog.

Bottom line, I'm not 16 anymore, E&W is now E&Y, the Big 8 h as gone by the wayside, ComputerLand of Arlington Heights, IL, Store #11 is a distant memory and as far as I know IDEAssociates is no longer around.

However - data was being created back then and data needed to be backed up.  So there were a host of ways we did that with the "island PCs" that were in our possession back in the early 80's.  

Most people backed up to floppy.  I sold a lot of those Dysan Diskettes just for that purpose (the first disk based backup for "open systems").  In the late-80's more PC based networks were popping up and required more than a box of Dysan disks to copy the data.  I remember backing up my Novell Network with nbackup to a Bernoulli box with removable media.  While this proved to be acceptable for our Novell Network it would only hold us for maybe six months.  By that time I was looking for a new, more formal backup tool.  I had narrowed it down to two

Cheyenne ArcServe 4.0

Palindrome Backup Director

Both were quoted with tape backup - oddly enough the VP I worked for at the time choked on the $18,000 price tag the two quotes came in with, both using a Tecmar QIC tape drive.  Once I implemented the backup software and we were automatically backing up successfully to tape - I felt that life couldn't get much better.  Incidentally - I thought that same thing when I was 16 1/2 and about to close my first Corporate Account.

Reality is, our situations change, technology changes and we need to adapt or get left behind.  I'm not using an IMSAI 8080 to write this blog - as a matter of fact, I'm not using a Compaq Portable either - no, I'm using a laptop that weighs less than 2lbs and with more processing power in the video card than the IMSAI had in its entire box.  

So why are you still backing up data in the same way, same fashion as you did in 1980-something?

I ran a consulting firm in Chicago, IL focused on data protection and data availability.  Let's be straight, my primary business practice was around Backup and Recovery and the biggest challenges my clients faced back then was meeting the tightening backup windows.  These weren't small companies either, but large companies.  Large Insurance companies, Financial institutions, Telecommunication companies, etc.

Now fast forward 15 years and guess what?  IDC, InfoPro, and most of the other research firms still rank "meeting backup windows" as one of the top 5 challenges faced by customers today when it comes to backup and recovery.

Unless you have forever and a day - you need to do something differently to protect your data - so you can meet your SLAs and the backup window specifically.  I've been saying this for almost 20 years now - and I'll say it again here "if we're to be serious about protecting the data in our eco-system, we have to look at moving all the data without having to move all that data over and over and over again." 

What does that look like?  


Consider this -

1. operational backup (with snapshots) for high speed local recovery

2. replicated, block level incremental backups for long term recovery (local or remote to core)

3. duplication (to use a backup term) or mirroring offsite for disaster recovery

4. file locking for archive and compliance and encryption for security

Keep coming back to my blog to check out what I write about - also if you haven't done so already - check out Data Protection Community out for great information, blogs and articles written by some of the best in the industry.

I'll be taking a look at how this may look in your environment and what technologies you should be looking at to help your efforts.  

Until then,

Chapa signing off

PS.  The movie was Wargames with Matthew Broderick

© NetApp, Inc.  |  "Safe Harbor" Statement  |  Privacy Policy