Talk about proof of concept. While there's always stress when a server crashes, this week we clearly saw proof of concept both with regards to our BDR Solution as well as our Managed Service approach to IT management and support. The hard part is seeing the positives during the stress of dealing with a crashed server. Now that the dust has settled a little, we can see success from failure.
A little background - this week, the Exchange Server at one of our clients crashed (they're about a 30 user company with 2 locations). It was a complete hardware failure (the motherboard failed and needed to be replaced). The timing of this couldn't have been more interesting - we were in the midst of the project to replace their server. If we had started the project a week earlier, we wouldn't even be talking about this. So, here's how it played out...
The first success - the server crash occurred after working hours one night and our monitoring system alerted us to the fact that the server was down. Our on-call staff sprung into action immediately responding by reviewing the situation and then calling the client via his emergency contact number and letting him know that there was a problem and we were on it. We performed some initial triage that night and then first thing the next morning we were out there dealing with things.
The second success - luckily (no, not luckily, but actually due to proper planning), we had implemented our BDR Solution for them about 6 months ago so not only were we able to restore the Exchange Server, we had a complete image of it and had it back up and running ON the BDR server within 4 hours that next day. Now keep in mind, this was a 200GB Information Store! For those of you who don't know, this is HUGE! A majority of our clients run Info Stores less than 35GB. Our's is about 70. Historically, this would have taken us easily two days to get it back up and running if we had to do a manual rebuild of the server and restore of the Info Store. Before the BDR, I can't remember EVER getting a crashed Exchange Server back up and running in production the same day! While the server was significantly slower than we anticipated (mostly due to the size of the info store), it was running and the client was functional. At the same time we were dealing with this, our procurement staff were dealing with expediting the replacements parts for the downed server. Two days later we had the parts, the Dell technician had replaced everything, the server was back up and then we transferred the active image back onto the server (with minimal downtime).
The third success - since this client is one of our Managed Service clients at our "ProtectIT" service level, all of our time involved in this response was covered! This means it didn't cost them anything more than their standard ongoing monthly fee. From a customer happiness standpoint, this is a huge win.
While it's understandable that the client wasn't happy having to go through a situation like this, and there are always things we can improve on our side, they do have piece of mind. This piece of mind comes from knowing that the decisions they made months ago to implement the BDR and go with our Managed Service approach prevented this situation from being significantly more painful, more costly, and certainly more impactful to their business. Seems to me this was a great example of success from failure. Proof of concept indeed!