by Brian on March 24, 2011

Recently, we had a customer with a RAID 5 server who had a failed drive. Nothing out of the ordinary. The nice thing about RAID 5 is that one drive can fail, and your server purrs along as if nothing is wrong. Meaning, your server stays running, and the only people who know there is an issue is the server management company.

One of our brilliant technicians logged into the server for a regular checkup, and found that drive number 3 was in a FAILED status. We ordered a replacement drive, and I quickly went on-site to replace the drive.  Although the server continues to operate in the event of a single drive failure, you always want to act rapidly to get the failed drive replaced. The danger lies in that if a second drive fails before the first failed drive is replaced, your server siezes to function. At which point a new hard drive system needs to be set up and the server must be restored from your latest backup. This will cause downtime for the customer and this is to be avoided at all cost.

So it was a warm spring morning and I showed up to replace the failed drive. I removed the failed drive and inserted the new one. I told the server to rebuild the RAID 5 array and I rebooted. But the server did not come back up. “Drive not found, please insert bootable disk”.  NOT the message you want to see after rebuilding a RAID array.
After some troubleshooting, I discovered that when I initialized the rebuilding of the RAID 5 array with the new disk, a second disk started to fail, thus making the entire disk array unbootable. The data was still on two of the disks (and SOME data was synced to the third), but we could not boot from the server.
I took the server back to our shop and began putting together the best way to restore this server to its peak operating potential.
I tried several recovery programs to restore the MBR (master boot record) thinking the server would boot up normally if I restored this. None of this worked. Finally I gave in and decided that we must reinstall the server with Windows.  Instead of pulling the backup disks (which would have lost 1 day of data), I tried an unconventional method of data recovery: RAID 5 data recovery. Using a program called File Scavenger, I used USB slave adapters to plug in the drive zero and drive 1 into our recovery server. This program pulled up all of the server data as if nothing was wrong. It worked beautifully.
So I re-installed Windows, copied over their data from the recovery directory, and got them up and running with NO data loss.
