If you are reading this page it may be that you have had a RAID failure and you are keen to find a quick solution. You can jump to our Raid Recovery Services page, or find details about RAID Recovery Software on our Recovery Software page or jump to our Do’s, Dont’s and Recovery Steps Below
What is RAID?
RAID stands for “Redundant Array of Independent Disks”. The original term was Redundant Array of Inexpensive Disks – the term inexpensive was used because at the time high capacity hard drives were very expensive, so it was cheaper to put multiple cheaper drives into an enclosure and add some controlling software and hardware to spread the data over multiple discs. Interestingly the old cost benefits of RAID have become popular in recent years with the use of RAID 0 (see link below) to connect 2 cheaper SDD disks together to give higher capacity.
The words Redundant and Array are also a bit confusing. Array is just a way of describing multiple discs linked together.
The word Redundant is a bit of a contradiction in two ways. When we use the word redundant in everyday life we usually mean the item is no longer required, but in computing and engineering redundant or redundancy refers to having extra components or a higher spec than the bare minimum to deal with unexpected events. Eg a Bridge might have extra beams, bracing and struts. These are redundant under normal use but are critical when abnormal severe stresses are put on the bridge eg earthquake or a vehicle impact. This is when the Redundant components become indispensable.
In most modern systems the redundant discs keep a backup of the original data ie the same data is copied to more than one disc.
The second contradiction is that some RAID systems only offer extra capacity or performance benefits without redundancy. See the descriptions of RAID 0 and RAID 1 on our RAID page in the Onsite Backup Section.
Some older devices used only one disk, but in modern systems files are split into sections which are then distributed across all the drives in the RAID set. Because the technology combines multiple disks components into a logical unit, there is an increase of storage capacity, better tolerance and performance improvement.
Three Benefits of RAID:
A RAID Array helps with data management in one, two or all three of the following ways:
- Increased Disk capacity: eg 2 x 1tb discs linked together to create a virtual 2tb disk
- Increased Performance: Data distributed over multiple discs which allows data to be Read and Written to multiple discs at the same time thus increasing access or response time.
- Duplicate storage of data on multiple disks thus reducing the risk of data loss.
So, do not assume that all RAID levels offer data backup – though most modern RAID systems will offer data backup. Although RAID is not a replacement for traditional backup measures, it can protect small businesses from hardware failure. It does not, however, protect against malware, theft, natural disasters or user failure.
1 Disk or 2 Disk Redundancy
When a reference is made about 1-disk or 2-disk redundancy, this is referring to the number of disks that can fail while the array remains usable. With 2 disk redundancy 2 disks can fail without loss of data, though the redundancy (backup) function is lost until at least one disk is replaced.
Why can 2 or more disks fail at the same time?
When a business or individual selects a RAID Array with data redundancy functionality they assume their data will be safe and it is often a big shock when the RAID Array fails and data is lost. Just like the saying “you can be sure of 2 things in life – Death and Taxes”, when it comes to hard disks you can be sure that “IT WILL FAIL”. All hard disks will fail at some stage; every manufacturer should have a specification called “Mean Time Before Failure” MTBF or “Mean Time To Failure” MTTF – this could be up 800,000 hours (91 years). This means the average drive, used in the average way should last 91 years. However in reality, no drives are ‘average’ and this MTBF figure reduces considerably when high loads are put on the disk and different environmental conditions occur. The reason 2 disks appear to fail at the same time in a RAID system is that when one disk fails, an increased load is placed on the other disks in the array, as the RAID controller attempts to backup data over the remaining discs. That extra load causes another disc to fail and hence the user needs a good RAID Recovery Service to retrieve data.
Always keep a spare disk ready for failure
If your data is really critical to you and you accept that your RAID System will never offer 100% data security, you should always keep a compatible spare disk for your raid array so that when 1 disk fails you have the easy solution of immediately installing the spare – rather than waiting for delivery or installation.
Causes of Raid Failure
Due to its complex nature, there are a number of reasons that could lead to RAID array system failure:
- Human Error
- Software Problems
- Transmission Errors
- Hardware Malfunctions
Although RAID arrays are generally stable, they can break down from time to time and many things can cause breakdowns:
- Drive(s) not booting/missing files
- Clicking drives (physical damage within a drive)
- Raid controller failure
- Damaged data striping
- Corrupt partition table
- Raid device not starting
- Inaccessible boot device
- Failure during RAID 5 Rebuild
- Multiple hard disk failure
- Electronic damage
- Parity loss or damage
- Accidental deletion of partition data
- Accidental formatting of re-initialization
- Component Failure
Continuing to run the system in degraded mode can cause additional disks within the array to fail. Even the most sophisticated RAID configurations fail and it is important to create a backups to other media or to an Online Backup Service. If the system does fail, a professional data recovery specialist may be able to recover the data.
Do’s and Don’ts
It is important, when the RAID system stops functioning, to turn it off immediately. Never use data recovery software on striped data and exercise extreme caution while attempting to repair or rebuild the array. Do not attempt repairs if there are any unusual failure symptoms or error messages. The best way to maintain a recoverable system is to shut them off immediately after fail symptoms occur. Operating a system with fail symptoms, can complicate the data recovery process. For best results, contact a RAID data recovery service immediately.
Fully document the configuration during initial setup, especially the main drives and maintain sequence by tagging physical drive units. Test the subsystem’s ability to recover from a drive failure by removing one of the drives of the subsystem while it’s running, after a full backup. Understand the fundamental concepts behind RAID functioning and recognize that if as many as two hard drive elements have failed, there is no possibility of regaining access to data without the help of a data recovery expert. Back your system up regularly. Understand the distinction between with RAID documentation and be able to distinguish between RAID systems. Immediately replace a failed hard drive and fully understand what “fault tolerance” means to your specific subsystem. When you begin an in-house trouble-shooting on failed RAID equipment, be meticulous. What to do when you experience a RAID failure:
- Be aware of strange noises and grinding
- Turn off computer immediately
- Never attempt to recover the data yourself using off the shelf software
- Always take a clone of the failed media before attempting repairs
- Always employ a data recovery professional to recover critical data
- Always employ a data recovery professional with class 100 clean facilities
- Check power cords and connections regularly
- Always label the drives in the RAID array with their positions
- Backup the drive after failure
There are many things not to do when dealing with RAID failure:
- Never use data recovery software on a hard drive with symptoms of mechanical failure
- Never recover data to the same physical hard drive
- Do not shake, rattle or attempt to clean
- Never open a hard drive
- Never change the PCB of a hard disc
- Do not run “CHKDSK” or “Vrepair”
- Do not continue rebuild after encountering the simultaneous failure of more than one drive
- Never interchange the hard disks in the RAID array
- Do not attempt to rebuild a RAID unless all drive members are present and fully functional
- Never ignore a RAID data storage subsystem fault warning
- Do not attempt to clean the hard drive
- Do not place in refrigerator to cool
For water damage:
- Do not attempt to clean or dry the hard disk
- Never attempt to dry by exposing to heat, such as a hairdryer
- Do not allow hard drive to dry during shipping process
- Do not attempt to operate visibly damaged devices
In order to prevent data loss or damage to file system structures, proper recovery protocols are important. RAID systems need to be rebuilt after losing a hard disk and it is a possibility that access to data will be lost during this process when a second disk fails or when the RAID controller stops functioning properly. Due to the way data is stored across multiple disks, RAID recovery is very specialized. RAIDs are amongst the most complex media devices to recover data from. Different manufacturers also tend to use bespoke configuration applications that can add further complexity. Due to the complexity, there are many steps a professional RAID Recovery Service needs to take..
- Disconnect and label all disks, cables and ports
- Be careful when handling the disks
- When delivering to a repair specialist, protect the disks carefully
- A lab will discover why the data is inaccessible,
- suggest further procedures,
- and estimate the cost and turnaround time
- The lab will also determine the extent of any mechanical damage
- Lab will determine:
- Technical and logical parameters of the disk array
- The original configuration
- Parameters necessary for the reconstruction
- Typically an email is sent to the client regarding a file listing
- Lab will determine:
- Working Copy
- Backup copies will be made of all hard disks
- A virtual image of the array will be created so no further work is done on the original disks
- Any repairs to the file system will be made at this time
- A copy will be made once there is a functional virtual image of the array
- If there is any doubt about the reliability of any RAID discs the customer will receive the copy on new media.
State of the art recovery tools and techniques are essential, but it is the experience of the recovery service engineers and the software coders that make the difference. A wide range of media issues can be treated, including controller damage, multiple hard drive failures, hard drive firmware damage and file corruption. Many people are unaware that a good data recovery service can often restore data after fire damage.
Selecting a RAID Recovery Services company
When choosing a company to recover data from a RAID array, it is important to choose one that is familiar with the specific manufacturer. Different servers store data differently and there are different techniques and storage procedures that need to be followed.
Ensure that the service provider has a Class 100 clean room. This ensures that the surrounding air is free of any contaminants so that disks can be opened up to perform the necessary repair work otherwise contaminants will land on the disk surface of the exposed platters creating further damage and data loss.
Also compare their checklist to the procedures we list above. It is suggested that companies should refrain from using IT support companies to assist in rebuilding the broken arrays and retrieving its data. Typically, while great with dealing with everyday “help desk” situations, IT support companies tend to be over their heads when dealing with complex RAID data recovery, and in many instances have recommended the incorrect course of action. This could lead to permanent data loss and costly consequences for your organisation.