IT Lecture Notes by Mark Kelly, McKinnon Secondary College
Last changed: June 27, 2002 9:47 AM

 

RAID

Redundant Array of Inexpensive (or "Independent") Disks is a series (RAID0 to RAID5) of increasing reliable and expensive ways of organising multiple physical hard disks into groups ("arrays") that work as a single logical disk. Each logical drive appears to the operating system as a single physical drive, thanks to the efforts of the RAID controller. There are hardware and software RAID controllers, but the software version adds strain to the CPU and is slower than a hardware controller.

RAID comes in several flavors, levels zero through five (if you want to get picky, there are others: 0, 1, 1E, 2, 3, 4, 5, 5E - 00, 10, 1E0 and 50.) Different flavours let you choose between performance, protection, and storage capacity.

A hot-spare drive is a hard disk drive in a server that is defined for automatic use in the event of a drive failure. If a drive fails, the system can automatically switch to the hot-spare drive, and the data from the dead drive is reconstructed on the hot-spare drive.

Each level is optimized for various capabilities, including improved performance of read or write operations, and improved data availability through redundant copies or parity checking. Features of different RAID levels can be combined to get the benefits of both.

"Parity", mentioned below, is an error-checking feature: when the data is saved, a special calculation based on the contents of the data is made and saved with the data. When data is later loaded, the calculation is made again and compared to the saved result. If they are different, it is a sign that the data has become corrupted and recovery measures can be taken.

RAID-0 is a high-performance/low-availability level. It provides basic disk striping without parity protection to catch errors, so while throughput is high, no redundancy is provided. It is relatively inexpensive. If one disk in the array happens to fail, all data in the array is unavailable. Striping spreads data across each disk in the array for improved performance.

STRIPING

Striping is the practice of spreading data over multiple disk drives. It allows greater performance because drives can seek and deliver data simultaneously, rather than one drive having to do all the work by itself.

 

 

RAID-0 REPORT CARD
Performance
best
There are significant performance advantage over a single disk. - Multiple reads or writes are done simultaneously with multiple disks, rather than a read or write to a single disk. Reads/writes are overlapped across all disks.
Protection
poor
If one disk fails, all data is lost, and all disks must be reformatted. Data could be restored across the array from a tape or diskette backup, if available.
Capacity
N
Where N is the number of disks, the capacity is N. RAID 0 writes blocks of data to each drive in the array. It cannot be extended once it is full.

 

RAID-1 is a disk-mirroring strategy for high performance. All data is written twice to separate drives. The cost per megabyte of storage is higher, of course, but if one drive fails, normal operations can continue with the duplicate data. If the RAID device permits hot-swapping of drives, the bad drive can be replaced without interruption.

Mirroring and Duplexing. Disk mirroring duplicates the data from one disk onto a second disk using a single disk controller. Disk duplexing is the same as mirroring, except that the disks are attached to separate disk controllers, such as two SCSI adapters.

RAID-1 REPORT CARD
Performance
good
Write performance is somewhat reduced, because both drives in the mirrored pair must complete the write operation. A read request can be handled by either disk. The drive in the pair that is less busy is issued the read command, leaving the other drive to perform another read operation.
Protection
good
If either disk fails, a copy of the data is still available on the other disk. - If a disk controller fails while duplexing, the data can still be accessed through the other controller and disk.
Capacity
N/2
Where N is the number of disks, the capacity is N divided by 2.

 

RAID-1E : (also called Hybrid or Enhanced RAID 1, RAID 6, or RAID 10) stripes the data across the disks with mirroring. In other words, it combines RAID-0 and RAID-1, provides two sets of striped disks, and is fairly popular. Striping increases throughput, and simultaneous reads from the two sets will reduce the performance drag caused by writing everything twice.

It needs three or more disks. The first stripe is the data stripe, and the second stripe is the mirror (copy) of the first data stripe shifted one drive. It is called mirrored stripe also, because a complete stripe of data is mirrored to another stripe within the set of disks.

When the user wants to load the file, the RAID system collects the necessary 6 chunks, puts them together and gives them to the user. Imagine disk 3 suddenly blows up. Chunks 3,2',6 and 5' are lost - or are they? If you look at the remaining two drives, the system can still find the 6 chunks that make up the entire file.

After the disk death, the RAID systems alerts the system manager that a disk failure has happened, the system manager slides a new hard disk in to replace the dead one and the RAID system automatically restores the contents of the dead drive onto the replacement drive. If there is a spare hot drive, the system will automatically take it over, restore the lost data and use the spare.

The more hard disks you have in the RAID system, and the more times data is redundantly saved, the more your system becomes crash-proof. Also, performance will tend to improve with more disks: when the number of disks in an array is doubled, server throughput will improve by about 50 percent until other bottlenecks occur.

RAID-1E REPORT CARD
Performance:
good
Faster than RAID1, but slower than RAID0. The data is striped across an odd number of disks. Each write has to be repeated to accomplish the mirroring. In the example above, on the first write S1, S2, and S3 are written to disks 1, 2, and 3, respectively. On the second write, S3', S1', and S2' are written to disks 1, 2, and 3, respectively (same data mirrored and shifted one disk). Performance is slowed.
Protection:
good
If any disk fails, the data is still available on the other disks
Capacity:
N/2
Where N is the number of disks, the capacity is N divided by 2.

RAID-2 performs disk striping at the bit level and uses one or more disks to store parity information. RAID-2 is not used very often because it is considered to be slow and expensive. Bit interleave data striping with hamming code. Fast for sequential applications such as graphics modeling. Almost never used with PC-based systems

RAID-3 uses data striping, generally at the byte level and uses one disk to store parity information. Striping improves the throughput of the system, and using only one disk per set for parity information reduces the cost per megabyte of storage. Striping data in small chunks provides excellent performance when transferring large amounts of data, because all disks operate in parallel. Two disks must fail within a set before data would become unavailable. Bit interleave data striping with parity - Access to all drives to retrieve on record - Best for large sequential reads - Poor for random transactions - Faster than a single disk but significantly slower than RAID 0 or RAID 1 in random environments

RAID-4 stripes data in larger chunks, which provides better performance than RAID-3 when transferring small amounts of data. Block interleave data striping with one parity disk - Best for large sequential I/O, but poor write performance - Faster than a single drive but significantly slower than RAID 0 or RAID 1.

RAID-5 stripes data in blocks sequentially across all disks in an array and writes parity data on all disks as well. By distributing parity information across all disks, RAID-5 eliminates the bottleneck sometimes created by a single parity disk. RAID-5 is increasingly popular and is well suited to transaction environments.

RAID-5 REPORT CARD
Performance
good
RAID 5 is preferred for smaller block transfers. Typically smaller block transfers are used in network files.
Protection
good
If any disk fails, the data can be recovered by using the data from the other disks along with the parity information.
Capacity
N-1
Where N is the number of disks, the capacity is N minus 1.

RAID can be high-speed because the separate hard disks can all work at once: e.g. instead of a single hard disk sending 5 chunks one at a time, all 5 hard disks could send one chunk simultaneously, making data retrieval far faster. RAID 5 combines striping with RAID1's mirroring so the array of hard disks not only has the data sprinkled across them, but there are matching disks maintaining an exact copy of the first disks.

RAID 5 yields the lowest I/O throughout of the three RAID strategies because of the additional checksum calculation and write operations required. In general, I/O throughput with RAID 5 is 30 to 50 percent lower than with RAID 1.

Also RAID 5, with its controller, usually costs more to implement than Hybrid RAID1. RAID5 requires one parity unit per stripe.

RAID 5 stripes data across all disks at the same time. Parity is interleaved with data information rather than stored on a dedicated drive

RAID 5E uses a distributed hot spare disk, so it works with a minimum of four disks. Protection: very good. Capacity: N-2. Where N is the number of disks, the capacity is N-2 (one for parity and one for spare).

Spanned Arrays (RAID x0): Spanned arrays (or composite RAID levels) are RAID arrays that are joined together to form larger RAID arrays.

Back to the IT Lecture Notes index

© Mark Kelly 2001

IT Lecture notes (c) Mark Kelly, McKinnon Secondary College