IT Lecture Notes
by Mark Kelly, McKinnon Secondary College
Last changed:
June 27, 2002 9:47 AM
RAID |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
Redundant Array of Inexpensive (or "Independent") Disks is a series (RAID0 to RAID5) of increasing reliable and expensive ways of organising multiple physical hard disks into groups ("arrays") that work as a single logical disk. Each logical drive appears to the operating system as a single physical drive, thanks to the efforts of the RAID controller. There are hardware and software RAID controllers, but the software version adds strain to the CPU and is slower than a hardware controller. RAID comes in several flavors, levels zero through five (if you want to get picky, there are others: 0, 1, 1E, 2, 3, 4, 5, 5E - 00, 10, 1E0 and 50.) Different flavours let you choose between performance, protection, and storage capacity. A hot-spare drive is a hard disk drive in a server that is defined for automatic use in the event of a drive failure. If a drive fails, the system can automatically switch to the hot-spare drive, and the data from the dead drive is reconstructed on the hot-spare drive. Each level is optimized for various capabilities, including improved performance of read or write operations, and improved data availability through redundant copies or parity checking. Features of different RAID levels can be combined to get the benefits of both. "Parity", mentioned below, is an error-checking feature: when
the data is saved, a special calculation based on the contents of the
data is made and saved with the data. When data is later loaded, the calculation
is made again and compared to the saved result. If they are different,
it is a sign that the data has become corrupted and recovery measures
can be taken. RAID-0 is a high-performance/low-availability level. It provides basic disk striping without parity protection to catch errors, so while throughput is high, no redundancy is provided. It is relatively inexpensive. If one disk in the array happens to fail, all data in the array is unavailable. Striping spreads data across each disk in the array for improved performance.
RAID-1 is a disk-mirroring strategy for high performance. All data is written twice to separate drives. The cost per megabyte of storage is higher, of course, but if one drive fails, normal operations can continue with the duplicate data. If the RAID device permits hot-swapping of drives, the bad drive can be replaced without interruption. Mirroring and Duplexing. Disk mirroring duplicates the data from one disk onto a second disk using a single disk controller. Disk duplexing is the same as mirroring, except that the disks are attached to separate disk controllers, such as two SCSI adapters.
RAID-1E : (also called Hybrid or Enhanced RAID 1, RAID 6, or RAID 10) stripes the data across the disks with mirroring. In other words, it combines RAID-0 and RAID-1, provides two sets of striped disks, and is fairly popular. Striping increases throughput, and simultaneous reads from the two sets will reduce the performance drag caused by writing everything twice. It needs three or more disks. The first stripe is the data stripe, and the second stripe is the mirror (copy) of the first data stripe shifted one drive. It is called mirrored stripe also, because a complete stripe of data is mirrored to another stripe within the set of disks.
When the user wants to load the file, the RAID system collects the necessary 6 chunks, puts them together and gives them to the user. Imagine disk 3 suddenly blows up. Chunks 3,2',6 and 5' are lost - or are they? If you look at the remaining two drives, the system can still find the 6 chunks that make up the entire file. After the disk death, the RAID systems alerts the system
manager that a disk failure has happened, the system manager slides a
new hard disk in to replace the dead one and the RAID system automatically
restores the contents of the dead drive onto the replacement drive. If
there is a spare hot drive, the system will automatically take it over,
restore the lost data and use the spare.
RAID-2 performs disk striping at the bit level and uses one or more disks to store parity information. RAID-2 is not used very often because it is considered to be slow and expensive. Bit interleave data striping with hamming code. Fast for sequential applications such as graphics modeling. Almost never used with PC-based systems RAID-3 uses data striping, generally at the byte level and uses one disk to store parity information. Striping improves the throughput of the system, and using only one disk per set for parity information reduces the cost per megabyte of storage. Striping data in small chunks provides excellent performance when transferring large amounts of data, because all disks operate in parallel. Two disks must fail within a set before data would become unavailable. Bit interleave data striping with parity - Access to all drives to retrieve on record - Best for large sequential reads - Poor for random transactions - Faster than a single disk but significantly slower than RAID 0 or RAID 1 in random environments RAID-4 stripes data in larger chunks, which provides better performance than RAID-3 when transferring small amounts of data. Block interleave data striping with one parity disk - Best for large sequential I/O, but poor write performance - Faster than a single drive but significantly slower than RAID 0 or RAID 1. RAID-5 stripes data in blocks sequentially across all disks in an array and writes parity data on all disks as well. By distributing parity information across all disks, RAID-5 eliminates the bottleneck sometimes created by a single parity disk. RAID-5 is increasingly popular and is well suited to transaction environments.
RAID can be high-speed because the separate hard disks can all work at once: e.g. instead of a single hard disk sending 5 chunks one at a time, all 5 hard disks could send one chunk simultaneously, making data retrieval far faster. RAID 5 combines striping with RAID1's mirroring so the array of hard disks not only has the data sprinkled across them, but there are matching disks maintaining an exact copy of the first disks. RAID 5 yields the lowest I/O throughout of the three RAID strategies because of the additional checksum calculation and write operations required. In general, I/O throughput with RAID 5 is 30 to 50 percent lower than with RAID 1. Also RAID 5, with its controller, usually costs more to implement than Hybrid RAID1. RAID5 requires one parity unit per stripe. RAID 5 stripes data across all disks at the same time. Parity is interleaved with data information rather than stored on a dedicated drive RAID 5E uses a distributed hot spare disk, so it works with a minimum of four disks. Protection: very good. Capacity: N-2. Where N is the number of disks, the capacity is N-2 (one for parity and one for spare). Spanned Arrays (RAID x0): Spanned arrays (or composite RAID levels) are RAID arrays that are joined together to form larger RAID arrays.
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Back to the IT Lecture Notes index
© Mark Kelly 2001
IT Lecture notes (c) Mark Kelly, McKinnon Secondary College