Promoting Linux Requires Advertising. It Matters to Me. TM
GnuCash Personal Finance Manager
GnuCash!

RAID Solutions for Linux

RAID, short for Redundant Array of Inexpensive Disks, is a method whereby information is spread across several disks, using techniques such as disk striping (RAID Level 0) and disk mirroring (RAID level 1) to achieve redundancy, lower latency and/or higher bandwidth for reading and/or writing, and recoverability from hard-disk crashes. Over six different types of RAID configurations have been defined. The folks at DPT provide a RAID Primer as part of their technology article series. Another primer can be found in the technology article series from Storage Computer. See also another Quick Reference for RAID Levels and Mike Neuffer's What Is RAID? intro.

Related References

Linux RAID Solutions

There are three types of RAID solution options available to Linux users: software RAID, outboard DASD boxes, and RAID disk controllers.

Software RAID
Pure software RAID implements the various RAID levels in the kernel disk (block device) code. Pure-software RAID offers the cheapest possible solution: not only are expensive disk controller cards or hot-swap chassis not required, but software RAID works with cheaper IDE disks as well as SCSI disks. With today's fast CPU's, software RAID performance can hold its own against hardware RAID in all but the most heavily loaded systems. The current Linux Software RAID is becoming increasingly fast, feature-rich and reliable, making many of the lower-end hardware solutions uninteresting. Expensive, high-end hardware may still offer advantages in management, reliability, dual-hosting, hot-swap, etc. but are no longer required for low-end casual deployment.

Outboard DASD Solutions
DASD (Direct Access Storage Device, an old IBM mainframe term) are separate boxes that come with their own power supply, provide a cabinet/chassis for holding the hard drives, and appear to Linux as just another SCSI device. In many ways, these offer the most robust RAID solution. Most boxes provide hot-swap disk bays, where failing disk drives can be removed and replaced without turning off power. Outboard solutions usually offer the greatest choice of RAID levels: RAID 0,1,3,4,and 5 are common, as well as combinations of these levels. Some boxes offer redundant power supplies, so that a failure of a power supply will not disable the box. Finally, with Y-scsi cables, such boxes can be attached to several computers, allowing high-availability to be implemented, so that if one computer fails, another can take over operations.

Because these boxes appear as a single drive to the host operating system, yet are composed of multiple SCSI disks, they are sometimes known as SCSI-to-SCSI boxes. Outboard boxes are usually the most reliable RAID solutions, although they are usually the most expensive (e.g. some of the cheaper offerings from IBM are in the twenty-thousand dollar ballpark).

Inboard DASD Solutions
Similar in concept to outboard solutions, there are now a number of bus-to-bus RAID converters that will fit inside a PC case. These in several varieties. One style is a small disk-like box, that fits into a standard 3.5 inch drive bay, and draws power from the power supply in the same way that a disk would. Another style will plug into a PCI, ISA or MicroChannel slot, and use that slot only for electrical power (and the space it provides).

Both SCSI-to-SCSI and EIDE-to-EIDE converters are available. Because these are converters, they appear as ordinary hard-drives to the operating system, and do not require any special drivers. Most such converters seem to support only RAID 0 (stripping) and 1 (mirroring), apparently due to size and cabling restrictions.

The principal advantages of inboard converters are price, reliability, ease-of-use, and in some cases, performance. Disadvantages are usually the lack of RAID-5 support, lack of hot-plug capabilities, and the lack of dual-ended operation.

RAID Disk Controllers
Disk Controllers are adapter cards that plug into the ISA/EISA/PCI bus. Just like regular disk controller cards, a cable attaches them to the disk drives. Unlike regular disk controllers, the RAID controllers will implement RAID on the card itself, performing all necessary operations to provide various RAID levels. Just like outboard boxes, the Linux kernel does not know (or need to know) that RAID is being used. However, just like ordinary disk controllers, these cards must have a corresponding device driver in the Linux kernel to be usable.

If the RAID disk controller has a modern, high-speed DSP/controller on board, and a sufficient amount of cache memory, it can outperform software RAID, especially on a heavily loaded system. However, using and old controller on a modern, fast 2-way or 4-way SMP machine may easily prove to be a performance bottle-neck as compared to a pure software-RAID solution. Some of the performance figures below provide additional insight into this claim.

Current Linux Software Status

Software RAID 0, 1, 4, 5
The md Multi-Device kernel module, included as a standard part of the v2.0.x kernels, provides RAID-0 (disk striping) and multi-disk linear-append spanning support in the Linux kernel. The RAID 1,4 and 5 kernel modules are a standard part of the latest 2.1.x kernels; patches are available for the 2.0.x kernels and the earlier 2.1.x kernels. The code itself appears stable, although some newer features, such as hot-reconstruction, should still be considered as alpha or beta quality code.

The RAID patches can be applied to kernels v2.0.29 and higher, and 2.1.26 through 2.1.62 (versions 2.1.x newer than this come with the RAID code built-in). Please avoid kernel 2.0.30, it has serious memory management, TCP/IP Masquerading and ISDN problems. Mirroring-over-striping is supported, as well as other combinations (RAID 1,4,5 can be put on top of other RAID-1,4,5 devices, or over the linear or striped personalities. Linear & striping over RAID 1,4,5 are not supported).

Please note that many of the 2.1.x series development kernels have problems, and thus the latest RAID patches from the ALPHA directory at http://ftp.kernel.org/pub/linux/daemons/raid/ need to be applied.

JFS
A journalled file system is under development. There is a mailing list:
mail majordomo@majordomo.ibasys.net
subscribe linux-ljfs

LVM
A Logical Volume Manager for Linux is in development and, while not feature complete, is reported to be quite usable.

The LVM concept is available on several Unix's and on OS/2. It provides an abstraction of the physical disks that makes the handling of larger file systems and disk arrays easier to administer. It does this by grouping sets of disks (physical volumes) into a pool (volume group). The volume group can be in turn be carved up into virtual partitions (logical volumes) that behave just like the ordinary disk block devices, except that (unlike disk partitions) they can be dynamically grown, shrunk and moved about without rebooting the system or entering into maintenance/standalone mode. A file system (or a swap space, or a raw device) sits on top of a logical volume. LVM utilities usually simplify adding, moving and removing hard drives, by abstracting away the file system mount points (/, /usr, /opt, etc) from the hard drive devices (/dev/hda1, /dev/sdb2, etc.) Note that if you have only a few disks (1-4), the difficulty of learning LVM may outweigh any administrative benefits that you gain.

Currently, Linux LVM only supports RAID-linear and RAID-0. Support for other RAID levels is planned.

Hot-Plug Support
Linux supports "hot-plug" SCSI in the sense that SCSI devices can be removed and added without rebooting the machine. The procedures for this are documented in the SCSI Programming HOWTO. From the command-line, the commands are
echo "scsi remove-single-device host channel ID LUN " > /proc/scsi/scsi
echo "scsi add-single-device host channel ID LUN " > /proc/scsi/scsi
Don't confuse this ability with the hot-plug support offered by vendors of outboard raid boxes.

Disk Management
I don't understand how RAID management is done under linux. How are failed disks reported? How are intermittent errors reported? Do I have to comb through syslogd reports to get this info? How do I get disk access statistics? How do I go about tuning my application for raid? See section at bottom of this page, though.

The Software-RAID package comes with some tools. Similar tools do not seem to yet be available for hardware RAID solutions.

Hardware Controllers

A hardware controller is a PCI or ISA card that mediates between the CPU and the disk drives via the I/O bus. Hardware controllers always need a device driver to be loaded into the kernel, so that the kernel can talk to the card. Note that there are some devices (which I've listed in the "outboard controllers" section below) that only draw power from the PCI/ISA bus, but do not use any of the signal pins, and do not require a (special) device driver. This section lists only those cards that use the PCI/ISA bus for actually moving data.

Vendors supported under Linux:

BigStorage
BigStorage offers a broad line of storage products tailored for Linux.

ICP Vortex
ICP Vortex offers full line of disk array controllers. Drivers are a standard part of the 2.0.x and 2.2.x kernels; the boot diskettes for most major Linux distributions will recognize an ICP controller. Initial configuration can be done through on-board ROM BIOS.

ICP Vortex also provides the GDTMonitor management utility. It provides the ability to monitor transfer rates, set hard drive and controller parameters, and hot-swap and reconstruct defect drives. For sites that cannot afford to take down and reboot a server in order to replace failed disks or do other maintenance, this utility is a gotta-have feature. As of January 1999, this is the only such program that I have heard of for a Linux hardware RAID controller, and this feature alone immedialtely elevates ICP above the competition.

A RAID Primer (PDF) and Manuals; see Chapter K for GDTmon.

Syred
Syred offers a series of RAID controllers. Their sales staff indicated that they use RedHat internally, so the Linux support should be solid.

BusLogic/Mylex
Buslogic/Mylex offers a series of SCSI controllers, including RAID controllers. BusLogic has been well known for their early support of SCSI on Linux. The latest drivers for these cards are being written & maintained by Dandelion Digital.

DPT
Look for the SmartCache [I/III/IV] and SmartRAID [I/III/IV] controllers from Distributed Processing Technology, Inc. Note that one must use the EATA-DMA driver, which is a part of the standard linux kernel distribution. There are two drivers:

Outboard RAID Vendors

There are many outboard box vendors, and, in theory, they should all work with Linux. In practice, some SCSI boxes support features that SCSI cards don't, and vice-versa, so buyer beware. Note Some outboard controllers are not true stand-alone, external boxes with external power supplies, but are small devices that fit into a standard drive bay, and draw power from the system power supply. Others are shaped as PCI or ISA cards, but use the PCI/ISA slots only to draw power, and do not use the signal pins on the bus. All of these devices need some other disk controller (typically, the stock, non-raid controller that came with your box) to communicate with. The upside to such a scheme: no special device drivers are required. The downside: there are even more cards, cables and connectors that can fail.
www.raidweb.com
www.raidweb.com

Arco Computer Products
Arco Computer offers the DupliDisk EIDE-to-EIDE converter for RAID-1 (mirroring). Three versions are supported: one that fits into an ISA slot, one that fits into an IDE slot, and one that fits into a drive bay.

DILOG
DILOG offers the 2XFR SCSI-to-SCSI RAID-0 product. Features:

Dynamic Network Factory
Dynamic Network Factory specializes in larger arrays.

LAND-5
Offer several products. These appear to be stand-alone scsi-attached boxes, and require no special Linux support. See www.land-5.com

StorComp
Storage Computer is an early pioneer in RAID. See also their Product Sheet. They offer:

Disk Array Management Software

Most controllers can be configured and managed via brute force, by rebooting the machine and descending into on-card BIOS or possibly DOS utilities to reconfigure, exchange and reuild failed drives. However, for many system operators, rebooting is a luxury that is not available. For these sites and servers, there is a real need for configuration and management software that will not only report on a variety of disk statistics, but also raise alarms where there is trouble, allow failed drives to be disabled, swapped out, and reconstructed, and for all this to be done without taking the array off line, without halting any server functions. Currently (January 1999) I am aware of only one vendor that provides this capability: ICP-Vortex.

ICP Vortex (New Listing)
ICP Vortex provides the GDTMonitor management utility for its controllers. The utility provides the ability to monitor transfer rates, set hard drive and controller parameters, and hot-swap and reconstruct defect drives.

BusLogic
Buslogic offers the Global Array Manager which runs under SCO Unix and UnixWare. Thus, a port to Linux is at least theoritically possible. Contact your sales representative.

StorComp
Storage Computer offers an SNMP MIB for storage management. MIB's being being what they are, any SNMP tool on Linux should be able to use this to query and manage the system. However, MIB's being what they are, this is a rather low-level, (very-) hard to use solution. See also a white paper on storage management.

DPT
DPT provides management software with their cards. The distribution includes SCO binaries. Thus, a port to Linux is at least theoritically possible. Contact your sales representative.

Product Reviews

The following product reviews were submitted by readers of this web page. Note that little effort has been made to verify their subjectiveness or to filter out malicious submissions.

Manufacturer: DPT
Model Number: PM3334UW/2 (two-channel "SmartRAID IV")
Number of disks, total size, raid-config: Two RAID-5 groups, one on each SCSI channel, each array consisting of nine non-hot-swappable 9 GB disk drives. The ninth drive on each RAID group is designated as a "hot spare". One channel also includes a Quantum DLT 7000 backup tape device.
On-controller memory 64 MB as qty 4 non-parity, non-ECC, non-EDO 60ns 16MB single-sided 72-pin SIMMs
Months in use: 10 months in heavy use.
OS kernel version, vendor and vendor version: 2.0.32, RedHat Linux 5.0
Use (news spool, file server, web server?): File server (directories for developers)
Support (1=bad, 5=excellent or n/a didn't try support): 3
Performance (1=very dissatisfied 5=highly satisfied or n/a): 4
Reliability (1=terrible 5=great or n/a no opinion): 4
Installation (1=hard, 5=very easy) (includes s/w install issues): 3
Overall satisfaction (1 to 5): 4
Comments: Regarding DPT support: Try DPT's troubleshooting web pages first. DPT's support staff does respond to e-mail, typically within one to two working days, and they do sometimes come up with interesting suggestions for work-arounds to try. But in my admittedly limited experience with DPT support staff as an end-user, unless you're truly stuck you're more likely to find a work-around to your problems before they do.

Regarding DPT documentation: The SmartRAID IV User's Manual is better than some documentation I've seen, but like most documentation it's nearly useless if you encounter a problem. The documentation does not appear to be completely accurate as regards hot spare allocation. And unsurprisingly, the printed documentation does not cover Linux.

Regarding DPT PM3334UW/2 installation: The following combinations of SCSI adapters and motherboards did not work for us:

  • DPT PM3334UW/2 + Adaptec 2940UW on an Intel Advanced/Endeavor Pentium motherboard.
  • DPT PM3334UW/2 + Mylex BT-950R on an Intel Advanced/Endeavor Pentium motherboard.
  • DPT PM3334UW/2 + NCR 53c815 on an ASUS P5A-B Pentium motherboard.
The following combinations of adapters and motherboards did work for us:
  • DPT PM3334UW/2 + Mylex BT-950R on an ASUS TX97-E Pentium motherboard.
  • DPT PM3334UW/2 + Adaptec 2940UW on an ASUS TX97-XE Pentium motherboard.
  • DPT PM3334UW/2 + Mylex BT-950R on an ASUS P2B Pentium-II motherboard.

Symptoms of non-working combinations may include that the Windows-based DPT Storage Manager application reports "Unable to connect to DPT Engine" or "There are no DPT HBAs on the local machine".

Regarding the DPT Storage Manager application: The Windows-based DPT Storage Manager application version 1.P6 must have all "options" installed, or it cannot run. Some variant of this application is required in order to build RAID groups.

The DPT Storage Manager application is dangerous--if you click on the wrong thing the application may immediately wipe out a RAID group, without confirmation and without hope of recovery. If you are adding a RAID group, you are advised to disconnect physically any other RAID groups on which you do not plan to operate, until you have finished working with the Storage Manager application. There is no Linux version of the Storage Manager application (or any other DPT run-time goodies) available at present.

Regarding Michael Neuffer's version 2.70a eata_dma.o driver for Linux: The eata_dma driver does appear to work, with the following minor problems:

  1. "Proc" information from, e.g. "cat /proc/scsi/eata_dma/1" is mostly incorrect, and therefore it is unlikely that one would be able to detect an alarm condition from it, among other things.
  2. The driver does not properly support more than two RAID groups. If you have three RAID groups, you will not be able to use the last RAID group--a device name of the form /dev/sdXN--that the driver reports at boot time.
  3. The -c (check) option in mke2fs falsely reports problems; omit that option when making a file system on a RAID group.
Another eata device driver exists, eata.o (as opposed to eata_dma.o), written by Dario Ballabio, but at the time of this writing I have not tried the eata.o driver. In the Red Hat 5.0 distribution, the eata.o driver is present in /usr/src/linux/drivers/scsi/.

Miscellaneous issues: if a hot spare is available, experiments appear to show that it is not possible to detect when the hot spare has been deployed automatically as the result of a drive failure. If a hot spare is not available, then an audible alarm sounds (earsplittingly) when a drive fails.

Author: (yourname and email or anonymous) Jerry Sweet (jsweet@irvine.com)
Date: November 10, 1998


Manufacturer: DPT
Model Number: 3334UW
Number of disks, total size, raid-config: 3 x 9 GB => 17 GB (RAID 5)
On-controller memory 64 MB parity, non ECC
Months in use: 3 months, 2 weeks in heavy use
OS kernel version, vendor and vendor version: 2.0.30, RedHat Linux 4.2
Use (news spool, file server, web server?): File server (home directories)
Support (1=bad, 5=excellent or n/a didn't try support): n/a
Performance (1=very dissatisfied 5=highly satisfied or n/a): 4
Reliability (1=terrible 5=great or n/a no opinion): 4
Installation (1=hard, 5=very easy) (includes s/w install issues): 3
Overall satisfaction (1 to 5): 4
Comments: Works nicely, and installation was easy enough in DOS, they even have a Linux icon included now. What I really would benefit from would be dynamic partitioning a la AIX, but that is a file system matter as well.

If the kernel crashes on mkfs.ext2 right after boot, try generating some traffic on the disk (dd if=/dev/sdb of=/dev/null bs=512 count=100) before making the file system. (Thanks Mike!) (ed note: this is a well known Linux 2.0.30 bug; try using 2.0.29 instead).

Author: (yourname and email or anonymous) Oskari Jääskeläinen (osi@fyslab.hut.fi)
Date: October 1997

Bonnies

The following figures were submitted by interested readers. No effort has been made to verify their accuracy, or the test methodology. These figures might be incorrect or misleading. Use at your own risk!
The following has been submitted by the Dilog manufacturer:

Linux 2.1.58 with an Adaptec 2940 UW card, two IBM DCAS

drives and the DiLog 2XFR:



   -------Sequential Output-------- ---Sequential Input--

   -Per Char- --Block--- -Rewrite-- -Per Char- --Block---

   K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU

    8392 99.0 13533 61.2  5961 48.9  8124 96.4 15433 54.3



Same conditions, one drive only:



   -------Sequential Output-------- ---Sequential Input--

   -Per Char- --Block--- -Rewrite-- -Per Char- --Block---

   K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU

    6242 72.2  7248 32.4  3491 25.1  7356 84.2  7864 25.2




John Morris (jman@dejanews.com) has submitted the following:

The following are comparisons of hardware and software RAID performance. The test machine is a dual-P2, 300MHz, with 512MB RAM, a BusLogic Ultra-wide SCSI controller, a DPT 3334UW SmartRAID IV controller w/64MB cache, and a bunch of Seagate Barracuda 4G wide-SCSI disks.

These are very impressive figures, highlighting the strength of software raid!




              -------Sequential Output-------- ---Sequential Input-- --Random--

              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---

Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU



(DPT hardware RAID5, 3 disks)

DPT3x4G  1000  1914 20.0  1985  2.8  1704  6.5  5559 86.7 12857 15.6 97.1  1.8



(Linux soft RAID5, 3 disks)

SOF3x4G  1000  7312 76.2 10908 15.5  5757 20.2  5434 86.4 14728 19.9 69.3  1.5



(DPT hardware RAID5, 6 disks)

DPT6x4G  1000  2246 23.4  2371  3.4  1890  7.1  5610 87.3  9381 10.9 112.1  1.9



(Linux soft RAID5, 6 disks)

SOF6x4G  1000  7530 76.8 16991 32.0  7861 39.9  5763 90.7 23246 49.6 145.4  3.7



(I didn't test DPT RAID5 w/8 disks because the disks kept failing,

even though it was the exact same SCSI chain as the soft RAID5, which

returned no errors; please interpolate!)



(Linux soft RAID5, 8 disks)

SOF8x4G  1000  7642 77.2 17649 33.0  8207 41.5  5755 90.6 22958 48.3 160.5  3.7



(Linux soft RAID0, 8 disks)

SOF8x4G  1000  8506 86.1 27122 54.2 11086 58.9  6077 95.9 27436 62.9 185.3  4.9




Tomas Pospisek maintains additional benchmarks at his Benchmarks page
Ram Samudrala me@ram.org reports the following:

Here's the output of the Bonnie program, on a DPT 2144 UW with 16MB of cache and three 9GB disks in a RAID 5 setup. The machine is on a dual processor Pentium Pro running Linux 2.0.32. For comparison, the Bonnie results for the IDE drive on that machine are also given. For comparison, some hardware raid figures are also given, for a Mylex controller on a DEC/OSF1 machine (KSPAC), with a 12 9 GB disk RAID. (Note that the test size is rather small, at 100MB, it tests memory performance as well as disk).


    -------Sequential Output-------- ---Sequential Input--  --Random--          

  

    -Per Char- --Block--- -Rewrite-- -Per Char- --Block---  --Seeks---          

  

 MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU   /sec  %CPU Machine 

   

100  3277 32.0  6325 23.5  2627 18.3  4818 44.8 59697 88.0  575.9  16.3 IDE

100  9210 96.8  1613  5.9   717  5.8  3797 36.1 90931 96.8 4648.2 159.2 DPT RAID

100  5384 42.3  5780 18.7  5287 42.1 12438 87.2 62193 83.9 4983.0  65.5 Mylex RAID


Copyright (c) 1996-1999 Linas Vepstas, All Rights Reserved
All trademarks are property of their respective owners.
Last updated January 1999 by Linas Vepstas (linas@linas.org)
Return to the Enterprise Linux(TM) Page
Return to Linas' Home Page