Promoting Linux Requires Advertising. It Matters to Me. TM
RAID Solutions for Linux
RAID, short for Redundant Array of Inexpensive Disks, is a method whereby information
is spread across several disks, using techniques such as disk striping (RAID Level
0) and disk mirroring (RAID level 1) to achieve redundancy, lower latency and/or
higher bandwidth for reading and/or writing, and recoverability from hard-disk
crashes. Over six different types of RAID configurations have been defined. The
folks at DPT provide a
RAID Primer as part of their technology
article series. Another primer
can be found in the technology article
series from Storage Computer. See also another
Quick Reference for RAID Levels and Mike Neuffer's
What Is RAID? intro.
Related References
Linux RAID Solutions
There are three types of RAID solution options available to Linux users:
software RAID, outboard DASD boxes, and RAID disk controllers.
- Software RAID
- Pure software RAID implements the various RAID levels in the kernel
disk (block device) code. Pure-software RAID offers the cheapest
possible solution: not only are expensive disk controller cards or
hot-swap chassis not required, but software RAID works with cheaper
IDE disks as well as SCSI disks. With today's fast CPU's, software
RAID performance can hold its own against hardware RAID in all but
the most heavily loaded systems. The current Linux Software RAID
is becoming increasingly fast, feature-rich and reliable, making
many of the lower-end hardware solutions uninteresting. Expensive,
high-end hardware may still offer advantages in management,
reliability, dual-hosting, hot-swap, etc. but are no longer required
for low-end casual deployment.
- Outboard DASD Solutions
- DASD (Direct Access Storage Device, an old IBM mainframe term) are
separate boxes that come with their own power supply, provide a
cabinet/chassis for holding the hard drives, and appear to
Linux as just another SCSI device. In many ways, these offer the
most robust RAID solution. Most boxes provide hot-swap disk bays,
where failing disk drives can be removed and replaced without
turning off power. Outboard solutions usually offer the greatest
choice of RAID levels: RAID 0,1,3,4,and 5 are common, as well as
combinations of these levels. Some boxes offer redundant power
supplies, so that a failure of a power supply will not disable
the box. Finally, with Y-scsi cables, such boxes can be attached to
several computers, allowing high-availability to be implemented, so
that if one computer fails, another can take over operations.
Because these boxes appear as a single drive to the host operating
system, yet are composed of multiple SCSI disks, they are sometimes
known as SCSI-to-SCSI boxes. Outboard boxes are usually the
most reliable RAID solutions, although they are usually the most
expensive (e.g. some of the cheaper offerings from IBM are in
the twenty-thousand dollar ballpark).
- Inboard DASD Solutions
- Similar in concept to outboard solutions, there are now a number of
bus-to-bus RAID converters that will fit inside a PC case. These
in several varieties. One style is a small disk-like box, that
fits into a standard 3.5 inch drive bay, and draws power from
the power supply in the same way that a disk would. Another style
will plug into a PCI, ISA or MicroChannel slot, and use that slot
only for electrical power (and the space it provides).
Both SCSI-to-SCSI and EIDE-to-EIDE converters are available. Because
these are converters, they appear as ordinary hard-drives to the
operating system, and do not require any special drivers. Most
such converters seem to support only RAID 0 (stripping) and 1
(mirroring), apparently due to size and cabling restrictions.
The principal advantages of inboard converters are price, reliability,
ease-of-use, and in some cases, performance. Disadvantages are usually
the lack of RAID-5 support, lack of hot-plug capabilities, and the lack
of dual-ended operation.
- RAID Disk Controllers
- Disk Controllers are adapter cards that plug into the ISA/EISA/PCI bus.
Just like regular disk controller cards, a cable attaches them to
the disk drives. Unlike regular disk controllers, the RAID controllers
will implement RAID on the card itself, performing all necessary
operations to provide various RAID levels. Just like outboard boxes,
the Linux kernel does not know (or need to know) that RAID is being used.
However, just like ordinary disk controllers, these cards must have a
corresponding device driver in the Linux kernel to be usable.
If the RAID disk controller has a modern, high-speed DSP/controller
on board, and a sufficient amount of cache memory, it can outperform
software RAID, especially on a heavily loaded system. However, using
and old controller on a modern, fast 2-way or 4-way SMP machine may
easily prove to be a performance bottle-neck as compared to a pure
software-RAID solution. Some of the performance figures below provide
additional insight into this claim.
Current Linux Software Status
- Software RAID 0, 1, 4, 5
- The md Multi-Device kernel module, included as a standard
part of the v2.0.x kernels, provides RAID-0 (disk striping) and
multi-disk linear-append spanning support in the Linux kernel.
The RAID 1,4 and 5 kernel modules are a standard part of
the latest 2.1.x kernels; patches are available for the 2.0.x kernels
and the earlier 2.1.x kernels. The code itself appears stable, although
some newer features, such as hot-reconstruction, should still be considered
as alpha or beta quality code.
The RAID patches can be applied to kernels v2.0.29 and higher, and
2.1.26 through 2.1.62 (versions 2.1.x newer than this come with the
RAID code built-in).
Please avoid kernel 2.0.30, it has serious memory management,
TCP/IP Masquerading and ISDN problems.
Mirroring-over-striping is supported, as well as other combinations
(RAID 1,4,5 can be put on top of other RAID-1,4,5 devices, or
over the linear or striped personalities. Linear & striping over RAID
1,4,5 are not supported).
Please note that many of the 2.1.x series development kernels have
problems, and thus the latest RAID patches from the ALPHA directory
at
http://ftp.kernel.org/pub/linux/daemons/raid/ need to be applied.
- JFS
- A journalled file system is under development. There is a mailing
list:
mail majordomo@majordomo.ibasys.net
subscribe linux-ljfs
- LVM
- A Logical Volume Manager
for Linux is in development and, while not feature complete, is reported
to be quite usable.
The LVM concept is available on several Unix's and on OS/2.
It provides an abstraction of the physical disks that makes
the handling of larger file systems and disk arrays easier to administer.
It does this by grouping
sets of disks (physical volumes) into a pool (volume group).
The volume group can be in turn be carved up into virtual partitions
(logical volumes) that behave just like the ordinary disk block
devices, except that (unlike disk partitions) they can be dynamically
grown, shrunk and moved about without rebooting the system or entering
into maintenance/standalone mode. A file system (or a swap space, or
a raw device) sits on top of a logical volume.
LVM utilities usually simplify adding, moving
and removing hard drives, by abstracting away the file system mount
points (/, /usr, /opt, etc) from the hard
drive devices (/dev/hda1, /dev/sdb2, etc.)
Note that if you have only a few disks (1-4), the difficulty of learning
LVM may outweigh any administrative benefits that you gain.
Currently, Linux LVM only supports RAID-linear and RAID-0. Support for
other RAID levels is planned.
- Hot-Plug Support
- Linux supports "hot-plug" SCSI in the sense that SCSI devices can be
removed and added without rebooting the machine. The procedures for
this are documented in the
SCSI Programming HOWTO. From the command-line, the commands are
echo "scsi remove-single-device host channel ID LUN " > /proc/scsi/scsi
echo "scsi add-single-device host channel ID LUN " > /proc/scsi/scsi
Don't confuse this ability with the hot-plug support offered by
vendors of outboard raid boxes.
- Disk Management
- I don't understand how RAID management is done under linux.
How are failed disks reported? How are intermittent errors reported?
Do I have to comb through syslogd reports to get this info?
How do I get disk access statistics? How do I go about tuning my
application for raid? See section at bottom of this page, though.
The Software-RAID package comes with some tools. Similar tools
do not seem to yet be available for hardware RAID solutions.
Hardware Controllers
A hardware controller is a PCI or ISA card that mediates between the CPU
and the disk drives via the I/O bus. Hardware controllers always need
a device driver to be loaded into the kernel, so that the kernel can
talk to the card. Note that there are some devices (which I've listed
in the "outboard controllers" section below) that only draw power from
the PCI/ISA bus, but do not use any of the signal pins, and do not
require a (special) device driver. This section lists only those cards
that use the PCI/ISA bus for actually moving data.
Vendors supported under Linux:
- BigStorage
- BigStorage offers a broad
line of storage products tailored for Linux.
- ICP Vortex
- ICP Vortex offers
full line of disk array controllers. Drivers are a standard part of the
2.0.x and 2.2.x kernels; the boot diskettes for most major Linux
distributions will recognize an ICP controller. Initial configuration
can be done through on-board ROM BIOS.
ICP Vortex also provides the
GDTMonitor management utility. It provides the ability to
monitor transfer rates, set hard drive and controller parameters,
and hot-swap and reconstruct defect drives. For sites that cannot
afford to take down and reboot a server in order to replace
failed disks or do other maintenance, this utility is a gotta-have
feature. As of January 1999,
this is the only such program that I have heard of for
a Linux hardware RAID controller, and this feature alone immedialtely
elevates ICP above the competition.
A RAID Primer (PDF) and
Manuals; see Chapter K for GDTmon.
- Syred
- Syred offers a series of
RAID controllers. Their sales staff indicated that they use RedHat
internally, so the Linux support should be solid.
-
- BusLogic/Mylex
- Buslogic/Mylex offers a series
of SCSI controllers, including RAID controllers. BusLogic
has been well known for their early support of SCSI on Linux.
The latest drivers for these cards are being written & maintained by
Dandelion Digital.
- DPT
-
Look for the SmartCache [I/III/IV] and SmartRAID [I/III/IV]
controllers from
Distributed Processing Technology, Inc.
Note that one must use the EATA-DMA driver, which is a part of
the standard linux kernel distribution.
There are two drivers:
- EATA-DMA: The driver for all EATA-DMA compliant (DPT) controllers.
- EATA-PIO: The driver for the very old PM2001 and PM2012A from DPT.
Outboard RAID Vendors
There are many outboard box vendors, and, in theory, they should all
work with Linux. In practice, some SCSI boxes support features that
SCSI cards don't, and vice-versa, so buyer beware. Note Some
outboard controllers are not true stand-alone, external boxes with
external power supplies, but are small devices that fit into a standard
drive bay, and draw power from the system power supply. Others are
shaped as PCI or ISA cards, but use the PCI/ISA slots only to draw
power, and do not use the signal pins on the bus. All of these devices
need some other disk controller (typically, the stock, non-raid controller
that came with your box) to communicate with. The upside to such
a scheme: no special device drivers are required. The downside: there
are even more cards, cables and connectors that can fail.
- www.raidweb.com
- www.raidweb.com
- Arco Computer Products
- Arco Computer offers
the DupliDisk EIDE-to-EIDE converter for RAID-1 (mirroring).
Three versions are supported: one that fits into an ISA slot,
one that fits into an IDE slot, and one that fits into a drive
bay.
- DILOG
- DILOG offers the
2XFR SCSI-to-SCSI
RAID-0 product. Features:
- Fits into a 3.5 inch drive bay.
- Certified by an IBM SIT lab to inter-operate with Linux.
- Dynamic Network Factory
- Dynamic
Network Factory specializes in larger arrays.
- LAND-5
- Offer several products. These appear to be stand-alone
scsi-attached boxes, and require no special Linux support.
See www.land-5.com
- StorComp
- Storage Computer is an early
pioneer in RAID. See also their
Product Sheet.
They offer:
- Multi Hosting, allowing multiple CPU's to access the disks
through SCSI.
- SNMP management MIBs.
Disk Array Management Software
Most controllers can be configured and managed via brute force, by
rebooting the machine and descending into on-card BIOS or possibly DOS
utilities to reconfigure, exchange and reuild failed drives. However,
for many system operators, rebooting is a luxury that is not available.
For these sites and servers, there is a real need for configuration and
management software that will not only report on a variety of disk
statistics, but also raise alarms where there is trouble, allow failed
drives to be disabled, swapped out, and reconstructed, and for all this
to be done without taking the array off line, without halting any server
functions. Currently (January 1999) I am aware of only one vendor that
provides this capability: ICP-Vortex.
- ICP Vortex (New Listing)
- ICP Vortex provides the
GDTMonitor management utility for its controllers.
The utility provides the ability to
monitor transfer rates, set hard drive and controller parameters,
and hot-swap and reconstruct defect drives.
- BusLogic
- Buslogic offers the
Global Array Manager which runs under SCO Unix and UnixWare.
Thus, a port to Linux is at least theoritically possible.
Contact your sales representative.
- StorComp
- Storage Computer offers an
SNMP MIB
for storage management. MIB's being being what they are, any
SNMP tool on Linux should be able to use this to query and manage
the system. However, MIB's being what they are, this is a rather
low-level, (very-) hard to use solution.
See also a
white paper on
storage management.
- DPT
- DPT provides management software with their cards. The distribution
includes SCO binaries.
Thus, a port to Linux is at least theoritically possible.
Contact your sales representative.
Product Reviews
The following product reviews were submitted by readers of this web
page. Note that little effort has been made to verify their subjectiveness
or to filter out malicious submissions.
Manufacturer: |
DPT
|
Model Number: |
PM3334UW/2 (two-channel "SmartRAID IV")
|
Number of disks, total size, raid-config: |
Two RAID-5 groups, one on each SCSI channel,
each array consisting of nine non-hot-swappable 9 GB
disk drives. The ninth drive on each RAID group is designated as a
"hot spare". One channel also includes a Quantum DLT 7000 backup
tape device.
|
On-controller memory |
64 MB as qty 4 non-parity, non-ECC, non-EDO 60ns 16MB single-sided 72-pin SIMMs
|
Months in use: |
10 months in heavy use.
|
OS kernel version, vendor and vendor version: |
2.0.32, RedHat Linux 5.0
|
Use (news spool, file server, web server?): |
File server (directories for developers)
|
Support (1=bad, 5=excellent or n/a didn't try support):
|
3
|
Performance (1=very dissatisfied 5=highly satisfied or n/a): |
4
|
Reliability (1=terrible 5=great or n/a no opinion): |
4
|
Installation (1=hard, 5=very easy) (includes s/w install
issues): |
3
|
Overall satisfaction (1 to 5): |
4
|
Comments: |
Regarding DPT support:
Try DPT's troubleshooting web pages first.
DPT's support staff does respond to e-mail, typically within
one to two working days, and they do sometimes come up with
interesting suggestions for work-arounds to try.
But in my admittedly limited experience with DPT support staff
as an end-user, unless you're truly stuck you're more likely
to find a work-around to your problems before they do.
Regarding DPT documentation:
The SmartRAID IV User's Manual is better than some documentation I've
seen, but like most documentation it's nearly useless if you encounter
a problem. The documentation does not appear to be completely
accurate as regards hot spare allocation.
And unsurprisingly, the printed documentation does not cover Linux.
Regarding DPT PM3334UW/2 installation:
The following combinations of SCSI adapters and motherboards did not work
for us:
- DPT PM3334UW/2 + Adaptec 2940UW on an Intel Advanced/Endeavor Pentium
motherboard.
- DPT PM3334UW/2 + Mylex BT-950R on an Intel Advanced/Endeavor Pentium
motherboard.
- DPT PM3334UW/2 + NCR 53c815 on an ASUS P5A-B Pentium motherboard.
The following combinations of adapters and motherboards did work for us:
- DPT PM3334UW/2 + Mylex BT-950R on an ASUS TX97-E Pentium motherboard.
- DPT PM3334UW/2 + Adaptec 2940UW on an ASUS TX97-XE Pentium motherboard.
- DPT PM3334UW/2 + Mylex BT-950R on an ASUS P2B Pentium-II motherboard.
Symptoms of non-working combinations may include that the Windows-based
DPT Storage Manager application reports "Unable to connect to DPT Engine"
or "There are no DPT HBAs on the local machine".
Regarding the DPT Storage Manager application:
The Windows-based DPT Storage Manager application version 1.P6 must
have all "options" installed, or it cannot run. Some variant of this
application is required in order to build RAID groups.
The DPT Storage Manager application is dangerous--if you click
on the wrong thing the application may immediately wipe out a RAID group,
without confirmation and without hope of recovery.
If you are adding a RAID group, you are advised to disconnect physically
any other RAID groups on which you do not plan to operate, until you
have finished working with the Storage Manager application.
There is no Linux version of the Storage Manager application (or any
other DPT run-time goodies) available at present.
Regarding Michael Neuffer's version 2.70a eata_dma.o driver for Linux:
The eata_dma driver does appear to work, with the following minor
problems:
- "Proc" information from, e.g. "cat /proc/scsi/eata_dma/1" is
mostly incorrect, and therefore it is unlikely that one would be able
to detect an alarm condition from it, among other things.
- The driver does not properly support more than two RAID groups.
If you have three RAID groups, you will not be able to use the last
RAID group--a device name of the form /dev/sdXN--that the driver
reports at boot time.
- The -c (check) option in mke2fs falsely reports problems;
omit that option when making a file system on a RAID group.
Another eata device driver exists, eata.o (as opposed to eata_dma.o),
written by Dario Ballabio, but at the time of this writing I have
not tried the eata.o driver. In the Red Hat 5.0 distribution,
the eata.o driver is present in /usr/src/linux/drivers/scsi/.
Miscellaneous issues: if a hot spare is available, experiments
appear to show that it is not possible to detect when the hot spare
has been deployed automatically as the result of a drive failure.
If a hot spare is not available, then an audible alarm sounds (earsplittingly)
when a drive fails.
|
Author: (yourname and email or anonymous) |
Jerry Sweet (jsweet@irvine.com)
|
Date: |
November 10, 1998
|
Manufacturer: |
DPT
|
Model Number: |
3334UW
|
Number of disks, total size, raid-config: |
3 x 9 GB => 17 GB (RAID 5)
|
On-controller memory |
64 MB parity, non ECC
|
Months in use: |
3 months, 2 weeks in heavy use
|
OS kernel version, vendor and vendor version: |
2.0.30, RedHat Linux 4.2
|
Use (news spool, file server, web server?): |
File server (home directories)
|
Support (1=bad, 5=excellent or n/a didn't try support): |
n/a
|
Performance (1=very dissatisfied 5=highly satisfied or n/a): |
4
|
Reliability (1=terrible 5=great or n/a no opinion): |
4
|
Installation (1=hard, 5=very easy) (includes s/w install issues): |
3
|
Overall satisfaction (1 to 5): |
4
|
Comments: |
Works nicely, and installation was easy enough in DOS, they even have
a Linux icon included now. What I really would benefit from would
be dynamic partitioning a la AIX, but that is a file system matter as
well.
If the kernel crashes on mkfs.ext2 right after boot, try generating
some traffic on the disk (dd if=/dev/sdb of=/dev/null bs=512 count=100)
before making the file system. (Thanks Mike!) (ed note: this is a
well known Linux 2.0.30 bug; try using 2.0.29 instead).
|
Author: (yourname and email or anonymous) |
Oskari Jääskeläinen (osi@fyslab.hut.fi)
|
Date: |
October 1997
|
Bonnies
The following figures were submitted by interested readers.
No effort has been made to verify their accuracy, or the
test methodology. These figures might be incorrect or
misleading. Use at your own risk!
The following has been submitted by the Dilog manufacturer:
Linux 2.1.58 with an Adaptec 2940 UW card, two IBM DCAS
drives and the DiLog 2XFR:
-------Sequential Output-------- ---Sequential Input--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block---
K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
8392 99.0 13533 61.2 5961 48.9 8124 96.4 15433 54.3
Same conditions, one drive only:
-------Sequential Output-------- ---Sequential Input--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block---
K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
6242 72.2 7248 32.4 3491 25.1 7356 84.2 7864 25.2
John Morris (jman@dejanews.com) has submitted the following:
The following are comparisons of hardware and software RAID
performance. The test machine is a dual-P2, 300MHz, with 512MB
RAM, a BusLogic Ultra-wide SCSI controller, a DPT 3334UW
SmartRAID IV controller w/64MB cache, and a bunch of Seagate
Barracuda 4G wide-SCSI disks.
These are very impressive figures, highlighting the strength
of software raid!
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
(DPT hardware RAID5, 3 disks)
DPT3x4G 1000 1914 20.0 1985 2.8 1704 6.5 5559 86.7 12857 15.6 97.1 1.8
(Linux soft RAID5, 3 disks)
SOF3x4G 1000 7312 76.2 10908 15.5 5757 20.2 5434 86.4 14728 19.9 69.3 1.5
(DPT hardware RAID5, 6 disks)
DPT6x4G 1000 2246 23.4 2371 3.4 1890 7.1 5610 87.3 9381 10.9 112.1 1.9
(Linux soft RAID5, 6 disks)
SOF6x4G 1000 7530 76.8 16991 32.0 7861 39.9 5763 90.7 23246 49.6 145.4 3.7
(I didn't test DPT RAID5 w/8 disks because the disks kept failing,
even though it was the exact same SCSI chain as the soft RAID5, which
returned no errors; please interpolate!)
(Linux soft RAID5, 8 disks)
SOF8x4G 1000 7642 77.2 17649 33.0 8207 41.5 5755 90.6 22958 48.3 160.5 3.7
(Linux soft RAID0, 8 disks)
SOF8x4G 1000 8506 86.1 27122 54.2 11086 58.9 6077 95.9 27436 62.9 185.3 4.9
Tomas Pospisek maintains additional benchmarks at his
Benchmarks page
Ram Samudrala me@ram.org reports the following:
Here's the output of the Bonnie program, on a DPT 2144 UW with
16MB of cache and three 9GB disks in a RAID 5 setup.
The machine is on a dual processor Pentium Pro
running Linux 2.0.32. For comparison, the Bonnie results for the IDE
drive on that machine are also given. For comparison, some hardware
raid figures are also given, for a Mylex controller on a DEC/OSF1 machine
(KSPAC), with a 12 9 GB disk RAID. (Note that the test size is rather
small, at 100MB, it tests memory performance as well as disk).
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU Machine
100 3277 32.0 6325 23.5 2627 18.3 4818 44.8 59697 88.0 575.9 16.3 IDE
100 9210 96.8 1613 5.9 717 5.8 3797 36.1 90931 96.8 4648.2 159.2 DPT RAID
100 5384 42.3 5780 18.7 5287 42.1 12438 87.2 62193 83.9 4983.0 65.5 Mylex RAID
Copyright (c) 1996-1999 Linas Vepstas, All Rights Reserved
All trademarks are property of their respective owners.
Last updated January 1999 by Linas Vepstas
(linas@linas.org)
Return to the Enterprise Linux(TM) Page
Return to Linas' Home Page