boxerboxes.ca
Contact Us
Converting a Linux System to Raid1

One of the machines I manage was running Linux on a Compaq Proliant server with a SMART Array (6042) controller and two external cabinets. The system booted off of a SCSI disk on the built-in Adaptec controller and the disks in the external cabinets were configured with one disk per logical disks.

Why this machine had a SMART Array (i.e. Hardware Raid) controller, is a tribute to public purchasing rules and quite another story. The initial hardware situation looked like this:       

What was desired was complete redundancy from the SCSI controller to the cabinets. This is the kind of hardening of the system that was deemed appropriate for this system. The desired configuration was this:

The system used LVM volumes, to allow the database file systems to grow if desired. The final system would be made up of:

  • Two disks as a raid1 (mirrored) pair, containing various filesystems, notably /
  • Eight logical volumes implemented on raid1 (mirroring) disk pairs. These mirror pairs were to be in different cabinets and connected to different SCSI controllers. In this way, the system will survive the loss of a physical cabinet (e.g. power supply fault).
This was accomplished in two steps:
  • Migrating all the LVM volumes to volume groups implemented on raid1, and
  • Migrating the boot disk to a raid1 set.

Migrating Volume Groups to Software Raid

Implementing Volume Groups on software raid is quite straightforward. Since I was using identical disks, things are easy. To accomplish this, you need a copy of the Linux Raidtools installed. The steps are:

  • Partition a disk with one partition (primary or extended). Make the partition type 'fd', Linux software RAID. It is very important that the partition type is not 8e (LVM) as the partitions must be identified as part of a RAID volume and not an LVM volume at boot time. Setting the partition type to be LVM will result in the vgscan finding the partitions before the RAID volume starts. This will result in the LVMs being migrated to one of the partitions, corrupting the underlying raid volume.
  • Use sfdisk -l <device> > partition_map. This creates a file to copy the partition table to the second device.
  • sfdisk -l <device2> < partition_map
  • . This partitions the second disk identically to the first.
  • Edit /etc/raidtab and create the mirror set. Add the lines
     raiddev /dev/md10
    raid-level 1
    nr-raid-disks 2
    nr-spare-disks 0
    chunk-size 4
    persistent-superblock 1
    device /dev/cciss/c0d1p1
    raid-disk 0
    device /dev/cciss/c1d1p1
    raid-disk 1
    .
    You need to adjust the device names to suit your situation.
  • Run raidstart /dev/md10. This will start the mirror set. Cat /proc/mdstat to watch the raid volume synchronize.
  • Create the LVM volume.
    • pvcreate /dev/md/10
    • vgcreate vg00 /dev/md/10 (vgcreate will not let you use the symbolic link /dev/md10).
    • Use vgdisplay -v vg00, to determine the number of extents available. You'll see something like this:
      --- Volume group ---
      VG Name vg00
      VG Access read/write
      VG Status available/resizable
      VG # 6
      MAX LV 256
      Cur LV 0
      Open LV 0
      MAX LV Size 2 TB
      Max PV 256
      Cur PV 0
      Act PV 0
      VG Size 33.88 GB
      PE Size 32 MB
      Total PE 1084
      Alloc PE / Size 0 / 0
      Free PE / Size 1084 / 33.88 GB
      VG UUID SmrIfl-iH3x-HNCp-l0Qn-T1gN-PU7d-Y4oqtb
    • lvcreate -l 1084 vg00. (1084 is the Total PE from above). This makes /dev/vg00/lvol1.
    • vgcfgbackup vg00. Ensure that the volume group definition is saved.
  • Create the filesystem; mkfs -t xfs /dev/vg00/lvol1
  • Edit /etc/fstab and mount /dev/vg00/lvol1 where desired.

In my situation, half of the disks were already in volume groups with data on them. Instead of the disks being partitioned, the volume group used the entire disk unpartitioned. This meant that I had to copy the data (I used the other disks, as there was enough free space), dismount the current volume, remove the current logical volume with lvremove, deactivate the volume group (vgchange -a n vgXX) and vgremove the volume group. I did the migration over two days, changing the mount points and leaving the mirrored LVMs when done.

Migrating the Boot Disk

The disk containing /, /usr etc. had no partitions in LVM volumes. I decided to keep it that way and just mirror the disk. Being conservative, I had a spare disk, which allowed me to move all my filesystems to the mirror pair and have my original boot disk just in case. You can migrate an existing disk without a spare, since when a mirror is started all the data from one partition is copied to the other. How to accomplish this is explained here. I still suggest having a spare disk, but I'm pretty conservative that way.

First we need to partition the disks that will be in the mirror:

  • Make a file with the current partition list sfdisk -l /dev/scsi/host0/bus0/target0/lun0 > disk_part.txt
  • Partition the new disks
    • sfdisk /dev/scsi/host0/bus0/target1/lun0 < disk_part.txt
    • sfdisk /dev/scsi/host0/bus0/target2/lun0 < disk_part.txt
  • Change the partition types to be 'fd' (Use the 't' command in fdisk)

Now we need to create the mirror sets and create the filesystems:

  • Edit /etc/raidtab and define your RAID device. You add entries to Raidtab like this:
     raiddev /dev/md0
    raid-level 1
    nr-raid-disks 2
    nr-spare-disks 0
    chunk-size 4
    persistent-superblock 1
    device /dev/scsi/host0/bus0/target1/lun0/part5
    raid-disk 0
    device /dev/scsi/host0/bus0/target2/lun0/part5
    raid-disk 1
    for /dev/md0.
  • Run raidstart /dev/md0. Check in /proc/mdstat to see the raid volumes synchronize.
  • Make the filesystem on the raid device mkfs -t ext3 /dev/md?
  • Copy your data to the raid volume.
    • mount raid volume at a temporary location mkdir /mnt/newdisk; mount /dev/md0 /mnt/newdisk
    • Copy the data (here we are copying /usr)cd /usr; find . -depth -xdev -print | cpio -pd /mnt/newdisk
    • Dismount the mirror set. umount /mnt/newdisk
  • Edit /etc/fstab and change the mount device to be /dev/md? instead of /dev/scsi....
    It also helps to let lilo know that it is now dealing with a mirror set. The top of my lilo.conf looks like this:
    boot=/dev/md0
    raid-extra-boot="/dev/scsi/host0/bus0/target1/lun0/disc,/dev/scsi/host0/bus0/target2/lun0/disc"
    map=/boot/map
    vga=normal
    default="2422-28entmd"
    keytable=/boot/us.klt
    prompt
    nowarn
    timeout=100
    message=/boot/message
    menu-scheme=wb:bw:wb:bw

    image=/boot/vmlinuz-2.4.22-28production
    label=2422-28entmd
    root=/dev/md0
    read-only
    optional
    vga=normal
    append=" devfs=mount acpi=ht resume=/dev/sda6 splash=silent"
    initrd=/boot/initrd-2.4.22-28_md_production.img

Setting Up Booting of the RAID Volume

With all the data migrated to raid sets, you now need to enable a few things for your system to successfully boot from the raid set. Two things are needed, your initrd must preload the md driver and LILO must know that the boot disk is a raid volume (this allows booting with a failed disk).

To create an initrd image with the md driver preloaded run the command: mkinitrd --preload=md --preload=loop --preload=xfs --preload=XXX imagename. What devices you need to preload needs to be determined for each system. For starters you must preload any modules that are needed to access the boot disks. Then you must preload the filesystems. Note that the loop device is needed to remount your root filesystem read-write if it is initially mounted read-only at boot.

Now you need to edit /etc/lilo.conf and make a new section that is a copy of your current preferred boot section, with the initrd changed to the one you just made. Since I'm keeping my old boot disk just in case the sky falls, I edited lilo.conf on the new mirror pair mounted under /mnt/newroot. You also need to update the boot= line to have /dev/md0 and add raid-auto, which tells lilo to update both disks boot sectors. Now, to run lilo, you need a full /dev tree and /proc mounted. Since I use devfs this meant doing:

  • mount -t devfs none /mnt/newroot/dev
  • mount -t proc none /mnt/newroot/proc.
Now we can run lilo, telling it to use the mirror volume as the root. lilo -r /mnt/newroot -v.

With my 18 disks, the Lilo version that I came with Mandrake 9.2 took a segmentation fault and dumped core. The solution was to download and compile version 22.6 of Lilo, which fixes a problem with greater than 16 disks.

Reboot and Go

Now we're all ready to reboot onto the mirror disks. Shutdown the system and pull the original boot disk and reboot. Lilo will start and you will see your new default mirror start. When the system boots log in a type /sbin/mount. All non-lvm volumes should be mounted on /dev/md?.

All volume groups should be using the md devices. Run /sbin/vgdisplay -v and verify that all volume groups are using /dev/md? instead of a physical disk.

A note on volume group naming

Volume groups can be named anything at all, however it is best to have a scheme that will allow you to identify your volume groups in your administration scripts. I use the name vgXX (where XX is a two digit number starting at 00). This allows for 100 volume groups, which is probably enough for most situations, using hexadecimal digits would up this to 256 volume groups.

As with volume groups, logical volumes can be named anything. Some people like to name their logical volumes after the mount points they are used for. /dev/vg00/var would be mounted at /var for instance. I prefer to keep the names the generic lvolX, as things inevitably change and sticking to generic names keeps things neat over time. Mount points should be documented separately in your configuration documentation.