Creating an encrypted RAID5 with LVM in Debian 8


It was already critical, as one of the RAID1 disks had failed and the second SMART counter for seek errors went higher and higher. No reallocated sectors though, lucky. Guess the seek mechanism was broken in this one. Or in both. The first disk had a controller problem which did not let me connect the disk anymore.

As a side note, the two broken disks are Seagate Barracuda 7200s. And those two are already replacement disks! The two before also went down pretty quickly. The three new disks are Western Digital WD30EFRX, which are marked as 'suitable for continuous operation'. Maybe the Seagates are really not made for long running hours, dunno.

WARNING: Playing with disks and data is always coming with risks! Backup your data onto external devices and store them somewhere safe. It is good to have a disconnected backup device that cannot be reached by out-of-control happyly-disk-formatting software!

Creating a new RAID5 with mdadm

After connecting the new disks they are available for use in the system. First step is now to bundle those three to one RAID array with mdadm. In my configuration, the new disks are sdc, sdd and sdf (sde being the old Seagate). Now, the initial creation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
$ sudo mdadm --create --verbose /dev/md1 --level=5 --raid-devices=3 /dev/sdc /dev/sdd /dev/sdf
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 512K
mdadm: size set to 2930135040K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.

$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 sdf[3] sdd[1] sdc[0]
      5860270080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      [>....................]  recovery =  0.0% (835364/2930135040) finish=350.6min speed=139227K/sec
      bitmap: 22/22 pages [88KB], 65536KB chunk

md0 : active raid1 sde[1]
      1953383360 blocks super 1.2 [2/1] [_U]

unused devices: <none>

The creation of /dev/md1 is now in work. What does mdadm say?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ sudo mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time :
     Raid Level : raid5
     Array Size : 5860270080 (5588.79 GiB 6000.92 GB)
  Used Dev Size : 2930135040 (2794.39 GiB 3000.46 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time :
          State : clean, degraded, recovering
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

 Rebuild Status : 0% complete

           Name : :1  (local to host)
           UUID :
         Events : 23

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       48        1      active sync   /dev/sdd
       3       8       80        2      spare rebuilding   /dev/sdf

Fine. /dev/sdf is marked as spared at creation time. This seems okay and is described in one of the linked resources. We now have an md-array available to further configure. You do not need to wait until the array is fully built.

Side note: if you happen to reboot while the creation was still in progress, you might find your array marked as 'degraded' and/or 'recovering' but nothing happens. The creation process is revivable by either writing to the array or simply setting the array as read-write (it is most probably marked as read-only too): sudo mdadm --readwrite /dev/md1. The recovery process should now continue.

Encrypting the new array device

Now comes the time to encrypt the newly created device /dev/md1. I am using LUKS here. I also encrypt the whole array and set LVM on top. You can do it the other way around, too, if you do not want to encrypt the whole disk for example.

So, I am now going to format the array with LUKS, secure the volume with a password, generate a keyfile and add this keyfile to the volume. Further info below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Creation of encrypted volume, password typing
$ sudo cryptsetup -v -y luksFormat /dev/md1

$ sudo dd if=/dev/urandom of=/root/storage2_keyfile bs=1024 count=4
4+0 records in
4+0 records out
4096 bytes (4.1 kB) copied, 0.000374909 s, 10.9 MB/s

# Only root may read the keyfile
$ sudo chmod 0400 /root/storage2_keyfile

$ sudo cryptsetup luksAddKey /dev/md1 /root/storage2_keyfile
Enter any passphrase:

Checking the LUKS header:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ sudo cryptsetup luksDump /dev/md1
LUKS header information for /dev/md1

Version:        1
Cipher name:    aes
Cipher mode:    xts-plain64
Hash spec:      sha1
Payload offset: 4096
MK bits:        256
MK digest:
MK salt:
MK iterations:  163250
UUID:

Key Slot 0: ENABLED
    Iterations:             653060
    Salt:
    Key material offset:    8
    AF stripes:             4000
Key Slot 1: ENABLED
    Iterations:             630541
    Salt:
    Key material offset:    264
    AF stripes:             4000
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

You can now test mounting the container with the keyfile via sudo cryptsetup -v -d /root/storage2_keyfile luksOpen /dev/md1 storage2_crypt. This should open a mapper device at /dev/mapper/storage2_crypt without asking for a password. If you want the container to be automatically mounted on system boot, you also need to add the md-array in /etc/crypttab:

1
2
$ sudo vim /etc/crypttab
storage2_crypt  /dev/disk/by-uuid/<uuid_of_md>  /root/storage2_keyfile  luks

Why did I also add a keyfile that is saved on the same machine? you may wonder. The reason for the whole encryption step is only to make sure, that broken disks I have to send back in the future do not contain any usable data. The disk with the faulty controller I talked about earlier I cannot send back because there is no way to shred the data the disk contains.
Is it a performance loss? I don't know yet but I guess not. The WD disks are 5400 rpm only, so they are not made for heavy disk load anyway and decryption is pretty easy for modern CPUs.

Setting up LVM physical, group and logical volumes

Step 3. We will now define the encrypted volume /dev/mapper/storage2_crypt as a physical LVM volume, add this pv to a newly created group and add logical volumes (like partitions) to this group.
The first volume I will then format as ext4 with a size of 2.5 terabytes. It will take over for the dying disk as soon as we are finished. Finally, I add the ext4 volume to /etc/fstab.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Define opened encrypted volume as a physical volume
$ sudo pvcreate -v /dev/mapper/storage2_crypt

# Add the physical volume to a volume group
$ sudo vgcreate storage2_vg /dev/mapper/storage2_crypt

# Create a logical volume inside this new group with size 2.5T and name 'files'
$ sudo lvcreate -L 2.5T -n files -v storage2_vg

# Format the new lv as ext4 with 0% space reserved for root
$ sudo mkfs.ext4 -v -m 0 -L "storage2_files" /dev/storage2_vg/files

# Setup mount points you are gonna use
$ sudo mkdir -p /mnt/storage2/{files,files2,kvm}

# Add the new ext4 formatted partition to /etc/fstab
$ sudo vim /etc/fstab
/dev/mapper/storage2_vg-files   /mnt/storage2/files/    ext4    auto,rw 0       0

Removing the old RAID1 array

In my setup we want to also remove the old broken RAID1 array. It's a LVM directly on top of an md-array.

1
2
3
4
5
6
7
8
# Unmount all logical volumes
$ sudo umount /dev/mapper/storage_old-volume

# Set the volume group as unavailable (-a n)
$ sudo vgchange -v -a n storage_old

# Stop the array
$ sudo mdadm --stop /dev/md0

The hardware disk can now be removed.

Bonus: Using SMART to watch your disks

While setting up the new array I regularily checked the SMART values of all acting disks:

1
2
3
4
$ for i in {c,d,e,f}; do \
echo -e "\nsd$i"; \
sudo smartctl -a /dev/sd$i | grep -E '(Raw_Read|Reall.*Sect|Seek_Err)' \
; done

Luckily, the new WDs do not have any errors so far. The Seagate on the other hand.. well:

1
2
3
4
sde
  1 Raw_Read_Error_Rate     0x000f   111   100   006    Pre-fail  Always       -       31981824
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   064   060   030    Pre-fail  Always       -       2958967

And later, after copying all data..

1
2
3
4
sde
  1 Raw_Read_Error_Rate     0x000f   114   100   006    Pre-fail  Always       -       61330528
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   064   060   030    Pre-fail  Always       -       2963591

Further resources