Coffee Space


Listen:

Software RAID

Preview Image

TL;DR

Setup RAID 10 on something that apparently doesn’t support RAID, it works but is slower than expected.

Problem

So I recently ran out of space on my 128GB M.2 SSD, my additional 1TB internal HDD and every other medium I could get my hands on. By this point I had deleted everything I could several times over, compressed everything that would be compressed and filled every memory stick available to me.

Of course there was nothing of real importance on the memory sticks (only things like podcasts 1), but the idea of having anything that is even remotely important on such a failure-ridden medium made me nervous.

I wanted something with some redundancy, and I looked at those NAS boxes with some problems:

  1. I’m poor and they cost a lot for how much storage they offer. 1TB was something like $200NZD and 4TB was something like $350NZD - with zero redundancy. Then you actually have to buy the NAS itself.
  2. A few of them rely on some proprietary software for backups, etc. Proprietary software has bitten me enough times that I’m very cautious with such things. If I couldn’t feasibly go into the source code and fix it myself, it better be easily replaceable with something else!
  3. Rumour has it the ‘N’ in ‘NAS’ is network - and out here I don’t really have a decent network infrastructure. Everything runs off a very dusty router via WiFi and I don’t really have the space to build out some mini Ethernet-based network right now.

So custom solution it is - what could possible go wrong?

Hardware

After looking around all the usual suspects, I found this:

Features:

  • MAIWO K305BU3S 2.5 / 3.5-inch SATA 4 slot hard drive base
  • Supports 2.5 / 3.5-inch SATA hard disk dock
  • Support 6TB read and write
  • USB3.0 high-speed output

Specifications:

  • Brand: MAIWO
  • Model: K305BU3S
  • Output interface: USB3.0
  • Color: Black
  • Product material: ABS plastic
  • Product weight: 338g
  • Product size: 163X126X87 MM
  • Operating system: Window XP / Vista / 7/8/10, Mac, Linux
  • Support hard disk: 2.5 “/3.5” SATA HDD / SSD

Package includes:

  • 1 * Four Slots HDD SSD Enclosure
  • 1 * Power Adapter
  • 1 * USB 3.0 Cable
  • 1 * User Manual

So the question is: Can it even RAID?

it cannot raid :(

Apparently it cannot be used for RAID… Perfect! I’ll make this into RAID storage then!

But this is only the drive holder, we need some actual drives… It happens that the University regularly upgrades computers - it happened there were some scheduled for e-waste. I tried my luck and the engineers were happy to allow me to have four 500GB Western Digital drives!

Drives, yaay!

After securing the drives, I made the purchase for the drive bay thing. It arrived in just two weeks!

Device

Initial impressions are that it was actually well packed and exactly as advertised (nice change!). Then plastic is a little more flexible than you would want and the drives could do with a little more support, but the base is well built.

Device from the site

How To RAID?

So at this point, I hadn’t actually figured out how I would get the RAID itself to work. I saw some explanations showing that it was possible to get an Ubuntu server to boot from software RAID, but nothing about using RAID as an external device.

My backup solution was always to just run them as individual drives and run a copy script across them occasionally (overnight). Slow, but should work.

Then I saw an StackExchange post about making RAID out of USB drives - sounds cool! Not only is it possible, the commands are actually very simple - too simple.

I cross-referenced with the instructions at Digital Ocean, they seemed much better and more exhaustive.

Next I had to decide which type of RAID would be best for me. There is a lot of confusing information out there that doesn’t all match up… I read this which seemed to suggest RAID was striping and mirroring - so speed and redundancy, but half the storage space. This configuration should survive the death of a single drive.

Making RAID

So now to run some commands…

Plug the drives in to see what they… Oh, two still have data on - oops. After removing all partitions…

Then I moved the computer slightly to adjust for light and bang, the drives are gone. WTF? After much panics, move my screen slightly again and they are back. Huh. Wiggle the USB connector… On… Off… On… Off… Ah! Seems like the USB cable it came with is awful despite looking good. Pulling the USB half way out makes a solid connection… Good enough till I get a replacement cable I guess.

Next to find the disks, they should be /dev/something. Mine were /dev/sdc, /dev/sdd, /dev/sde and /dev/sdf. (I used the “Disks” GUI and gparted just to be sure.)

Now to make the drives: mdadm /dev/md10 --create --level=10 -n 4 /dev/sdc /dev/sdd /dev/sde /dev/sdf

And it’s done. Huh? Run cat /proc/mdstat… OK it’s still going.

0001 md10 : active raid10 sdf[3] sde[2] sdd[1] sdc[0]
0002       976508928 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
0003       [>....................]  resync =  0.5% (4909568/976508928) finish=1325.0min speed=12220K/sec
0004       bitmap: 8/8 pages [32KB], 65536KB chunk

It’ll actually take 24 hours or so. Checked online, apparently this experience is standard. I can wait, as long as it works…

24 Hours Later

Success!

Next I create a partition on the combined drive: sudo mkfs.ext4 -F /dev/md10, then a folder: sudo mkdir -p /mnt/md10 and then a mount: sudo mount /dev/md01 /mnt/md10.

Awesome! Copy some 600MB file to it, takes a few seconds (not amazing). Turn it off, turn it back on - it’s still there!

Next we want it to be automatically setup: sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf. And we’re done! There are some other things, but I don’t want to boot from the contents or anything, so I don’t care.

Benchmark

Results

Hmmmmmmmm, that doesn’t look like USB3.0! That looks like USB2.0 speeds… Maybe a newer proper cable will help.

More to investigate at a later date! For now it is sufficient to extend my storage.

Edit: I got a new USB3.0 cable and we’re in the same place. So either my laptop is unable to utilize the full speed of USB3.0, or the drive controller is unable to run faster under RAID.

New results

On the plus side, the new cable is less prone to random disconnects, so there’s that.

Edit2: Big brain moment. The reason it’s so slow is related to that 15ms access time - strange isn’t it? It doesn’t take that long to send write/read some information via USB? Unless…

First consider your normal drives setup: HDD connected via SATA. Writing files is relatively fast if you are writing large blocks, but you pay heavily for writing to random locations as you have to wait for the drive to spin to the right place and then move the head to the right ring. This seek time isn’t so long in reality and the CPU spends tonnes of time waiting around in any case.

Next consider the use case where you have four drives multiplexing on one connection without RAID. Your read/write times are still roughly the same and you pay a similar cost for random read/writes.

Now add in the complexity of RAID 10, where you both mirror and stripe. When you write a block of some size, now you have to wait for disk 0 to spin to the correct place, then wait for disk 1 to spin to the right place for the mirror, then wait for disk 2 to spin to the right place for the stripe, then wait for disk 3 to spin to the right place to mirror the stripe. That’s an awful lot of time! And remember this is likely a blocking operation as the drive doesn’t have cache, you need to be told that it is ready to receive more data!

I suspect what hardware RAID will do is cache these writes and send to all the disks at the same time. This way you don’t have to wait for any one drive to complete. With some knowledge about the data being requested (i.e. size, location, etc) it could even minimize read time by either:

I would have to look up exactly how it works, but I suspect people smarter than me in this area would make such optimizations if there was even a 0.5% speed benefit.

A simple solution to speed up this entire setup for writes would be to use cached SSDs, so that the blocking time is minimized when writing to any one drive. I guess the real lesson here is that software RAID really can’t be as fast as hardware RAID.

Anyway, for now it is fine. It’s still multiple times faster than my internet connection or any buffer I want to read the data into. Hopefully the backups won’t take years!

EDIT: Now read part 2.


  1. If you’re asking: “How many podcasts could this person possibly have?” - probably enough to run an engineering focussed radio station 24/7 for a few years↩︎