So you want to build a Linux-HA based iSCSI SAN


    Like you and so many before, I surveyed the internet for the current standard in a Linux-HA based SAN. I came across many links to HowTo’s and Youtube links from many individuals documenting their attempts. There was a lot of information, much duplicated and a lot of it out of date. Some issues with GlusterFS involving repairing systems with large file sizes seems to have been addressed with the latest release. But I’m jumping ahead, so let’s get back on track. I really wanted an easy system that anyone can setup and that meant a GUI interface. Fortunately I found LCMC which is a wonderful GUI to setup Corosync/Heartbeat or Pacemaker/Heartbeat. It’s still in active development. I first taught myself the LCMC interface before I dived into the different replication technologies available.

DRBDI first ran into DRBD (Distributed Replicated Block Device) on the internet which is what most people are using.DRBD

    It’s quick, easy to setup and if you don’t do it right it will corrupt all your data. From my readings it’s what a lot of 3rd party SAN developers use as the backed of the products they sell. The linux disto I used was Ubuntu Server (no gui front end). In my own experimentation I setup DRBD as both a block device which I mounted as an iSCSI target and as an EXT4 files system which I then created files with dd (unix command) which I then mounted with iSCSI. I first used IET then switched to SCST after I learned a little more. I know, it gets complicated. I will document all the different steps and the pitfalls I ran into.


    The issue I had with DRBD is that you had to worry that you didn’t run into a split-brain situation. That’s the scenario where both SAN units (primary and secondary) go online and you don’t know which data is the correct to keep. You also run into corruption if two systems attempt to write to the same file. I started looking for something else and ran into someone who setup a SAN with GlusterFS.


    He had great performance, but later on wrote that it wasn’t ready for prime time (all of this will be referenced in future posts). He did write that he used IET as his iSCSI Target software. I looked into the issue further and discovered that IET isn’t suitable for VMWare Hosts or Windows Hyper-V because it doesn’t supprot Persistent (SCSI-3) Reservations. So then I looked into how to install SCST which is what everyone right now recommends for VMWare and Hyper-V. This is where it gets annoying. The SCST drivers aren’t included in the default linux kernel which means you have to recompile the kernel to add them. When I tried it, it took hours just to recompile the kernel. A friend of mine told me that was always a headache had with Linux that ne never had with BSD.


    So now I had a reliable iSCSI software I could use. Went back and retried DRBD using LCMC for the setup and things worked well. I tried manual setups of Pacemaker/Heartbeat just as a proof of concept. I then went ahead and tried my most elaborate SAN setup to date. A four node SAN.


    Two nodes would be used to create the GlusterFS Volume. The other two nodes would be used to mount the GlusterFS Volume and SCST would be used to mount an iSCSI target using a binary file (I called it lun0.bin) I created with dd. BTW, all of this was built on an ESXi box with 1.5 TB of local storage.

    The most time consuming part was the linux kernel rebuilds. The GlusterFS nodes only had one virtual nic each and the iSCSI Head Units had two virtual nics each (one for GlusterFS and one to publish iSCSI targets). This setup actually ran rather well and I didn’t have to worry too much about split-brain because if the initial iSCSI head node didn’t release the binary file (lun0.bin for example), the other node couldn’t mount it and the gluster client would report back that the file was locked and in use. GlusterFS is meant to be used as a NAS, not a SAN but I engineered it to perform the SAN function. There are many stories about GlusterFS stealing all the bandwidth to rebuild a Gluster Volume. I’ll cover those more in detail when I get to the GlusterFS module writeup.

    In the coming days I’ll begin to write up the modules in the order that I found most efficient. VM’s work great as a test bed since you can clone the VM after you rebuild the linux kernel. The modules I plan to write in the following order…

1) SCST install on Ubuntu Server (may need to Google instructions for other distros). These will be installed on whatever system finally provides iSCSI targets to your network.

2) LCMC which also includes DRBD as part of it’s setup.

3) GlusterFS and Gluster Client. I’ll cover building everything on a single node or if you want storage nodes separate from the iSCSI Head Units.

    I have to stress again that all of this is a test setup and I didn’t use STONITH in any of the setups. You should consider looking into setting up STONITH (even for GlusterFS iSCSI Head Units) if you don’t want to suffer split-brain or corrupted data.


Leave a Reply

Your email address will not be published.

Protected with IP Blacklist CloudIP Blacklist Cloud