openais: an alternative to clvm with cman

I’ve been battling lately with a lot of problems with cman, part of Red Hat Cluster Suite. Specifically, the fencing tool (fenced) is pretty much junk when you try to start using it with Xen dom0’s. After much searching and gnashing of teeth I happened upon this mailing list post. The promise there is that you could take clvm and compile it against openais and get a cluster aware LVM which doesnt require the rest of Red Hat Cluster Suite (and its crappy documentation, crappy fencing, and general all around crappiness). A little more searching turned up this web site from Olivier Le Cam which pretty much did 90% of the work for me.

After some testing I’m happy to say it appears to work smashingly. What follows is a somewhat more complete version of how to achieve the same results on Debian Lenny. Enjoy :)


The first thing that needs to be done is to get the debian sources for clvm and modify then to use openais. After that we will recompile new packages from that source, then set up openais on our cluster nodes.

Install all the dependencies we need to compile clvm

root@host:~# apt-get build-dep clvm
root@host:~# apt-get install libopenais-dev

Now download the source files and cd into our working directory

root@host:~# cd /usr/src/
root@host:/usr/src# apt-get source clvm
root@host:/usr/src# cd lvm2-2.02.39

Now we’ll modify a few files in the source:

  • The first is debian/clvm.init. You’ll need to remove any references to cman or cluster.conf. You can download an already edited version here.
  • The next is debian/control. Modify the dependies (lvm2 without the version number, openais in place of cman) and modify the comments accordingly. A pre-edited version is here.
  • The last file is debian/rules. Replace cman by openais in the configure options, and add the PATH where to find the openais libs. Again, a pre-made version is here.

For clarities sake, here is the actual code block from debian/rules:

$(STAMPS_DIR)/setup-deb: SOURCE_DIR = $(BUILD_DIR)/source
$(STAMPS_DIR)/setup-deb: DIR = $(BUILD_DIR)/build-deb
$(STAMPS_DIR)/setup-deb: $(STAMPS_DIR)/source
        rm -rf $(DIR)
        cp -al $(SOURCE_DIR) $(DIR)
        cd $(DIR); \
        ./configure CFLAGS="$(CFLAGS)" \
		LDFLAGS="-L/usr/lib/openais" \
                $(CONFIGURE_FLAGS) \
                --with-optimisation="" \
                --with-clvmd=openais \
                --enable-readline
        touch $@

The last thing to do is update the internal version number of the clvm package and add some comments to the changelog:

root@host:/usr/src/lm2-2.02.39# dch -i

Now go ahead and compile the package:

root@host:/usr/src/lm2-2.02.39# dpkg-buildpackage -rfakeroot -uc -b

After the compliation completes you should have some shiny new .deb files in /usr/src. The one we are interested in is clvm_2.02.39-7.1_i386.deb (the actual version of yours may vary depending on what you put in the debian changelog in the previous step).

So now that weve got our custom version of clvm compiled, its time to move on the cluster nodes. On each node in the cluster, do the following…

Install openais and add a user for it:

root@node:~# apt-get install openais
root@node:~# mkdir -p /etc/ais
root@node:~# adduser --no-create-home --disabled-password --disabled-login --gecos openAIS ais

Now create the following config in /etc/ais/openais.conf. This is the most basic config you can have. All you need to do is set 192.168.1.0 to be your actual network address.

totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.1.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}

openais does not include a proper debian init sript so you can download one here and save it as /etc/init.d/openais. After that is done, add it to the proper runlevels by issuing:

root@node:~# update-rc.d openais start 62 S . start 50 0 6 .

Now we can install our custom deb and turn on LVM clustering

root@node:~# dpkg -i clvm_2.02.39-7.1_i386.deb
root@node:~# sed -i 's/^    locking_type = 1$/    locking_type = 3/' /etc/lvm/lvm.conf

Thats it! Reboot your cluster nodes and they should all be cluster aware now :)

Comment

  • Trackback are closed
  • Comments (22)
  1. Dear author,
    I liked very much the excellent solution you published, so I installed it inside my 3-node cluster, hoping it will be soon embodied inside the lenny distro. Just like you, we need Xen and only Xen.

    Maybe due to my ignorance, I need to solve some runtime problems yet. I do not think they are related to your solution, but not knowing the reason why they happen and method I can use to have a better idea, I’d like to know your opinion, if it were possible. I know nothing about openais and little about clvm.

    It happens to me that commands like lvcreate, lvremove get sometimes blocked. The only workaround seems to be killing the process from another session, kiiling clvm, aisexec and then starting openais and clvm again. After such operation the chance that the command works and produces effects visible from the same or another node is high even though not absolute. It is like a brutal manual fencing.

    I thought of a timeout problem for openais, but I have no info about its status and its operation. I also noticed that the installation created a directory /var/log/openais which is empty. There is no useful info in /var/log/syslog .

    What do you suggest to do ?

    Ciao and thank you for your patch.
    Ezio.

    • jcl
    • Oct 22nd. 2009 8:58am

    ezio,

    I have seen some of the same problems actually. I have discovered that its possible to recover from it by simply restarting the openais/clvm/lvm stack in the correct order. My best guess at this point is that there is some kind of poor interaction between openais and clvm but I havent been able to test it much.

    The problem is very intermittent too. Some times it will occur twice in one day. Other times I can go weeks and not have the issue at all.

    I have plans for trying to get to the bottom of it but I dont have any more information than that at this time. If you discover anything, please post it here for others.

    Thanks!
    – jcl

  2. Hi jcl,

    my SAN is not in a stable status. I could not configure the multipath service against my active-passive controllers yet, since I am waiting for a multipath compatible firmware upgrade. so my logs are full of errors because the nodes try to connect through FC connections to redundant devices which are not actually reachable. The fall of performance could by huge. Nevertheless I decided to test your environment, because I felt that your solution was excellent for our aims.

    Since the problem described is more frequent in my case, and my nodes are quite full of syslog errors due to the problems above, this could mean that the problem is related to timeout problems, performance of the SAN and ways to configure timeouts in the clvm and openais packages.

    Is /var/log/openais empty In your installation too ?

    I ordered 2 little AoE’s too and am studying your article about it.

    Thanks.
    ezio

    • jcl
    • Oct 23rd. 2009 7:42am

    ezio,

    /var/log/openais is indeed empty on my installations. im not sure if you have to enable logging somehow in openais or not.

    there might be some way to configure openais/clvm timeout periods… i havent looked into that. its also possible that clvm assumes certain things about cman that arent true about openais and that is causing the problem. perhaps there have been improvements in the latest source code for both packages.

    another thing you could try, though its dangerous if you do modifications to the cluster lvm metadata from different nodes at the same time, would be to put each openais instance into its own ring. that would trick clvm into thinking it had a quorate cluster since each node is, in effect, the only node in the cluster.

    – jcl

    • Xen Fan
    • Mar 10th. 2010 2:43pm

    Thanks for the great writeup, however I’m having difficulty building this on lenny. At the dpkg-buildpackage stage I get this error.

    /usr/src/lvm2-2.02.39# dpkg-buildpackage -rfakeroot -uc -b
    dpkg-buildpackage: warning: using a gain-root-command while being root
    dpkg-buildpackage: set CFLAGS to default value: -g -O2
    dpkg-buildpackage: set CPPFLAGS to default value:
    dpkg-buildpackage: set LDFLAGS to default value:
    dpkg-buildpackage: set FFLAGS to default value: -g -O2
    dpkg-buildpackage: set CXXFLAGS to default value: -g -O2
    dpkg-buildpackage: source package lvm2
    dpkg-buildpackage: source version 2.02.39-7.3
    dpkg-buildpackage: source changed by root
    dpkg-buildpackage: host architecture amd64
    fakeroot debian/rules clean
    /usr/bin/fakeroot: line 164: debian/rules: Permission denied
    dpkg-buildpackage: failure: fakeroot debian/rules clean gave error exit status 126

    Any ideas? Do you have a package anywhere for download? If I could get clvm going without all the other nonsense it would be amazing. Thanks!

    • jcl
    • Mar 10th. 2010 2:46pm

    Xen Fan,

    Looks like youre trying to use fakeroot when youre already root.

    • Xen Fan
    • Mar 10th. 2010 3:14pm

    Yeah, I was confused because you did the same thing above:

    root@host:/usr/src/lm2-2.02.39# dpkg-buildpackage -rfakeroot -uc -b

    I’m not clear on where I need to become another user. A regular user won’t be able to create the build dir in /usr/src.

    • jcl
    • Mar 10th. 2010 3:22pm

    Xen Fan,

    If you are already root, you dont need to use -rfakeroot at all.

    • Xen Fan
    • Mar 10th. 2010 3:30pm

    Regardless of userid or the use of fakeroot, the dpkg-buildpackage gives a rules error.

    With fakeroot as root:

    fakeroot debian/rules clean
    /usr/bin/fakeroot: line 164: debian/rules: Permission denied
    dpkg-buildpackage: failure: fakeroot debian/rules clean gave error exit status 126

    Without fakeroot as root:

    debian/rules clean
    Can’t exec “debian/rules”: Permission denied at /usr/bin/dpkg-buildpackage line 475.
    dpkg-buildpackage: failure: debian/rules clean failed with unknown exit code -1

    With fakeroot as regular user:
    fakeroot debian/rules clean
    /usr/bin/fakeroot: line 164: debian/rules: Permission denied
    dpkg-buildpackage: failure: fakeroot debian/rules clean gave error exit status 126

    I appreciate your help and apologize for the annoyance, but not being a debian package guy I’ve been banging my head on this for a while. I can’t even figure out what it’s trying to do.

    • Rico
    • Apr 9th. 2010 5:01pm

    I compiled your how-to and the cluster works fine. But there is a problem with clvmd… Look at this log when I launch it in debug mode :

    # clvmd -d 1
    CLVMD[3e5cd770]: Apr 9 13:56:41 CLVMD started
    CLVMD[3e5cd770]: Apr 9 13:56:41 Our local node id is -1062714761
    CLVMD[3e5cd770]: Apr 9 13:56:41 Add_internal_client, fd = 7
    CLVMD[3e5cd770]: Apr 9 13:56:41 Connected to OpenAIS
    CLVMD[3e5cd770]: Apr 9 13:56:41 Cluster ready, doing some more initialisation
    CLVMD[3e5cd770]: Apr 9 13:56:41 starting LVM thread
    CLVMD[40800950]: Apr 9 13:56:41 LVM thread function started
    CLVMD[3e5cd770]: Apr 9 13:56:41 clvmd ready for work
    CLVMD[3e5cd770]: Apr 9 13:56:41 Using timeout of 60 seconds
    CLVMD[3e5cd770]: Apr 9 13:56:41 confchg callback. 1 joined, 0 left, 2 members
    File descriptor 4 left open
    File descriptor 5 left open
    File descriptor 6 left open
    WARNING: Locking disabled. Be careful! This could corrupt your metadata.
    CLVMD[40800950]: Apr 9 13:56:41 LVM thread waiting for work

    Did you see the WARNING ? The locking does not work, and I have no idea why ! my /etc/lvm/lvm.conf is modified to use internal locking, I can see lock/unlock requests in openais logs, but clvmd just doesn’t care about that !!!

    Any help will be appreciated :)

    • Olly
    • Jun 17th. 2010 4:37am

    Hi

    thx for your howto, but i have a problem to compile the clvm package. I got this error message

    make[3]: Leaving directory `/usr/src/lvm2-2.02.39/debian/build/build-deb/daemons/clvmd’
    make[3]: Entering directory `/usr/src/lvm2-2.02.39/debian/build/build-deb/daemons/clvmd’
    gcc -c -I../../include -I/usr/src/lvm2-2.02.39/debian/build/build-deb//include -DUSE_OPENAIS -D_REENTRANT -DHAVE_CONFIG_H -g -O2 -g -O2 -fPIC -Wall -Wundef -Wshadow -Wcast-align -Wwrite-strings -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Winline -Wmissing-noreturn -Wformat-security -g -O2 -fPIC -Wall -Wundef -Wshadow -Wcast-align -Wwrite-strings -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Winline -Wmissing-noreturn -Wformat-security -fno-strict-aliasing -g -O2 -fPIC -Wall -Wundef -Wshadow -Wcast-align -Wwrite-strings -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Winline -Wmissing-noreturn -Wformat-security clvmd-command.c -o clvmd-command.o
    In file included from clvmd-command.c:74:
    clvmd-comms.h:79:29: error: openais/saAis.h: No such file or directory
    clvmd-comms.h:80:35: error: openais/totem/totem.h: No such file or directory
    In file included from clvmd-command.c:76:
    clvmd.h:32: error: ‘SA_MAX_NAME_LENGTH’ undeclared here (not in a function)
    make[3]: *** [clvmd-command.o] Error 1
    make[3]: Leaving directory `/usr/src/lvm2-2.02.39/debian/build/build-deb/daemons/clvmd’
    make[2]: *** [clvmd] Error 2
    make[2]: Leaving directory `/usr/src/lvm2-2.02.39/debian/build/build-deb/daemons’
    make[1]: *** [daemons] Error 2
    make[1]: Leaving directory `/usr/src/lvm2-2.02.39/debian/build/build-deb’
    make: *** [debian/stamps/build-deb] Error 2
    dpkg-buildpackage: failure: debian/rules build gave error exit status 2

    Any ideas about that?
    regards
    olly

    • Dave
    • Jun 28th. 2011 10:06pm

    Thanks for the great howto.
    As of lvm2-2.02.66 (possibly earlier) you must also install libcorosync-dev in order to build your modified clvm.

    I also suggest providing patches (using diff -u) rather than full file listings for the modified files, as this simplifies applying your changes to later source versions.

    Dave

    • Dave
    • Jun 29th. 2011 1:11am

    Users of dependency-managed run control will probably prefer this openais init script:

    #!/bin/sh
    #
    ### BEGIN INIT INFO
    # Provides: openais
    # Required-Start: $network $remote_fs $syslog
    # Required-Stop: $network $remote_fs $syslog
    # Default-Start: S
    # Default-Stop: 0 6
    # Short-Description: start and stop the openais cluster management daemon
    ### END INIT INFO
    #

    PATH=/sbin:/usr/sbin:/bin:/usr/bin
    #PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
    DESC=”OpenAIS Cluster Management Daemon”
    NAME=openais
    DAEMON=/usr/sbin/aisexec
    SCRIPTNAME=/etc/init.d/openais

    FLAGS=

    test -f $DAEMON || exit 0

    set -e

    JOIN_TIMEOUT=15

    # Read configuration variable file if it is present
    [ -r /etc/default/$NAME ] && . /etc/default/$NAME

    case “$1″ in
    start)
    echo -n “Starting $DESC: ”
    start-stop-daemon –start –quiet -o –exec $DAEMON — $FLAGS
    time=0
    while [ “$JOIN_TIMEOUT” -eq 0 ] || [ “$time” -lt “$JOIN_TIMEOUT” ] ; do
    sleep 1
    if openais-cfgtool -s &>/dev/null ; then
    echo “$NAME.”
    exit 0
    else
    echo -n ” . ”
    time=$(($time + 1))
    fi

    done
    echo “FAILED”
    exit 1
    ;;
    stop)
    echo -n “Stopping $DESC: ”
    start-stop-daemon –stop –quiet -o –exec $DAEMON
    echo “$NAME.”
    ;;
    reload|force-reload)
    echo “Reloading $DESC configuration files.”
    start-stop-daemon –stop –signal 1 –quiet -o –exec $DAEMON
    ;;
    restart)
    echo -n “Restarting $DESC: ”
    start-stop-daemon –stop –quiet -o –exec $DAEMON
    sleep 1
    start-stop-daemon –start –quiet -o –exec $DAEMON — $FLAGS
    echo “$NAME.”
    ;;
    *)
    N=/etc/init.d/$NAME
    echo “Usage: $N {start|stop|restart|reload|force-reload}” >&2
    exit 1
    ;;
    esac

    exit 0

    • Dave
    • Jun 29th. 2011 1:14am

    You should then also add openais to Required-Start and Required-Stop in clvm’s init script when updating the clvm deb config. That is, just replace cman on those lines with openais.

    • Dave
    • Jun 29th. 2011 2:13am

    And a more robust sed expression for updating /etc/lvm/lvm.conf would be:

    sed -i ‘s/^\([[:space:]]*locking_type = \)1$/\13/’ /etc/lvm/lvm.conf

    • Dave
    • Jul 1st. 2011 3:26am

    Thought I should update that the openais init script and everything related to it are not required on Squeeze. Use corosync instead.

    • jcl
    • Jul 1st. 2011 10:48am

    Thanks for the excellent comments Dave :)

    • Dave
    • Jul 5th. 2011 2:46am

    Thanks for the kind words ;)

    I hit another issue with using a patched clvm on Debian Squeeze today which I have resolved using the following patch to /etc/init.d/corosync (thanks to Michael Schwartzkopff who has alerted the Debian corosync maintainer). As of corosync-1.2.1-4 this is not yet patched in the Debian package. I have provided the required change as a unified diff here for people who follow the above and hit this issue. In addition to this patch please add

    OPENAIS_SERVICES=yes

    to your /etc/default/corosync.

    — /etc/init.d/corosync 2011-01-03 00:49:16.000000000 +1000
    +++ /etc/init.d/corosync.patched 2011-07-05 16:32:21.000000000 +1000
    @@ -31,6 +31,11 @@
    exit 0
    fi

    +if [ “$OPENAIS_SERVICES” = “yes” ]; then
    + export COROSYNC_DEFAULT_CONFIG_IFACE
    + : ${COROSYNC_DEFAULT_CONFIG_IFACE=”openaisserviceenableexperimental:corosync_parser”}
    +fi
    +
    # Define LSB log_* functions.
    # Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
    . /lib/lsb/init-functions

    • Dave
    • Jul 5th. 2011 2:49am

    To help Googlers find this page, the output you would receive when starting clvmd -d without the above patch would be similar to this:

    root@xen5.tst:/etc/corosync# clvmd -d
    CLVMD[7276f7a0]: Jul 5 16:27:12 CLVMD started
    CLVMD[7276f7a0]: Jul 5 16:27:12 Cannot initialise OpenAIS lock service: 12

    CLVMD[7276f7a0]: Jul 5 16:27:12 Can’t initialise cluster interface
    Can’t initialise cluster interface

    • Dave
    • Jul 10th. 2011 12:30am

    I’ve now also had success using clvm directly with corosync on Squeeze by compiling clvm by changing –with-clvmd=cman to –with-clvmd=corosync.

    The above patch to /etc/init.d/corosync is still needed if you plan to layer o2cb (OCFS2) on your cLVM as o2cb depends upon openais services.

    Sorry JCL – I really should create my own blog ;)

    Dave

    • ElRooto
    • May 25th. 2012 5:39pm

    So have you found that you don’t require fencing with clvmd then? What happens if the cluster goes split brain and then each tries to create a logical volume?

    • Dave
    • Nov 5th. 2012 10:34pm

    Of course you need fencing in any situation beyond initial lab testing. Incidentally I’ve had loss of cluster sync (resulting in 12-way split brain) several times in developing a 12 node clustered xen environment on top of this and miraculously suffered no disk corruption. But I also did not have any single-instance services (such as VMs) on top of lvm configured to start automatically if they disappeared from view of the cluster master.

RANDOM BITS FROM A LINUX ENGINEER