arrow-down coffee engineering consultancy development remote-management support linkedin twitter youtube email phone

Managing Percona Xtradb Cluster with Puppet

Submitted by Walter Heck on April 25, 2014

Last month I spoke at the Percona Live conference about MySQL and puppet. There was a lot of interest in the talk, so I figured I'd write a blog post about it as well. I used the galera module I wrote as an example in the session, so this post will be specifically about galera.

Prerequisites

Setting up virtualbox

We have used specific network settings for virtualbox in our vagrant file, so we'll need to make sure it's configured properly. Inside VirtualBox, go to preferences -> network -> Host Only Networks (on a Mac, may be different on other host OSes. Edit vboxnet0, or add it if the list is empty. Use the following settings to make sure your vm's will be using the ip's defined in the vagrantfile:

Adapter tab:

  • IPv4 Address: 192.168.56.1
  • IPv4 Network Mask: 255.255.255.0
  • IPv6 Settings can stay on their defaults

DHCP tab:

  • tick "Enable server"
  • Server Address: 192.168.56.100
  • Server Mask: 255.255.255.0
  • Lower Address Bound: 192.168.56.101
  • Upper Address Bound: 192.168.56.254

DHCP is not strictly needed (we set static ips in the vagrantfile), but if you add other servers to your testing later on, it's convenient to have them in the same subnet.

Getting the puppet master up and running

Now that virtualbox is ready for action, lets grab the code and fire up the puppet master with vagrant:


$ git clone https://github.com/olindata/olindata-galera-puppet-demo.git olindata-galera-demo
$ cd olindata-galera-demo/vagrant
$ vagrant up master
[..Wait a few minutes, grab coffee and read the rest of this post..]

Note that the vagrant up command throws errors here and there, but they are okay as they are corrected later in the

master_setup.sh
script. To check that everything completed, log into the master and check for which process is listening on port 8140. This should be
httpd
. In addition, a
puppet agent -t
run should complete without problems:

$ vagrant ssh master
[vagrant@master ~]$ sudo su -
[root@master ~]# netstat -plant | grep 8140
tcp        0      0 :::8140                     :::*                        LISTEN      5305/httpd
[root@master ~]# puppet agent -t
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Loading facts in /var/lib/puppet/lib/facter/concat_basedir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/etckepper_puppet.rb
Info: Caching catalog for master.olindata.vm
Info: Applying configuration version '1397908319'
Notice: Finished catalog run in 5.30 seconds

If the output of the commands is as shown above, the Puppet master is now ready for the agents to be brought up.

Bringing up the galera nodes

First node

The galera puppet module is quite nice, but it has one big caveat at the moment: bootstrapping a cluster. The problem is that when puppet runs on a node, it has a hard time figuring out if that node is the first node in a cluster (and thus needs to be bootstrapped) or if it is joining a cluster. A solution to this would be to write a little script that checks all the nodes in the wsrep_cluster_address variable to see if they are already up, but that is neither very nice (we're trying to prevent needing that in Puppet) nor implemented at present.

Since the majority of the times we'll be adding nodes to an already existing cluster, we have opted for that to be the default with the Galera module. This in turn means that for this demo we need to bring up one vm first, bootstrap galera on it and then bring up the other nodes. (Note: Elegant solutions to this problem welcome in the comments!)

Let's start by bringing up the vm and ssh'ing in as root:


$ vagrant up galera000
Bringing machine 'galera000' up with 'virtualbox' provider...
==> galera000: Importing base box 'debian-73-x64-virtualbox-puppet'...
==> galera000: Matching MAC address for NAT networking...
==> galera000: Setting the name of the VM: vagrant_galera000_1397908792529_17411
==> galera000: Fixed port collision for 22 => 2222. Now on port 2200.
==> galera000: Clearing any previously set network interfaces...
==> galera000: Preparing network interfaces based on configuration...
    galera000: Adapter 1: nat
    galera000: Adapter 2: hostonly
==> galera000: Forwarding ports...
    galera000: 22 => 2200 (adapter 1)
==> galera000: Booting VM...
==> galera000: Waiting for machine to boot. This may take a few minutes...
    galera000: SSH address: 127.0.0.1:2200
    galera000: SSH username: vagrant
    galera000: SSH auth method: private key
    galera000: Error: Connection timeout. Retrying...
==> galera000: Machine booted and ready!
==> galera000: Checking for guest additions in VM...
==> galera000: Setting hostname...
==> galera000: Configuring and enabling network interfaces...
==> galera000: Mounting shared folders...
    galera000: /vagrant => /Users/walterheck/Source/olindata-galera-demo/vagrant
==> galera000: Running provisioner: shell...
    galera000: Running: /var/folders/4x/366j5zl15b1b4z7t6l7jf6zw0000gn/T/vagrant-shell20140419-2728-1wfc0hm
stdin: is not a tty
$ vagrant ssh galera000
Linux vagrant 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64
Last login: Wed Feb  5 12:49:09 2014 from 10.0.2.2
vagrant@galera000:~$ sudo su -
root@galera000:~#

Next up, we run puppet agent on it. Note that since we have autosigning turned on on the puppetmaster, the first run doesn't need to wait for a signed certificate. The puppet run will have some errors, but we can live with that:


root@galera000:~# puppet agent -t

In the output (too much to display here), you'll see red lines that complain about not being able to start mysql:


Error: Could not start Service[mysqld]: Execution of '/etc/init.d/mysql start' returned 1:
Error: /Stage[main]/Mysql::Server::Service/Service[mysqld]/ensure: change from stopped to running failed: Could not start Service[mysqld]: Execution of '/etc/init.d/mysql start' returned 1:
Error: Could not start Service[mysqld]: Execution of '/etc/init.d/mysql start' returned 1:
Error: /Stage[main]/Mysql::Server::Service/Service[mysqld]/ensure: change from stopped to running failed: Could not start Service[mysqld]: Execution of '/etc/init.d/mysql start' returned 1:

This is not actually true, when you check for the mysql process after the puppet run it's there:


root@galera000:~# ps aux | grep mysql
root      9881  0.0  0.0   4176   440 ?        S    05:05   0:00 /bin/sh /usr/bin/mysqld_safe
mysql    10209  0.2 64.8 830292 330188 ?       Sl   05:05   0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/lib/mysql/galera000.err --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
root     12456  0.0  0.1   7828   872 pts/0    S+   05:11   0:00 grep mysql

Let's kill mysql first :


root@galera000:~# pkill -9ef mysql
root@galera000:~# ps aux | grep mysql
root     12475  0.0  0.1   7828   876 pts/0    S+   05:12   0:00 grep mysql

Next up, we bootstrap the cluster:


root@galera000:~# service mysql bootstrap-pxc
[....] Bootstrapping Percona XtraDB Cluster database server: mysqld[....] Please take a l[FAILt the syslog. ... failed!
 failed!

Somehow this thinks it failed, but it didn't. To make sure it worked, log into mysql and check the status of the wsrepcluster* status variables. It should look something like this:


mysql> show global status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 1                                    |
| wsrep_cluster_size       | 1                                    |
| wsrep_cluster_state_uuid | 7665992d-bc38-11e3-a2c4-9aefb5dea18a |
| wsrep_cluster_status     | Primary                              |
+--------------------------+--------------------------------------+
4 rows in set (0.00 sec)

Now that mysql is properly bootstrapped, we can run the puppet agent one more time and see it completely properly now:


root@galera000:~# puppet agent -t
Info: Retrieving plugin
Info: Loading facts in /var/lib/puppet/lib/facter/etckepper_puppet.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/concat_basedir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Caching catalog for galera000.olindata.vm
Info: Applying configuration version '1398437502'
Notice: /Stage[main]/Xinetd/Service[xinetd]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Xinetd/Service[xinetd]: Unscheduling refresh on Service[xinetd]
Notice: /Stage[main]/Mcollective::Server::Config::Factsource::Yaml/File[/etc/mcollective/facts.yaml]/content:
--- /etc/mcollective/facts.yaml 2014-04-25 08:17:39.000000000 -0700
+++ /tmp/puppet-file20140425-17657-1jhcy3j  2014-04-25 08:23:25.000000000 -0700
@@ -63,7 +63,7 @@
   operatingsystemmajrelease: "7"
   operatingsystemrelease: "7.3"
   osfamily: Debian
-  path: "/usr/bin:/bin:/usr/sbin:/sbin"
+  path: "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
   physicalprocessorcount: "1"
   processor0: "Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz"
   processorcount: "1"

Info: /Stage[main]/Mcollective::Server::Config::Factsource::Yaml/File[/etc/mcollective/facts.yaml]: Filebucketed /etc/mcollective/facts.yaml to puppet with sum 3a6aabbe41f4023031295a8ac3735df3
Notice: /Stage[main]/Mcollective::Server::Config::Factsource::Yaml/File[/etc/mcollective/facts.yaml]/content: content changed '{md5}3a6aabbe41f4023031295a8ac3735df3' to '{md5}227af082af9547f423040a45afec7800'
Notice: /Stage[main]/Mysql::Server::Root_password/Mysql_user[root@localhost]/password_hash: defined 'password_hash' as '*55070223BD04C680F8BD1586E6D12989358B4B55'
Notice: /Stage[main]/Mysql::Server::Root_password/File[/root/.my.cnf]/ensure: defined content as '{md5}af3f5d93645d29f88fd907e78d53806b'
Notice: /Stage[main]/Galera::Health_check/Mysql_user[mysqlchk_user@127.0.0.1]/ensure: created
Notice: /Stage[main]/Galera/Mysql_user[sst_xtrabackup@%]/ensure: created
Notice: /Stage[main]/Galera/Mysql_grant[sst_xtrabackup@%/*.*]/privileges: privileges changed ['USAGE'] to 'CREATE TABLESPACE LOCK TABLES RELOAD REPLICATION CLIENT SUPER'
Notice: Finished catalog run in 4.65 seconds

Now that this is done, we're ready to move on to the other nodes.

Subsequent nodes

Next, we bring up the other three vagrant nodes. The output from vagrant up will look like this:


$ vagrant up galera001
Bringing machine 'galera001' up with 'virtualbox' provider...
==> galera001: Importing base box 'debian-73-x64-virtualbox-puppet'...
==> galera001: Matching MAC address for NAT networking...
==> galera001: Setting the name of the VM: vagrant_galera001_1398437027038_8689
==> galera001: Fixed port collision for 22 => 2222. Now on port 2201.
==> galera001: Clearing any previously set network interfaces...
==> galera001: Preparing network interfaces based on configuration...
    galera001: Adapter 1: nat
    galera001: Adapter 2: hostonly
==> galera001: Forwarding ports...
    galera001: 22 => 2201 (adapter 1)
==> galera001: Booting VM...
==> galera001: Waiting for machine to boot. This may take a few minutes...
    galera001: SSH address: 127.0.0.1:2201
    galera001: SSH username: vagrant
    galera001: SSH auth method: private key
    galera001: Error: Connection timeout. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
    galera001: Error: Remote connection disconnect. Retrying...
==> galera001: Machine booted and ready!
==> galera001: Checking for guest additions in VM...
==> galera001: Setting hostname...
==> galera001: Configuring and enabling network interfaces...
==> galera001: Mounting shared folders...
    galera001: /vagrant => /Users/walterheck/Source/olindata-galera-demo/vagrant
==> galera001: Running provisioner: shell...
    galera001: Running: /var/folders/4x/366j5zl15b1b4z7t6l7jf6zw0000gn/T/vagrant-shell20140425-19022-fiys2z
stdin: is not a tty

Do the same for

galera002
and
galera003
, then log into
galera001
and run
puppet agent -t
:

$ vagrant ssh galera001
Linux vagrant 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Feb  5 12:49:09 2014 from 10.0.2.2
vagrant@galera001:~$ sudo su -
root@galera001:~# puppet agent -t
Info: Creating a new SSL key for galera001.olindata.vm
Info: Caching certificate for ca
Info: csr_attributes file loading from /etc/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for galera001.olindata.vm
Info: Certificate Request fingerprint (SHA256): A2:FF:3B:6F:7C:BA:FF:5B:65:C7:36:6F:CF:D2:FD:10:50:7C:63:7E:26:F1:F5:06:54:B8:C5:E7:2D:E2:17:37
Info: Caching certificate for galera001.olindata.vm
Info: Caching certificate_revocation_list for ca
Info: Caching certificate for ca
Info: Retrieving plugin
Notice: /File[/var/lib/puppet/lib/puppet]/ensure: created
Notice: /File[/var/lib/puppet/lib/puppet/provider]/ensure: created
Notice: /File[/var/lib/puppet/lib/puppet/provider/database_user]/ensure: created
[..snip..]
Notice: /Stage[main]/Profile::Mysql::Base/Package[xtrabackup]/ensure: ensure changed 'purged' to 'latest'
Info: Class[Mcollective::Server::Config]: Scheduling refresh of Class[Mcollective::Server::Service]
Info: Class[Mcollective::Server::Service]: Scheduling refresh of Service[mcollective]
Notice: /Stage[main]/Mcollective::Server::Service/Service[mcollective]: Triggered 'refresh' from 1 events
Info: Creating state file /var/lib/puppet/state/state.yaml
Notice: Finished catalog run in 137.24 seconds

When the puppet agent run is finished, we do a similar round of pkill and service start:


root@galera001:~# pkill -9ef mysql
root@galera001:~# ps aux | grep mysql
root     12077  0.0  0.1   7828   876 pts/0    S+   08:28   0:00 grep mysql
root@galera001:~# service mysql start
[FAIL] Starting MySQL (Percona XtraDB Cluster) database server: mysqld[....] Please take a look at the syslog. ... failed!
 failed!

If you then look at the mysql error log, it will output something like this after a few seconds, indicating the node has joined our cluster:


root@galera001:~# tail /var/log/mysql/error.log
2014-04-25 08:29:00 12900 [Note] WSREP: inited wsrep sidno 1
2014-04-25 08:29:00 12900 [Note] WSREP: SST received: 7019fb90-cc8d-11e3-9540-1248cb76bdcb:6
2014-04-25 08:29:00 12900 [Note] WSREP: 0.0 (galera001): State transfer from 1.0 (galera000) complete.
2014-04-25 08:29:00 12900 [Note] WSREP: Shifting JOINER -> JOINED (TO: 6)
2014-04-25 08:29:00 12900 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.15-63.0'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release 25.5, wsrep_25.5.r4061
2014-04-25 08:29:00 12900 [Note] WSREP: Member 0 (galera001) synced with group.
2014-04-25 08:29:00 12900 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 6)
2014-04-25 08:29:00 12900 [Note] WSREP: Synchronized with group, ready for connections
2014-04-25 08:29:00 12900 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

next up is a little hack. There's a galera-specific dependency error in the mysql module where it will try to create the root user with password before it writes that info to the

~/.my.cnf
file (which is used by the module's commands to avoid needing any hard-coded root password). Since fixing the module is outside of the scope of this article, we'll cheat a little bit. Create a
/root/.my.cnf
file like this:

root@galera001:~# cat .my.cnf
[client]
user=root
host=localhost
password='khbrf9339'
socket=/var/lib/mysql/mysql.sock

After that, the puppet agent run will complete succesfully:


root@galera001:~# puppet agent -t
Info: Retrieving plugin
Info: Loading facts in /var/lib/puppet/lib/facter/etckepper_puppet.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/concat_basedir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Caching catalog for galera001.olindata.vm
Info: Applying configuration version '1398437502'
Notice: /Stage[main]/Xinetd/Service[xinetd]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Xinetd/Service[xinetd]: Unscheduling refresh on Service[xinetd]
Notice: Finished catalog run in 3.04 seconds

The last step is to restart the xinetd service one more time:


root@galera001:~# service xinetd restart
[ ok ] Stopping internet superserver: xinetd.
[ ok ] Starting internet superserver: xinetd.

Now, rinse and repeat the steps for

galera001
on
galera002
and
galera003
:
  • write the .my.cnf file
  • run puppet agent -t
  • pkill mysql, then start the service manually
  • run puppet agent -t again
  • service xinetd restart

After all this is done, run

puppet agent -t
on all nodes one more time, specifically on
galera000
, as this has an haproxy running on it that will help us load balance the connections. This haproxy automatically configures galera nodes as they come up, and a puppet agent run will take care of this.

haProxy

This demo cluster comes with an haproxy instance running on galera000. It's http status page should be accessible from the host directly, giving you an insight into what the status of all nodes is. If you did the above all succesfully, the result should be like so:

Open a browser on your host and go to: [http://192.168.56.100/haproxy?stats]

haproxy stats

We have created two listeners by default, with slightly different behaviour:

1) One listener (galera_reader, port 13306) divides incoming queries round-robin over it's backends. This can be used to send all select queries to. 2) The second listener (galera_writer, port 13307) always directs sessions at the same server, unless that one is unavailable. This can be used to send all write-traffic to.

This process assumes your application can make a split in such a way. This is common in applications that used to be run on classic asynchronous replication previously. If your app can't do this, start by sending all traffic to galera_writer. Then slowly implement functionality that makes selects go to galera_reader.

Note that galera is synchronous replication and in theory you can send your writes to any node. In practice however, this is not so simple when concurrency goes up. This discussion is not for this blog post however.

Summary

You are now ready to send queries to the two ports on the haproxy node, and watch them be distributed over the galera cluster. Feel free to play around by shutting down certain nodes, then watch them come back up.

In a next article I'll discuss the puppet repository structure that is used for this article.