Failover cluster with Master-Slave on PostgreSQL
Welcome, hebraically!
In this article I want to share the experience of deploying the cluster Master-slave on PostgreSQL. Fault tolerance is achieved by using pgpool-II (failover, online recovery).
pgpool is a great tool for scaling and load balancing between servers and I think few know about the possibilities of automatic creation of failover on the slave server when the master and how to add new capacity to an already running cluster without shutting the entire cluster.
The figure shows a typical diagram of the cluster Master-slave.
The cluster should contain 1 master server (Master), at least 1 slave (Slave), 1 unit of scaling (Balancer).
When each server needs to be installed Linux distribution (I have put Red Hat 6.1) on the site scale have installed the gcc compiler.
PostgreSQL — 9.0.1, pgpool-II 3.0.5. You can use other versions of the DBMS and pgpool. In this case, refer to the documentation.
Online recovery and failover require that you configure a remote connection via SSH Protocol without a password. You need to create SSH keys for user
Important! For online recovery you need to when you open a remote session to go to another remote session (i.e. it was possible to implement the following transition mechanism via SSH without password: host scalability — master server — slave server and node scaling — slave server — the master server).
For failover, you must create SSH key for user
This step is important when configuring, so make sure you can connect from a remote session from one server to another.
You need to pre-open the reception/transmission of data on port 5432 (the standard PostgreSQL port) in iptables.
Edit the configuration file
the
I note the importance of the last line. The fact that it will be used in the restore script slave node, so it is necessary to change the way it is written above.
Then add lines for replication in the
the
postgres is the database administrator who will carry out replication and other admin tricks. Using these lines we were allowed to replicate as the slave and the master server.
Then overload the master server:
the
Stop the slave server (if run previously):
the
Now you can start replication.
On host server, user
the
Then on the slave create a config replication
the
The parameter
Next you need to enable the "hot spare" on the slave server:
the
Then you need to start the slave server:
the
The replication activity can be checked in the following way:
the
It should output something like this:
the
Similarly, on the slave server:
the
It will give you the following:
the
Change the configuration file
the
Next, in
the
Reload pgpool:
the
The mechanism of creation of automatic failover the following:
And now the setup:
On a node scale changing pgpool's config
the
Tell a bit more about parameter
The script itself
the
This script should be created in the pgpool directory
Now we need to compile the pgpool procedure. In the src package pgpool in
The procedure you need to copy to a directory on each production server
Downloadable database on the master server:
the
The uploading of this procedure on the slave server is not necessary: it will be available through the previously tuned replication.
The status of the servers can be identified using query
Example request:
Server status 2 means that the server is active and available for queries. In the case of single server failure, the status changes to 3.
To test the automatic failover as follows:
Probably, this mechanism is the most complex in terms of debugging, but it is a powerful tool for database administration. The operation of this mechanism is as follows: there is a working cluster that we want to include previously crashed slave server, but the data stored on it, does not match the data in the cluster. This mechanism allows us to add real-time another slave server without stopping the cluster and to undertake any additional steps during configuration.
Online recovery works in the following way:
Move to setting.
Add the following lines in
the
Add password hash
the
123456 is the password
On the master node to create the script
the
I emphasize, this script should be in the directory
On slave and master server in the directory
the
It will allow you to remotely run processes DBMS.
Next node scaling, you need to compile the stored procedure
the
Setup online recovery is finished.
To enable the new slave server in the cluster, you must perform the following actions:
Read more about
This completed the setup online recovery.
You can test these two mechanisms as follows:
Thus, the above-described mechanisms allow to protect the cluster "master-slave" and simplify the job of database administrator in its recovery.
PS I Hope this post helped someone. Comments and additions are welcome! Thank you for your attention.
Article based on information from habrahabr.ru
In this article I want to share the experience of deploying the cluster Master-slave on PostgreSQL. Fault tolerance is achieved by using pgpool-II (failover, online recovery).
pgpool is a great tool for scaling and load balancing between servers and I think few know about the possibilities of automatic creation of failover on the slave server when the master and how to add new capacity to an already running cluster without shutting the entire cluster.
Schema cluster and machine requirements
The figure shows a typical diagram of the cluster Master-slave.
The cluster should contain 1 master server (Master), at least 1 slave (Slave), 1 unit of scaling (Balancer).
When each server needs to be installed Linux distribution (I have put Red Hat 6.1) on the site scale have installed the gcc compiler.
PostgreSQL — 9.0.1, pgpool-II 3.0.5. You can use other versions of the DBMS and pgpool. In this case, refer to the documentation.
set up a dial-up connection between servers of a cluster
Online recovery and failover require that you configure a remote connection via SSH Protocol without a password. You need to create SSH keys for user
postgres
and send the user postgres
to each of the servers.Important! For online recovery you need to when you open a remote session to go to another remote session (i.e. it was possible to implement the following transition mechanism via SSH without password: host scalability — master server — slave server and node scaling — slave server — the master server).
For failover, you must create SSH key for user
root
on node scaling and to forward the user postgres
master and slave servers.This step is important when configuring, so make sure you can connect from a remote session from one server to another.
setup streaming replication
You need to pre-open the reception/transmission of data on port 5432 (the standard PostgreSQL port) in iptables.
Edit the configuration file
$PGDATA/postgresql.conf
master server as follows:the
listen_addresses = '*'
wal_level = hot_standby
max_wal_senders = 2
wal_keep_segments = 32
#hot_standby = on
I note the importance of the last line. The fact that it will be used in the restore script slave node, so it is necessary to change the way it is written above.
Then add lines for replication in the
$PGDATA/pg_hba.conf
: the
host replication postgres 192.168.100.2/32 trust
host replication postgres 192.168.100.3/32 trust
postgres is the database administrator who will carry out replication and other admin tricks. Using these lines we were allowed to replicate as the slave and the master server.
Then overload the master server:
the
# service postgresql restart
Stop the slave server (if run previously):
the
# service postgresql stop
Now you can start replication.
On host server, user
postgres
creates a backup of the database sent to the slave server:the
$ psql -c "SELECT pg_start_backup('stream');"
$ rsync -a /var/lib/pgsql/data/ 192.168.100.3:/var/lib/pgsql/data/ --exclude postmaster.pid
$ psql -c "SELECT pg_stop_backup();"
Then on the slave create a config replication
$PGDATA/recovery.conf
: the
standby_mode = 'on'
primary_conninfo = 'host=192.168.100.2 port=5432 user=postgres'
trigger_file = 'failover'
The parameter
trigger_file
is responsible for the way in which PostgreSQL searches for the file to switch to master mode. In this case, PostgreSQL looks for a file at $PGDATA/failover
.Next you need to enable the "hot spare" on the slave server:
the
$ sed -i 's/#hot_standby = on/hot_standby = on/' /var/lib/pgsql/data/postgresql.conf
Then you need to start the slave server:
the
# service postgresql start
The replication activity can be checked in the following way:
the
$ ps aux | grep sender
It should output something like this:
the
2561 ? Ss 0:00 postgres: wal sender process postgres 192.168.100.3(33341) streaming 0/2031D28
Similarly, on the slave server:
the
$ ps aux | grep receiver
It will give you the following:
the
1524 ? Ss 0:00 postgres: wal reciever process streaming 0/2031D28
General site setup-zoom
Change the configuration file
/etc/pgpool-II/pgpool.conf
:the
# Set the full range of listening addresses
listen_addresses = '*'
# The connection settings to the database on the slave server
backend_hostname0 = '192.168.100.3'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/data'
# The connection settings to the database on the master server
backend_hostname1 = '192.168.100.2'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/data'
# Use pool_hba.conf to authenticate clients
enable_pool_hba = true
Next, in
/etc/pgpool-II/pool_hba.conf
add the information about the client log:the
host all all 127.0.0.1/32 trust
host all all 192.168.100.2/32 trust
host all all 192.168.100.3/32 trust
Reload pgpool:
the
# service pgpool restart
configuring automatic failover
The mechanism of creation of automatic failover the following:
-
the
- For workers (master and slave) the server runs the procedure
pgpool-walrecrunning()
that defines which server is the master, which slave.
the - pgpool remotely connect to production servers and checks the activity of processes of a DBMS. If not, pgpool executes the script, which creates a failover on the slave node in case of failure of the master server. the
- and then disconnects from pgpool node is crashed and restarts, all client applications connected to it.
And now the setup:
On a node scale changing pgpool's config
/etc/pgpool-II/pgpool.conf
:the
# the Script that is invoked when the server crash
failover_command = '/etc/pgpool-II/failover.sh %d %H /var/lib/pgsql/data/failover'
# User checking on the status of the server
health_check_user = 'postgres'
# Put the active mode, master / slave
master_slave_mode = true
# We have configured streaming replication, set the mode to 'streaming'
master_slave_sub_mode = 'stream'
# Don't allow pgpool to send a copy of the request on nodes in the cluster, we have this will happen automatically through replication
replication_mode = false
# Include balancing servers so that the requests can be evenly distributed between servers
load_balance_mode = true
Tell a bit more about parameter
failover_command
. The script specified in the string are passed the parameters %d
— the ID of the crashed node (as backend_hostname
in pgpool.conf
) %H
— IP of the new master server.The script itself
failover.sh
:the
#! /bin/bash
# ID fallen host
FAILED_NODE=$1
# The IP of the new master
NEW_MASTER=$2
# The path to the trigger file
TRIGGER_FILE=$3
if [ $FAILED_NODE = 1 ];
then
echo "Slave server has failed"
exit 1
fi
echo "master server failed"
echo "New master server: $NEW_MASTER"
ssh -T postgres@$NEW_MASTER touch $TRIGGER_FILE
exit 0
This script should be created in the pgpool directory
/etc/pgpool-II/
and to grant rights 755.Now we need to compile the pgpool procedure. In the src package pgpool in
sql/pgpool-walrecrunning
contains the source code for the needed procedure. To compile you need the header files of PostgreSQL, then you can use the command make
to pgpool-walrecrunning.so
and SQL query download this procedure pgpool-walrecrunning.sql
.The procedure you need to copy to a directory on each production server
/usr/lib64/pgsql/
that called the $libdir
. sql file in /usr/share/pgsql/
. Downloadable database on the master server:
the
psql -f /usr/share/pgsql/pgpool-walrecrunning.sql -d postgres
The uploading of this procedure on the slave server is not necessary: it will be available through the previously tuned replication.
The status of the servers can be identified using query
SHOW pool_nodes;
pre-going client psql
on the node scaling.Example request:
hostname | port | status | lb_weight
-----------------------------------------------------
192.168.100.3 | 5432 | 2 | 0.500000
192.168.100.2 | 5432 | 2 | 0.500000
(2 rows)
Server status 2 means that the server is active and available for queries. In the case of single server failure, the status changes to 3.
To test the automatic failover as follows:
-
the
- Disable master server the
- Run query SHOW pool_nodes; node scaling the
- to Watch the logs for pgpool on the subject of script execution the
- to Ensure that a slave server after the script execution can accept write requests
Online recovery
Probably, this mechanism is the most complex in terms of debugging, but it is a powerful tool for database administration. The operation of this mechanism is as follows: there is a working cluster that we want to include previously crashed slave server, but the data stored on it, does not match the data in the cluster. This mechanism allows us to add real-time another slave server without stopping the cluster and to undertake any additional steps during configuration.
Online recovery works in the following way:
-
the
- node scaling starts the recovery procedure of the slave server
- After successful replication, the database on the master server remotely starts up using standard PostgreSQL the utility
PGCTL
the - pgpool restarts, detects the slave server and includes it in a cluster
This procedure on the master server runs a script that performs automatic replication between master and slave server the
Move to setting.
Add the following lines in
/etc/pgpool-II/pgpool.conf
:the
# the User performing the restore
recovery_user = 'postgres'
# This user's password
recovery_password = '123456'
# Script to be run on the master script from the directory $PGDATA
recovery_1st_stage_command = 'basebackup.sh'
Add password hash
postgres
:the
# pg_md5 123456 >> /etc/pgpool-II/pcp.conf
123456 is the password
postgres
in the clear. Additionally in front of the password hash, you need to specify the name of the user to whom this hash belongs to, i.e. the file should be the string postgres:enrypted_password
.On the master node to create the script
basebackup.sh
follows:the
#!/bin/bash
# The path to the directory $PGDATA on the master server
PRIMARY_DATA=$1
# IP address of the slave server that is included in the cluster
SLAVE_IP=$2
# The path to the directory $PGDATA on the slave server
SLAVE_DATA=$3
# Define the IP of the host server to enable the configuration recovery.conf
PRIMARY_IP=$(ifconfig eth0| sed -n '2 {s/^.*inet addr:\([0-9.]*\) .*/\1/;p}')
# Directory to store the configuration for the slave server
TMP_DIR=/var/lib/pgsql/tmp
# On the master server, remove the old configs from the replication (if the master server was once a slave)
cd $PRIMARY_DATA
rm-f recovery.* failover
# Check if the hot standby mode on the master server
cat postgresql.conf | grep '#hot_standby = on'
# If active, switch it off
if [ $? = 1 ]
then
sed -i 's/hot_standby = on/#hot_standby = on/' postgresql.conf
# Restart the master server
/usr/bin/pg_ctl restart-D $PGDIR
fi
#Remotely stop the slave server
ssh -T postgres@$SLAVE_IP "/usr/bin/pg_ctl stop-D $SLAVE_DATA"
# Create backup database on the master server
psql -c "SELECT pg_start_backup('Streaming Replication', true)" postgres
# Sent his wingman
rsync-a $PRIMARY_DATA/ $SLAVE_IP:$SLAVE_DATA/ --exclude postmaster.pid --exclude postmaster.opts
# Create a temporary directory configuration for the slave server
mkdir $TMP_DIR
cd $TMP_DIR
# Copy postgresql config.conf and turn on hot_standby
cp $PRIMARY_DATA/postgresql.conf $TMP_DIR/
sed -i 's/#hot_standby = on/hot_standby = on/' postgresql.conf
# Create config recovery.conf
echo "standby_mode = 'on'" > recovery.conf
echo "primary_conninfo = 'host=$PRIMARY_IP port=5432 user=postgres'" >> recovery.conf
echo "trigger_file = 'failover'" >> recovery.conf
# Delete the slave server with old config replication
ssh -T postgres@$SLAVE_IP rm-f $SLAVE_DATA/recovery.*
# Copy the new configs
scp postgresql.conf postgres@$SLAVE_IP:$SLAVE_DATA/postgresql.conf
#Complete the process of backup
psql -c "SELECT pg_stop_backup()" postgres
# Delete the temporary folder, conifgure
cd ..
rm -fr $TMP_DIR
I emphasize, this script should be in the directory
$PGDATA
. Script to set the permissions 755.On slave and master server in the directory
$PGDATA
create the script pgpool_remote_start
(under the same name.) with the following content:the
#! /bin/bash
if [ $# -ne 2 ]
then
echo "not Enough arguments passed to the script"
exit 1
fi
SLAVE_IP=$1
SLAVE_DIR=$2
PGCTL=/usr/bin/pg_ctl
ssh -T $SLAVE_IP $PGCTL -w-D $SLAVE_DIR start 2>/dev/null 1 > /dev/null < /dev/null &
It will allow you to remotely run processes DBMS.
Next node scaling, you need to compile the stored procedure
pgpool-recovery.so
, located at sql/pgpool-recovery
the src package pgpool. Likewise send it to production servers and to load the procedure into the database:the
$ psql -f /usr/share/pgsql/pgpool-recovery.sql -d template1
Setup online recovery is finished.
To enable the new slave server in the cluster, you must perform the following actions:
-
the
- to Run the database on the new master node the
- node scaling to run the command to restore the server:
pcp_recovery_node 20 192.168.100.4 9898 postgres 123456 1
Read more about
pcp_recovery_node
. This command implements the recovery servers of the cluster. 20
is the number of attempts to connect to the slave server, 192.168.100.4
— IP of the node scaling, 9898
— the port of the pcp commands node scaling, postgres
— the name of the user making the recovery, 123456
is the password 1
— ID of the restored node. This completed the setup online recovery.
You can test these two mechanisms as follows:
-
the
- to Create a test database on the master server. Make sure it is replicated on the slave the
- to Simulate a failure of the master server, disabling it the
- make Sure that the load failover and slave server become new master the
- to Make changes to the database on the slave server the
- to Run fallen leading server and make it a slave, after online recovery
Thus, the above-described mechanisms allow to protect the cluster "master-slave" and simplify the job of database administrator in its recovery.
PS I Hope this post helped someone. Comments and additions are welcome! Thank you for your attention.
Комментарии
Отправить комментарий