Linux How - To's - Free Consultancy: Postgresql 9.0 Replication

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Both InnoDB and PostgreSQL - as well as many other databases - use a technique called multi-version concurrency control (MVCC) to provide transaction isolation: transactions should not see the work of other, uncommitted transactions. MVCC means that, when a row is updated, the database stores both the old and new versions of the row.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Thursday, October 14, 2010

Postgresql 9.0 Replication - Step by Step

Postgresql 9.0 Streaming Replication - A Big Competition

Now Postgresql 9.0 has come with in-built replication features. NOW A BIG COMPETITOR FOR MYSQL

This method for those who wants to satisfy them by implementing Asynchronous mode of replicationWho wants Synchronous method, and then have to wait for 9.1 to come in existence.I tried to build a Postgresql Master-Slave replication system with new feature “Streaming Replication (SR)”, and this I have to say, this is very easy and stable method for asynchronous method for replication.While building the SR, I felt, this type of SR is like Mysql Master-slave replication, Slaves are getting updates from master from log files (binlog in cas e of mysql and xlog in postgres)

The above statement can be taken as rough idea when comparing Mysql and Postgresql replication method

Steps Included:

1.       Download and build, install Postgresql from source.

2.     Initialize database on master

3.       Some tweaks in postgresql.conf (On Master)

4.       Start backup mode on master

5.       Take snapshot of Data (/usr/local/postgres/data) directory on master

6.       Sync snapshot to slave and remove postmaster.pid on slave

7.       Some tweaks in postgresql.conf and recovery.conf (On Slaves)

8.       At end start the Database on Master and slave

Download the Database: (On Master)

# wget http://wwwmaster.postgresql.org/redir/198/h/source/v9.0.0/postgresql-9.0.0.tar.bz2
# tar xzf postgresql-9.0.0.tar.bz2

If you have fresh CentOS/other linux version, then maybe you have to download and install required packages… In my case I am Using CentOS 5.3 64 Bit

Run the below command as “ROOT”

# yum install -y glibc gcc libtermcap-devel readline-devel zlib-devel
# useradd postgres
# chown –R postgres:postgres postgresql-9.0.0
# su – postgres
# cd postgresql-9.0.0 (extracted directory)

Run the following command to compile and install Postgresql

# ./configure –prefix=/usr/local/postgres –with-perl
# gmake
# su –
# gmake install
# chown -R postgres:postgres /usr/local/postgres
# mkdir /usr/local/postgres/data
# chown postgres /usr/local/postgres/data
# su - postgres
# /usr/local/postgres/bin/initdb -D /usr/local/postgres/data

After the initializing the data directory, we have to tweak---

“/usr/local/postgres/data/postgresql.conf“ file

I am showing you required things only

listen_addresses = '*'
max_connections = 500
shared_buffers = 32MB
wal_level = hot_standby
archive_mode = on
archive_command = 'cp %p /usr/local/postgres/data/pg_xlogarch/%f'
max_wal_senders = 1 ####( this setting for 1 slave can connect with Master)
wal_keep_segments = 32
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'

# mkdir /usr/local/postgres/data/pg_xlogarch/
# su – postgres
# cd /usr/local/postgres/
# vi data/pg_hba.conf (add the below line)

host replication all 172.32.1.2/32 trust

Now start the Database Instance on Master:

As user postgres

# /usr/local/postgres/bin/pg_ctl -D /usr/local/postgres/data -l logfile start

After starting the database Instance, start the backup mode on DB Master

# cd/usr/local/postgres/bin/
# ./psql –c "SELECT pg_start_backup('initial backup for SR')" = database name for which you want replication

# cd /usr/local/postgres/

# tar -czf data.tar.gz data/

Exit from Backup mode
# /usr/local/postgres/bin/psql –c "SELECT pg_stop_backup()"

# rsync –avz data.tar.gz postgres@172.32.1.2:/usr/local/postgrsql/ (run this command after the slave’s Postgresql installation)

On SLAVE:

Same as master download and install Postgresql on slave

Just do not initialize the database:

As user Postgres
# cd /usr/local/postgrsql
# tar xzf data.tar.gz
# chown –R postgres:postgres /usr/local/postgres/data/pg_xlogarch

# cd data

# rm –rf postmaster.pid

# vi /usr/local/postgrsql/data/Postgresql.conf

In short:

listen_addresses = '*'          # Which IP address(es) to listen on
max_connections = 500        # (change requires restart)shared_buffers = 32MB    #min 128kB
hot_standby = ondatestyle = 'iso, mdy'lc_messages = 'en_US.UTF-8'   # locale for system error message
lc_monetary = 'en_US.UTF-8'        # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'          # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting
default_text_search_config = 'pg_catalog.english'

Now create a “recovery.conf” under /usr/local/postgres/data/ and put the below lines: (In Slave)

standby_mode = 'on'
primary_conninfo = 'host=172.32.1.1 port=5432 user=readonly'
restore_command = 'cp /usr/local/postgres/data/pg_xlog/%f %p'

That's it we are done with configurations.

Start the database instance on slave machine

# /usr/local/postgres/bin/pg_ctl -D /usr/local/postgres/data -l logfile start

If all goes well, the on master, you will see like this on Master:

LOG: replication connection authorized: user=readonly host=172.32.1.2 port=55811

And on slave’s log file (logfile)

LOG: streaming replication successfully connected to primary

If fails, check "logfile" and send me error (if in case you feel so)

New Addition: If you want to check, which pgsql server is Master or Slave, then run this command:

# /usr/local/postgres/bin/pg_controldata /usr/local/postgres/data/   (running on Master)
pg_control version number:            903
Catalog version number:               201008051
Database system identifier:           5572911747953590263
Database cluster state:               in production (This shows, it is master server)
pg_control last modified:             Fri 06 May 2011 05:09:20 PM IST
Latest checkpoint location:           2/97482E60
............

On Slave:

Same Line will show as:

Database cluster state: in archive recoveryYou can also use below one: to check

Slave Status:

Postgres=# select pg_last_xlog_replay_location();
pg_last_xlog_replay_location
------------------------------
7B/35F312B0
(1 row)

isten_addresses = '*'          # what IP address(es) to listen on;
max_connections = 500                   # (change requires restart)
shared_buffers = 32MB                   # min 128kB
hot_standby = on
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'                     # locale for system error message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting

26 comments:

AnonymousOctober 14, 2010 at 4:15 PM
great article... now i bet, mysql will feal in danger now.
ReplyDelete
Replies
sushMarch 5, 2011 at 10:34 PM
Nice one Manish. Your instructions are very clear.
ReplyDelete
Replies
AnonymousMarch 30, 2011 at 12:39 PM
Hi,
I have followed above steps in centos, it was working fine.
While testing in the REDHAT, we are getting “the replication is successful” in both master and slave. But in slave while connecting to the db, we are getting fatal error “database is not starting up “.

Below are contents of logfile

cp: cannot stat `/usr/local/pgsql/data/pg_xlogarch/000000010000000000000002': No such file or directory
LOG: redo starts at 0/2000020
LOG: record with zero length at 0/20000B0
cp: cannot stat `/usr/local/pgsql/data/pg_xlogarch/000000010000000000000002': No such file or directory
LOG: streaming replication successfully connected to primary
FATAL: the database system is starting up
FATAL: the database system is starting up

could you please help on this.
ReplyDelete
Replies
manish singhMarch 30, 2011 at 2:10 PM
One thing i forgot to writeup there...

On master: in pg_hba.conf file...
add this line:

host replication all 172.32.1.2/32 trust

Then restart the Master DB app and and on slave also....

@Anonymous:

I check the issue... will let you know... Just do one thing...
Make sure your Master and slave both must have "pg_xlogarch" directory and replication user must have privileges to replication....
ReplyDelete
Replies
AnonymousMarch 31, 2011 at 11:56 AM
Hi,
Thanks for ur reply.
pg_xlogarch directory is present in both master and slave.
under pg_xlogarch "000000010000000000000001" dir is created instead of "000000010000000000000002" This may be the reason for tht error. But we are not sure why some other dir has bee created.

Could please tell me wat privilege we have to give for pg_logarch directory. So tht i cross verify whether corresponding privilege is present or not.

Thanks
ReplyDelete
Replies
manish singhMarch 31, 2011 at 12:09 PM
@Anonymous

The permission should be of Postgres or what ever the user you have created for running Postgresql process.

one more thing... have you followed the same process as above. Above configuration is just only for 1 master and 1 slave.

Pl check Master is running perfectly or not

Send below report
On Master as well as on Slave

ps waux | grep postgres

ans also check log on Master and Slave....
ReplyDelete
Replies
AnonymousApril 1, 2011 at 3:32 PM
I have executed ps waux | grep postgres
in both master and slave, below are o/p of the same.

Master
================

-bash-4.1$ ps waux | grep postgres
root 14230 0.0 0.0 161352 1860 pts/30 S 14:36 0:00 su - postgres
postgres 14231 0.0 0.0 108364 1804 pts/30 S 14:36 0:00 -bash
postgres 14253 0.0 0.0 159956 9224 pts/30 S 14:37 0:00 /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
postgres 14255 0.0 0.0 160240 1848 ? Ss 14:37 0:00 postgres: writer process
postgres 14256 0.0 0.0 159956 932 ? Ss 14:37 0:00 postgres: wal writer process
postgres 14257 0.0 0.0 161040 2200 ? Ss 14:37 0:00 postgres: autovacuum launcher process
postgres 14258 0.0 0.0 112984 852 ? Ss 14:37 0:00 postgres: archiver process failed on 000000010000000000000001
postgres 14259 0.0 0.0 113008 968 ? Ss 14:37 0:00 postgres: stats collector process
postgres 14330 0.0 0.0 161084 2372 ? Ss 14:47 0:00 postgres: wal sender process postgres 10.128.16.54(36020) streaming 0/20001D0
postgres 14350 0.0 0.0 108084 1104 pts/30 R+ 14:49 0:00 ps waux
postgres 14351 0.0 0.0 103276 884 pts/30 S+ 14:49 0:00 grep postgres

==========================================

Slave
============

[postgres@bangalore data]$ ps waux | grep postgres
root 16577 0.0 0.0 161352 1860 pts/0 S 14:50 0:00 su - postgres
postgres 16578 0.0 0.0 108492 1940 pts/0 S 14:50 0:00 -bash
postgres 16616 0.2 0.0 160120 9196 pts/0 S 14:54 0:00 /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
postgres 16617 0.0 0.0 160236 1428 ? Ss 14:54 0:00 postgres: startup process recovering 000000010000000000000002
postgres 16620 0.0 0.0 160120 1096 ? Ss 14:54 0:00 postgres: writer process
postgres 16621 0.0 0.0 166660 1748 ? Ss 14:54 0:00 postgres: wal receiver process streaming 0/2000140
postgres 16625 8.0 0.0 108084 1100 pts/0 R+ 14:56 0:00 ps waux
postgres 16626 0.0 0.0 103276 880 pts/0 S+ 14:56 0:00 grep postgres
ReplyDelete
Replies
AnonymousApril 1, 2011 at 3:36 PM
Master's Logfile
=================

LOG: database system was shut down at 2011-04-01 14:27:37 IST
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
LOG: archive command failed with exit code 1
DETAIL: The failed archive command was: cp pg_xlog/000000010000000000000001 /usr/local/pgsql/data/xlogarch/000000010000000000000001
cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
LOG: archive command failed with exit code 1
DETAIL: The failed archive command was: cp pg_xlog/000000010000000000000001 /usr/local/pgsql/data/xlogarch/000000010000000000000001
cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
LOG: archive command failed with exit code 1

===============================================

Slave_Logfile
==============

LOG: database system was shut down at 2011-04-01 14:27:37 IST
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
LOG: archive command failed with exit code 1
DETAIL: The failed archive command was: cp pg_xlog/000000010000000000000001 /usr/local/pgsql/data/xlogarch/000000010000000000000001
cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
LOG: archive command failed with exit code 1
DETAIL: The failed archive command was: cp pg_xlog/000000010000000000000001 /usr/local/pgsql/data/xlogarch/000000010000000000000001
cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
LOG: archive command failed with exit code 1
DETAIL: The failed archive command was: cp pg_xlog/000000010000000000000001 /usr/local/pgsql/data/xlogarch/000000010000000000000001
WARNING: transaction log file "000000010000000000000001" could not be archived: too many failures
LOG: database system was interrupted; last known up at 2011-04-01 14:39:16 IST
LOG: entering standby mode
cp: cannot stat `/usr/local/pgsql/data/pg_xlogarch/000000010000000000000002': No such file or directory
LOG: redo starts at 0/2000020
LOG: record with zero length at 0/20000B0
cp: cannot stat `/usr/local/pgsql/data/pg_xlogarch/000000010000000000000002': No such file or directory
LOG: streaming replication successfully connected to primary
ReplyDelete
Replies
AnonymousApril 1, 2011 at 3:39 PM
I have checked the permissionof pg_xlogarch directory in both master and slave, it is postgres only.

Above are the attached logfile and report for ps waux | postgres.

Could you please help on the same.

Can u please let me know how to increase the debug level,so that more messages come in logfile, it will be very helpful for debugging.

Thanks.
ReplyDelete
Replies
manish singhApril 1, 2011 at 3:41 PM
Hi

as seen O/P of master log... have put recovery file under data on Master server?.. that seems like that....

That may be the problem... that is taking Master in recovery mode.. and bcz of this.. slave not able to connect.
ReplyDelete
Replies
manish singhApril 1, 2011 at 3:56 PM
CORRECTION:

as seen O/P of master log... have you put recovery file under data on Master server?.. that seems like that.... if so.. then pl. remove that... and restart the processes on master then slave.

That may be the problem... that is taking Master in recovery mode.. and bcz of this.. slave not able to connect.
ReplyDelete
Replies
AnonymousApril 1, 2011 at 4:04 PM
Thanks for the reply Manish.

recovery.conf file is present only in slave. It is not present in master m/c.
ReplyDelete
Replies
manish singhApril 1, 2011 at 4:39 PM
For debug... get original postgresql.conf.. there u can get the what to log, when to log and where to log... and so many other options too...

I still want the output of:
ls -lht /usr/local/pgsql/
ls -lht /usr/local/pgsql/data/

If you do not have issue.. can i get access to your system to get the main issue...

we can you teamviewer or like... so you can also view what is happening...
ReplyDelete
Replies
AnonymousApril 1, 2011 at 6:06 PM
slave's O/P
===========

-bash-4.1$ ls -lht /usr/local/pgsql
total 5.4M
drwx------. 13 postgres postgres 4.0K Apr 1 17:30 data
-rw-r--r--. 1 root root 5.4M Apr 1 17:15 data.tar.gz
drwxr-xr-x. 3 postgres postgres 4.0K Apr 1 16:37 lib
drwxr-xr-x. 2 postgres postgres 4.0K Apr 1 16:37 bin
drwxr-xr-x. 5 postgres postgres 4.0K Apr 1 16:37 share
drwxr-xr-x. 6 postgres postgres 4.0K Apr 1 16:37 include
-bash-4.1$
-bash-4.1$
-bash-4.1$
-bash-4.1$ ls -lht /usr/local/pgsql/data
total 120K
-rw-------. 1 postgres postgres 4.0K Apr 1 17:30 logfile
drwx------. 2 postgres postgres 4.0K Apr 1 17:30 pg_notify
-rw-------. 1 postgres postgres 59 Apr 1 17:30 postmaster.opts
-rw-------. 1 postgres postgres 48 Apr 1 17:30 postmaster.pid
-rw-------. 1 postgres postgres 3.6K Apr 1 17:29 logfile1
-rw-r--r--. 1 postgres postgres 148 Apr 1 17:29 recovery.conf
drwx------. 2 postgres postgres 4.0K Apr 1 17:28 pg_stat_tmp
drwx------. 2 postgres postgres 4.0K Apr 1 17:28 global
-rw-------. 1 postgres postgres 18K Apr 1 17:26 postgresql.conf
-rw-r--r--. 1 postgres postgres 18K Apr 1 17:26 12
-rw-------. 1 postgres postgres 142 Apr 1 17:15 backup_label.old
drwx------. 3 postgres postgres 4.0K Apr 1 17:15 pg_xlog
drwxr-xr-x. 2 postgres postgres 4.0K Apr 1 17:15 pg_xlogarch
drwx------. 6 postgres postgres 4.0K Apr 1 17:12 base
-rw-------. 1 postgres postgres 3.9K Apr 1 17:09 pg_hba.conf
drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_clog
-rw-------. 1 postgres postgres 1.6K Apr 1 17:01 pg_ident.conf
drwx------. 4 postgres postgres 4.0K Apr 1 17:01 pg_multixact
drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_subtrans
drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_tblspc
drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_twophase
-rw-------. 1 postgres postgres 4 Apr 1 17:01 PG_VERSION
-bash-4.1$

master's o/p
============

-bash-4.1$ ls -lht /usr/local/pgsql
total 5.4M
-rw-r--r--. 1 postgres postgres 5.4M Apr 1 17:15 data.tar.gz
drwx------. 13 postgres postgres 4.0K Apr 1 17:15 data
drwxr-xr-x. 3 postgres postgres 4.0K Apr 1 17:01 lib
drwxr-xr-x. 2 postgres postgres 4.0K Apr 1 17:01 bin
drwxr-xr-x. 5 postgres postgres 4.0K Apr 1 17:01 share
drwxr-xr-x. 6 postgres postgres 4.0K Apr 1 17:01 include
-bash-4.1$ ls -lht /usr/local/pgsql/data
total 92K
drwx------. 2 postgres postgres 4.0K Apr 1 17:25 pg_stat_tmp
-rw-------. 1 postgres postgres 1.5K Apr 1 17:22 logfile
-rw-------. 1 postgres postgres 142 Apr 1 17:15 backup_label
drwx------. 3 postgres postgres 4.0K Apr 1 17:15 pg_xlog
drwxr-xr-x. 2 postgres postgres 4.0K Apr 1 17:15 pg_xlogarch
drwx------. 2 postgres postgres 4.0K Apr 1 17:15 global
drwx------. 6 postgres postgres 4.0K Apr 1 17:12 base
drwx------. 2 postgres postgres 4.0K Apr 1 17:11 pg_notify
-rw-------. 1 postgres postgres 59 Apr 1 17:11 postmaster.opts
-rw-------. 1 postgres postgres 48 Apr 1 17:11 postmaster.pid
-rw-------. 1 postgres postgres 3.9K Apr 1 17:09 pg_hba.conf
-rw-------. 1 postgres postgres 18K Apr 1 17:08 postgresql.conf
drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_clog
drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_subtrans
-rw-------. 1 postgres postgres 1.6K Apr 1 17:01 pg_ident.conf
drwx------. 4 postgres postgres 4.0K Apr 1 17:01 pg_multixact
drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_tblspc
drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_twophase
-rw-------. 1 postgres postgres 4 Apr 1 17:01 PG_VERSION
ReplyDelete
Replies
AnonymousApril 5, 2011 at 3:52 PM
Hi Manish,
Whether u are able to find anything? Could u please help us regarding the same issue.
ReplyDelete
Replies
manish singhApril 11, 2011 at 9:17 PM
I think you should fallow same again... from scratch.... your problem... i out of my hands... until i see that myself...

Remove the old installations and configuration...
ReplyDelete
Replies
AnonymousMay 5, 2011 at 3:56 PM
Hi,
We have a couple of queries,

===> How to find whether current postgres is running as master or slave?

====> What is the procedure to make the failed master as working slave?

Could you please clarify above queries.
ReplyDelete
Replies
manish singhMay 6, 2011 at 5:34 PM
Hi
Sorry for the late response... didn't checked the comment...

So... there are number of ways (Standard and Non-Standard) to check the PG server is Master or Slave

What i used is: this command

pg_controldata /usr/local/postgres/data/

if you Run PG Server the you will many lines... Check below Line in that:

Database cluster state: in production (4th Line) -----> This Means, that is Master Server, it has the capability of Read/Write Both

The same command will show this line, if on Slave

Database cluster state: in archive recovery

Other ways are....
If you run this command... if on server then you will get this result

ps aux | grep postgres
postgres 25868 0.0 0.0 476036 2492 ? Ss Apr22 0:00 postgres: wal sender process pguser xxx.xxx.xxx.xxx(32771) streaming 2/970D59E0
postgres 25869 0.0 0.0 476036 2496 ? Ss Apr22 0:00 postgres: wal sender process pguser xxx.xxx.xxx.xxx(53426) streaming 2/970D59E0
postgres 25870 0.0 0.0 476036 2496 ? Ss Apr22 0:00 postgres: wal sender process pguser xxx.xxx.xxx.xxx(39631) streaming 2/970D59E0
postgres 25871 0.0 0.0 476036 2496 ? Ss Apr22 0:00 postgres: wal sender process pguser xxx.xxx.xxx.xxx(36204) streaming 2/970D59E0
ReplyDelete
Replies
NenominalMay 19, 2011 at 11:22 PM
Hello there,

great tutorial for Postgresql replication. Thanks for taking from your time and share with us.
The problem is that I have the exactly same problem with "Anonymous" .
For few days now I'm trying to figure out what is the problem but without success.

I'm using Centos 5.6 and Postgresql 9.04.
I initialize the database on master and then I change the configuration for postgresql.conf.
I created a script for rsync and point in time backup:

#/bin/sh -x
#
# XXX We assume master and recovery host uses the same port number
PORT=5432
master_node_host_name=master
master_db_cluster=/var/database/pgsql/9.0/data
recovery_node_host_name=slave
recovery_db_cluster=/var/database/pgsql/9.0/data
tmp=/tmp/mytemp$$
trap "rm -f $tmp" 0 1 2 3 15

psql -p $PORT -c "SELECT pg_start_backup('Streaming Replication', true)"
rsync -av $master_db_cluster/ $recovery_node_host_name:$recovery_db_cluster/ --exclude pg_xlog --exclude pg_xlogarch --exclude postgresql.conf --exclude postmaster.pid
ssh -T $recovery_node_host_name mkdir $recovery_db_cluster/pg_xlog
ssh -T $recovery_node_host_name chmod 700 $recovery_db_cluster/pg_xlog
ssh -T $recovery_node_host_name rm -f $recovery_db_cluster/recovery.done

cat > $tmp <standby_mode = 'on'
primary_conninfo = 'host=master port=5432 user=postgres'
trigger_file = '/tmp/pg_failover_trigger'
restore_command = 'cp /var/database/pgsql/9.0/data/pg_xlogarch/%f "%p"'
EOF

scp $tmp $recovery_node_host_name:$recovery_db_cluster/recovery.conf

psql -p $PORT -c "SELECT pg_stop_backup()"

rsync -av $master_db_cluster/pg_xlog $recovery_node_host_name:$recovery_db_cluster/
rsync -av $master_db_cluster/pg_xlogarch $recovery_node_host_name:$recovery_db_

After that I make the modifications to postgresql.conf on "slave" and start it.
I got the same errors with not finding the files.
The good thing is that the servers seems they are in synced. I created a database on master and feel it with data and the shut it down and I saw that all that is well on "slave".
I installed and tried zillions times before a post it here :)
I'm out of ideas and I felt like I should ask for some help if anyone is able to give.
ReplyDelete
Replies
manish singhMay 24, 2011 at 9:52 PM
Sorry for my blunder...
actually by mistake i made a entry in recovery.conf

standby_mode = 'on'
primary_conninfo = 'host=172.32.1.1 port=5432 user=readonly'
restore_command = 'cp /usr/local/postgres/data/pg_xlogarch/%f "%p"'

but in restore_command the line must be:
restore_command = 'cp /usr/local/postgres/data/pg_xlog/%f "%p"'
ReplyDelete
Replies
NenominalMay 25, 2011 at 3:11 AM
Hi,

Thank you for this answer but I still have those errors:

LOG: database system was interrupted; last known up at 2011-05-25 00:05:36 EEST
LOG: creating missing WAL directory "pg_xlog/archive_status"
LOG: entering standby mode
cp: cannot stat `/var/database/pgsql/9.0/data/pg_xlog/00000001000000000000000E': No such file or directory
LOG: streaming replication successfully connected to primary
LOG: redo starts at 0/E000020
LOG: consistent recovery state reached at 0/F000000
LOG: database system is ready to accept read only connections

I corrected the restore command as you said but I still don't get it why I have the errors.

Question: do I have to transfer pg_xlog and pg_xlogarch from the master to slave or the master will push the files after the slave
connects to it? I tried with and without sync of those folders on the slave but also with no luck.
ReplyDelete
Replies
manish singhMay 25, 2011 at 12:03 PM
@Nenominal

can you send postgresql.conf (from Master)and recovery.conf (slave's one)
ReplyDelete
Replies
NenominalMay 25, 2011 at 2:20 PM
@manish:

This is for master: http://pastebin.com/ZfT0DssS
This is for slave: http://pastebin.com/qNFXHmji
And recovery.conf :http://pastebin.com/yjJYtxhi

On the other hand, are you sure it is pg_xlog in recovery command. I read on wiki page from Postgresql and I found this:

restore_command = 'cp -f /var/lib/postgresql/data/archive/%f %p'

As far as I understand, "archive" folder is xlogarch in our case.

Thank you for taking your time to debug this with me :)
ReplyDelete
Replies
manish singhMay 25, 2011 at 3:17 PM
i also find the error message with one of my slave... (with old settings in post)

and also find, the issue (no such file .....) is totally harmless.... bcz of streaming nature of this replication... whenever you restart the slave... it tries to find a file

NOTE: the Update in my post is right.... pg_xlogarch is recovery on master is right... but on slave... by default the restore will set in pg_xlog ... you can prove this nature... by just list file which you got (not found) in log or stdout

so like this... Just on slave run this command as example

log:
cp: cannot stat `/usr/local/pgsql/data/pg_xlogarch/000000010000000300000029': No such file or directory

# ls /usr/local/pgsql/data/pg_xlogarch/000000010000000300000029
ls: /usr/local/pgsql/data/pg_xlogarch/000000010000000300000029: No such file or directory

# ls -lht /usr/local/pg/dasqlta/pg_xlog/000000010000000300000029
-rw------- 1 postgres postgres 16M May 25 14:42 /usr/local/pgsql/data/pg_xlog/000000010000000300000029
ReplyDelete
Replies
NenominalMay 25, 2011 at 3:40 PM
Ok:). After some playing around and create and drop some test databases on master seems that the error is gone.
Anyway...as I said on some of my previous posts the databases seemed always in sync.
Great job and thank you for your time!
ReplyDelete
Replies
Reza FarasdakDecember 26, 2012 at 1:20 PM
Hi..
As i know, error "FATAL: the database system is starting up" is caused by Master Server still send the backup and "pg_stop_backup()" is not execute yet.

The problem solved when i do psql -c "SELECT pg_stop_backup()"

Thanks.
ReplyDelete
Replies

Add comment