-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Both InnoDB and PostgreSQL - as well as many other databases - use a technique called multi-version concurrency control (MVCC) to provide transaction isolation: transactions should not see the work of other, uncommitted transactions. MVCC means that, when a row is updated, the database stores both the old and new versions of the row.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Thursday, October 14, 2010

Postgresql 9.0 Replication - Step by Step



Postgresql 9.0 Streaming Replication - A Big Competition  



 Now Postgresql 9.0 has come with in-built replication features.  NOW A BIG COMPETITOR FOR MYSQL


This method for those who wants to satisfy them by implementing Asynchronous mode of replicationWho wants Synchronous method, and then have to wait for 9.1 to come in existence.I tried to build a Postgresql Master-Slave replication system with new feature “Streaming Replication (SR)”, and this I have to say, this is very easy and stable method for asynchronous method for replication.While building the SR, I felt, this type of SR is like Mysql Master-slave replication,  Slaves are getting updates from master from log files (binlog in cas e of mysql and xlog in postgres)

The above statement can be taken as rough idea when comparing  Mysql and Postgresql replication method

Steps Included:

1.       Download and build, install Postgresql from source.

2.       Initialize database on master

3.       Some tweaks in postgresql.conf (On Master)

4.       Start backup mode on master

5.       Take snapshot of Data (/usr/local/postgres/data) directory on master

6.       Sync snapshot to slave and remove postmaster.pid  on slave

7.       Some tweaks in postgresql.conf and recovery.conf (On Slaves)

8.       At end start the Database on Master and slave



Download the Database:  (On Master)

# wget http://wwwmaster.postgresql.org/redir/198/h/source/v9.0.0/postgresql-9.0.0.tar.bz2
# tar xzf postgresql-9.0.0.tar.bz2

If you have fresh CentOS/other linux version, then maybe you have to download and install  required packages…  In my case I am Using CentOS 5.3 64 Bit

Run the below command as “ROOT”
# yum install -y glibc gcc libtermcap-devel readline-devel zlib-devel
# useradd postgres
# chown –R postgres:postgres postgresql-9.0.0
# su – postgres
# cd postgresql-9.0.0  (extracted directory)


Run the following command to compile and install Postgresql

# ./configure –prefix=/usr/local/postgres –with-perl
# gmake
# su –
# gmake install
# chown -R postgres:postgres /usr/local/postgres
# mkdir /usr/local/postgres/data
# chown postgres /usr/local/postgres/data
# su  - postgres
# /usr/local/postgres/bin/initdb -D /usr/local/postgres/data


After the initializing the data directory, we have to tweak---
“/usr/local/postgres/data/postgresql.conf“ file
I am showing you required things only


listen_addresses = '*'        
max_connections = 500
shared_buffers = 32MB
wal_level = hot_standby
archive_mode = on
archive_command = 'cp %p /usr/local/postgres/data/pg_xlogarch/%f'
max_wal_senders = 1    ####( this setting for  1 slave can connect with Master)
wal_keep_segments = 32
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
# mkdir  /usr/local/postgres/data/pg_xlogarch/
# su – postgres
# cd /usr/local/postgres/
# vi data/pg_hba.conf (add the below line)

host    replication             all             172.32.1.2/32            trust


Now start the Database Instance on Master:

As user postgres


# /usr/local/postgres/bin/pg_ctl -D /usr/local/postgres/data -l logfile start

After starting the database Instance, start the backup mode on DB Master
# cd/usr/local/postgres/bin/
# ./psql –c "SELECT pg_start_backup('initial backup for SR')" = database name for which you want replication

# cd /usr/local/postgres/
# tar  -czf data.tar.gz        data/

 
Exit from Backup mode 
# /usr/local/postgres/bin/psql –c "SELECT pg_stop_backup()"

# rsync –avz  data.tar.gz postgres@172.32.1.2:/usr/local/postgrsql/  (run this command after the slave’s Postgresql installation)

On SLAVE:
 
Same as master download and install Postgresql on slave
 
Just do not initialize the database:

As user Postgres
# cd  /usr/local/postgrsql
# tar xzf data.tar.gz
# chown –R postgres:postgres  /usr/local/postgres/data/pg_xlogarch
# cd data
# rm –rf postmaster.pid

# vi  /usr/local/postgrsql/data/Postgresql.conf

In short:
listen_addresses = '*'          # Which IP address(es) to listen on
max_connections = 500        # (change requires restart)shared_buffers = 32MB    #min 128kB
hot_standby = ondatestyle = 'iso, mdy'lc_messages = 'en_US.UTF-8'   # locale for system error message
lc_monetary = 'en_US.UTF-8'        # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'          # locale for number formatting
lc_time = 'en_US.UTF-8'  # locale for time formatting
default_text_search_config = 'pg_catalog.english'


Now create a “recovery.conf under /usr/local/postgres/data/ and put the below lines: (In Slave)


standby_mode = 'on'
primary_conninfo = 'host=172.32.1.1 port=5432 user=readonly'
restore_command = 'cp /usr/local/postgres/data/pg_xlog/%f  %p'

 

That's it we are done with configurations.

Start the database instance on slave machine
# /usr/local/postgres/bin/pg_ctl -D /usr/local/postgres/data -l logfile start
If all goes well, the on master, you will see like this on Master:
LOG: replication connection authorized: user=readonly host=172.32.1.2 port=55811



And on slave’s log file (logfile)
LOG: streaming replication successfully connected to primary
If fails, check "logfile" and send me error (if in case you feel so)


New Addition: If you want to check, which pgsql server is Master or Slave, then run this command:

#  /usr/local/postgres/bin/pg_controldata /usr/local/postgres/data/   (running on Master)
pg_control version number:            903
Catalog version number:               201008051
Database system identifier:           5572911747953590263
Database cluster state:               in production  (This shows, it is master server)
pg_control last modified:             Fri 06 May 2011 05:09:20 PM IST
Latest checkpoint location:           2/97482E60
............


On Slave:


Same Line will show as:
Database cluster state:               in archive recoveryYou can also use below one: to check


Slave Status:
Postgres=# select pg_last_xlog_replay_location();
 pg_last_xlog_replay_location
------------------------------
 7B/35F312B0
(1 row)


isten_addresses = '*'          # what IP address(es) to listen on;
max_connections = 500                   # (change requires restart)
shared_buffers = 32MB                   # min 128kB
hot_standby = on
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'                     # locale for system error message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting

26 comments:

  1. great article... now i bet, mysql will feal in danger now.

    ReplyDelete
  2. Nice one Manish. Your instructions are very clear.

    ReplyDelete
  3. Hi,
    I have followed above steps in centos, it was working fine.
    While testing in the REDHAT, we are getting “the replication is successful” in both master and slave. But in slave while connecting to the db, we are getting fatal error “database is not starting up “.

    Below are contents of logfile

    cp: cannot stat `/usr/local/pgsql/data/pg_xlogarch/000000010000000000000002': No such file or directory
    LOG: redo starts at 0/2000020
    LOG: record with zero length at 0/20000B0
    cp: cannot stat `/usr/local/pgsql/data/pg_xlogarch/000000010000000000000002': No such file or directory
    LOG: streaming replication successfully connected to primary
    FATAL: the database system is starting up
    FATAL: the database system is starting up

    could you please help on this.

    ReplyDelete
  4. One thing i forgot to writeup there...

    On master: in pg_hba.conf file...
    add this line:

    host replication all 172.32.1.2/32 trust

    Then restart the Master DB app and and on slave also....

    @Anonymous:

    I check the issue... will let you know... Just do one thing...
    Make sure your Master and slave both must have "pg_xlogarch" directory and replication user must have privileges to replication....

    ReplyDelete
  5. Hi,
    Thanks for ur reply.
    pg_xlogarch directory is present in both master and slave.
    under pg_xlogarch "000000010000000000000001" dir is created instead of "000000010000000000000002" This may be the reason for tht error. But we are not sure why some other dir has bee created.


    Could please tell me wat privilege we have to give for pg_logarch directory. So tht i cross verify whether corresponding privilege is present or not.

    Thanks

    ReplyDelete
  6. @Anonymous

    The permission should be of Postgres or what ever the user you have created for running Postgresql process.

    one more thing... have you followed the same process as above. Above configuration is just only for 1 master and 1 slave.

    Pl check Master is running perfectly or not

    Send below report
    On Master as well as on Slave

    ps waux | grep postgres

    ans also check log on Master and Slave....

    ReplyDelete
  7. I have executed ps waux | grep postgres
    in both master and slave, below are o/p of the same.

    Master
    ================

    -bash-4.1$ ps waux | grep postgres
    root 14230 0.0 0.0 161352 1860 pts/30 S 14:36 0:00 su - postgres
    postgres 14231 0.0 0.0 108364 1804 pts/30 S 14:36 0:00 -bash
    postgres 14253 0.0 0.0 159956 9224 pts/30 S 14:37 0:00 /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
    postgres 14255 0.0 0.0 160240 1848 ? Ss 14:37 0:00 postgres: writer process
    postgres 14256 0.0 0.0 159956 932 ? Ss 14:37 0:00 postgres: wal writer process
    postgres 14257 0.0 0.0 161040 2200 ? Ss 14:37 0:00 postgres: autovacuum launcher process
    postgres 14258 0.0 0.0 112984 852 ? Ss 14:37 0:00 postgres: archiver process failed on 000000010000000000000001
    postgres 14259 0.0 0.0 113008 968 ? Ss 14:37 0:00 postgres: stats collector process
    postgres 14330 0.0 0.0 161084 2372 ? Ss 14:47 0:00 postgres: wal sender process postgres 10.128.16.54(36020) streaming 0/20001D0
    postgres 14350 0.0 0.0 108084 1104 pts/30 R+ 14:49 0:00 ps waux
    postgres 14351 0.0 0.0 103276 884 pts/30 S+ 14:49 0:00 grep postgres



    ==========================================

    Slave
    ============


    [postgres@bangalore data]$ ps waux | grep postgres
    root 16577 0.0 0.0 161352 1860 pts/0 S 14:50 0:00 su - postgres
    postgres 16578 0.0 0.0 108492 1940 pts/0 S 14:50 0:00 -bash
    postgres 16616 0.2 0.0 160120 9196 pts/0 S 14:54 0:00 /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
    postgres 16617 0.0 0.0 160236 1428 ? Ss 14:54 0:00 postgres: startup process recovering 000000010000000000000002
    postgres 16620 0.0 0.0 160120 1096 ? Ss 14:54 0:00 postgres: writer process
    postgres 16621 0.0 0.0 166660 1748 ? Ss 14:54 0:00 postgres: wal receiver process streaming 0/2000140
    postgres 16625 8.0 0.0 108084 1100 pts/0 R+ 14:56 0:00 ps waux
    postgres 16626 0.0 0.0 103276 880 pts/0 S+ 14:56 0:00 grep postgres

    ReplyDelete
  8. Master's Logfile
    =================

    LOG: database system was shut down at 2011-04-01 14:27:37 IST
    LOG: database system is ready to accept connections
    LOG: autovacuum launcher started
    cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
    LOG: archive command failed with exit code 1
    DETAIL: The failed archive command was: cp pg_xlog/000000010000000000000001 /usr/local/pgsql/data/xlogarch/000000010000000000000001
    cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
    LOG: archive command failed with exit code 1
    DETAIL: The failed archive command was: cp pg_xlog/000000010000000000000001 /usr/local/pgsql/data/xlogarch/000000010000000000000001
    cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
    LOG: archive command failed with exit code 1

    ===============================================

    Slave_Logfile
    ==============

    LOG: database system was shut down at 2011-04-01 14:27:37 IST
    LOG: database system is ready to accept connections
    LOG: autovacuum launcher started
    cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
    LOG: archive command failed with exit code 1
    DETAIL: The failed archive command was: cp pg_xlog/000000010000000000000001 /usr/local/pgsql/data/xlogarch/000000010000000000000001
    cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
    LOG: archive command failed with exit code 1
    DETAIL: The failed archive command was: cp pg_xlog/000000010000000000000001 /usr/local/pgsql/data/xlogarch/000000010000000000000001
    cp: cannot create regular file `/usr/local/pgsql/data/xlogarch/000000010000000000000001': No such file or directory
    LOG: archive command failed with exit code 1
    DETAIL: The failed archive command was: cp pg_xlog/000000010000000000000001 /usr/local/pgsql/data/xlogarch/000000010000000000000001
    WARNING: transaction log file "000000010000000000000001" could not be archived: too many failures
    LOG: database system was interrupted; last known up at 2011-04-01 14:39:16 IST
    LOG: entering standby mode
    cp: cannot stat `/usr/local/pgsql/data/pg_xlogarch/000000010000000000000002': No such file or directory
    LOG: redo starts at 0/2000020
    LOG: record with zero length at 0/20000B0
    cp: cannot stat `/usr/local/pgsql/data/pg_xlogarch/000000010000000000000002': No such file or directory
    LOG: streaming replication successfully connected to primary

    ReplyDelete
  9. I have checked the permissionof pg_xlogarch directory in both master and slave, it is postgres only.

    Above are the attached logfile and report for ps waux | postgres.

    Could you please help on the same.


    Can u please let me know how to increase the debug level,so that more messages come in logfile, it will be very helpful for debugging.


    Thanks.

    ReplyDelete
  10. Hi

    as seen O/P of master log... have put recovery file under data on Master server?.. that seems like that....

    That may be the problem... that is taking Master in recovery mode.. and bcz of this.. slave not able to connect.

    ReplyDelete
  11. CORRECTION:

    as seen O/P of master log... have you put recovery file under data on Master server?.. that seems like that.... if so.. then pl. remove that... and restart the processes on master then slave.

    That may be the problem... that is taking Master in recovery mode.. and bcz of this.. slave not able to connect.

    ReplyDelete
  12. Thanks for the reply Manish.

    recovery.conf file is present only in slave. It is not present in master m/c.

    ReplyDelete
  13. For debug... get original postgresql.conf.. there u can get the what to log, when to log and where to log... and so many other options too...

    I still want the output of:
    ls -lht /usr/local/pgsql/
    ls -lht /usr/local/pgsql/data/


    If you do not have issue.. can i get access to your system to get the main issue...

    we can you teamviewer or like... so you can also view what is happening...

    ReplyDelete
  14. slave's O/P
    ===========


    -bash-4.1$ ls -lht /usr/local/pgsql
    total 5.4M
    drwx------. 13 postgres postgres 4.0K Apr 1 17:30 data
    -rw-r--r--. 1 root root 5.4M Apr 1 17:15 data.tar.gz
    drwxr-xr-x. 3 postgres postgres 4.0K Apr 1 16:37 lib
    drwxr-xr-x. 2 postgres postgres 4.0K Apr 1 16:37 bin
    drwxr-xr-x. 5 postgres postgres 4.0K Apr 1 16:37 share
    drwxr-xr-x. 6 postgres postgres 4.0K Apr 1 16:37 include
    -bash-4.1$
    -bash-4.1$
    -bash-4.1$
    -bash-4.1$ ls -lht /usr/local/pgsql/data
    total 120K
    -rw-------. 1 postgres postgres 4.0K Apr 1 17:30 logfile
    drwx------. 2 postgres postgres 4.0K Apr 1 17:30 pg_notify
    -rw-------. 1 postgres postgres 59 Apr 1 17:30 postmaster.opts
    -rw-------. 1 postgres postgres 48 Apr 1 17:30 postmaster.pid
    -rw-------. 1 postgres postgres 3.6K Apr 1 17:29 logfile1
    -rw-r--r--. 1 postgres postgres 148 Apr 1 17:29 recovery.conf
    drwx------. 2 postgres postgres 4.0K Apr 1 17:28 pg_stat_tmp
    drwx------. 2 postgres postgres 4.0K Apr 1 17:28 global
    -rw-------. 1 postgres postgres 18K Apr 1 17:26 postgresql.conf
    -rw-r--r--. 1 postgres postgres 18K Apr 1 17:26 12
    -rw-------. 1 postgres postgres 142 Apr 1 17:15 backup_label.old
    drwx------. 3 postgres postgres 4.0K Apr 1 17:15 pg_xlog
    drwxr-xr-x. 2 postgres postgres 4.0K Apr 1 17:15 pg_xlogarch
    drwx------. 6 postgres postgres 4.0K Apr 1 17:12 base
    -rw-------. 1 postgres postgres 3.9K Apr 1 17:09 pg_hba.conf
    drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_clog
    -rw-------. 1 postgres postgres 1.6K Apr 1 17:01 pg_ident.conf
    drwx------. 4 postgres postgres 4.0K Apr 1 17:01 pg_multixact
    drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_subtrans
    drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_tblspc
    drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_twophase
    -rw-------. 1 postgres postgres 4 Apr 1 17:01 PG_VERSION
    -bash-4.1$



    master's o/p
    ============


    -bash-4.1$ ls -lht /usr/local/pgsql
    total 5.4M
    -rw-r--r--. 1 postgres postgres 5.4M Apr 1 17:15 data.tar.gz
    drwx------. 13 postgres postgres 4.0K Apr 1 17:15 data
    drwxr-xr-x. 3 postgres postgres 4.0K Apr 1 17:01 lib
    drwxr-xr-x. 2 postgres postgres 4.0K Apr 1 17:01 bin
    drwxr-xr-x. 5 postgres postgres 4.0K Apr 1 17:01 share
    drwxr-xr-x. 6 postgres postgres 4.0K Apr 1 17:01 include
    -bash-4.1$ ls -lht /usr/local/pgsql/data
    total 92K
    drwx------. 2 postgres postgres 4.0K Apr 1 17:25 pg_stat_tmp
    -rw-------. 1 postgres postgres 1.5K Apr 1 17:22 logfile
    -rw-------. 1 postgres postgres 142 Apr 1 17:15 backup_label
    drwx------. 3 postgres postgres 4.0K Apr 1 17:15 pg_xlog
    drwxr-xr-x. 2 postgres postgres 4.0K Apr 1 17:15 pg_xlogarch
    drwx------. 2 postgres postgres 4.0K Apr 1 17:15 global
    drwx------. 6 postgres postgres 4.0K Apr 1 17:12 base
    drwx------. 2 postgres postgres 4.0K Apr 1 17:11 pg_notify
    -rw-------. 1 postgres postgres 59 Apr 1 17:11 postmaster.opts
    -rw-------. 1 postgres postgres 48 Apr 1 17:11 postmaster.pid
    -rw-------. 1 postgres postgres 3.9K Apr 1 17:09 pg_hba.conf
    -rw-------. 1 postgres postgres 18K Apr 1 17:08 postgresql.conf
    drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_clog
    drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_subtrans
    -rw-------. 1 postgres postgres 1.6K Apr 1 17:01 pg_ident.conf
    drwx------. 4 postgres postgres 4.0K Apr 1 17:01 pg_multixact
    drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_tblspc
    drwx------. 2 postgres postgres 4.0K Apr 1 17:01 pg_twophase
    -rw-------. 1 postgres postgres 4 Apr 1 17:01 PG_VERSION

    ReplyDelete
  15. Hi Manish,
    Whether u are able to find anything? Could u please help us regarding the same issue.

    ReplyDelete
  16. I think you should fallow same again... from scratch.... your problem... i out of my hands... until i see that myself...

    Remove the old installations and configuration...

    ReplyDelete
  17. Hi,
    We have a couple of queries,

    ===> How to find whether current postgres is running as master or slave?

    ====> What is the procedure to make the failed master as working slave?

    Could you please clarify above queries.

    ReplyDelete
  18. Hi
    Sorry for the late response... didn't checked the comment...

    So... there are number of ways (Standard and Non-Standard) to check the PG server is Master or Slave

    What i used is: this command

    pg_controldata /usr/local/postgres/data/

    if you Run PG Server the you will many lines... Check below Line in that:

    Database cluster state: in production (4th Line) -----> This Means, that is Master Server, it has the capability of Read/Write Both


    The same command will show this line, if on Slave

    Database cluster state: in archive recovery


    Other ways are....
    If you run this command... if on server then you will get this result

    ps aux | grep postgres
    postgres 25868 0.0 0.0 476036 2492 ? Ss Apr22 0:00 postgres: wal sender process pguser xxx.xxx.xxx.xxx(32771) streaming 2/970D59E0
    postgres 25869 0.0 0.0 476036 2496 ? Ss Apr22 0:00 postgres: wal sender process pguser xxx.xxx.xxx.xxx(53426) streaming 2/970D59E0
    postgres 25870 0.0 0.0 476036 2496 ? Ss Apr22 0:00 postgres: wal sender process pguser xxx.xxx.xxx.xxx(39631) streaming 2/970D59E0
    postgres 25871 0.0 0.0 476036 2496 ? Ss Apr22 0:00 postgres: wal sender process pguser xxx.xxx.xxx.xxx(36204) streaming 2/970D59E0

    ReplyDelete
  19. Hello there,

    great tutorial for Postgresql replication. Thanks for taking from your time and share with us.
    The problem is that I have the exactly same problem with "Anonymous" .
    For few days now I'm trying to figure out what is the problem but without success.

    I'm using Centos 5.6 and Postgresql 9.04.
    I initialize the database on master and then I change the configuration for postgresql.conf.
    I created a script for rsync and point in time backup:

    #/bin/sh -x
    #
    # XXX We assume master and recovery host uses the same port number
    PORT=5432
    master_node_host_name=master
    master_db_cluster=/var/database/pgsql/9.0/data
    recovery_node_host_name=slave
    recovery_db_cluster=/var/database/pgsql/9.0/data
    tmp=/tmp/mytemp$$
    trap "rm -f $tmp" 0 1 2 3 15

    psql -p $PORT -c "SELECT pg_start_backup('Streaming Replication', true)"
    rsync -av $master_db_cluster/ $recovery_node_host_name:$recovery_db_cluster/ --exclude pg_xlog --exclude pg_xlogarch --exclude postgresql.conf --exclude postmaster.pid
    ssh -T $recovery_node_host_name mkdir $recovery_db_cluster/pg_xlog
    ssh -T $recovery_node_host_name chmod 700 $recovery_db_cluster/pg_xlog
    ssh -T $recovery_node_host_name rm -f $recovery_db_cluster/recovery.done

    cat > $tmp <standby_mode = 'on'
    primary_conninfo = 'host=master port=5432 user=postgres'
    trigger_file = '/tmp/pg_failover_trigger'
    restore_command = 'cp /var/database/pgsql/9.0/data/pg_xlogarch/%f "%p"'
    EOF

    scp $tmp $recovery_node_host_name:$recovery_db_cluster/recovery.conf

    psql -p $PORT -c "SELECT pg_stop_backup()"

    rsync -av $master_db_cluster/pg_xlog $recovery_node_host_name:$recovery_db_cluster/
    rsync -av $master_db_cluster/pg_xlogarch $recovery_node_host_name:$recovery_db_


    After that I make the modifications to postgresql.conf on "slave" and start it.
    I got the same errors with not finding the files.
    The good thing is that the servers seems they are in synced. I created a database on master and feel it with data and the shut it down and I saw that all that is well on "slave".
    I installed and tried zillions times before a post it here :)
    I'm out of ideas and I felt like I should ask for some help if anyone is able to give.

    ReplyDelete
  20. Sorry for my blunder...
    actually by mistake i made a entry in recovery.conf

    standby_mode = 'on'
    primary_conninfo = 'host=172.32.1.1 port=5432 user=readonly'
    restore_command = 'cp /usr/local/postgres/data/pg_xlogarch/%f "%p"'

    but in restore_command the line must be:
    restore_command = 'cp /usr/local/postgres/data/pg_xlog/%f "%p"'

    ReplyDelete
  21. Hi,

    Thank you for this answer but I still have those errors:

    LOG: database system was interrupted; last known up at 2011-05-25 00:05:36 EEST
    LOG: creating missing WAL directory "pg_xlog/archive_status"
    LOG: entering standby mode
    cp: cannot stat `/var/database/pgsql/9.0/data/pg_xlog/00000001000000000000000E': No such file or directory
    LOG: streaming replication successfully connected to primary
    LOG: redo starts at 0/E000020
    LOG: consistent recovery state reached at 0/F000000
    LOG: database system is ready to accept read only connections



    I corrected the restore command as you said but I still don't get it why I have the errors.

    Question: do I have to transfer pg_xlog and pg_xlogarch from the master to slave or the master will push the files after the slave
    connects to it? I tried with and without sync of those folders on the slave but also with no luck.

    ReplyDelete
  22. @Nenominal

    can you send postgresql.conf (from Master)and recovery.conf (slave's one)

    ReplyDelete
  23. @manish:

    This is for master: http://pastebin.com/ZfT0DssS
    This is for slave: http://pastebin.com/qNFXHmji
    And recovery.conf :http://pastebin.com/yjJYtxhi

    On the other hand, are you sure it is pg_xlog in recovery command. I read on wiki page from Postgresql and I found this:

    restore_command = 'cp -f /var/lib/postgresql/data/archive/%f %p'

    As far as I understand, "archive" folder is xlogarch in our case.


    Thank you for taking your time to debug this with me :)

    ReplyDelete
  24. i also find the error message with one of my slave... (with old settings in post)

    and also find, the issue (no such file .....) is totally harmless.... bcz of streaming nature of this replication... whenever you restart the slave... it tries to find a file

    NOTE: the Update in my post is right.... pg_xlogarch is recovery on master is right... but on slave... by default the restore will set in pg_xlog ... you can prove this nature... by just list file which you got (not found) in log or stdout

    so like this... Just on slave run this command as example

    log:
    cp: cannot stat `/usr/local/pgsql/data/pg_xlogarch/000000010000000300000029': No such file or directory


    # ls /usr/local/pgsql/data/pg_xlogarch/000000010000000300000029
    ls: /usr/local/pgsql/data/pg_xlogarch/000000010000000300000029: No such file or directory

    # ls -lht /usr/local/pg/dasqlta/pg_xlog/000000010000000300000029
    -rw------- 1 postgres postgres 16M May 25 14:42 /usr/local/pgsql/data/pg_xlog/000000010000000300000029

    ReplyDelete
  25. Ok:). After some playing around and create and drop some test databases on master seems that the error is gone.
    Anyway...as I said on some of my previous posts the databases seemed always in sync.
    Great job and thank you for your time!

    ReplyDelete
  26. Hi..
    As i know, error "FATAL: the database system is starting up" is caused by Master Server still send the backup and "pg_stop_backup()" is not execute yet.

    The problem solved when i do psql -c "SELECT pg_stop_backup()"

    Thanks.

    ReplyDelete