1.3. Creating a Postgres-XL cluster

As mentioned in the architectural fundamentals, Postgres-XL is a collection of multiple components. It can be a bit of work to come up with your initial working setup. In this tutorial, we will show how one can start with an empty configuration file and use the pgxc_ctl utility to create your Postgres-XL cluster from scratch.

A few pre-requisites are necessary on each node that is going to be a part of the Postgres-XL setup.

The pgxc_ctl utility should be present in your PATH. If it is not there, it can be compiled from source.

$ cd $XLSRC/src/bin/pgxc_ctl
$ make install

We are now ready to prepare our template configuration file. The pgxc_ctl utility allows you to create three types of configuration. We will choose the empty configuration which will allow us to create our Postgres-XL setup from scratch. Note that we also need to set up the dataDirRoot environment variable properly for all future invocations of pgxc_ctl.

$ export dataDirRoot=$HOME/DATA/pgxl/nodes
$ mkdir $HOME/pgxc_ctl
$ pgxc_ctl
Installing pgxc_ctl_bash script as /Users/postgres/pgxc_ctl/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /Users/postgres/pgxc_ctl/pgxc_ctl_bash.
Reading configuration using /Users/postgres/pgxc_ctl/pgxc_ctl_bash --home
/Users/postgres/pgxc_ctl --configuration
/Users/postgres/pgxc_ctl/pgxc_ctl.conf
Finished reading configuration.
   ******** PGXC_CTL START ***************

   Current directory: /Users/postgres/pgxc_ctl
PGXC$  prepare config empty
PGXC$  exit

The empty configuration file is now ready. You should now make changes to the pgxc_ctl.conf. At a minimum, pgxcOwner should be set correctly. The configuration file does contain USERi and HOME environment variables to allow easy defaults for the current user.

The next step is to add the GTM master to the setup.

$ pgxc_ctl
PGXC$  add gtm master gtm localhost 20001 $dataDirRoot/gtm

Use the "monitor" command to check the status of the cluster.

$ pgxc_ctl
PGXC$  monitor all
Running: gtm master

Let us now add a couple of coordinators. When the first coordinator is added, it just starts up. When another coordinator is added, it connects to any existing coordinator node to fetch the metadata about objects.

PGXC$  add coordinator master coord1 localhost 30001 30011 $dataDirRoot/coord_master.1 none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
PGXC$  add coordinator master coord2 localhost 30002 30012 $dataDirRoot/coord_master.2 none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2

Moving on to the addition of a couple of datanodes, now. When the first datanode is added, it connects to any existing coordinator node to fetch global metadata. When a subsequent datanode is added, it connects to any existing datanode for the metadata.

PGXC$  add datanode master dn1 localhost 40001 40011 $dataDirRoot/dn_master.1 none none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
PGXC$  add datanode master dn2 localhost 40002 40012 $dataDirRoot/dn_master.2 none none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2

Your Postgres-XL setup is ready now and you can move on to the next "Getting Started" topic.

Read on further, only if you want a quick crash course on the various commands you can try out with Postgres-XL. It is strongly recommended to go through the entire documentation for more details on each and every command that we will touch upon below.

Connect to one of the coordinators and create a test database.

$  psql -p 30001 postgres
postgres=# CREATE DATABASE testdb;
CREATE DATABASE
postgres=# \q

Look at pgxc_node catalog. It should show all the configured nodes. It is normal to have negative node id values. This will be fixed soon.

$  psql -p 30001 testdb
testdb=# SELECT * FROM pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 dn2       | D         |     40002 | localhost | f              | t                |   352366662
(4 rows)

Let us now create a distributed table, distributed on first column by HASH.

testdb=# CREATE TABLE disttab(col1 int, col2 int, col3 text) DISTRIBUTE BY HASH(col1);
CREATE TABLE
testdb=# \d+ disttab
                        Table "public.disttab"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              | 
 col2   | integer |           | plain    |              | 
 col3   | text    |           | extended |              | 
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: ALL DATANODES

Also create a replicated table.

testdb=# CREATE TABLE repltab (col1 int, col2 int) DISTRIBUTE BY
REPLICATION;
CREATE TABLE
testdb=# \d+ repltab
                       Table "public.repltab"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 col1   | integer |           | plain   |              | 
 col2   | integer |           | plain   |              | 
Has OIDs: no
Distribute By: REPLICATION
Location Nodes: ALL DATANODES

Now insert some sample data in these tables.

testdb=# INSERT INTO disttab SELECT generate_series(1,100), generate_series(101, 200), 'foo';
INSERT 0 100
testdb=# INSERT INTO repltab SELECT generate_series(1,100), generate_series(101, 200);
INSERT 0 100

Ok. So the distributed table should have 100 rows

testdb=# SELECT count(*) FROM disttab;
 count 
-------
   100
(1 row)

And they must not be all on the same node. xc_node_id is a system column which shows the originating datanode for each row. Note that the distribution can be slightly uneven because of the HASH function

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    42
  352366662 |    58
(2 rows)

For replicated tables, we expect all rows to come from a single datanode (even though the other node has a copy too).

testdb=# SELECT count(*) FROM repltab;
 count 
-------
   100
(1 row)

testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |   100
(1 row)

Now add a new datanode to the cluster.

PGXC$  add datanode master dn3 localhost 40003 40013 $dataDirRoot/dn_master.3 none none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3

Note that during cluster reconfiguration, all outstanding transactions are aborted and sessions are reset. So you would typically see errors like these on open sessions

testdb=# SELECT * FROM pgxc_node;
ERROR:  canceling statement due to user request             <==== pgxc_pool_reload() resets all sessions and aborts all open transactions

testdb=# SELECT * FROM pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 dn2       | D         |     40002 | localhost | f              | t                |   352366662
 dn3       | D         |     40003 | localhost | f              | f                |  -700122826
(5 rows)

Note that with new datanode addition, Existing tables are not affected. The distribution information now explicitly shows the older datanodes

testdb=# \d+ disttab
                        Table "public.disttab"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              | 
 col2   | integer |           | plain    |              | 
 col3   | text    |           | extended |              | 
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: dn1, dn2

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    42
  352366662 |    58
(2 rows)

testdb=# \d+ repltab
                       Table "public.repltab"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 col1   | integer |           | plain   |              | 
 col2   | integer |           | plain   |              | 
Has OIDs: no
Distribute By: REPLICATION
Location Nodes: dn1, dn2

Let us now try to redistribute tables so that they can take advantage of the new datanode

testdb=# ALTER TABLE disttab ADD NODE (dn3);
ALTER TABLE
testdb=# \d+ disttab
                        Table "public.disttab"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              | 
 col2   | integer |           | plain    |              | 
 col3   | text    |           | extended |              | 
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: ALL DATANODES

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -700122826 |    32
  352366662 |    32
 -560021589 |    36
(3 rows)

Let us now add a third coordinator.

PGXC$  add coordinator master coord3 localhost 30003 30013 $dataDirRoot/coord_master.3 none none
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: coordinator master coord3
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3

testdb=# SELECT * FROM pgxc_node;
ERROR:  canceling statement due to user request
testdb=# SELECT * FROM pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 dn2       | D         |     40002 | localhost | f              | t                |   352366662
 dn3       | D         |     40003 | localhost | f              | f                |  -700122826
 coord3    | C         |     30003 | localhost | f              | f                |  1638403545
(6 rows)

We can try some more ALTER TABLE so as to delete a node from a table distribution and add it back

testdb=# ALTER TABLE disttab DELETE NODE (dn1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
  352366662 |    42
 -700122826 |    58
(2 rows)

testdb=# ALTER TABLE disttab ADD NODE (dn1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -700122826 |    32
  352366662 |    32
 -560021589 |    36
(3 rows)

You could also alter a replicated table to make it a distributed table. Note that even though the cluster has 3 datanodes now, the table will continue to use only 2 datanodes where the table was originally replicated on.

testdb=# ALTER TABLE repltab DISTRIBUTE BY HASH(col1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    42
  352366662 |    58
(2 rows)

testdb=# ALTER TABLE repltab DISTRIBUTE BY REPLICATION;
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |   100
(1 row)

Remove the coordinator added previously now. You can use the "clean" option to remove the corresponding data directory as well.

PGXC$  remove coordinator master coord3 clean
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3

testdb=# SELECT oid, * FROM pgxc_node;
ERROR:  canceling statement due to user request
testdb=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11197 | coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 16384 | coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 16385 | dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 16386 | dn2       | D         |     40002 | localhost | f              | t                |   352366662
 16397 | dn3       | D         |     40003 | localhost | f              | f                |  -700122826
(5 rows)

Let us try to remove a datanode now. NOTE: Postgres-XL does not employ any additional checks to ascertain if the datanode being dropped has data from tables that are replicated/distributed. It is the responsibility of the user to ensure that it's safe to remove a datanode. You can use the below query to find out if the datanode being removed has any data on it. Do note that this only shows tables from the current database. You might want to ensure the same for all databases before going ahead with the datanode removal. Use the OID of the datanode that is to be removed in the below query:

testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397];
 pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets |     nodeoids      
---------+---------------+----------+-----------------+---------------+-------------------
   16388 | H             |        1 |               1 |          4096 | 16385 16386 16397
(1 row)


testdb=# ALTER TABLE disttab DELETE NODE (dn3);
ALTER TABLE
testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397];
 pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets | nodeoids 
---------+---------------+----------+-----------------+---------------+----------
(0 rows)

Ok, it is safe to remove datanode "dn3" now.

PGXC$  remove datanode master dn3 clean
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2

testdb=# SELECT oid, * FROM pgxc_node;
ERROR:  canceling statement due to user request
testdb=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11197 | coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 16384 | coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 16385 | dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 16386 | dn2       | D         |     40002 | localhost | f              | t                |   352366662
(4 rows)

The pgxc_ctl utility can also help in setting up slaves for datanodes and coordinators. Let us setup a slave for a datanode and see how failover can be performed in case the master datanode goes down.

PGXC$  add datanode slave dn1 localhost 40101 40111 $dataDirRoot/dn_slave.1 none $dataDirRoot/datanode_archlog.1
PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode slave dn1
Running: datanode master dn2

testdb=# EXECUTE DIRECT ON(dn1) 'SELECT client_hostname, state, sync_state FROM pg_stat_replication';
 client_hostname |   state   | sync_state 
-----------------+-----------+------------
                 | streaming | async
(1 row)

Add some more rows to test failover now.

testdb=# INSERT INTO disttab SELECT generate_series(1001,1100), generate_series(1101, 1200), 'foo';
INSERT 0 100
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    94
  352366662 |   106
(2 rows)

Let us simulate datanode failover now. We will first stop the datanode master "dn1" for which we configured a slave above. Note that since the slave is connected to the master we will use "immediate" mode for stopping it.

PGXC$  stop -m immediate datanode master dn1

Since a datanode is down, queries will fail. Though a few queries may still work if the failed node is not required to run the query, and that is determined by the distribution of the data and the WHERE clause being used.

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
ERROR:  Failed to get pooled connections

testdb=# SELECT xc_node_id, * FROM disttab WHERE col1 = 3;
 xc_node_id | col1 | col2 | col3 
------------+------+------+------
  352366662 |    3 |  103 | foo
(1 row)

We will now perform the failover and check that everything is working fine post that.

PGXC$  failover datanode dn1

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
ERROR:  canceling statement due to user request
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    94
  352366662 |   106
(2 rows)

The pgxc_node catalog now should have updated entries. Especially, the failed over datanode node_host and node_port should have been replaced with the slave's host and port values.

testdb=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11197 | coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 16384 | coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 16386 | dn2       | D         |     40002 | localhost | f              | t                |   352366662
 16385 | dn1       | D         |     40101 | localhost | t              | t                |  -560021589
(4 rows)

PGXC$  monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2