Before anyone can access the database, you must start the database
server. The database server program is called
postgres
.
The postgres
program must know where to
find the data it is supposed to use. This is done with the
-D
option. Thus, the simplest way to start the
server is:
$ postgres -D /usr/local/pgsql/data
which will leave the server running in the foreground. This must be
done while logged into the PostgreSQL user
account. Without -D
, the server will try to use
the data directory named by the environment variable PGDATA
.
If that variable is not provided either, it will fail.
Normally it is better to start postgres
in the
background. For this, use the usual Unix shell syntax:
$ postgres -D /usr/local/pgsql/data >logfile 2>&1 &
It is important to store the server's stdout and stderr output somewhere, as shown above. It will help for auditing purposes and to diagnose problems. (See Section 24.3 for a more thorough discussion of log file handling.)
The postgres
program also takes a number of other
command-line options. For more information, see the
postgres reference page
and Chapter 19 below.
This shell syntax can get tedious quickly. Therefore the wrapper program pg_ctl is provided to simplify some tasks. For example:
pg_ctl start -l logfile
will start the server in the background and put the output into the
named log file. The -D
option has the same meaning
here as for postgres
. pg_ctl
is also capable of stopping the server.
Normally, you will want to start the database server when the
computer boots.
Autostart scripts are operating-system-specific.
There are a few distributed with
PostgreSQL in the
contrib/start-scripts
directory. Installing one will require
root privileges.
Different systems have different conventions for starting up daemons
at boot time. Many systems have a file
/etc/rc.local
or
/etc/rc.d/rc.local
. Others use init.d
or
rc.d
directories. Whatever you do, the server must be
run by the PostgreSQL user account
and not by root or any other user. Therefore you
probably should form your commands using
su postgres -c '...'
. For example:
su postgres -c 'pg_ctl start -D /usr/local/pgsql/data -l serverlog'
Here are a few more operating-system-specific suggestions. (In each case be sure to use the proper installation directory and user name where we show generic values.)
For FreeBSD, look at the file
contrib/start-scripts/freebsd
in the
PostgreSQL source distribution.
On OpenBSD, add the following lines
to the file /etc/rc.local
:
if [ -x /usr/local/pgsql/bin/pg_ctl -a -x /usr/local/pgsql/bin/postgres ]; then su -l postgres -c '/usr/local/pgsql/bin/pg_ctl start -s -l /var/postgresql/log -D /usr/local/pgsql/data' echo -n ' postgresql' fi
/usr/local/pgsql/bin/pg_ctl start -l logfile -D /usr/local/pgsql/data
to /etc/rc.d/rc.local
or /etc/rc.local
or look at the file
contrib/start-scripts/linux
in the
PostgreSQL source distribution.
When using systemd, you can use the following
service unit file (e.g.,
at /etc/systemd/system/postgresql.service
):
[Unit] Description=PostgreSQL database server Documentation=man:postgres(1) [Service] Type=notify User=postgres ExecStart=/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data ExecReload=/bin/kill -HUP $MAINPID KillMode=mixed KillSignal=SIGINT TimeoutSec=0 [Install] WantedBy=multi-user.target
Using Type=notify
requires that the server binary was
built with configure --with-systemd
.
Consider carefully the timeout setting. systemd has a default timeout of 90 seconds as of this writing and will kill a process that does not notify readiness within that time. But a PostgreSQL server that might have to perform crash recovery at startup could take much longer to become ready. The suggested value of 0 disables the timeout logic.
On NetBSD, use either the FreeBSD or Linux start scripts, depending on preference.
On Solaris, create a file called
/etc/init.d/postgresql
that contains
the following line:
su - postgres -c "/usr/local/pgsql/bin/pg_ctl start -l logfile -D /usr/local/pgsql/data"
Then, create a symbolic link to it in /etc/rc3.d
as
S99postgresql
.
While the server is running, its
PID is stored in the file
postmaster.pid
in the data directory. This is
used to prevent multiple server instances from
running in the same data directory and can also be used for
shutting down the server.
As described in the previous chapter, XL consists of
various components.
Minimum set of components are a GTM, GTM-Proxy, Coordinator and
Datanode.
You must configure and start each of them.
Following sections will give you how to configure and start them.
pgxc_clean
and GTM-Standby
are described in high-availability
sections.
You should initialize each database which
composes Postgres-XL database cluster system.
Both Coordinator and Datanode has its own database and you should
initialize these database.
Coordinator holds just database catalog and temporary data store.
Datanode holds most of your data.
First of all, you should determine how many Coordinators/Datanodes
to run and where they should run.
It is a good convention that you run a Coordinator where you run a
Datanode.
In this case, you should run GTM-Proxy
on the same
server too.
It simplifies XL configuration and help to make
workload of each servers even.
Both Coordinator and Datanode have their own databases, essentially PostgreSQL databases. They are separate and you should initialize them separately.
The GTM provides global transaction management feature to all the other components in Postgres-XL database cluster. Because the GTM handles transaction requirements from all the Coordinators and Datanodes, it is highly advised to run this in a separate server.
Before you start the GTM, you should decide followings:
Because the GTM receives all the request to begin/end transactions and to refer to sequence values, you should run the GTM in a separate server. If you run the GTM in the same server as Datanode or Coordinator, it will become harder to make a workload reasonably balanced.
Then, you should determine the GTM's working directory. Please create this directory before you run the GTM.
Next, you should determine listen address and port of the GTM. Listen
address can be either the IP address or host name which receives request
from other component, typically GTM-Proxy
.
You have a chance to run more than one GTM in one Postgres-XL cluster. For example, if you need a backup of GTM in high-availability environment, you need to run two GTMs. You should give unique GTM id to each of such GTMs. GTM id value begins with one.
When this is determined, you can initialize the GTM with the command initgtm, for example:
$
initgtm -Z gtm -D /usr/local/pgsql/data_gtm
All the parameters related to the GTM can be modified in
gtm.conf
located in data folder initialized by
initgtm
.
Then you can start the GTM as follows:
$
gtm -D /usr/local/pgsql/data_gtm
where -D
option specifies working directory of the GTM.
Alternatively, the GTM can be started using gtm_ctl
, for
example:
$
gtm_ctl -Z gtm start -D /usr/local/pgsql/data_gtm
A GTM-Proxy is not a mandatory component of Postgres-XL cluster but it can be used to group messages between the GTM and cluster nodes, reducing workload and the number of packages exchanged through network.
As described in the previous section, a GTM-Proxy
needs its
own listen address, port, working directory and GTM-Proxy ID, which should
be unique and begins with one. In addition, you should determine how many
working threads to run. You should also use the GTM's address and port to
start GTM-Proxy
.
Then, you need first to initialize a GTM-Proxy with
initgtm
, for example:
$
initgtm -Z gtm_proxy -D /usr/local/pgsql/data_gtm_proxy
All the parameters related to a GTM-Proxy can be modified in
gtm_proxy.conf
located in data folder initialized by
initgtm
.
Then, you can start a GTM-Proxy
like:
$
gtm_proxy -D /usr/local/pgsql/data_gtm_proxy
where -D
specifies GTM-Proxy
's working
directory.
Alternatively, you can start a GTM-Proxy using gtm_ctl
as
follows:
$
gtm_ctl start -Z gtm_proxy -D /usr/local/pgsql/data_gtm_proxy
Before starting Coordinator or Datanode, you must configure them.
You can configure Coordinator or Datanode by
editing postgresql.conf
file located at their working
directory as you specified by -D
option
in initdb
command.
Datanode is almost native PostgreSQL with some extensions.
Additional options in postgresql.conf
for the Datanode are as
follows:
This value is not just a number of connections you expect to each Coordinator. Each Coordinator backend has a chance to connect to all the Datanodes. You should specify number of total connections whole Coordinator may accept. For example, if you have five Coordinators and each of them may accept forty connections, you should specify 200 as this parameter value.
Even though your application does not intend to issue PREPARE
TRANSACTION
, a Coordinator may issue this internally when more than one
Datanodes are involved. You should specify this parameter the same value
as max_connections
.
The GTM needs to identify each Datanode, as specified by this parameter. The value should be unique and start with one.
Because both Coordinator and Datanode may run on the same server, you may want to assign separate port number to the Datanode.
Specify the port number of the GTM-Proxy, as specified in -p
option in gtm_proxy
or gtm_ctl
.
Specify the host name or IP address of the GTM-Proxy, as specified
in -h
option in gtm_proxy
or gtm_ctl
.
For some joins that occur in queries, data from one Datanode may need to be joined with data from another Datanode. Postgres-XL uses shared queues for this purpose. During execution each Datanode knows if it needs to produce or consume tuples, or both.
Note that there may be mulitple shared_queues used even for a single query. So a value should be set taking into account the number of connections it can accept and expected number of such joins occurring simultaneously.
This parameter sets the size of each each shared queue allocated.
Although Coordinators and Datanodes shares the same binary, their configuration is a little different due to their functionalities.
You don't have to take other Coordinators or Datanodes into account. Just specify the number of connections the Coordinator accepts from applications.
Specify at least total number of Coordinators in the cluster.
The GTM needs to identify each Datanode, as specified by this parameter.
Because both a Coordinator and Datanode may run on the same server, you may want to assign separate port numbers to the Coordinator. It may be convenient to use default value of PostgreSQL listen port.
Specify the port number of the GTM-Proxy, as specified in -p
option in gtm_proxy
or gtm_ctl
.
Specify the host name or IP address of the GTM-Proxy, as specified in
-h
option in gtm_proxy
or gtm_ctl
.
Specify the port number that the pooler should use. This must not conflict with any other server ports used on this host.
A Coordinator maintains connections to Datanodes as a pool. This
parameter specifies max number of connections the Coordinator maintains.
Specify max_connection
value of remote nodes as this parameter
value.
This is the minimum number of Coordinators to remote node connections maintained by the pooler. Typically specify 1.
This parameter specifies how long to keep the connection alive. If older than this amount, the pooler discards the connection. This parameter is useful in multi-tenant environments where many connections to many different databases may be used, so that idle connections may cleaned up. It is also useful for automatically closing connections occasionally in case there is some unknown memory leak so that this memory can be freed.
This parameter specifies how long to wait until pooler maintenance is performed. During such maintenance, old idle connections are discarded. This parameter is useful in multi-tenant environments where many connections to many different databases may be used, so that idle connections may cleaned up.
This parameter specifies the cost overhead of setting up a remote query to obtain remote data. It is used by the planner in costing queries.
This parameter is used in query cost planning to estimate the cost involved in row shipping and obtaining remote data based on the expected data size. Row shipping is expensive and adds latency, so this setting helps to favor plans that minimizes row shipping.
This parameter is used to get several sequence values at once from the GTM. This greatly speeds up COPY and INSERT SELECT operations where the target table uses sequences. Postgres-XL will not use this entire amount at once, but will increase the request size over time if many requests are done in a short time frame in the same session. After a short time without any sequence requests, decreases back down to 1. Note that any settings here are overriden if the CACHE clause was used in CREATE SEQUENCE or ALTER SEQUENCE.
This is the maximum number of Coordinators that can be configured in the cluster. Specify exact number if it is not planned to add more Coordinators while cluster is running, or greater, if it is desired to dynamically resize cluster. It costs about 140 bytes of shared memory per slot.
This is the maximum number of Datanodes configured in the cluster. Specify exact number if it is not planned to add more Datanodes while cluster is running, or greater, if it is desired to dynamically resize cluster. It costs about 140 bytes of shared memory per slot.
Enforce the usage of two-phase commit on transactions involving ON COMMIT actions or temporary objects. Usage of autocommit instead of two-phase commit may break data consistency so use at your own risk.
Now you can start central component of Postgres-XL, Datanode and Coordinator. If you're familiar with starting PostgreSQL database server, this step is very similar to PostgreSQL.
You can start a Datanode as follows:
$
postgres --datanode -D /usr/local/pgsql/data
--datanode
specifies postgres
should run as a
Datanode. You may need to specify -i
postgres
to
accept connection from TCP/IP connections or edit pg_hba.conf
if cluster uses nodes among several servers.
You can start a Coordinator as follows:
$
postgres --coordinator -D /usr/local/pgsql/Datanode
--coordinator
specifies postgres
should run as a
Coordinator. You may need to specify -i
postgres
to
accept connection from TCP/IP connections or edit pg_hba.conf
if cluster uses nodes among several servers.
There are several common reasons the server might fail to start. Check the server's log file, or start it by hand (without redirecting standard output or standard error) and see what error messages appear. Below we explain some of the most common error messages in more detail.
LOG: could not bind IPv4 address "127.0.0.1": Address already in use HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry. FATAL: could not create any TCP/IP sockets
This usually means just what it suggests: you tried to start
another server on the same port where one is already running.
However, if the kernel error message is not Address
already in use
or some variant of that, there might
be a different problem. For example, trying to start a server
on a reserved port number might draw something like:
$ postgres -p 666
LOG: could not bind IPv4 address "127.0.0.1": Permission denied
HINT: Is another postmaster already running on port 666? If not, wait a few seconds and retry.
FATAL: could not create any TCP/IP sockets
A message like:
FATAL: could not create shared memory segment: Invalid argument DETAIL: Failed system call was shmget(key=5440001, size=4011376640, 03600).
probably means your kernel's limit on the size of shared memory is smaller than the work area PostgreSQL is trying to create (4011376640 bytes in this example). Or it could mean that you do not have System-V-style shared memory support configured into your kernel at all. As a temporary workaround, you can try starting the server with a smaller-than-normal number of buffers (shared_buffers). You will eventually want to reconfigure your kernel to increase the allowed shared memory size. You might also see this message when trying to start multiple servers on the same machine, if their total space requested exceeds the kernel limit.
An error like:
FATAL: could not create semaphores: No space left on device DETAIL: Failed system call was semget(5440126, 17, 03600).
does not mean you've run out of disk space. It means your kernel's limit on the number of System V semaphores is smaller than the number PostgreSQL wants to create. As above, you might be able to work around the problem by starting the server with a reduced number of allowed connections (max_connections), but you'll eventually want to increase the kernel limit.
If you get an “illegal system call” error, it is likely that shared memory or semaphores are not supported in your kernel at all. In that case your only option is to reconfigure the kernel to enable these features.
Details about configuring System V IPC facilities are given in Section 18.4.1.
Although the error conditions possible on the client side are quite varied and application-dependent, a few of them might be directly related to how the server was started. Conditions other than those shown below should be documented with the respective client application.
psql: could not connect to server: Connection refused Is the server running on host "server.joe.com" and accepting TCP/IP connections on port 5432?
This is the generic “I couldn't find a server to talk to” failure. It looks like the above when TCP/IP communication is attempted. A common mistake is to forget to configure the server to allow TCP/IP connections.
Alternatively, you'll get this when attempting Unix-domain socket communication to a local server:
psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
The last line is useful in verifying that the client is trying to
connect to the right place. If there is in fact no server
running there, the kernel error message will typically be either
Connection refused
or
No such file or directory
, as
illustrated. (It is important to realize that
Connection refused
in this context
does not mean that the server got your
connection request and rejected it. That case will produce a
different message, as shown in Section 20.4.) Other error messages
such as Connection timed out
might
indicate more fundamental problems, like lack of network
connectivity.