Before we proceed, you should understand the basic PostgreSQL system architecture. Understanding how the parts of PostgreSQL interact will make this chapter somewhat clearer.
Postgres-XL, in short, is a collection of PostgreSQL database clusters which act as if the whole collection is a single database cluster. Based on your database design, each table is replicated or distributed among member databases.
To provide this capability, Postgres-XL is composed of three major components called the GTM, Coordinator and Datanode. The GTM is responsible to provide ACID property of transactions. The Datanode stores table data and handle SQL statements locally. The Coordinator handles each SQL statements from applications, determines which Datanode to go, and sends plans on to the appropriate Datanodes.
You usually should run GTM on a separate server because GTM has to take care of transaction requirements from all the Coordinators and Datanodes. To group multiple requests and responses from Coordinator and Datanode processes running on the same server, you can configure GTM-Proxy. GTM-Proxy reduces the number of interactions and the amount of data to GTM. GTM-Proxy also helps handle GTM failures.
It is often good practice to run both Coordinator and Datanode on the same server because we don't have to worry about workload balance between the two, and you can often get at data from replicated tables locally without sending an additional request out on the network. You can have any number of servers where these two components are running. Because both Coordinator and Datanode are essentially PostgreSQL instances, you should configure them to avoid resource conflict. It is very important to assign them different working directories and port numbers.
Postgres-XL allows multiple Coordinators to accept statements from applications independently but in an integrated way. Any writes from any Coordinator is available from any other Coordinators. They acts as if they are single database. The Coordinator's role is to accept statements, find what Datanodes are involved, send query plans on to the appropriate Datanodes if needed, collect the results and write them back to applications.
The Coordinator does not store any user data. It stores only catalog data to determine how to process statements, where the target Datanodes are, among others. Therefore, you don't have to worry about Coordinator failure much. When the Coordinator fails, you can just switch to the other one.
The GTM could be single point of failure (SPOF). To prevent this, you can run another GTM as a GTM-Standby to backup GTM's status. When GTM fails, GTM-Proxy can switch to the standby on the fly. This will be described in detail in high-availability sections.
As described above, the Coordinators and Datanodes of Postgres-XL are essentially PostgreSQL database servers. In database jargon, PostgreSQL uses a client/server model. A PostgreSQL session consists of the following cooperating processes (programs):
A server process, which manages the database files, accepts
connections to the database from client applications, and
performs database actions on behalf of the clients. The
database server program is called
postgres
.
The user's client (frontend) application that wants to perform database operations. Client applications can be very diverse in nature: a client could be a text-oriented tool, a graphical application, a web server that accesses the database to display web pages, or a specialized database maintenance tool. Some client applications are supplied with the PostgreSQL distribution; most are developed by users.
As is typical of client/server applications, the client and the server can be on different hosts. In that case they communicate over a TCP/IP network connection. You should keep this in mind, because the files that can be accessed on a client machine might not be accessible (or might only be accessible using a different file name) on the database server machine.
The PostgreSQL server can handle
multiple concurrent connections from clients. To achieve this it
starts (“forks”) a new process for each connection.
From that point on, the client and the new server process
communicate without intervention by the original
postgres
process. Thus, the
master server process is always running, waiting for
client connections, whereas client and associated server processes
come and go. (All of this is of course invisible to the user. We
only mention it here for completeness.)