1.2. Architectural Fundamentals

Before we proceed, you should understand the basic PostgreSQL system architecture. Understanding how the parts of PostgreSQL interact will make this chapter somewhat clearer.

Postgres-XL, in short, is a collection of PostgreSQL database clusters which act as if the whole collection is a single database cluster. Based on your database design, each table is replicated or distributed among member databases.

To provide this capability, Postgres-XL is composed of three major components called the GTM, Coordinator and Datanode. The GTM is responsible to provide ACID property of transactions. The Datanode stores table data and handle SQL statements locally. The Coordinator handles each SQL statements from applications, determines which Datanode to go, and sends plans on to the appropriate Datanodes.

You usually should run GTM on a separate server because GTM has to take care of transaction requirements from all the Coordinators and Datanodes. To group multiple requests and responses from Coordinator and Datanode processes running on the same server, you can configure GTM-Proxy. GTM-Proxy reduces the number of interactions and the amount of data to GTM. GTM-Proxy also helps handle GTM failures.

It is often good practice to run both Coordinator and Datanode on the same server because we don't have to worry about workload balance between the two, and you can often get at data from replicated tables locally without sending an additional request out on the network. You can have any number of servers where these two components are running. Because both Coordinator and Datanode are essentially PostgreSQL instances, you should configure them to avoid resource conflict. It is very important to assign them different working directories and port numbers.

Postgres-XL allows multiple Coordinators to accept statements from applications independently but in an integrated way. Any writes from any Coordinator is available from any other Coordinators. They acts as if they are single database. The Coordinator's role is to accept statements, find what Datanodes are involved, send query plans on to the appropriate Datanodes if needed, collect the results and write them back to applications.

The Coordinator does not store any user data. It stores only catalog data to determine how to process statements, where the target Datanodes are, among others. Therefore, you don't have to worry about Coordinator failure much. When the Coordinator fails, you can just switch to the other one.

The GTM could be single point of failure (SPOF). To prevent this, you can run another GTM as a GTM-Standby to backup GTM's status. When GTM fails, GTM-Proxy can switch to the standby on the fly. This will be described in detail in high-availability sections.

As described above, the Coordinators and Datanodes of Postgres-XL are essentially PostgreSQL database servers. In database jargon, PostgreSQL uses a client/server model. A PostgreSQL session consists of the following cooperating processes (programs):

As is typical of client/server applications, the client and the server can be on different hosts. In that case they communicate over a TCP/IP network connection. You should keep this in mind, because the files that can be accessed on a client machine might not be accessible (or might only be accessible using a different file name) on the database server machine.

The PostgreSQL server can handle multiple concurrent connections from clients. To achieve this it starts (forks) a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. Thus, the master server process is always running, waiting for client connections, whereas client and associated server processes come and go. (All of this is of course invisible to the user. We only mention it here for completeness.)