What is Postgres-XL?

In short

Postgres-XL is an open source project to provide both write-scalability and massively parallel processing transparently to PostgreSQL. It is a collection of tightly coupled database components which can be installed on more than one system or virtual machine.

Write-scalable means Postgres-XL can be configured with as many database servers as you want and handle many more writes (updating SQL statements) than a single standalone database server could otherwise do. You can have more than one database server that provides a single database view. Any database update from any database server is immediately visible to any other transactions running on different servers. Transparent means you do not necessarily need to worry about how your data is stored in more than one database servers internally. [1]

You can configure Postgres-XL to run on more than one machine. It stores your data in a distributed way, that is, partitioned or replicated depending on what is chosen for each table. [2] When you issue queries, Postgres-XL determines where the target data is stored and dispatches corresponding plans to the servers containing the target data.

In typical web systems, you can have as many web servers or application servers to handle your transactions. However, you cannot do this for a database server in general because all the changing data have to be visible to all the transactions. Unlike other database cluster solutions, Postgres-XL provides this capability. You can install as many database servers as you like. Each database server provides uniform data view to your applications. Any database update from any server is immediately visible to applications connecting the database from other servers. This is one of the most important features of Postgres-XL.

The other significant feature of Postgres-XL is MPP parallelism. You can use Postgres-XL to handle workloads for Business Intelligence, Data Warehousing, or Big Data. In Postgres-XL, a plan is generated once on a coordinator, and sent down to the individual data nodes. This is then executed, with the data nodes communicating directly with one another, where each understands from where it is expected to receive any tuples that it needs to ship, and where it needs to send to others.

Postgres-XL's Goal

The ultimate goal of Postgres-XL is to provide database scalability with ACID consistency across all types of database workloads. That is, Postgres-XL should provide the following features:

Postgres-XL Key Components

In this section, we will describe the main components of Postgres-XL.

Postgres-XL is composed of three major components: the GTM (Global Transaction Manager), the Coordinator and the Datanode. Their features are given in the following sections.

GTM (Global Transaction Manager)

The GTM is a key component of Postgres-XL to provide consistent transaction management and tuple visibility control.

As described later in this manual, PostgreSQL's transaction management is based upon MVCC (Multi-Version Concurrency Control) technology. Postgres-XL extracts this technology into separate component such as the GTM so that any Postgres-XL component's transaction management is based upon single global status. Details will be described in Chapter 48.

Coordinator

The Coordinator is an interface to the database for applications. It acts like a conventional PostgreSQL backend process, however the Coordinator does not store any actual data. The actual data is stored by the Datanodes as described below. The Coordinator receives SQL statements, gets Global Transaction Id and Global Snapshots as needed, determines which Datanodes are involved and asks them to execute (a part of) statement. When issuing statement to Datanodes, it is associated with GXID and Global Snapshot so that Multi-version Concurrency Control (MVCC) properties extend cluster-wide.

Datanode

The Datanode actually stores user data. Tables may be distributed among Datanodes, or replicated to all the Datanodes. The Datanode does not have a global view of the whole database, it just takes care of locally stored data. Incoming statements are examined by the Coordinator as described next, and subplans are made. These are then transferred to each Datanode involved together with a GXID and Global Snapshot as needed. The datanode may receive request from various Coordinators in separate sessions. However, because each transaction is identified uniquely and associated with a consistent (global) snapshot, each Datanode can properly execute in its transaction and snapshot context.

Postgres-XL Inherits From PostgreSQL

Postgres-XL is an extension to PostgreSQL and inherits most of its features.

It is an open-source descendant of PostgreSQL and its original Berkeley code. It supports a large part of the SQL standard and offers many modern features:

Also, similar to PostgreSQL, Postgres-XL can be extended by the user in many ways, for example by adding new

Postgres-XL can be used, modified, and distributed by anyone free of charge for any purpose, be it private, commercial, or academic, provided it adheres to the PostgreSQL License.

Notes

[1]

Of course, you should use the information about how tables are stored internally when you design the database physically to get most from Postgres-XL.

[2]

To distinguish from PostgreSQL's native partitioning, we refer to this as "distribution". In distributed database textbooks, this is often referred to as a "horizontal fragment", and more recently, sharding.

[3]

Postgres-XL's foreign key usage has some restrictions. For details, see CREATE TABLE.

[4]

Postgres-XL does not support triggers in the current version. This may be supported in future releases.