Chapter 9. Database Server

Table of Contents

Introduction
PostgreSQL
Architecture
Deployment and uses
MySQL
Architecture
Deployment and uses
User management
Resources

Introduction

Databases are a prime subject for storing structured data. Be it as a backend for an LDAP system or DNS system, or as a structured storage for user data, or your most vital company secrets - databases offer a standardized method for storing and retrieving data. Thanks to wide driver support, many tools can interact with several database types (Oracle, SQL Server, PostgreSQL, MySQL, ...).

Databases are often, from an operating system point of view, not all that complicated: a few processes and some files are accessed. However, internally, this is an entire different story. Databases are a world by itself - you will need trained professionals to properly manage your database systems (DBAs, DataBase Administrators) who know the internals of the database product and can tune the database to react swiftly to the requests launched against it.

In this section, we focus on the setup of PostgreSQL and MySQL.

PostgreSQL

Although (somewhat) less popular than MySQL, the PostgreSQL database is in the author's (humble) opinion a more stable and successful database platform. It is gaining momentum with vendors and has commercial backing by several companies - just in case you are in dire need for paying for software ;-)

Architecture

Basically, postgresql is an almost monolithic system from a Unix/Linux point of view: not that many processes are involved. Of course, internally, that is a completely different thing.

Let's first look at how we would position the database in the reference architecture used in this book.

Figure 9.1. Load balanced setup

Load balanced setup

What we are aiming for is a high-available PostgreSQL solution: two (or more) PostgreSQL databases of which one is a master (we are not going to look for multi-master setups, even though they do exist) and where at least one target is synchronously replicated towards. The synchronous replication is needed to ensure that, in case of system failure, the standby database already holds the last successful commit and can be made master easily.

In front of the database setup, a load balancer is placed which has two main functions:

  • distribute read-only queries to the standby systems as well, lowering the load on the master database

  • provide seamless fail-over towards the standby database so applications do not notice the (short) interruption on the database service

To accomplish this, we will set up two PostgreSQL systems with hot standby, and a pgpool load balancer in front of it.

Feeds and flows - backup

There are two main flows identified in our architecture: backup and replication. Let's first take a look at the backup on.

Backups should be stored somewhere outside the direct vicinity of the database (different site even). We will use the pg_rman application and use an NFS mount to place the backup catalog (metadata about backups) and backed up files.

Figure 9.2. Backup architecture for a PostgreSQL setup

Backup architecture for a PostgreSQL setup

Feeds and flows - Replication

Next to the backup/restore, we also define standby databases. In this case, the needed WAL logs (translation logs of PostgreSQL) are shipped towards the standby systems.

Figure 9.3. Standby setups

Standby setups

In the depicted scenario, a standby server is receiving the archive logs and, depending of the type of standby,

  • waits (keeps logs on the file system) until the standby is started (for instance due to a fail-over), or

  • immediately applies the received WAL logs

If a standby waits, it is called a warm standby. If it immediately applies the received WAL logs, it is a hot standby. In case of a hot standby, the database is also open for read-only access. There are also methods for creating multi-master PostgreSQL clusters. In our presented architecture, the hot standby even has to signal to the master that it has received and applied the log before the commit is returned as successful to the client.

Of course, if you do not need this kind of protection on the database, it is best left disabled as it incurs a performance penalty.

Administration

PostgreSQL administration is mostly done using the psql command and related pg_* commands.

Monitoring

When monitoring PostgreSQL, we should focus on process monitoring and log file monitoring.

Operations

When you are responsible for managing PostgreSQL systems, then you will need to understand the internal architecture of PostgreSQL.

Figure 9.4. Internal architecture for PostgreSQL

Internal architecture for PostgreSQL

The master postgres server process listens on the PostgreSQL port, waiting for incoming connections. Any incoming connection is first checked against the settings in the Host Based Access configuration file (pg_hba.conf). Only if this list allows the connection will it go through towards the postgres database, where the rights granted to the user then define what access is allowed.


User management

PostgreSQL supports a wide number of authentication methods, including PAM support (Pluggable Authentication Modules), GSSAPI (including the Kerberos one), RADIUS, LDAP, etc. However, it is important to know that this is only regarding authentication, not authorization. In other words, you still need to create the roles (users) in the PostgreSQL server and grant them whatever they need.

Central user management in this case ensures that, if a person leaves the organization, his account can be immediately removed (or locked) so that authenticating as this user against the databases is no longer possible. However, you will still need a method for cleaning up the role definitions in the databases.

Security

Deployment and uses

Manual installation of the database server

We start by installing the dev-db/postgresql-server package on the system.

# emerge dev-db/postgresql-server

Next, edit the /etc/conf.d/postgresql-9.1 file to accommodate the settings of the cluster.

PGDATA="/etc/postgresql-9.1"
DATA_DIR="/var/lib/postgresql/9.1/data"
PG_INITDB_OPTS="--locale=en_US.UTF-8"

Now, create the cluster: temporarily assign a password to the postgres user (it will be asked during the configuration), and afterwards lock the account again.

# passwd postgres
# emerge --config dev-db/postgresql-server:9.1
# passwd -l postgres
# restorecon -Rv /var/lib/postgresql/9.1/data

To secure the cluster, we need to edit its configuration files before starting.

Let's first make sure that we can connect to the database remotely too (by default, it only works locally). Edit the postgresql.conf file (in /etc/postgresql-9.1) and set the listen_addresses to the interface(s) you want the database to be reachable on.

listen_addresses=::1 2001:db8:81::204:de89:3312

Next, edit pg_hba.conf to allow remote connections for IPv6:

host    all     all      2001:db8:81::1/96    md5

Start the cluster and set the admin password:

# rc-service postgresql-9.1 start
# psql -U postgres
postgres=# \password
postgres=# \q

With the password set, change the method in pg_hba.conf for the local (socket-based) connections from trust to password and reload the service

# rc-service postgresql-9.1 reload

MySQL

The MySQL database platform is a very popular database, especially for web-based applications. Many software titles support MySQL as a database backend natively, so it makes sense to provide support for MySQL in our reference architecture as well.

Architecture

Deployment and uses

The initial setup of MySQL on Gentoo is a breeze. Install the dev-db/mysql package on your system and you're good to go. Next, make sure that the following entries are set in your /etc/mysql/my.cnf file:

  • local-infile=0 (in the [mysqld] section) to disable loading data from a local file from within MySQL

Furthermore, secure the environment through the following actions:

  • Look in all users' home directories for a file called .mysql_history, and remove it. Next, create a symlink to /dev/null called .mysql_history. This is because the history file contains the history of all commands executed, which can include sensitive information.

Now, create the initial database with mysql_install_db and start the database:

# rc-service mysql start

Then, call mysql_secure_installation to properly secure the database further.

User management

Currently, MySQL uses internal tables for managing user accounts. Work is on the way to support other authentication services, like LDAP, using the MySQL plugin support. But for now, let's stick with the internal user tables.

For database users, I always tend to make distinction between (personal) accounts and (functional) application accounts.

  • Personal accounts are tied to a single person, and might even have DBA (or at least more) privileges

  • Functional application accounts are tied to an application server (or set of servers) and are used to access the database data

Making the distinction is pretty much for security reasons, but also because functional accounts might be better off with a different authentication approach.

Authentication methods

By default, MySQL uses password-based authentication (with the hashes of the database stored in the mysql.user table). The field is a SHA1 checksum of the binary representation of the SHA1 checksum of the password. For instance, if the password is "password", then you can generate the hash from the command line as follows:

$ echo -n password | openssl dgst -sha1 -binary | sha1sum

Or you can use the mysql function:

mysql> SELECT PASSWORD('password');

Knowing how the hash works also allows you to manage the hashes remotely if you want.

Resources

For more information, many excellent resources are available on the wide Internet.

  • pg_rman, the backup/restore utility used in this chapter