Table of Contents
Databases are a prime subject for storing structured data. Be it as a backend for an LDAP system or DNS system, or as a structured storage for user data, or your most vital company secrets - databases offer a standardized method for storing and retrieving data. Thanks to wide driver support, many tools can interact with several database types (Oracle, SQL Server, PostgreSQL, MySQL, ...).
Databases are often, from an operating system point of view, not all that complicated: a few processes and some files are accessed. However, internally, this is an entire different story. Databases are a world by itself - you will need trained professionals to properly manage your database systems (DBAs, DataBase Administrators) who know the internals of the database product and can tune the database to react swiftly to the requests launched against it.
In this section, we focus on the setup of PostgreSQL and MySQL.
Although (somewhat) less popular than MySQL, the PostgreSQL database is in the author's (humble) opinion a more stable and successful database platform. It is gaining momentum with vendors and has commercial backing by several companies - just in case you are in dire need for paying for software ;-)
Basically, postgresql is an almost monolithic system from a Unix/Linux point of view: not that many processes are involved. Of course, internally, that is a completely different thing.
Let's first look at how we would position the database in the reference architecture used in this book.
What we are aiming for is a high-available PostgreSQL solution: two (or more) PostgreSQL databases of which one is a master (we are not going to look for multi-master setups, even though they do exist) and where at least one target is synchronously replicated towards. The synchronous replication is needed to ensure that, in case of system failure, the standby database already holds the last successful commit and can be made master easily.
In front of the database setup, a load balancer is placed which has two main functions:
distribute read-only queries to the standby systems as well, lowering the load on the master database
provide seamless fail-over towards the standby database so applications do not notice the (short) interruption on the database service
To accomplish this, we will set up two PostgreSQL systems with hot standby, and a pgpool load balancer in front of it.
There are two main flows identified in our architecture: backup and replication. Let's first take a look at the backup on.
Backups should be stored somewhere outside the direct vicinity of the database (different site even). We will use the pg_rman application and use an NFS mount to place the backup catalog (metadata about backups) and backed up files.
Next to the backup/restore, we also define standby databases. In this case, the needed WAL logs (translation logs of PostgreSQL) are shipped towards the standby systems.
In the depicted scenario, a standby server is receiving the archive logs and, depending of the type of standby,
waits (keeps logs on the file system) until the standby is started (for instance due to a fail-over), or
immediately applies the received WAL logs
If a standby waits, it is called a
standby. If it immediately applies the received WAL logs,
it is a
hot standby. In case of a hot standby,
the database is also open for read-only access. There are also methods
for creating multi-master PostgreSQL clusters. In our presented
architecture, the hot standby even has to signal to the master that it
has received and applied the log before the commit is returned as
successful to the client.
Of course, if you do not need this kind of protection on the database, it is best left disabled as it incurs a performance penalty.
When monitoring PostgreSQL, we should focus on process monitoring and log file monitoring.
When you are responsible for managing PostgreSQL systems, then you will need to understand the internal architecture of PostgreSQL.
Figure 9.4. Internal architecture for PostgreSQL
The master postgres server process listens on the PostgreSQL
port, waiting for incoming connections. Any incoming connection is
first checked against the settings in the Host Based Access
configuration file (
pg_hba.conf). Only if this list allows the connection will it go
through towards the postgres database, where the rights granted to
the user then define what access is allowed.
PostgreSQL supports a wide number of authentication methods, including PAM support (Pluggable Authentication Modules), GSSAPI (including the Kerberos one), RADIUS, LDAP, etc. However, it is important to know that this is only regarding authentication, not authorization. In other words, you still need to create the roles (users) in the PostgreSQL server and grant them whatever they need.
Central user management in this case ensures that, if a person leaves the organization, his account can be immediately removed (or locked) so that authenticating as this user against the databases is no longer possible. However, you will still need a method for cleaning up the role definitions in the databases.
We start by installing the dev-db/postgresql-server package on the system.
# emerge dev-db/postgresql-server
Next, edit the /etc/conf.d/postgresql-9.1 file to accommodate the settings of the cluster.
PGDATA="/etc/postgresql-9.1" DATA_DIR="/var/lib/postgresql/9.1/data" PG_INITDB_OPTS="--locale=en_US.UTF-8"
Now, create the cluster: temporarily assign a password to the postgres user (it will be asked during the configuration), and afterwards lock the account again.
# passwd postgres # emerge --config dev-db/postgresql-server:9.1 # passwd -l postgres # restorecon -Rv /var/lib/postgresql/9.1/data
To secure the cluster, we need to edit its configuration files before starting.
Let's first make sure that we can connect to the database
remotely too (by default, it only works locally). Edit the
postgresql.conf file (in
etc/postgresql-9.1) and set the listen_addresses
to the interface(s) you want the database to be reachable on.
Next, edit pg_hba.conf to allow remote connections for IPv6:
host all all 2001:db8:81::1/96 md5
Start the cluster and set the admin password:
# rc-service postgresql-9.1 start # psql -U postgres postgres=# \password postgres=# \q
With the password set, change the method in pg_hba.conf for the local (socket-based) connections from trust to password and reload the service
# rc-service postgresql-9.1 reload
The MySQL database platform is a very popular database, especially for web-based applications. Many software titles support MySQL as a database backend natively, so it makes sense to provide support for MySQL in our reference architecture as well.
The initial setup of MySQL on Gentoo is a breeze. Install the dev-db/mysql package on your system and you're good to go. Next, make sure that the following entries are set in your /etc/mysql/my.cnf file:
local-infile=0 (in the [mysqld] section) to disable loading data from a local file from within MySQL
Furthermore, secure the environment through the following actions:
Look in all users' home directories for a file called .mysql_history, and remove it. Next, create a symlink to /dev/null called .mysql_history. This is because the history file contains the history of all commands executed, which can include sensitive information.
Now, create the initial database with mysql_install_db and start the database:
# rc-service mysql start
Then, call mysql_secure_installation to properly secure the database further.
Currently, MySQL uses internal tables for managing user accounts. Work is on the way to support other authentication services, like LDAP, using the MySQL plugin support. But for now, let's stick with the internal user tables.
For database users, I always tend to make distinction between (personal) accounts and (functional) application accounts.
Personal accounts are tied to a single person, and might even have DBA (or at least more) privileges
Functional application accounts are tied to an application server (or set of servers) and are used to access the database data
Making the distinction is pretty much for security reasons, but also because functional accounts might be better off with a different authentication approach.
By default, MySQL uses password-based authentication (with the hashes of the database stored in the mysql.user table). The field is a SHA1 checksum of the binary representation of the SHA1 checksum of the password. For instance, if the password is "password", then you can generate the hash from the command line as follows:
$ echo -n password | openssl dgst -sha1 -binary | sha1sum
Or you can use the mysql function:
mysql> SELECT PASSWORD('password');
Knowing how the hash works also allows you to manage the hashes remotely if you want.
For more information, many excellent resources are available on the wide Internet.
pg_rman, the backup/restore utility used in this chapter