Chapter 11. Configuration management with git and Puppet

Table of Contents

Introduction
Central configuration management, or federated?
About Puppet
About Git
Git with gitolite
Architecture
Using gitolite
Puppet
Architecture
Setting up puppet master
Setting up puppet clients
Working with Puppet
The power of Puppets definitions
Resources

Introduction

When working in a larger environment, using a central configuration management environment allows you to quickly rebuild vital systems or create copies, as well as keep track of your inventory and services. By centralizing all information, decisions and impact analysis can be done quickly and efficiently. In this chapter, we will focus on using Puppet within the reference architecture.

Warning

TODO only use non-managed mode.

Servers pull in data (git pull) on scheduled times (job scheduler).

Content teams provide their modules through a puppet module server, which admin teams use. Admin teams use a single repository (or multiple if you want, then the servers will pull from "their" repository). As such, no need to combine systems.

Technologies include gitosis & puppet

Perhaps use TXT record for repository source per server?

Central configuration management, or federated?

Configuration management in a larger environment is not something that is easy to implement. It is a combination of deployment information, instance configuration management, historical information, parameter relations and more. In this reference architecture, the approach in the next figure is used.

Figure 11.1. Overview for configuration management

Overview for configuration management

In this example, one team could be responsible for JBoss application server components (packages, documentation, third line support) but not have the responsibility for installing and managing JBoss AS. In this situation, this team delivers and manages their components, including an overlay (either for the team or shared with others). The system administrators configure their systems to add this particular overlay as well as configure their systems to properly set up JBoss AS by instantiating the modules developed by the first team on the systems.

Federated repositories

First of all, the reference architecture uses a federated repository approach. Teams with their own responsibility use their own repositories (and managed in the way that is most efficient for those teams). On the figure, there are two "types" of teams represented, but this is only an example:

  • development teams, who are responsible for providing the necessary packages, documentation and more. In case of a Gentoo-only installation environment, those teams will manage overlays (structured sets of ebuilds) for their components.

  • infrastructural teams, who are responsible for the infrastructure servers themselves. These teams manage their servers in their own repositories, which are checkout out on the configuration management hubs. With the use of proper branch names, the configuration management hubs can checkout testing branches and production branches for use on the target systems.

The use of federated repositories allows each team to work on their components in the most flexible manner. It also allows a reasonable access control on the various components: team 4 might not be granted write access on the system configuration for team 3 but can still read it (for its own internal testing), or perhaps even not read it at all.

High-available hubs

The configuration management HUBs are set up in a high-available manner. Because they only contain checkouts of the repositories (and do not act as the master repositories themselves) we can easily scale this architecture. In the picture it is shown as a multi-master (i.e. each HUB manages servers), load-balanced setup. However, other architectures can easily be implemented, such as a HUB for one site (data center) and a HUB for another site (another data center).

The hubs contain the CMDB information needed for deployment. In our reference architecture, this will be Puppet.

Version controlled or not

In many organizations, changes on systems should be version controlled. By using version control repositories (such as Git) this is implemented automatically, but raises another question: how can teams fix a particular setting for one environment? This requirement has to be taken up with the specific solutions (such as Puppet and Git).

About Puppet

Puppet is a free software configuration management tool, written in Ruby and developed under the wings of the Puppet Labs company.

The idea behind Puppet is that administrators describe how a system should look like, and Puppet will devise the strategy to configure the system up to that point. This removes the cludge of re-inventing the order of actions as this is handled by Puppet. The declarative state of a system can also be applied to other systems easily, making it a breeze to create copies of existing systems.

Puppet definition structure

The strength of Puppet lays within its definitions. You have modules that help you define the state of a particular resource (or set of resources), including default values that fit your needs, and you have nodes that instantiate the modules and set specifics. The power lays in the inheritance and overrides that you can define. In this architecture, let's consider the following definition structure:

/etc/puppet/
+- manifests
|  +- site.pp  # Global definition 
|  +- nodes.pp # Sources in the node definitions further down
|  +- nodes
|  |  +- team1
|  |  |  +- nodes.pp
|  |  |  `- ...
|  |  `- teamN
|  `- patterns
|     +- pattern1
|     `- patternN
+- modules
|  +- module1
|  `- moduleN
`- environments
   +- manifests
   |  `- nodes
   +- environment1
   |  +- modules
   |  `- patterns
   `- environmentN

System administration teams have their repository in which they define the state of their systems. This repository is available on the Puppet master site under the /etc/puppet/manifests/nodes/<repo> location. Each repository contains a nodes.pp file that is sourced in from the main nodes.pp at /etc/puppet/manifests. Nodes inherit or include information from patterns (and perhaps modules, but mostly patterns).

Modules are developed independently as their own repositories, and made available under /etc/puppet/modules/<repo>. Modules are seen as building blocks of a particular technology and should have as little dependencies as possible.

Patterns are also a sort-of modules, but they combine the previous modules in well structured classes. For instance, a pattern for a postgresql database server will include the postgresql module, but also a backup module (for instance for bacula), log rotation information, additional users (accounts for third line support), etc.

Finally, environments are specific (tagged) checkouts of the modules and patterns, and are used to provide a release-based approach. Whereas the main location could contain the master branch of all repositories (which is the production-ready branch) the environments can support preproduction checkouts (for instance an environment called "development" and one called "testing") or even post-production checkouts (specific releases). If your organization uses monthly releases, this could be an environment production-201212 denoting the december release in 2012. However, know that the environments also require the node definitions again and that it is the Puppet agent that defines which environment to use.

Alternative: Using packaged releases

If you do not like the use of environments as depicted above, you can also focus on packaged releases: teams still manage their code as they please, but eventually create packages (like ebuilds) which are versioned and can be deployed. This would then lead to packages that install modules with have a version in their name:

/etc/puppet
`- modules
   +- postgresql92-1.0.0
   `- postgresql92-1.0.2

Other packages (like those providing the patterns) then depend on the correct versions. Thanks to dependency support in Gentoo, when no patterns (or nodes) are installed anymore that depend on a particular version, it will be cleaned from the environment. And thanks to SLOT support, multiple deployments of the same package with different versions is supported as well.

In this reference architecture, I will not persue this method - I find it a broken solution for a broken situation: in your architecture, fixing versions leads to outdated systems and slowly progressing information architecture. By leveraging environments, this problem is less prominent.

About Git

Git is a distributed revision control system where developers have a full repository on their workstation, pushing changes to a remote origin location where others can then pull the changes from.

The use of Git is popular in the free software world as it has proven itself fruitful for distributed development (such as with the Linux kernel), which is perfect in our reference architecture.

Git with gitolite

Git by itself is a fairly modest versioning system. It doesn't support access controls out-of-the-box, instead relying on the abilities of other systems (such as those used to provide access to git, like the access control systems of the Linux operating system) to provide this important security feature.

To make access controls easier to implement, as well as the management of the git repositories, various software titles have emerged. In this section, I'll focus on the gitolite software, which provides internal access controls (without needing to provide accounts on Linux operating system level) with SSH keys.

Architecture

Gitolite uses a single operating system user, and abstracts access towards the git repositories it manages. Access towards git is done through SSH.

Because git provides every developer with its own, full repository, we will not setup a high available architecture for git. Instead, we'll rely on its distributed model (and of course frequent backups).

Flows

Only one flow is identified, which is the backup.

Figure 11.2. Git and gitolite flows

Git and gitolite flows

Administration

Gitolite itself is managed through a git repository of its own. Only when things are awkwardly failing on that level as well will you need SSH access to the server.

Figure 11.3. Gitolite administration

Gitolite administration

Monitoring

To monitor git, we'll use a testing repository on the system and a remote git commit. This allows us to make sure that the service is running and available, and might even be updated with some performance metrics (for instance, the time that a push of a small change takes).

Operations

Regular operations is the same as with the administration: users connect to the git server through git (and SSH).

Users

Users are managed through the gitolite repository.

Security

The secure access towards the repositories is handled by gitolite (through the configuration in the gitolite administration repository) and SSH.

Using gitolite

Installing & configuring gitolite

Installing gitolite in Gentoo is a breeze (just like with the majority of other distributions).

# emerge dev-vcs/gitolite

Next, copy the SSH key of the administrator(s) somewhere on the system, and then have the following command ran as the git user (you can use "su -u git -c" or "sudo -H -u git" for this), where /path/to/admin.pub is the public SSH key of the administrator:

git $ gl-setup /path/to/admin.pub

The configuration file of gitolite will be shown. In it, you can configure gitolite.

For instance:

  • set $REPO_UMASK to 0027 to ensure the repositories are not readable (and especially not writeable) by other users on the git server

Finally, have the administrator check out the gitolite administrative repository on his (client) system.

$ git clone git@git.internal.genfic.com:gitolite-admin.git

Managing users

Users are managed through their public SSH keys. Once you obtained a public SSH key for a user, commit it into the keydir/ location (inside the gitolite-admin repository) named after the user (like keydir/username.pub). The filename itself is used to identify the users. If a user needs to be removed, remove the key from the directory and push the changes to the administration repository. The changes will take effect immediately.

Managing repositories

To create a new repository, edit the conf/gitolite.conf file and add in (or remove) the repository, and identify the user or users that are allowed access to the repository. If you remove a repository configuration, you'll need to remove the repository from the Linux host itself as well (which isn't done through the gitolite-admin repository).

For instance, a snippet for a repository "puppet-was" in which the users john, dalia (both admins), jacob and eden have access to:

repo puppet-was
  RW+ = john dalia
  RW  = jacob
  R   = eden

The gitolite documentation referred to at the end of this chapter has more information about the syntax and the abilities, including group support in gitolite.

Puppet

The puppet master hosts the configuration entries for your environment and manages the puppet clients' authentication (through certificates).

Architecture

The puppet architecture is fairly simple, which is also one of its strengths.

Flows

The following diagram shows the flows/feeds that interact with the puppet processes.

Figure 11.4. Flows towards and from puppet

Flows towards and from puppet

The most prominent flow is the one with configuration updates. These updates come from one of the Git repositories and are triggered locally on the puppet master server itself.

Administration

As puppet is an administration tool by itself, it comes to no surprise that the actual administration on puppet is done using the system-specific interactive shells (i.e. through SSH).

Figure 11.5. Puppet administration

Puppet administration

The main administration task in puppet is handling the certificates: system administrators request a certificate through the puppet client. The client connects to the master, sends the signing request, which is then queued. The puppet admin then lists the pending certificate requests and signs those he know are valid. When signed, the system administrator can then retrieve the signed certificate and have it installed on the system (again through the puppet client) from which point the system is known and can be managed by puppet.

Monitoring

When checking on the puppet operations, we need to make sure that

  • the puppet agent is running (or scheduled to run)

  • the puppet agent ran within the last xx minutes (depending on the frequency of the data gathering)

  • the puppet agent did not fail

We might also want to include a check that says that n consecutive polls might not perform changes every time (in other words, the configuration has to be stable after n-1 requests).

Operations

During regular operations, the puppet agent frequently connects to the puppet master, sends all his "facts" (the state of the current system) from which the puppetmaster then devises how to update the system to match the configuration the system should be in.

Figure 11.6. Regular operations of puppet

Regular operations of puppet

The activities are, by default, triggered from the agent. It is possible (and we will do so later) to configure the agent to also listen for incoming connections from the puppet master. This allows administrators to push changes to systems without waiting for the agents to connect to the master.

User management

Puppet does not have specific user management features in it. If you want separate roles, you will need to do so through the file access control mechanisms on the puppet master and/or through the repositories that you use as the configuration repository.

Security

Make sure that no resources can be accessed through Puppet that are otherwise not accessible by unauthorized people. As Puppet includes a (web-based) file server, we need to configure it properly so that unauthorized access is not possible. Luckily, this is the default behavior with a Puppet installation.

Setting up puppet master

Installing puppet master

The puppet master and puppet client itself are both provided through the app-admin/puppet package.

# equery u puppet
[ Legend : U - final flag setting for installation]
[        : I - package is installed with flag     ]
[ Colors : set, unset                             ]
 * Found these USE flags for app-admin/puppet-2.7.18:
 U I
 + + augeas              : Enable augeas support
 - - diff                : Enable diff support
 - - doc                 : Adds extra documentation (API, Javadoc, etc). It is \
                           recommended to enable per package instead of globally
 - - emacs               : Adds support for GNU Emacs
 - - ldap                : Adds LDAP support (Lightweight Directory Access Protocol)
 - - minimal             : Install a very minimal build (disables, for example, \
                           plugins, fonts, most drivers, non-critical features)
 - - rrdtool             : Enable rrdtool support
 + + ruby_targets_ruby18 : Build with MRI Ruby 1.8.x
 - - shadow              : Enable shadow support
 - - sqlite3             : Adds support for sqlite3 - embedded sql database
 - - test                : Workaround to pull in packages needed to run with \
                           FEATURES=test. Portage-2.1.2 handles this internally, \
                           so don't set it in make.conf/package.use anymore
 - - vim-syntax          : Pulls in related vim syntax scripts
 - - xemacs              : Add support for XEmacs

# emerge app-admin/puppet

Next, edit /etc/puppet/puppet.conf and add the following to enable puppetmaster to bind on IPv6:

[master]
    bindaddress="::"

You can then start the puppet master service.

# run_init rc-service puppetmaster start

Configuring as CA

One puppet master needs to be configured as the certificate authority, responsible for handing out and managing the certificates of the various puppet clients.

# cat /etc/puppet/puppet.conf
[main]
  logdir=/var/log/puppet
  rundir=/var/run/puppet
  ssldir=$vardir/ssl
[master]
  bindaddress="::"
  

Configuring as (non-CA) Hub

The remainder of puppet masters need to be configured as a HUB; for these systems, disable CA functionality:

# cat /etc/puppet/puppet.conf
[main]
  logdir=/var/log/puppet
  rundir=/var/run/puppet
  ca_server=puppet-ca.internal.genfic.com
[master]
  bindaddress="::"
  ca=false

Make sure no ssl directory is available.

# rm -rf $(puppet master --configprint ssldir)

Next, request a certificate from the CA for this master. In the --dns_alt_names parameter, specify all possible hostnames (fully qualified and not) that agents might use to connect to this particular master.

# puppet agent --test --dns_alt_names \
    "puppetmaster1.internal.genfic.com,puppet,puppet.internal.genfic.com"

Then, on the CA server, sign the request:

# puppet cert list
# puppet cert sign <new master cert>

Finally, retrieve the signed certificate back on the HUB:

# puppet agent --test

Repeat these steps for every HUB you want to use. You can implement round-robin load balancing by using a round-robin DNS address allocation for the master hostname (such as puppet.internal.genfic.com).

Configuring repositories

As per our initial example, we will need to pull from the repositories. Assuming that the git repositories are available at git.internal.genfic.com, we could do the following:

# cd /etc/puppet/
# git clone git://git.internal.genfic.com/puppet/manifests.git
# git clone git://git.internal.genfic.com/puppet/modules.git
# git clone git://git.internal.genfic.com/puppet/patterns.git

Of course, you can use nested repositories as well. For instance, if you have several administration teams, then inside the manifests repository a directory nodes would be available that is ignored by git (through the .gitignore file). As a result, anything inside that directory is not managed by the manifests.git project. So what we do is pull in the repositories of the various teams:

# cd /etc/puppet/manifests/nodes
# git clone git://git.internal.genfic.com/teams/team1.git
# git clone git://git.internal.genfic.com/teams/team2.git

With the environments, a similar setup is used, but once cloned we check out a specific branch. For instance, for development environment, this would be the development branch:

# cd /etc/puppet/environments
# git clone --branch development git://git.internal.genfic.com/puppet/patterns.git

Because we just pull in changes as they come along, mastership of the data is within the git repositories. Given proper policies in place, you can easily have a simple script available that is invoked by cron to update the various repositories.

Adding additional modules

Puppet supports downloading and installing additional puppet modules from the Puppet forge location. To do so, you can use puppet module install <name>. For instance, to install the inkling/postgresql module:

# puppet module install inkling/postgresql
Preparing to install into /etc/puppet/modules ...
Downloading from http://forge.puppetlabs.com ...
Installing -- do not interrupt ...
/etc/puppet/modules
└─┬ inkling-postgresql (v0.3.0)
  ├── puppetlabs-firewall (v0.0.4)
  └── puppetlabs-stdlib (v3.2.0)

If you setup environments, you can even have a particular version of a module installed inside a specific environment by specifying it through --environment <yourenv>.

Setting up puppet clients

Installing puppet client

The puppet client, just like the master, is provided by the app-admin/puppet package. During the installation, portage will also install augeas, which is a tool that abstracts configuration syntax and allows simple, automated changes on configuration files. Once installed, you can start the puppet client service:

# run_init rc-service puppet start

When started, the puppet client will try to connect to the server with hostname puppet. If the puppet master is hosted on a server with a different hostname, edit the /etc/puppet/puppet.conf file and add in a server= entry inside the [agent] section. As we are using a load-balanced setup, we need to set a dedicated location for the certificate handling of the clients - only one server can act as the certificate authority. We can point to this server using the ca_server parameter:

# cat /etc/puppet/puppet.conf
[main]
  logdir=/var/log/puppet
  rundir=/var/run/puppet
  ssldir=$vardir/ssl

[agent]
  classfile=$vardir/classes.txt
  localconfig=$vardir/localconfig
  listen=true
  ca_server=puppet-ca.internal.genfic.com

You probably notice we also added the listen=true directive. This allows the puppetmaster to connect to the puppet clients as well (by default, the puppet clients connect to the master themselves). This can be interesting if you want to push changes to particular systems without waiting for the standard refresh period.

Now tell the client to create a certificate and send the signing request to the puppet master:

# puppet agent --test

On the puppet master, the certificate request is now pending. You can see the list of certificates with puppet cert --list. Sign the certificate if you know it is indeed a valid request.

# puppet cert list
  "pg_db1.internal.genfic.com" (23:A5:2F:99:65:60:12:32:00:CA:FE:7F:35:2F:E2:3A)
# puppet cert sign "pg_db1.internal.genfic.com"
notice: Signed certificate request for pg_db1.internal.genfic.com
notice: Removing file Puppet::SSL::CertificateRequest pg_db1.internal.genfic.com at '/var/lib/puppet/ssl/ca/requests/pg_db1.internal.genfic.com.pem'

Once the request is signed, you can retrieve the certificate using the puppet agent command again.

# puppet agent

Configuring access

The SELinux policy loaded does not, by default, allow puppet to manage each and every file on the system. If you want this, you need to enable the puppet_manage_all_files boolean.

Working with Puppet

Learning the facts

When you are on a puppet-managed system, you can run facter to get an overview of all the facts that it found on the system. For instance, to get information on addresses:

# facter | grep address
ipaddress => 192.168.100.152
ipaddress6 => 2001:db8:81:22:0:d8:e8fc:a2dc
macaddress => 36:5b:94:e1:eb:0e

Not using daemon

If you do not want to use the puppet (client) daemon, you can run puppet from cron easily. Just have puppet agent ran with the frequency you need.

Not using a puppet master

You can even use Puppet without a puppet master. In that case, the local system will need access to the configuration repository (which can be a read-only NFS mount or a local checkout of a repository).

In such a situation, you run puppet apply:

# puppet apply --modulepath /path/to/modules /path/to/manifests/site.pp

If you want more information of all changes that are made, you can ask puppet to log (using -l logfile) or print it out on screen more (using --verbose).

Requesting an immediate check and update

You can ask the puppet (client) daemon to immediately check with the puppet master by sending a SIGUSR1 signal to the daemon (restarting the daemon also works).

# pkill -SIGUSR1 puppetd

If your puppet daemons are running with the "listen=true" setting, then you can tell the puppet master too to connect to the daemon and trigger an immediate check using the "kick" feature:

# puppet kick pg_db1.internal.genfic.com

Logging

Puppet logs its activities by default through the system logger.

# tail -f /var/log/syslog
puppet-master[4946]: Compiled catalog for puppet.internal.genfic.com in \
  environment production in 0.10 seconds
puppet-agent[6195]: (/Stage[main]/Portage/File[make.conf]/content) content \
  changed '{md5}be84feffe82bc2a37ffc721d892ef06a' to '{md5}5050fba3458f8eb120562db10834e0f1'
puppet-agent[6195]: Finished catalog run in 0.34 seconds

The power of Puppets definitions

As I said before, puppets power comes from the ability of describing the state of a system, and letting Puppet decide how to reach that state. In this section, I give you an overview of how that is achieved. Note however that this is far from a crash course on Puppet - there are good resources for that online, and Puppet truly warrants an entire book on itself.

The site.pp file

The first file is puppet's manifests/site.pp file. This file is read by Puppet and thus acts as the starting point for all definitions. A basic definition for our architecture would be to include the patterns and the nodes:

import "patterns/*.pp"
import "nodes/*.pp"

A pattern file

Note

Patterns are specific to our Puppet implementation; the terminology is not used in Puppet by itself.

In a pattern file, we declare the settings as if it was a sort of "template" node for a particular service. You could have patterns for regular systems, for databases, for web application servers, for hardened/stripped servers, etc. The idea is that these patterns are then applied on the particular nodes, allowing for simple and default installations (for instance image-driven) to be quickly transformed into proper deployments.

A pattern described further is just an example for a "general" state. You can build on top of this one, build other patterns that include from this one, etc. Really, patterns just apply wherever you think reuse is good and the setup is somewhat complex (if it was a single thing, it would become a module).

Such a pattern file starts from the next, simple layout (for instance for manifests/patterns/general.pp):

class general {
  # Include your content here
}

Let's first include some modules (basic building blocks) - something we'll cover later:

# Manage our /etc/hosts file
include hosts
# setup portage settings
include myportage

Next, we want to make sure that an eix cache (database) is available on the system. If not, then Puppet might fail to install software itself as it relies on the eix database:

exec { "create_eix_cache":
  command => "/usr/bin/eix-update",
  creates => "/var/cache/eix/portage.eix",
}

In the above example, Puppet will look at the creates parameter, see that it would create the /var/cache/eix/portage.eix file, and then checks on the system if that file exists. If not, then the command referenced by the command parameter will be executed by Puppet.

Let's also say that SSH must be installed (on Gentoo this is by default, but you never know a wiseguy deleted it) and the service must be running. This could be declared as follows:

package { ssh:
  ensure => installed,
  require => Exec['create_eix_cache'],
}

service { sshd:
  ensure => running,
  require => [
    File['/etc/ssh/sshd_config'],
    Package[ssh]
  ],
}

Here, Puppet gets to know that before it can ensure the sshd service is running, the /etc/ssh/sshd_config file must be available (this does mean we need to declare something later that provides this file) and the package as well. And for the package to be installed, Puppet knows it first needs to check if our previously created eix-command has ran.

The possibilities are almost endless. You can handle mount information too, like so:

mount { "/usr/portage/packages":
  ensure => mounted,
  device => "nfs_server.internal.genfic.com:gentoo/packages",
  atboot => true,
  fstype => "nfs4",
  options => "soft,timeo=30",
}

With this setting, Puppet will update the /etc/fstab file and then invoke the mount command.

A node file

When a pattern is made, we can create a node (which is the term used for a target machine that is managed by Puppet).

In a simple form, this could be:

node 'pg1_db.internal.genfic.com' {
  include p_database
}

Here, we just declared that the node with hostname pg1_db.internal.genfic.com uses the pattern definition in p_database.pp. Of course, we could extend this further by including more information. And the real power comes when the pattern uses certain settings (through variables) which are then set on the node level.

A module file

Modules are the building blocks of a Puppet-managed environment. They provide management features for a single, specific technology.

Say you need to manage a postgresql database, then you can search for (and use) a Puppet module for postgresql.

# puppet module search postgresql
Searching http://forge.puppetlabs.com ...
NAME                   DESCRIPTION                            
camptocamp-pgconf      A defined type to manage entries in Postgresql's configuration file
inkling-postgresql     NOTE: Transfer of Ownership
akumria-postgresql     Manage and install Postgresql databases and users
KrisBuytaert-postgres  Puppet Postgres module
puppetlabs-postgresql  Transferred from Inkling

# puppet module install puppetlabs-postgresql
Preparing to install into /etc/puppet/modules ...
Downloading from http://forge.puppetlabs.com ...
Installing -- do not interrupt ...
/etc/puppet/modules
└─┬ puppetlabs-postgresql (v1.0.0)
  ├── puppetlabs-firewall (v0.0.4)
  └── puppetlabs-stdlib (v2.6.0)

You can manage your own, internal forge site where modules can be downloaded from as well.

When you have a module available, you can start using it similar as we did before. On the http://forge.puppetlabs.com site you will also find usage examples for each module.

Resources

A humble list of excellent online resources.

For git and gitolite:

For puppet: