ansible : docker client

This is my first ansible playbook which is used to provision some linux boxes to run as docker platforms. This playbook speeds up the deployment and configuration management considerably, especially since there are 5 boxes at separate sites to manage.

--- #post install configuration for docker use
- hosts: localserver
  remote_user: root 
  become: su
  gather_facts: no
  connection: ssh
  tasks:

  - name: 'selinux permissive'
    lineinfile: dest=/etc/selinux/config regexp="^SELINUX=" line="SELINUX=permissive"

  - name: 'add docker ce repo'
    fetch:    
      src: https://download.docker.com/linux/centos/docker-ce.repo
      dest: /etc/yum.repos.d/
      flat: yes
      fail_on_missing: no

  - name: 'update package list'
    yum: 
      update_cache: yes 
      name: '*' 
      state: latest

  - name: 'add packages'
    yum: 
      name: 
       - epel-release
       - yum-utils
       - device-mapper-persistent-data
       - docker-ce
       - python-pip
      state: latest

  - name: 'install docker-compose'
    pip: >
      name=docker-compose
      state=latest

  - name: 'add centos to docker group'
    user: 
     name: centos
     groups: docker
     append: yes

  - name: 'add daemon.json'
    copy:
      src: /mnt/c/Users/soops/playbooks/dockerServer/daemon.json
      dest: /etc/docker/daemon.json
      owner: root
      group: root
      mode: 0644

  - name: 'enable and restart docker'
    systemd: 
      name: docker
      enabled: yes
      state: restarted
      daemon_reload: yes

  - name: 'stop postfix'
    systemd:
      name: postfix
      enabled: no
      state: stopped

  - name: 'start portainer container'
    docker_container: 
      name: portainer
      state: started
      restart_policy: always
      ports: 9000:9000
      docker_host: unix://var/run/docker.sock
      image: portainer/portainer
      command: --no-auth

  - name: 'website test'
    docker_container:
      name: testWebserver
      state: started
      ports: 80:80
      docker_host: unix://var/run/docker.sock
      image: httpd
      volumes: /home/centos/html/:/usr/local/apache2/htdocs/
      

I learned a couple of things here while hacking this together once I had a friend who uses ansible professionally review it:

  • There’s a module for that. Avoid shell commands like the plague, and look up your command’s corresponding module. Using shell commands relies on sequence, ansible is a declarative structured tool so task statements should be tests of state and stand alone without regard to location in the script.
  • Use -C to test each statement without changing anything on the managed hosts.
  • Use –syntax-check to validate your playbook without executing it.
  • Use pip to install your ansible. I used the apt-get method for my Ubuntu for Windows, and I could only get 2.0.0.2. I kept getting syntax errors for statements that I knew to be true, and after hours of staring at the screen I found out that it meant the module wasn’t supported in the ansible version being used. Installing ansible through pip gave me the current 2.7.6 version.

I need to work on roles, triggers and variables, but this has vastly simplified my server deployment and configuration problem, and given me an added tool to deploy containers that I can try with Jenkins.

Running dhcpd as a container on Mac OS X: the problems

Well, this has been challenging.

My goal was to implement dhcpd on each school site’s time machine server to provide failover and reduce latency. I am using Mac OS X Minis as the server platform, as we invested in them as Time Machine backups for critical staff laptops, and they also serve to manage printer and copier queues for the print accounting service. I hit a couple of road blocks that are forcing me to seriously consider investing a couple of grand in 4 linux boxes and just move the whole container concept back to linux. Here are a couple of the issues I’ve been working through:

  • Docker for Mac is not for production. I am finding out that the app is designed for single user, and doesn’t respond to tinkering with the storage-driver or other options, and I can’t run a swarm using Macs or Windows as nodes, so no docker services or that very cool mesh routing.
  • Running Docker for Mac as a production service. The app requires an interactive window session to run, so each time machine has to be set to automatically log in.
  • Freeing up port 67 & 68 is difficult. I moved the network interface to a static IP, but there were some legacy services from the Server app that kept taking over the ports I to run dhcpd from a container. I have to remove the Dhcp scope reference from the Dhcp service, even though the service was turned off, and I had to disable NetBoot.  This worked fine on OS X 1.10 – 13, but 10.14 will not release control of those ports no matter what I do, even after removing the Server app entirely. May have to take a hammer to it…
  • The container may not be handling all the traffic. I am testing a set up where the site’s router has two ip helper-address directives for each vlan, forwarding layer 2 traffic to the time machine and a central dhcpd server at a remote site, which is running on a trusty snowflake centos 6 installation, a treasured pet for many years. In reviewing the logs, the centos service appears to be handling all of the dhcpd traffic, while the container is only handling a fraction. I haven’t ruled out the time machine entirely, as I need to confirm how the switches and wifi controllers are handling the ip helper-address directives, but it is possible that the high volume of traffic (each site has over 500 devices over 5-7 subnets) may be too much for either the time machine’s interface or the container’s bridge network interface.

So, it has not been as easy as I had thought to implement Docker with Macs as nodes. However, I think I can use this proof of concept to perhaps fund some small linux boxes and build a proper cluster. We’ll see what happens…

Docker Certified Associate Exam – Passed!

After taking the prep course on Linux Academy and practicing on their cloud servers and my trusty raspberry pi cluster, I paid my $200 and took the test. It was more challenging than I thought, and while I passed, I would recommend reviewing the Docker website documentation and doing some additional reading. I just bought Docker Deep Dive so I can’t yet recommend it, but it looks like a useful reference.

So, on to my Jenkins certification!

Building a CI/CD Pipeline to solve a network control issue

As I was working on my automated disaster recovery processes, I realized I had two main problems to solve, one in the cloud and one on premise. The on-premise issue is one of a lack of failover and redundancy when it comes to network control services, specifically dhcpd and radiusd.

The networks of four separate physical sites depend on connectivity of a single server, and relies on a cloud hosted ldap instance, so if the vpn or internet gateway goes offline, then no authenticated access available.

This set up made sense at the time because the critical systems could be managed from a central location by one person on a single machine, rather than investing in hardware that would be distributed and managed by personnel at each site. I had me a special pet server, and with webmin we manually configured and lovingly administered that little snowflake  into one big ol’ single point of failure.

Time to butcher this pet and start farming cattle…

One of the school sites is moving to a new building, so we have an opportunity to rethink how the network is designed as well as integrate more updated technology paradigms such as infrastructure as code, containerization and continuous integration / continuous delivery. By adding Docker to each of the timemachine servers, we would be able to automate deployment of changes to the services (such as new dhcp static ip reservations), as well as add a bit of failover by having ip-helper directives at each router point not only to the local control server, but also to a failover server.

First phase of containerization, move dhcpd and radiusd to containers, then decommission the network controller and the cold standby server (not shown).

The next step will be to integrate a CI server (a Jenkins container with a persistent store) and build pipelines for each package. Changes to configuration will be written to the organization’s CodeCommit repository, and build triggers set up to automatically build and deploy the changes to the local timemachine containers. The container has been deployed, and each of the timemachines are set up as jenkins build agents. Time to build that pipeline…

Adding the time machines as jenkins build slaves was very easy, once I had pushed the ssh key to each of the clients and added the key to my Jenkins credentials. Under the Jenkins / Manage Jenkins / Manage Nodes / New Node menu I added the time machines as build slave nodes and launched the agent. I added a distinct label for each node and set the usage to “Only build jobs with label expressions matching this node.” One little wrinkle I did not expect was the need to copy the the .ssh/known_hosts file to the jenkins_home directory in order to use the Host Key Verification Strategy.

As a test pipeline, I will build a freestyle project for the radiusd image that will execute the following command line steps: 

  • pull the current radiusd config from the CodeCommit repository
  • build the radiusd image from a Dockerfile (also in repo)
  • push the image to the CodeCommit images repository
  • stop and delete the current container
  • start a container from the current image

This is a bit rough and ready, as there is no testing structure yet. Once I have the steps built, I’ll see about building a Jenkinsfile that incorporates a test build stage. I haven’t gotten to that part of the class yet 🙂

The next set of challenges will be to containerize the pcounter service, which I haven’t touched in about two years, and to set up ldap replication between local hosted containers and the main ldap server in the cloud, as well as how to figure out the automatic change of dns address in case of instance, tunnel or region failure.

Running radiusd as a container on Mac OS X: the problems

So far, the radiusd experiment is working, but as I try to expand the feature set I am hitting some roadblocks. At this point, I have 4 working instances of radiusd running and responding to radtest commands, but deploying everything is taking a lot of command line intervention.

The process I have so far is as follows:

  • copy archive of working raddb directory to each server running docker
  • decompress archive to same place
  • run the radiusd container in interactive mode, otherwise it shuts down and I can’t get the subsequent commands to work correctly:
docker run -it --name radiusd -p 1812:1812 -p 1813:1813 -p 1812:1812/udp -p 1813:1813/udp  soops/radiusd bash
  • From a new terminal, copy the raddb directory contents to the container
docker cp Desktop/etc/raddb/. radiusd:/etc/raddb
  • Fix the ownership issue
docker exec radiusd chown -R radiusd:radiusd /etc/raddb
  • Start the service
docker exec radiusd service radiusd start
  • Tail the log to see if the requests are coming through
docker exec radiusd tail -f /var/log/radius/radius.log

Definitely the clumsy way to do it. Also, I have a weird situation where one instance logs just fine, while the others do not show anything in the log beyond the initial 3 lines of radius saying it is ready to go. I need to get that figured out, since logging is kinda importante.

I tried to use the awslogs service so that Docker would send my logs to my Cloudwatch, but I can’t get the credentials to be recognized.

docker run -d --name radiusd -p 1812:1812 -p 1813:1813 -p 1812:1812/udp -p 1813:1813/udp --log-driver=awslogs --log-opt awslogs-region=us-west-2 --log-opt awslogs-group=bccsDocker --log-opt awslogs-create-group=true -soops/radiusd

Everything looks like it will work as I get a container ID, and then I get:

docker: Error response from daemon: failed to initialize logging driver: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors.

So far, it looks like an issue with Docker for Mac not exporting the aws credentials effectively. Every time I try to add directives to the daemon.json file via the app Preferences / Daemon window, it breaks the whole install and I have to reset and rebuild my images. So, this one is going to take some experimenting.

The other problem to solve is how to automate the deployment of all of these steps. At this point, there are a couple of ways to handle it:

  • A script that pulls the raddb directory from our private repository, copies it to each server, deletes the old container, rebuilds the new, copies the directory, fixes the perms and starts the service. Workable but highly inefficient.
  • Try to get all of these Mac nodes to work together as a kubernetes cluster or a swarm, and run radiusd as a service. Better, but adds a layer of complexity I am not ready to tackle, just yet, although I’m taking a docker certification course, so it might be worthwhile to try the swarm.
  • Get Jenkins to handle the automation. This looks to be the best bet, so I’ll attack this next.

My First radiusd Container

This service is my first to containerize. At the present time, my school organization’s wifi authenticates via radius, and I host an instance of freeradius on our network control server, along with dhcp. This single point of failure is a not good thing, so radiusd is a good candidate to be converted to a container service that I can run on multiple nodes, perhaps a kubernetes cluster.

The first step was to build my docker image and get it running so it would take authentication requests over ports 1812 & 1813. I have a custom image from my Dockerfile, so I can skip all the install info, which really just this:


yum install -y freeradius freeradius-utils

Also, my work installation is centos6, and a previous attempt to run radiusd on centos7 gave me too much trouble to work out the changes to the configs right now. So, my custom image is based on centos 6.

docker run -it --name myRadiusd -p 1812:1812 -p 1813:1813 -p 1812:1812/udp -p 1813:1813/udp soops/radiusd bash

Next, I backed up to working /etc/raddb folder from my control server and copied it to my local system. I spun up an instance of my radiusd image on my windows system. At first I tried to mount the local raddb folder as a shared volume, but it mapped as world writable and the radiusd service wouldn’t start. I came upon the docker cp command and it did the trick.

docker cp raddb/. myRadiusd:/etc/raddb

This worked swimmingly 🙂 The service started without a hitch, and I had a working copy of my radiusd service from work. Now to test!

As mentioned above, the radiusd image is running on my Windows system, so I will test using the radtest from a container in my pi cluster. I had to install some utilities:

yum install freeradius-utils hostname -y

I also had to install a package so the radtest script’s hostname directive would work.

One more thing to get ready: I added an entry for my test machine in the clients.conf file. I took the ip from ignored attempt entry in the /var/log/radiusd/radiusd.log

client radtest {
ipaddr = 172.17.0.1
secret = test
nastype = other
}

I also added a test user to the /etc/raddb/users file with a super duper secure password…

test            Cleartext-Password := "securePassword"

From the pi-cluster container, I ran radtest:

radtest test securePassword 192.168.0.18 1812 test

I got back a response which, while needs further troubleshooting, shows that the instance is responding.

[root@container-id /]# radtest -t chap test securePassword 192.168.0.18 1812 test
Sent Access-Request Id 216 from 0.0.0.0:45303 to 192.168.0.18:1812 length 75
User-Name = "test"
CHAP-Password = 0x0cd1072b900009f73e66b3e10600b13370
NAS-IP-Address = 172.17.0.3
NAS-Port = 1812
Message-Authenticator = 0x00
Cleartext-Password = "securePassword"
Received Access-Reject Id 216 from 192.168.0.18:1812 to 0.0.0.0:0 length 20
(0) -: Expected Access-Accept got Access-Reject

So, just have to troubleshoot the reject message, but I suspect that has to do with being transplanted into my local network. The important part is that the radiusd instance is responding. Now I need to set it up as an alternate authentication offer at work and give it a try…

My Nagios Installation

This was my second attempt at building a custom docker image with a Dockerfile.  The first was to create a customized centos image that had the basic tools I regularly use, rather than just the standard minimal install you get with the centos/centos:6 image.

FROM centos:6
MAINTAINER soops@ucla.edu
RUN yum update -y

#basic tools to make using docker livable
RUN yum install -y man wget git less netutils net-tools openssh-server openssh-clients initscripts sudo chkconfig tar

#replace systemctl to work around dbus issue
RUN git clone https://github.com/gdraheim/docker-systemctl-images.git
RUN cp docker-systemctl-images/files/docker/systemctl.py /bin/systemctl

This gave me a familiar environment, rather than use systemctl, which is a personal preference. It also allows me to get around the dbus error that happens when I try to start services. I need to understand this part a bit better, but another problem for another time.

I took the nagios-core installation instructions and put them into another Dockerfile:

FROM soops/centosbase
MAINTAINER soops@ucla.edu
from: https://support.nagios.com/kb/article/nagios-core-installing-nagios-core-from-source-96.html#CentOS
WORKDIR /root
RUN yum install -y gcc make unzip glibc glibc-common httpd php gd gd-devel perl postfix
RUN wget -O /root/nagioscore.tar.gz https://github.com/NagiosEnterprises/nagioscore/archive/nagios-4.4.1.tar.gz
RUN ls -al
RUN tar -zxvf /root/nagioscore.tar.gz
WORKDIR /root/nagioscore-nagios-4.4.1
RUN ./configure
RUN make all
RUN make install-groups-users
RUN usermod -a -G nagios apache
RUN make install
RUN make install-daemoninit
RUN chkconfig --level 2345 httpd on
RUN make install-commandmode
RUN make install-config
RUN make install-webconf
adds nagiosadmin:password
RUN echo "nagiosadmin:\$apr1\$k.dy5xrT\$QUbNRLhX01U4tTvt6r8hd1" > /usr/local/nagios/etc/htpasswd.users
these are not working, add service start commands to docker run command?
CMD systemctl restart httpd &
CMD systemctl restart nagios &
plugin installation
WORKDIR /tmp
RUN yum install -y gcc glibc glibc-common make gettext automake autoconf wget openssl-devel net-snmp net-snmp-utils epel-release
RUN yum install -y perl-Net-SNMP
adding dev tools because "no gnu make" error
RUN yum group install "Development Tools"
RUN yum install -y which
RUN wget --no-check-certificate -O nagios-plugins.tar.gz https://github.com/nagios-plugins/nagios-plugins/archive/release-2.2.1.tar.gz
RUN tar zxf nagios-plugins.tar.gz
RUN cd /tmp/nagios-plugins-release-2.2.1/
WORKDIR /tmp/nagios-plugins-release-2.2.1
RUN ./tools/setup
RUN ./configure
RUN make
RUN make install
RUN systemctl start httpd &
RUN systemctl start nagios &
EXPOSE 80

The service httpd start and systemctl start httpd commands don’t work, I think I’ll add the start directives at the end of the docker run command and see if that works better.

It was easier to set a default password for nagiosadmin than figure how to run the password command without being interactive, will have to come back to that one.

I created a local folder and copied out the /usr/local/nagios/etc/ contents into it. Now I can run my container and change the nagios configs without having to rebuild my container. At present, I still have to run it interactively so I can restart the httpd and nagios services when I make changes, but getting closer to a complete solution.

I did also try this image out on my Raspberry Pi cluster, which taught me a valuable lesson: images need to be compiled on the architecture on which they run. You can’t just pull an image compiled on a linux or Windows system and expect it to run on a Pi. The great thing about having all the docker files in github meant it was easy to pull the repo down to the Pi, build new images with an rpi- prefix, and then run them on the cluster. Very cool.

My hard working pi-kubernetes cluster, which I have attached to a wifi enabled powerstrip. So, all I have to do is say “Echo, turn on cluster” and all the magic comes to life!

Still have to figure out how to pull the nagios.cfg and the .htaccess file from my local system to the container (COPY directive), but that should be pretty easy.

Next: My Terraform Scripts

My Jenkins Installation

The goal here is to set up a continuous integration server to control the build and deployment of the disaster recovery steps. If possible, I’d like to get a failure notice from a nagios container, trigger a build in a jenkins container, use a terraform container to trigger the deployment, and have a roll back method that that will back up the dr region resources, tear them down and update the operational regions resources. To do that, I need a Jenkins CI control server.

Should be pretty straight forward, right? There are a couple of steps here that were fun to discover:

  1. Sometimes, the latest jenkins image doesn’t work with some plugins.
    1. First, you can’t just pull jenkins, you have to pull jenkins/jenkins. This was my intro to the concept of docker repos, and you have to be careful to get the correct official package and not pull someone’s customized version.
    2. Not sure why, but when I first used the jenkins/jenkins:latest version, a lot of the plugins were broken. I had to use the :lts version, which worked.  Now the :latest version is working, but it was a good lesson on pulling different tagged versions to find the one that works best.
  2. Using a shared volume for installation file storage can be tricky where a Norton “smart” firewall is concerned. Even though I added the virtualized network to the firewall rules, I still have to stop my firewall for a few minutes when I start a container that uses a shared volume. Once the connection is established, the volume is accessible, but upon restart (which a Windows system loves to do every now and again, as a surprise), it won’t catch without intervention. I hope to figure that out one day, it would be great to have a container auto restart without my having to hold my finger on the firewall.
  3. I created a folder for the jenkins data and started the container with the following command: 
docker run -it --name myJenkins -p 8080:8080 -p 5000:5000 -v c:/Users/soops/jenkins_home:/var/jenkins_home jenkins/jenkins

This command created a container with a custom name, attached ports from the local machine to the container’s exposed ports, and bound the container’s /var/jenkins folder to the local file system’s folder. Once I solve the firewall problem I can recreate the container with –restart always so it will automatically start when the Docker Desktop app is started. The shared folder saves the configuration and local working folders so I don’t have to reinitialize Jenkins each time I start it.

Later I’ll cover adding the necessary plugins and pipelines for my projects.

Next: My Nagios Installation

My Docker Installation

I’ve been a systems admin for more than a decade, but I completely missed the advent of containerization. My old organization could not afford the infrastructure to a virtual environment, so I bought cheap servers and cluttered the data center with them, trying to avoid a single point of failure, patching and loving each of my little snowflakes on a constant and time consuming basis. When I learned about AWS, cloud virtualization was the perfect solution to our budget constraints, as servers were no longer a fixed cost and an inventory asset. However, learning about Docker has taught me to rethink how I deploy services.

My goal here is to use docker as a platform for specific services, rather than install software and configure individual software packages. In addition, the images and dockerfiles can be pushed to repositories and pulled to other docker installations, so the work I do here can be re-purposed to my work scenario.

My journey begins with the install on my local Windows system. I chose to use the Docker for Windows (also called Docker Desktop) application, which let’s me run docker commands from the Windows command line. I also have the ubuntu subsystem on my Windows machine, since I do most of my work at the linux command line, just I chose to not use the ubuntu subsystem (although I do appreciate how Microsoft is supporting linux without having to run a virtualized environment). I’m not sure this was the best decision, but I’d rather not rebuild my entire machine right now 🙂

The biggest issue I have with this set up is the fact that I have yet to successfully allow Docker to connect to my local shared drives without shutting down my Norton’s smart firewall. Either I or the firewall are not all that smart, since I’ve followed the docs and added a rule for Docker, but I always have to turn off my firewall when I start a container that has a shared drive, such as my jenkins instance. It’s rather annoying.

I also have a Docker Desktop set up on my Mac, which doesn’t have the same firewall problems, but I rarely use it.

I set up a github account and a docker.com account so I can publish my dockerfiles and push my images, and also set up installed git, but this time as part of the ubuntu subsystem. I don’t want to spend time navigating a GUI for git, since I really only use about seven git commands on a regular basis. 

  • git clone
  • git pull
  • git add *
  • git status
  • git commit -m “message goes here”
  • git push
  • git  reset –hard HEAD

 I’ve yet to figure out how to make branches work for me, since I am a dev team of one. Problem for another day.

So, I have docker at the command line, a directory for my scripts, and a repo for my images. Ready to rumble!

Here are some docker basics:

  • docker ps -a    –    gives a list of available containers and their status
  • docker rm <name>    –    deletes a docker container. The <name> can be its label or its container id. The automatically generated names can be a bit hilarious.
  • docker images    –    gives a list of pulled images.
  • docker rmi <name>   –    deletes a docker image.
  • docker system prune    –    deletes all containers that are not running.
  • docker run -it <repo/image>    –    start an image.
  • docker run -it –name <myName> -p <local port>:<container port>/<protocol> <repo/image> <shell or command>

Next:  My Jenkins Installation