My Docker pi swarm

This was a lot of fun to figure out, and served as a test case for the dhcpd container.  At the end of it I have a 7 node raspberry pi cluster that is accessed and controlled from one node over wifi, and the other nodes get their IPs from the dhcpd container.

Step 1: get the pi control node to run a static IP on wifi

My home wifi router is provided by the cable company, and it doesn’t have the ability to reserve IP addresses for specific MAC addresses, so my pi node IPs keep changing. Also, I found you can’t run a dhcpd service in a container while on a host that uses dhcpd as a client, since port 67 & 68 would already be in use (kind of a duh moment, but anyways).

Researching the concept of static IP over wifi on a Pi was surprisingly challenging, but I found a decent link that explained it best:
https://raspberrypi.stackexchange.com/questions/37920/how-do-i-set-up-networking-wifi-static-ip-address/74428#74428

In summary, I needed to:

  • edit my /etc/network/interfaces file to set up the static IPs I would be using for each network segment, the wifi and my eth0 segment for the cluster nodes.
source-directory /etc/network/interfaces.d
auto lo
iface lo inet loopback
iface eth0 inet manual
allow-hotplug wlan0
iface wlan0 inet static
address 192.168.0.38
netmask 255.255.255.0
gateway 192.168.0.1
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
  • disable dhcpd as a client, restart networking and reboot.
sudo systemctl disable dhcpcd
sudo systemctl enable networking
sudo reboot

Step 2: get the dhcpd container working

First, I needed to set a static IP on my eth0 interface to service the node cluster network. I have chosen to use 192.168.100.0/24 for my internal cluster network, and will set 192.168.100.1 as the static IP for eth0. The simplest method to do this is to add the ip to the end of the line in the /etc/cmdline.txt file. Here is mine as an example, which has a few extra options to support kubernetes on the pi. Just add the ip=<your choice> to the end.

dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=PARTUUID=f6795000-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory ip=192.168.100.1

Once I rebooted the pi, the node is available over wifi and has a static IP on the eth0 interface.

Unfortunately, I was not able to get my custom centos container to work effectively on the pi cluster, so I ended up using a container image that another person built: 
https://hub.docker.com/r/prehley/rpi-dhcpd

This image worked out swimmingly. I created a dhcp folder on my pi control node, and inside I write a simple dhcpd.conf with static IP reservations for my pi nodes:

#pi zone
subnet 192.168.100.0 netmask 255.255.255.0 {
allow bootp;
authoritative;
option routers 192.168.100.1;
option domain-name "pizpone";
option domain-name-servers 1.1.1.1, 8.8.8.8;
default-lease-time 3600;
max-lease-time 21600;
range dynamic-bootp 192.168.100.100 192.168.100.110;
host p1 {
hardware ethernet b8:27:eb:c8:cd:01;
fixed-address 192.168.100.91;
option domain-name-servers 1.1.1.1;
}
host p2 {
hardware ethernet b8:27:eb:c8:cd:02;
fixed-address 192.168.100.92;
option domain-name-servers 1.1.1.1;
}
... <truncated>
}

Of course, the above example is truncated to show the pattern. I’ve got six host nodes that will be served by this scope.

I’ll run this container as a local docker container rather than a docker service.

docker run -d --name dhcpd --net=host --restart always -v "$(pwd)/dhcp":/data prehley/rpi-dhcpd

This creates a container that mounts a volume for the config data and restarts if the node is restarted. Once the subordinate nodes are restarted (I have them all powered by a wifi enabled surge protector, so it’s ‘Alexa, turn off cluster’ and ‘Alexa, turn on cluster’, and it comes up automagically! I can ssh into each node vi the control node, and I disable the subordinate nodes’ wifi config by renaming the /etc/wpa_supplicant/wpa_supplicant.conf which breaks the wifi settings. There is likely a better way to do that.

Finally, I added /etc/hosts entries for the nodes in the control node. The nodes where set up for docker before, by installing the docker package and adding the pi user to the docker groups.

Step 3: Set the pi master node to masquerade traffic to the cluster network

I found the relevant steps in this article, but I skipped the dnsmasq part:
https://pimylifeup.com/raspberry-pi-wifi-bridge/

Essentially, I skipped down to step 11 to enabled ip forwarding by editing the /etc/sysctl.conf file and uncommenting net.ipv4.ip_forward=1

Next, I ran the following commands:

sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward"
sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
sudo iptables -A FORWARD -i wlan0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
sudo iptables -A FORWARD -i eth0 -o wlan0 -j ACCEPT
sudo sh -c "iptables-save > /etc/iptables.ipv4.nat"

Finally, I edited the /etc/rc.local file and added this line above the exit 0 line, to re-enable ip forwarding after reboot:

iptables-restore < /etc/iptables.ipv4.nat

Step 4: get the swarm started

Very easy to do. At this point in my exploration, I wonder why kubernetes is the dominant container cluster architecture rather than docker swarm. I imagine it has to do with Google’s influence as the originator of the kubernetes project. At some point, I’ll have to do a comparison, but getting this pi swarm running was far easier than the pi-kubernetes cluster.

First, I had undo my original swarm that was working over the wifi, so I ran ‘docker swarm leave’ on each node, and ‘docker swarm leave –force’ on the master node.

On the master node:

docker swarm init --advertise-addr 192.168.100.1

I copied the output, which is a join command for the slave nodes. I ran this on each node via ssh:

ssh -i myKey.pem pi@p1 'docker swarm join --token TOKEN-example-1234 192.168.100.1:2377'

Once all nodes had been joined to the swarm, it was easy to start a test service:

docker service create –name webtest -p8080:80 –replicas 6 httpd

By setting up the node network to route through the control node’s static IP and then forwarding it out the wifi, each of the slave nodes can pull the images needed to run the service.

This experiment helped me to understand how to run a dhcpd container, which I will next test in my school’s network, as well as how to create a docker swarm on the raspberry pi. Having built a successful swarm, I can attack the issue of running docker machines to create the swarm at work using the time machine servers as well as repurposing the current linux services as docker swarm nodes.

The next step would be to build a service that is more interesting than a basic web service , and access it through a load balancing nginx node. But, that is a problem for another time.

Running radiusd as a container on Mac OS X: the problems

So far, the radiusd experiment is working, but as I try to expand the feature set I am hitting some roadblocks. At this point, I have 4 working instances of radiusd running and responding to radtest commands, but deploying everything is taking a lot of command line intervention.

The process I have so far is as follows:

  • copy archive of working raddb directory to each server running docker
  • decompress archive to same place
  • run the radiusd container in interactive mode, otherwise it shuts down and I can’t get the subsequent commands to work correctly:
docker run -it --name radiusd -p 1812:1812 -p 1813:1813 -p 1812:1812/udp -p 1813:1813/udp  soops/radiusd bash
  • From a new terminal, copy the raddb directory contents to the container
docker cp Desktop/etc/raddb/. radiusd:/etc/raddb
  • Fix the ownership issue
docker exec radiusd chown -R radiusd:radiusd /etc/raddb
  • Start the service
docker exec radiusd service radiusd start
  • Tail the log to see if the requests are coming through
docker exec radiusd tail -f /var/log/radius/radius.log

Definitely the clumsy way to do it. Also, I have a weird situation where one instance logs just fine, while the others do not show anything in the log beyond the initial 3 lines of radius saying it is ready to go. I need to get that figured out, since logging is kinda importante.

I tried to use the awslogs service so that Docker would send my logs to my Cloudwatch, but I can’t get the credentials to be recognized.

docker run -d --name radiusd -p 1812:1812 -p 1813:1813 -p 1812:1812/udp -p 1813:1813/udp --log-driver=awslogs --log-opt awslogs-region=us-west-2 --log-opt awslogs-group=bccsDocker --log-opt awslogs-create-group=true -soops/radiusd

Everything looks like it will work as I get a container ID, and then I get:

docker: Error response from daemon: failed to initialize logging driver: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors.

So far, it looks like an issue with Docker for Mac not exporting the aws credentials effectively. Every time I try to add directives to the daemon.json file via the app Preferences / Daemon window, it breaks the whole install and I have to reset and rebuild my images. So, this one is going to take some experimenting.

The other problem to solve is how to automate the deployment of all of these steps. At this point, there are a couple of ways to handle it:

  • A script that pulls the raddb directory from our private repository, copies it to each server, deletes the old container, rebuilds the new, copies the directory, fixes the perms and starts the service. Workable but highly inefficient.
  • Try to get all of these Mac nodes to work together as a kubernetes cluster or a swarm, and run radiusd as a service. Better, but adds a layer of complexity I am not ready to tackle, just yet, although I’m taking a docker certification course, so it might be worthwhile to try the swarm.
  • Get Jenkins to handle the automation. This looks to be the best bet, so I’ll attack this next.

My First radiusd Container

This service is my first to containerize. At the present time, my school organization’s wifi authenticates via radius, and I host an instance of freeradius on our network control server, along with dhcp. This single point of failure is a not good thing, so radiusd is a good candidate to be converted to a container service that I can run on multiple nodes, perhaps a kubernetes cluster.

The first step was to build my docker image and get it running so it would take authentication requests over ports 1812 & 1813. I have a custom image from my Dockerfile, so I can skip all the install info, which really just this:


yum install -y freeradius freeradius-utils

Also, my work installation is centos6, and a previous attempt to run radiusd on centos7 gave me too much trouble to work out the changes to the configs right now. So, my custom image is based on centos 6.

docker run -it --name myRadiusd -p 1812:1812 -p 1813:1813 -p 1812:1812/udp -p 1813:1813/udp soops/radiusd bash

Next, I backed up to working /etc/raddb folder from my control server and copied it to my local system. I spun up an instance of my radiusd image on my windows system. At first I tried to mount the local raddb folder as a shared volume, but it mapped as world writable and the radiusd service wouldn’t start. I came upon the docker cp command and it did the trick.

docker cp raddb/. myRadiusd:/etc/raddb

This worked swimmingly 🙂 The service started without a hitch, and I had a working copy of my radiusd service from work. Now to test!

As mentioned above, the radiusd image is running on my Windows system, so I will test using the radtest from a container in my pi cluster. I had to install some utilities:

yum install freeradius-utils hostname -y

I also had to install a package so the radtest script’s hostname directive would work.

One more thing to get ready: I added an entry for my test machine in the clients.conf file. I took the ip from ignored attempt entry in the /var/log/radiusd/radiusd.log

client radtest {
ipaddr = 172.17.0.1
secret = test
nastype = other
}

I also added a test user to the /etc/raddb/users file with a super duper secure password…

test            Cleartext-Password := "securePassword"

From the pi-cluster container, I ran radtest:

radtest test securePassword 192.168.0.18 1812 test

I got back a response which, while needs further troubleshooting, shows that the instance is responding.

[root@container-id /]# radtest -t chap test securePassword 192.168.0.18 1812 test
Sent Access-Request Id 216 from 0.0.0.0:45303 to 192.168.0.18:1812 length 75
User-Name = "test"
CHAP-Password = 0x0cd1072b900009f73e66b3e10600b13370
NAS-IP-Address = 172.17.0.3
NAS-Port = 1812
Message-Authenticator = 0x00
Cleartext-Password = "securePassword"
Received Access-Reject Id 216 from 192.168.0.18:1812 to 0.0.0.0:0 length 20
(0) -: Expected Access-Accept got Access-Reject

So, just have to troubleshoot the reject message, but I suspect that has to do with being transplanted into my local network. The important part is that the radiusd instance is responding. Now I need to set it up as an alternate authentication offer at work and give it a try…

Cramming for the DevOps Pro Exam

Since I have a little time off right now, I’m making the most of my Linux Academy subscription by studying all kinds of technologies to help me get into DevOps.

Here’s a link to my profile, all the classes are under Cloud Credentials 🙂

https://linuxacademy.com/profile/show/user/name/linuxlsr

It has been a year since I completed the AWS Architect Professional certification, and it is time to get the DevOps Engineer Professional certification. I’m working through the Linux Academy tutorials, and taking notes. I’m only about 40%  through my first pass, but I expect to take the test in the next 30 to 60 days. I think it will help solidify my understanding of some concepts, and help with the job hunt.  

I’d like to give a big shout out the most active student community advocates out there, Broadus Palmer.  He is always posting on LinkedIn, encouraging us to reach for our goals, and he has started a slack study group for the 100 or so of us trying to get the cert this holiday season. It helps to have friends push you along, and I am most grateful to Broadus.

More content to come, but gotta get back to studying!

My Nagios Installation

This was my second attempt at building a custom docker image with a Dockerfile.  The first was to create a customized centos image that had the basic tools I regularly use, rather than just the standard minimal install you get with the centos/centos:6 image.

FROM centos:6
MAINTAINER soops@ucla.edu
RUN yum update -y

#basic tools to make using docker livable
RUN yum install -y man wget git less netutils net-tools openssh-server openssh-clients initscripts sudo chkconfig tar

#replace systemctl to work around dbus issue
RUN git clone https://github.com/gdraheim/docker-systemctl-images.git
RUN cp docker-systemctl-images/files/docker/systemctl.py /bin/systemctl

This gave me a familiar environment, rather than use systemctl, which is a personal preference. It also allows me to get around the dbus error that happens when I try to start services. I need to understand this part a bit better, but another problem for another time.

I took the nagios-core installation instructions and put them into another Dockerfile:

FROM soops/centosbase
MAINTAINER soops@ucla.edu
from: https://support.nagios.com/kb/article/nagios-core-installing-nagios-core-from-source-96.html#CentOS
WORKDIR /root
RUN yum install -y gcc make unzip glibc glibc-common httpd php gd gd-devel perl postfix
RUN wget -O /root/nagioscore.tar.gz https://github.com/NagiosEnterprises/nagioscore/archive/nagios-4.4.1.tar.gz
RUN ls -al
RUN tar -zxvf /root/nagioscore.tar.gz
WORKDIR /root/nagioscore-nagios-4.4.1
RUN ./configure
RUN make all
RUN make install-groups-users
RUN usermod -a -G nagios apache
RUN make install
RUN make install-daemoninit
RUN chkconfig --level 2345 httpd on
RUN make install-commandmode
RUN make install-config
RUN make install-webconf
adds nagiosadmin:password
RUN echo "nagiosadmin:\$apr1\$k.dy5xrT\$QUbNRLhX01U4tTvt6r8hd1" > /usr/local/nagios/etc/htpasswd.users
these are not working, add service start commands to docker run command?
CMD systemctl restart httpd &
CMD systemctl restart nagios &
plugin installation
WORKDIR /tmp
RUN yum install -y gcc glibc glibc-common make gettext automake autoconf wget openssl-devel net-snmp net-snmp-utils epel-release
RUN yum install -y perl-Net-SNMP
adding dev tools because "no gnu make" error
RUN yum group install "Development Tools"
RUN yum install -y which
RUN wget --no-check-certificate -O nagios-plugins.tar.gz https://github.com/nagios-plugins/nagios-plugins/archive/release-2.2.1.tar.gz
RUN tar zxf nagios-plugins.tar.gz
RUN cd /tmp/nagios-plugins-release-2.2.1/
WORKDIR /tmp/nagios-plugins-release-2.2.1
RUN ./tools/setup
RUN ./configure
RUN make
RUN make install
RUN systemctl start httpd &
RUN systemctl start nagios &
EXPOSE 80

The service httpd start and systemctl start httpd commands don’t work, I think I’ll add the start directives at the end of the docker run command and see if that works better.

It was easier to set a default password for nagiosadmin than figure how to run the password command without being interactive, will have to come back to that one.

I created a local folder and copied out the /usr/local/nagios/etc/ contents into it. Now I can run my container and change the nagios configs without having to rebuild my container. At present, I still have to run it interactively so I can restart the httpd and nagios services when I make changes, but getting closer to a complete solution.

I did also try this image out on my Raspberry Pi cluster, which taught me a valuable lesson: images need to be compiled on the architecture on which they run. You can’t just pull an image compiled on a linux or Windows system and expect it to run on a Pi. The great thing about having all the docker files in github meant it was easy to pull the repo down to the Pi, build new images with an rpi- prefix, and then run them on the cluster. Very cool.

My hard working pi-kubernetes cluster, which I have attached to a wifi enabled powerstrip. So, all I have to do is say “Echo, turn on cluster” and all the magic comes to life!

Still have to figure out how to pull the nagios.cfg and the .htaccess file from my local system to the container (COPY directive), but that should be pretty easy.

Next: My Terraform Scripts

My Jenkins Installation

The goal here is to set up a continuous integration server to control the build and deployment of the disaster recovery steps. If possible, I’d like to get a failure notice from a nagios container, trigger a build in a jenkins container, use a terraform container to trigger the deployment, and have a roll back method that that will back up the dr region resources, tear them down and update the operational regions resources. To do that, I need a Jenkins CI control server.

Should be pretty straight forward, right? There are a couple of steps here that were fun to discover:

  1. Sometimes, the latest jenkins image doesn’t work with some plugins.
    1. First, you can’t just pull jenkins, you have to pull jenkins/jenkins. This was my intro to the concept of docker repos, and you have to be careful to get the correct official package and not pull someone’s customized version.
    2. Not sure why, but when I first used the jenkins/jenkins:latest version, a lot of the plugins were broken. I had to use the :lts version, which worked.  Now the :latest version is working, but it was a good lesson on pulling different tagged versions to find the one that works best.
  2. Using a shared volume for installation file storage can be tricky where a Norton “smart” firewall is concerned. Even though I added the virtualized network to the firewall rules, I still have to stop my firewall for a few minutes when I start a container that uses a shared volume. Once the connection is established, the volume is accessible, but upon restart (which a Windows system loves to do every now and again, as a surprise), it won’t catch without intervention. I hope to figure that out one day, it would be great to have a container auto restart without my having to hold my finger on the firewall.
  3. I created a folder for the jenkins data and started the container with the following command: 
docker run -it --name myJenkins -p 8080:8080 -p 5000:5000 -v c:/Users/soops/jenkins_home:/var/jenkins_home jenkins/jenkins

This command created a container with a custom name, attached ports from the local machine to the container’s exposed ports, and bound the container’s /var/jenkins folder to the local file system’s folder. Once I solve the firewall problem I can recreate the container with –restart always so it will automatically start when the Docker Desktop app is started. The shared folder saves the configuration and local working folders so I don’t have to reinitialize Jenkins each time I start it.

Later I’ll cover adding the necessary plugins and pipelines for my projects.

Next: My Nagios Installation

My Docker Installation

I’ve been a systems admin for more than a decade, but I completely missed the advent of containerization. My old organization could not afford the infrastructure to a virtual environment, so I bought cheap servers and cluttered the data center with them, trying to avoid a single point of failure, patching and loving each of my little snowflakes on a constant and time consuming basis. When I learned about AWS, cloud virtualization was the perfect solution to our budget constraints, as servers were no longer a fixed cost and an inventory asset. However, learning about Docker has taught me to rethink how I deploy services.

My goal here is to use docker as a platform for specific services, rather than install software and configure individual software packages. In addition, the images and dockerfiles can be pushed to repositories and pulled to other docker installations, so the work I do here can be re-purposed to my work scenario.

My journey begins with the install on my local Windows system. I chose to use the Docker for Windows (also called Docker Desktop) application, which let’s me run docker commands from the Windows command line. I also have the ubuntu subsystem on my Windows machine, since I do most of my work at the linux command line, just I chose to not use the ubuntu subsystem (although I do appreciate how Microsoft is supporting linux without having to run a virtualized environment). I’m not sure this was the best decision, but I’d rather not rebuild my entire machine right now 🙂

The biggest issue I have with this set up is the fact that I have yet to successfully allow Docker to connect to my local shared drives without shutting down my Norton’s smart firewall. Either I or the firewall are not all that smart, since I’ve followed the docs and added a rule for Docker, but I always have to turn off my firewall when I start a container that has a shared drive, such as my jenkins instance. It’s rather annoying.

I also have a Docker Desktop set up on my Mac, which doesn’t have the same firewall problems, but I rarely use it.

I set up a github account and a docker.com account so I can publish my dockerfiles and push my images, and also set up installed git, but this time as part of the ubuntu subsystem. I don’t want to spend time navigating a GUI for git, since I really only use about seven git commands on a regular basis. 

  • git clone
  • git pull
  • git add *
  • git status
  • git commit -m “message goes here”
  • git push
  • git  reset –hard HEAD

 I’ve yet to figure out how to make branches work for me, since I am a dev team of one. Problem for another day.

So, I have docker at the command line, a directory for my scripts, and a repo for my images. Ready to rumble!

Here are some docker basics:

  • docker ps -a    –    gives a list of available containers and their status
  • docker rm <name>    –    deletes a docker container. The <name> can be its label or its container id. The automatically generated names can be a bit hilarious.
  • docker images    –    gives a list of pulled images.
  • docker rmi <name>   –    deletes a docker image.
  • docker system prune    –    deletes all containers that are not running.
  • docker run -it <repo/image>    –    start an image.
  • docker run -it –name <myName> -p <local port>:<container port>/<protocol> <repo/image> <shell or command>

Next:  My Jenkins Installation

Automated Disaster Recovery

Time to apply all the stuff I have been learning.

My goal is to plan, develop, test and deploy an automated disaster recovery plan, and implement it twice, once for my personal account as a sanity check, and again in my school’s AWS environment. I’d like to have it auto deploy upon a region failure, and recover once the region is back up. I expect to do this using the following tools:

As I build each piece, I’ll link the write up below, and this site will serve as proof of concept and documentation. 

Here we go!

Hitting the “Books”

The last few months have been interesting and a bit tumultuous. My project ended suddenly, and my contracting agency couldn’t find me anything locally, everything offered meant moving out of state. I got called back for a 30 day contract to continue work on the project that was suddenly cancelled, and right now there is no work on the horizon. So, it is time to sharpen my tools, study some technologies in depth, take my next certification and get ready to find a proper job in the new year. Not having a steady job is making me rather nervous, but I will have faith and study every spare moment. Basically, dawn to dusk glued to Linux Academy 🙂

So far, I have been learning about Docker, Kubernetes, Ansible, Jenkins, Terraform and implementing a CI/CD pipeline. I plan on redeveloping my personal AWS environment into a set of case studies I can show during interviews, as well as implement these tools in my previous organization to eliminate single points of failure.

What an adventure!