My Azure Learning

As the new job requires me to be proficient in Azure and Terraform, I took the opportunity to practice deploying resources and taking notes. It’s been really interesting and is an ongoing effort, as I prepare for the first of 5 certifications in Azure. If you are interested, check out my Github repo.

Gigging as a contractor, those days are over…

The last two years as an IT contractor have been really intense, with both of my contracts ending early and giving that uneasy unemployed feeling. My last assignment with a major media company has been a great opportunity to get exposed to lots of design patterns and best practices. Life as a systems reliability engineer was exactly what I needed to see what operating at scale under enterprise conditions. I got an opportunity to get better at terraform and cloudformation, use chef and gitlab-ci, build containers and deploy them to a kubernetes cluster in AWS. Just brain meltingly  cool stuff!

Sadly, the pandemic shut down my time as an SRE prematurely, so back into the world of job hunting. Very scary, as the day I got my layoff notice the recruiters stopped calling because the talent pool became an ocean, so many IT people were being let go. My SRE team got cut 40%, and others I thought would be critical an un-cut-able are looking for work.

Luckily, I had maintained contact with a friend from a previous contract (at yet another major media company), and he turned me onto an opportunity in financial technology. I’m now working as devops engineer (I know, it’s not a job title, but it is here)
 in Azure for a well known bank. That is something of a sea change for me, but I’m looking forward to it, and it is a proper J.O.B. and not a contract, so that is pretty nice to have some semblance of security.  I’m certainly blessed and grateful to be so lucky in this these times of trial. Watching the Linkedin and Twitter has been really depressing for the last 3 months, as so many of my peers are scrambling to keep working while tech companies fold all around us. Time to focus, improve my skills, and keep swinging for the fences.

SSM Parameter Store

As part of my disaster recovery process, I make a daily AMI snapshot of the servers and copy it to my disaster recovery target region. As that AMI ID changes each day, I need a way to get the current day’s ID into my CloudFormation template so that AMI can be used when creating a copy of our server in the new region. Short of having to use python and Lambda to discover the AMI ID and recreate the template, there had to be a better way to do it.

Enter Parameter Store.

This is an AWS service that acts like a region bound scratchpad, where you can store data and have it retrieved from a few other services, one of which is CloudFormation.

My first step is to create the AMIs and store their IDs in SSM:

# snippet, local_ami_list is list of local instances and names with date pre-pended.
# this section creates the AMIs, tags them, and adds to a list for copying to DR
for line in local_ami_list:
    image_data_combined_list = line.split(',')
    #pprint(image_data_combined_list)
    local_instance_id = image_data_combined_list[0]
    local_instance_name = current_date_tag + '-' + image_data_combined_list[1]
    image = ec2_local.create_image(InstanceId=local_instance_id, Description=local_instance_name, DryRun=False,
                                    Name = local_instance_name, NoReboot=True)
    tag_image = ec2_local.create_tags(Resources=[image['ImageId']], Tags=[{'Key': 'Name', 'Value': local_instance_name},])
    entry = local_instance_name + ',' + image['ImageId']
    ami_list_to_copy.append(entry)

sleep(90)

# this snippet copies the AMIs to the DR region

for line in ami_list_to_copy:
    ami_list_combined_data = line.split(',')
    local_ami_name = ami_list_combined_data[0]
    local_ami_id = ami_list_combined_data[1]
    try:
        image_copy = ec2_dr.copy_image(Description=local_ami_name, Name=local_ami_name, SourceImageId=local_ami_id,
                                        SourceRegion=local_region, DryRun=False)
        entry = local_ami_name + ',' + image_copy['ImageId']
        dr_ami_list.append(entry)
# this snippet is a bit of kludgy hack, but it gets me in the ballpark. Anonymized for my protection

# 5. lists amis in dr region and writes the current day to SSM parameters for further use in cf-scripts

sv1_ami_parameter = '/org/env/ec2/ServerName1/ami'
sv2_ami_parameter = '/org/env/ec2/ServerName2/ami'
sv3_ami_parameter = '/org/env/ec2/ServerName3/ami'
sv4_ami_parameter = '/org/env/ec2/ServerName4/ami'
sv5_ami_parameter = '/org/env/ec2/ServerName5/ami'
current_ami_list = []

ssm_dr = boto3.client('ssm',region_name=dr_region)
dr_amis = ec2_dr.describe_images(Owners=['self'])
for ami in dr_amis['Images']:
    match = re.search(current_date_tag, str(ami['Name']))
    if match:
        entry = str(ami['Name']) + ',' + str(ami['ImageId'])
        current_ami_list.append(entry)

for ami in current_ami_list:
    line = ami.split(',')

    match1 = re.search('ServerName1', line[0])
    match2 = re.search('ServerName2', line[0])
    match3 = re.search('ServerName3', line[0])
    match4 = re.search('ServerName4', line[0])
    match5 = re.search('ServerName5', line[0])
    if match1:
        set_parameter = ssm_dr.put_parameter(Name=sv1_ami_parameter,
                                            Value=line[1],
                                            Type='String',
                                            Overwrite=True)
    elif match2:
        set_parameter = ssm_dr.put_parameter(Name=sv2_ami_parameter,
                                            Value=line[1],
                                            Type='String',
                                            Overwrite=True)
    elif match3:
        set_parameter = ssm_dr.put_parameter(Name=sv3_ami_parameter,
                                            Value=line[1],
                                            Type='String',
                                            Overwrite=True)

    elif match4:
        set_parameter = ssm_dr.put_parameter(Name=sv4_ami_parameter,
                                            Value=line[1],
                                            Type='String',
                                            Overwrite=True)

    elif match5:
        set_parameter = ssm_dr.put_parameter(Name=sv5_ami_parameter,
                                            Value=line[1],
                                            Type='String',
                                            Overwrite=True)

As you can see, the set_parameter function of the boto3 ssm client module puts the data as a plain text value. To retrieve it in a cloudformation script, you have to reference it:

Parameters:

  sv1:
    Description:  'pre-baked AMI copied from ops region, ID retrieved from SSM Parameter Store'
    Type: 'AWS::SSM::Parameter::Value<String>'
    Default: '/org/env/ec2/ServerName1/ami'

# then reference it in the Resources ec2 instance code block as the ImageId.

No more hard coded values, or a need to dynamically generate the script on a daily basis.

You can encrypt values and store them as SecureStrings. However, to retrieve them you will need an understanding of the version number, and I’ve yet to figure that out. Once I do that, then I can more securely store usernames and passwords and avoid hard coding them. So, very cool indeed! (And now I know the answer to an interview question that I bombed!)

First Jenkins success build disaster recovery stacks

Now that I have most of the disaster recovery scripts built (still testing and solving little issues, but getting close!), I thought I’d give Jenkins a try. I’ve built a freestyle job and used the AWS CloudFormation plugin, pointed it at the root stack in S3, and voila! It builds. A couple of things to figure out:

  • I’m not having any luck auto-building from commits. I need to work out a process that will take the steps of pulling the code, and syncing it to my S3 bucket so that Jenkins can build it.
  • As I am using a jenkins container, I don’t have the awscli installed and can’t script the commands. So far, building my own jenkins install with all the tools built seems the way to go, but if there are CloudFormation plugins, there should be something similar to what I need. That, or build it myself (hmm…..).
  • I can’t get an update to work through Jenkins. The build throws an error about a badly formatted template when I try an update, but that update works fine if I do a manual stack update. I I delete the stack and build again, it works fine. Weird….

So, so more troubleshooting ahead. I’m still working on why the load balance instances keep initializing and then get replaced, I think it is a health check setting… Back at it!