Downsizing EBS Volumes

Finding the trade off between performance and cost has been a bit of a challenge. At my school district we run 4 instances and one database, and while reserve instances have been great in getting stronger servers with the CPU, memory and network capacity for cheap, finding the right balance of drive IOPS vs cost has been a challenge. My goal was to reduce our monthly costs to about $150 a month instead of the $500 a month we were averaging, but the bills after the reserved instances purchases have been running about $300, so I needed to look for ways to lower cost.

Using the billing detail section of the console, I see that my cost fixed costs were as follows:

  • Support – $29 per month  This is a percentage of the monthly bill, $29 is the lowest amount. Not giving this up, not for love nor money. AWS support is amazing, and well worth the price. The response time is 24 hours at this (Developer) level, and they often take all of it to answer, but they take the time to help you learn where you went wrong. They even fixed a postfix issue for me, even though it was clearly on me and not part of their purview. Can’t say enough good things.
  • Workspaces – $7.25 For a windows image we maintain for Powerschool data exchange. If you leave it running the cost doubles.
  • Route 53 – $3  At $.50 per zone, we have 6. The utility of having our DNS managed in AWS far outweighs the cost of using Network Solutions, in my opinion.
  • Marketplace image – $7.44 a month. This is a Turnkey ldap image @ .10 an hour. There is a free one I’ll change to later on down the road.

So, around $50 for a base. My dynamic costs are where it gets fun:

  • CloudFront – $5 – $50. We had an issue where a large software package was being pulled from the cloud instead of a local file share, so that got expensive quick. We also had a spike of external installs for teacher laptops before the school year started, so a small increase was expected, but that was a lesson learned.
  • Data transfer – $5, Not a big deal.
  • EC2 EBS storage – $100. Ouch. This was a couple of large volumes created as provisioned IOPS, so a terabyte+ of data was sitting there, but we only used about 100 gigabytes of data. Room for improvement
  • EIPs – $1. We use four, so no issue.
  • EFS – $2.74. This is for our student web data storage, a bit less than 10 gigabytes.
  • RDS – $100. This is odd. at .10 per gigabyte and .20 per million requests, we had a lot more requests that I think we should. We have a reserved instance here so the cost should be a lot less than that.
  • Route 53 – $.32.  At.40 per million requests, so no problem here.
  • S3 – $15.  At .023 per gigabyte plus .005 per 1000 puts, so this should be fine. We use S3 for log and config storage, software package deployment in CloudFront, and data backup of the student files on EFS, so this was expected.

So, two places to cut costs:

  • Go back to MySql from Aurora and move the database to a non public endpoint. I don’t see anything in the data, but it looks like the database was getting banged from outside, so I put a stop to that by moving out of the default VPC and creating a 2 tier VPC with a set of private subnets. That should cut down the requests to only the essential ones from the webservers. You live, you learn, you get better. Thank goodness for CloudTrail and VPC logs to let me see where the traffic was from.
  • Reduce the EBS volume sizes.

Make EBS volumes is a bit tricky, as there is no push button way to do it in the console, and the fact that the volumes are root level volumes on working servers meant that I had to figure out some linux magic. Luckily, I found a tutorial that made this possible: https://matt.berther.io/2015/02/03/how-to-resize-aws-ec2-ebs-volumes/  A big shout out to the author, Matt Berther, for posting the most understandable tutorial for this issue of all the stuff in Google.

  1. I made snap shots of the target volumes.
  2. I learned this the hard way: volumes are tied to a availability zone. If you create it in one zone, you can’t map it to an instance in another zone. Good pro tip!
  3. I spun up instances as the same AMIs in the same availability zone as the two servers that used the volumes, an Amazon-linux AMI and an Ubuntu 14.04 AMI. I found out the hard way that the root volumes are not interchangeable, which makes sense if you get some sleep, cut down on the caffeine and think about it. When creating the instances I made each volume 20GB gp2 100/3000 IOPS. This gives me the smaller volume size, but I have to make sure that the volumes perform under load without overloading the I/O queue. That will take some testing. If I need to, I can increase the volumes easily in the console. Can easily get big, a challenge to get small.
  4. I shut down the two new instances, made note of their root volume mapping, /dev/xvda for the AWS AMI and /dev/sda1 for the Ubuntu AMI.
  5. I detached the volumes and named them as “target.” There are a bunch of volumes in the EBS console and they can get confusing with a naming method.
  6. I created volumes of the large source volumes in the same availability zone as the target volumes, and labeled them as “source.”
  7. Now the fun part: when I was trying this out, I used a t2.nano instance to do the resizing and copy operations, and it was slooooowwww. The t2.nano was getting about 23MB per second throughput. One of the source volumes had about 10GB of data, so it took nearly an hour. So, I decided to use the servers instances themselves to do the copy operations, so that 160MB per second throughput came in very handy. The tutorial does a great job of explaining how to use e2fsck, resize2fs, and dd, so I wont repeat them here. However, step 9 is handled by my steps 3 & 4.

It was a challenge to use the tutorial’s calculations to figure out the block count so I made notes to make sure I didn’t fat finger the entries:
studentServicesSource /dev/sdf (/dev/xvdf1)  641235 (4k blocks) 2564940 / 12288 (208.735) 210
mdm source volume (xvdg1) 15322108 61288432 4987.66  5000

Taking the first volume, there were 641235 4k blocks. I multiplied that by 4 to get the 16k block count. The tutorial’s formula was count *4 / (16*1024), but it made more sense to just use 12288 and divide by that to get the 1M block count, then I rounded up a bit. The mdm volume was much bigger so that took a bit longer. I’ve used dd a bunch in the past, mostly creating USB boot volumes for Linux install sticks, but I didn’t know the switches. The Amazon AMI version of dd had a status=progress switch which gave me good progress data, but the Ubuntu version didn’t support it and I didn’t want to try piping it through pv. That copy operation took a while and was a bit of a mystery until it was complete, just watching top until the process disappeared. Maybe next time.
Note: the source mapping is different from the tutorial because I attached the drives to the server in the wrong order, so I had to adapt or risk wiping out my source volume. Good thing you can’t run dd with a mounted root volume or I could have wiped my server volume.

Once the target volumes were resized and fscked, I attached them to the server instance on the correct root device mapping and spun them back up. Everything worked, so I tested them out, created new AMIs of the servers, and deleted the old source volumes. thank you Amazon for cheap S3 storage for snapshots!

So, I was able to reduce my EBS costs by 96% as a baseline. If I need to increase the IOPS, I can make the volumes larger, increase the IOPS or change them into provisioned IOPS (io2) volumes and reattach them to the servers. We’ll see next month how the bill looks, but at this point, I think I am on track for $100 a month, with room to grow if we need more throughput.

CloudTrail: there’s a log entry for that

I haven’t worked in an environment that had any compliance or data retention policies (well, there was that MS Outlook IMAP/POP3  incident, but thankfully there was a backup), so a lot of this stuff I make up as I go along. So, today I discovered CloudTrail and I must say, it is pretty cool.

CloudTrail monitors your environment API calls, so when some one does anything, there is a record of it. You get one “trail” for free, so I set one up to go to an S3 bucket. According to the pricing document (see, I’m learning, I checked the pricing first!) most environments cost less that $3 per month, so I think this is a good purchase.

To test it out, I turned on a NAT instance to see when the event would be logged. It took about 10 minutes for the event to get registered in the log, so some patience is in order.

The event was also copied in JSON format to my logs bucket, so once I find an  app to de-minimize the code and maybe build a parser, it will be useful. Until then, it is taking up space in a bucket, which has policies to rotate the contents to Infrequent Access storage after 30 days, then off to Glacier for a bit more  than a year before being chucked in the bit bucket. This will have to do until we document some more formal retention policies.

My AWS Seminar

On August 3rd, I had the opportunity to participate in a webinar for Amazon Web Services, presenting a case study in how our schools use AWS. Unfortunately, the recording crashed, so all I have is my script. If you are in the education sector and are interested in how we leverage AWS, I hope you find this useful.

Can you please let us know a little about yourself and your institution?

Welcome! I am the director of technology for Youth Policy Institute Charter Schools, or as we call it, YPICS. We are a charter management organization with about a 1000 students, grades 5 through 12, nearly 100 adult staff, two middle schools, one high school and a small central office / training center. We serve the Pacoima and Pico Union communities of Los Angeles.

Our charter focuses on project based, technology integrated service learning, in order to produce students who are college ready, active citizens and lifelong learners. We also have special education programs that focus on autism and integrating students with special needs into the general education classroom. We have one computer to one student ratios at each school, and our equipment is predominantly Mac based, with more than 1200 Macs, 150 iPads, and 275 Chromebooks. We also have 8 Linux servers providing internal and external web publishing, mobile device management, print accounting and queue management, database services and critical infrastructure services.

 

Previously I worked in technology support management at Kinkos and at the Design | Media Arts school at UCLA (Go Bruins!). I chose to work in the charter school movement to help students have access to 21st century skills. When I joined YPICS in 2006, I was teaching visual arts and technology to 7th graders, mostly Dreamweaver, Photoshop, Illustrator and Blender, all the while setting up all the technology and manage the network of about 45 computers on a retail hub network and a heavily oversubscribed dsl connection. We’ve come a long way since then. I left the classroom to manage the technology full time about 5 years ago, and we hired some part time tech support about two years ago, since managing 800 machines was getting a bit much for me to handle and my high school aged son wouldn’t allow me to “volunteer” him any more.

 

Can you talk about some of the specific challenges you faced leading up to using AWS?

My organization runs cheap and light on IT support, as most small charter schools must. This is much different than my experience at UCLA. I am a full time 12 month employee and I have three very hard working part time college students who handle site tech support. My team supports more than 1600 computers, printers, phones, services, etc, so our support ratio is about 530 devices to 1, which is extremely high even at our level of automation. Also, our annual budget process is long and fraught with politics and uncertainty, especially at the school board, state and federal levels. We spend as much money as possible on instruction and access devices, but putting money into infrastructure and support is a long, hard conversation with my COO that usually ends in a “no.” So, finding a flexible and low cost solution to a technology problem is paramount, because even though the answer is “no!” we still have to meet our charter objectives. Also, our schools are either in temporary facilities that are shared with other schools, or in portable bungalows, so keeping our data center (a small closet in an office) powered and secure and cool can be a challenge.

I began to experiment with Amazon Web Services as a way to make our web servers more stable, secure and scalable. We had public facing websites hosted in the campus data center, which means that server traffic would affect our overall bandwidth and it provided an attack vector into our network. We also host internal websites for student portfolios that serve as presentations of learning, and these sites took on a huge performance hit during specific days and times as the students worked on their projects. Since all of these services were all hosted on the same physical device, the performance was not enough to meet the instructional need. So, I was looking for low cost, scalable, easy to manage solution.

Please tell us what you are currently doing in AWS?
Over the last 18 months, we have implemented the following nine services:

  1. Virtualization.  We run 4 different virtual servers and one database in AWS. I have my public facing marketing web sites, our private student web sites, our mobile device management server, and our directory server. The student and the directory servers are only accessible from the school’s network, so they are secure from outside access. We run a mysql database that feeds the servers, and it is also only accessible to the school network. The database and servers are automatically backed up nightly, so in case of failure or data corruption I can get them back in service in about 5 minutes and only lose at worst a day’s worth of data. Here is one example of the flexibility of using AWS virtualization. Our student server was only necessary during the day, no need to run it overnight. I was able to write a script that created my student web server at 6am on each weekday morning and destroy it each afternoon at 6pm, paying .08 cents an hour for 12 hours a day and nothing for the server over night, all done automagically.  
  2. Load balancing. I was able to set up a virtual load balancer and auto scaling for my student server, which gives me a dynamic cost structure, and I can scale up or down and pay only what is necessary. I didn’t have to invest in any load balancing hardware or try to figure them out in my live environment, which means I got my nights and weekends back.
  3. Route 53. We use AWS’ domain name service for domain management for our 6 domains. This allowed us to consolidate vendors and restructure our network to be more efficient for student use.
  4. EFS. The elastic file system provides an unlimited storage and we only pay for what we use, which is as this point is a bit more than a cent a day per gigabyte, for our student server server data.
  5. Scalable storage. We use the scalable storage service aka S3 for archival, and at this point we pay twelve cents a day for somewhere around 250 gigabytes of storage. I use these for automated backups of server configurations, student data and logs. We also have a storage bucket that serves our mobile device management server so we can deploy software packages to managed clients outside the network, great for patching teacher machines when they are off campus. The flexible permissions in S3 allow me to secure the data very tightly.
  6. Reserve instances. We can buy the server time up front and at dramatic discount and only pay for data traffic to the servers and storage. For example, I have purchased three years of a large server instance for about $900, instead of having to invest in a hardware device. This has allowed me to fund the next three years of server operations at a significant discount.  I have saved about 40% in server costs.
  7. Cost Explorer. The management console’s cost explorer allows me to drill down to the fine details of each of the services we are using on a day by day basis, and I can adjust my implementations to lower costs, or justify increases.
  8. Messaging. I am also using the simple notification service to alert me via email and text message if there are any server issues.
  9. Workspaces. We needed to transfer our student data from one information system (PowerSchool) to another (Illuminate), and the process required a Windows PC to be on 24 /7 to syncronize the data. We don’t have any spare PCs laying around, so I was able to use the WorkSpaces service to create a Windows 7 desktop in the cloud, install the converter scripts, and let it run. It cost me 7.25 a month for the instance and all of $25 a month if I left it running 24/7. This enabled us to easily migrate data changes from one SIS to the other without having to keep a physical machine plugged in and turned on. I could easily and securely access it from my Mac laptop.

    What efficiencies or benefits has this provided?
  1. Lower overall costs that can be managed on a daily / hourly basis, and lower facilities costs by reducing the power and air conditioning load in the data center.
  2. Push button management for deployment and disaster recovery measured in minutes rather than hours
  3. Less dependency on outside vendors and consultants, and less time managing server hardware
  4. Better security, flexibility and scalability. By moving servers and data to the cloud and out of the closet in the main office, so we are not being tied to a physical data center as a single point of failure or locked up with upfront hardware costs.
  5. Consolidation of services. We used to manage domains at Network Solutions, run servers and databases on premise, and store data on a bunch of separate 1 terabyte drives scattered throughout the organization. All of this is now virtualized in the cloud and the only hardware in the school sites and data center are critical network services such as networking, content filtering, switching, wifi control and print queues.
  6. Most importantly, I have more time to focus on strategic initiatives rather than server maintenance. I get to keep my hair, I sleep uninterrupted and spend my weekends away from campus and have a deeper impact on our charter mission.

    What advice would you have for other academic institutions looking to get started with the cloud?

 

Define a problem, and start small. You can use the free tier of services for the first year to test out your implementation. Once your proof of concept works, you can scale up and out. Use the cost explorer and the billing details to keep a tight handle on your costs. See if you are saving money and time. I think you’ll find that moving to the cloud will increase your uptimes and lower your overall costs. If that works, invest in a person who can become certified in aws services (the classes and certification tests are much less expensive than Cisco classes, in my experience).

Possible projects:

  1. Create a website, maybe for a student project, like a leadership class or an instructional unit project.
  2. Migrate your data  archive to inexpensive long term storage.
  3. You can upload and automatically transcode video for different formats. Great for student projects or media for your marketing efforts.
  4. AWS has some great tools for big data storage and analytics.

 

On a side note, the AWS documentation is very well written and easy to understand,  there is a lot of examples and use cases on the Internet, and I would be happy to answer any questions you might have afterwards.

 

Future implementations

  1. I am looking into the AppStream service to help us access an important website that has very specific Internet Explorer dependencies and is a challenge to use in our Mac environment.
  2. We are using the CodeCommit service to train my tech team to develop software in house rather than hosting it only on their test machines or in a public space such as GitHub. We retain any intellectual property and control access.
  3. I am going to implement the web application firewall to improve the security of our website. The WAF service can limit site access to specific geographic areas, so I can stop dealing with automated attacks from Eastern Europe and Asia.
  4. Finally, I am looking into converting our service infrastructure into code and deploying it with AWS CloudFormation, so I won’t have to manage individual servers at all, the cloud will do it all for me.

AWS Config is cool, but pricey for us cheapskates

While studying for the SA-Pro test, I reacquainted myself with AWS Config. This is a great tool, helps you track changes to your environment, monitor policy compliance, all but butters your bread. So, I’m nosing around, walking my way through the console, setting up the bucket and the recording, I figure I’ll add some rules. There are 21 rules, so I figure what the heck, throw ’em all in, I’ll sort them out later.

A little warning went off in my head, I think I made this mistake before… Yep, this will be a $42 dollar mistake. At this point, it is $2 per month per active rule. So, there goes any savings I was shooting for this month, with my $100 credit for answering a survey. Sorry, taxpayers, my bad. I’ll get it back once I figure out how to shrink these root EBS volumes…

Why do the warnings come after the spells…?

However, as part of my daily billing review I see the system only charged me for rules that applied to my resources after I ripped them all out, so I only got hit with 9 rules for $18, plus some change for 39 evaluations.

The cool thing is the Dashboard that gives me a chance to drill down into all my resources. After two years of experimentation and implementation, I’m up to 127 cosas! (Sorry, bing-watched Narcos, my Spanish has gotten more colorful). Didn’t think that I had 14 network interfaces up, I only have 4 EC2 instances. Good place to do some exploration.

When you View All # Resources, you get a list of each resource with a timeline of changes and a link to manage that resource, so you can zero in on each thing and make sure it is needed, properly configured, tagged, etc.

Very useful!

 

CloudWatch for Server Logs, or what magic is this?

While reviewing AWS Security and Monitoring in Linuxacademy.com’s CSA-Pro course, I heard the instructor say that CloudWatch could be used to monitor application logs. I’ve only ever used CloudWatch to monitor metrics like CPUUtilization and DatabaseConnections, and only recently implemented custom metrics like DiskUtilization. Monitoring application logs looks like some next level sorcery! I’m in, let’s do this!

I used this blog post as the basis of my exploration (when I grow up, I want to be just like Jeff Barr, he’s my hero!):
https://aws.amazon.com/blogs/aws/cloudwatch-log-service/

Rather than create a new policy, I appended the sample policy into an existing cloudwatch policy. It took a little fiddling with a custom Stmt ID and tracking all the brackets and parentheses, but finally the policy validated.

Instead of doing the manual install, I followed the instructions here and used the awslogs package.

Setting up the log group and log streams was a bit confusing. Using CloudWatch’s Logs interface, I settled on building a log set for my school webserver, cleverly named webServer. I created two streams, messages and and ypics.org.access_log. I didn’t realize until much later that this step was completely unnecessary. (“They really should put the warnings before the spells”…)

In the /etc/awslogs/awslogs.conf file, I set up the stream definitions, leveraging the messages example and built one for the access_log. Turned on the server, it should all work, right?

Not so fast…

The dreaded “No Events Found” notice in “be calm, don’t hurt anyone” blue filled warning rectangle. Stop, it’s troubleshooting time… (Woah, Woah Woah)

In the /var/log/awslogs.log it looks like everything is fine:

2017-09-26 16:35:08,341 – cwlogs.push.publisher – INFO – 2572 – Thread-3 – Log group: webServer, log stream: webServer_messages, queue size: 0, Publish batch: {‘skipped_events_count’: 0, ‘first_event’: {‘timestamp’: 1506443701000, ‘start_position’: 240553L, ‘end_position’: 240661L}, ‘fallback_events_count’: 0, ‘last_event’: {‘timestamp’: 1506443701000, ‘start_position’: 240553L, ‘end_position’: 240661L}, ‘source_id’: ‘6536545901daa6a75722ce388afbd37d’, ‘num_of_events’: 1, ‘batch_size_in_bytes’: 133}
2017-09-26 16:36:56,721 – cwlogs.push.publisher – INFO – 2572 – Thread-5 – Log group: ypics.org.access_log, log stream: webServer_messages, queue size: 0, Publish batch: {‘skipped_events_count’: 0, ‘first_event’: {‘timestamp’: 1506443811357, ‘start_position’: 2848118L, ‘end_position’: 2848201L}, ‘fallback_events_count’: 0, ‘last_event’: {‘timestamp’: 1506443811357, ‘start_position’: 2849033L, ‘end_position’: 2849183L}, ‘source_id’: ‘aeb5ce683400f407f5adfd0cbdfe07bd’, ‘num_of_events’: 8, ‘batch_size_in_bytes’: 1265}

At the command line, I checked to see of maybe the policy was messed up:

]$ aws cloudwatch list-metrics

Could not connect to the endpoint URL: “https://monitoring.us-west-2b.amazonaws.com/”

That doesn’t look right.

I think I broke the policy. I guess that means valid JSON doesn’t always mean correct JSON… Back to IAM! Time to build a custom policy…. No, that doesn’t feel right. Instead, I ran sudo aws configure and set the region, but no keys. That seemed to fix the issue with aws cloudwatch list-metrics, but still no data was showing up in the stream. Hmmm….

This is what I get for not reading the instructions. The issue was that the /etc/awslogs/awscli.conf was pointing at us-east-1. As soon as I corrected the region and restarted, the logs started pouring it.

Too easy! Time to put the config on the rest of my servers for my log viewing pleasure!

 

Rescheduled the Solutions Architect Pro Exam

In order to prepare for the SysOps Associate and the Solutions Architect Professional certifications, I intended to take two weeks vacation and spend every available minute studying and practicing. However, I was told that I could not take vacation, but had to be on call at least two hours a day. Still, a generous proposal, but I ended up spending much more time at work than I had thought, and the SysOps exam was more challenging that I expected. As I began to prepare for the SA-Pro, the linuxacademy.com lecturer was adamant that I don’t schedule until 2-3 weeks after I complete the course so I could practice and do the additional reading and practice tests. I had already booked the test, so the best compromise is to reschedule for a month from now. That gives me this week to finish the videos, and the rest of the time to practice. This blog is part of the reflection and practice process. I hope it might be interesting to some out there who are pursuing AWS certifications.

Here’s the count down (no stress!): https://daycalc.appspot.com/10/28/2017

Breadth of Knowledge at this point

So many AWS services and features, and more coming out every year at re:invent. Here’s a tally on what I know and what I want to know. The analogy of an iceberg’s above / below water ratio comes to mind.

Use Regularly

  • EC2 – Elastic Cloud Compute
  • S3 – Simple Storage Service
  • EFS – Elastic File Service (NFS)
  • RDS – Relational Database Service
  • VPC – Virtual Private Cloud
  • CloudFront (CDN)
  • Route 53 (DNS)
  • CodeCommit
  • CloudWatch
  • IAM
  • Simple Notification Service
  • WorkDocs
  • WorkSpaces

Experimented With

  • LightSail
    Elastic Beanstalk
    Glacier
  • DynamoDB
  • CloudFormation
  • CloudTrail
  • Config
  • OpsWorks
  • Trusted Advisor
  • Inspector
  • Directory Service
  • AppStream 2.0

Want to Learn

  • EC2 Container Service (Docker)
  • Lambda
  • Storage Gateway
  • ElastiCache
  • Direct Connect
  • Certificate Manager
  • WAF & Shield
  • Kinesis
  • Step Functions
  • SWF
  • API Gateway
  • Elastic Transcoder
  • Simple Queue Service
  • Simple Email Service
  • WorkMail
  • Amazon Chime

For Fun

  • Glacier
  • Redshift
  • CodeStar
  • CodeBuild
  • CodeDeploy
  • CodePipeline
  • Lex
  • Amazon Polly
  • Rekognition

Extra

  • Batch
  • AWS Migration Hub
  • Application Discovery Service
  • Database Migration Service
  • Server Migration Service
  • Snowball
  • Service Catalog
  • Managed Services
  • Artifact
  • Amazon Macie
  • CloudHSM
  • Athena
  • EMR
  • CloudSearch
  • Elasticsearch Service
  • Data Pipeline
  • QuickSight
  • AWS Glue
  • Machine Learning
  • AWS IoT
  • AWS Greengrass
  • Amazon Connect
  • Amazon Gamelift
  • Mobiler Hub
  • Cognito
  • Device Farm
  • Mobile Analytics
  • Pinpoint

My first VPC, my little buddy

While studying for the last of the associate certifications, I realized that if I was ever asked during an interview to log in and show what I have done, my AWS accounts would look like a kid’s toy chest, much played with but never organized. Creating a Virtual Private Cloud (VPC) is such a core skill, and it was finally time to rein in my personal account. I’ll tackle work on Monday.

I’ve used the VPC wizard before, but today was kind of a test for me, could I do it from memory? I have an adapted CloudFormation template for a private class C network, which is all I really need. However, I watched a re:invent video on VPC design and figured “why not use 65k IPs if they are free?” You never know, I might need a bigger network space one day, and it would be a pain to renumber it all.

So, the plan:
My region has three AZs, so three public subnets, for the day I might actually add a load balancer and scaling group. Fun for another day. I also created 3 private subnets:

10.0.1.0/24 – public-2a
10.0.2.0/24 – public-2b
10.0.3.0/24 – public-2c
10.0.100.0/24 – private-2a
10.0.101.0/24 – private-2b
10.0.102.0/24 – private-2c

I did have a moment of fun and created a 10.0.0.0/22 and revelled in all the room, but figured I could always redo the subnets later. 251 IPs is enough when I have one server instance and one database. I can dream big later.

First step, go to the console and create my first VPC. I didn’t realize until I saw a re:invent video on how the network works on the actual machine that I can have separate VPCs with the same CIDR blocks. Coming from a Cisco switching understanding of lans and vlans, it blew my mind that the VPC ID is just a tag added to the traffic to keep it separate from the rest of the bits and bytes. Still cleaning up bits of brain. Whoa.

Next step was to create the Internet GateWay (IGW) and attach it to my new VPC.

Next step was to create the routing table and associate the subnets with it. Right out of the gate, adding the 0.0.0.0/0 route and pointing it at the IGW. Wait a minute, now all the subnets can route out to the internet, how is that even private? This is where our friends the NAT instance and the NAT gateway come in. Ideally, creating the NAT gateway is the right answer, since it scales and doesn’t need managing or securing. However, it costs money, and I don’t see a need to patch servers in the private subnets right now, because I don’t have any. So, a NAT instance from the community AMI worked fine, I spun it up, changed the Source Destination Checks and gave it an Elastic IP. So far, so good.

Next, I created a new routing table with the NAT instance as the gateway. It popped right up as an option, didn’t have to look for it. Once I had the two routing tables, I assigned the public subnets to the main gateway and the private subnets to the NAT instance gateway, and then shut it down.

Almost there! I created a webDMZ security group that tightened down the access to the web server instance, and created a privateToPublic group that allowed only traffic from the webDMZ. Next, I spun up a test instance in a private subnet and installed mysql to test connectivity. Of course, I had to restart the NAT instance to allow the install.

I had a little trouble figuring out why the mysql traffic wasn’t working between the public and private subnets. I understood that subnets in the same VPC could communicate with each other, but I had to explicitly add mysql to the  the security group rules for each security group to reach each other, as well as explicitly add the public subnets to the the private subnet Access Control List (ACL). I need to go back and understand that better.

The fun part was getting the database in RDS to work, as AWS introduced a new interface and I had some fun with Subnet Groups and Security Groups, but that is another story for another day.

So, my personal account is divested from the default VPC, so I feel like my toy box is all cleaned up and put away properly.