Finding the trade off between performance and cost has been a bit of a challenge. At my school district we run 4 instances and one database, and while reserve instances have been great in getting stronger servers with the CPU, memory and network capacity for cheap, finding the right balance of drive IOPS vs cost has been a challenge. My goal was to reduce our monthly costs to about $150 a month instead of the $500 a month we were averaging, but the bills after the reserved instances purchases have been running about $300, so I needed to look for ways to lower cost.
Using the billing detail section of the console, I see that my cost fixed costs were as follows:
- Support – $29 per month This is a percentage of the monthly bill, $29 is the lowest amount. Not giving this up, not for love nor money. AWS support is amazing, and well worth the price. The response time is 24 hours at this (Developer) level, and they often take all of it to answer, but they take the time to help you learn where you went wrong. They even fixed a postfix issue for me, even though it was clearly on me and not part of their purview. Can’t say enough good things.
- Workspaces – $7.25 For a windows image we maintain for Powerschool data exchange. If you leave it running the cost doubles.
- Route 53 – $3 At $.50 per zone, we have 6. The utility of having our DNS managed in AWS far outweighs the cost of using Network Solutions, in my opinion.
- Marketplace image – $7.44 a month. This is a Turnkey ldap image @ .10 an hour. There is a free one I’ll change to later on down the road.
So, around $50 for a base. My dynamic costs are where it gets fun:
- CloudFront – $5 – $50. We had an issue where a large software package was being pulled from the cloud instead of a local file share, so that got expensive quick. We also had a spike of external installs for teacher laptops before the school year started, so a small increase was expected, but that was a lesson learned.
- Data transfer – $5, Not a big deal.
- EC2 EBS storage – $100. Ouch. This was a couple of large volumes created as provisioned IOPS, so a terabyte+ of data was sitting there, but we only used about 100 gigabytes of data. Room for improvement
- EIPs – $1. We use four, so no issue.
- EFS – $2.74. This is for our student web data storage, a bit less than 10 gigabytes.
- RDS – $100. This is odd. at .10 per gigabyte and .20 per million requests, we had a lot more requests that I think we should. We have a reserved instance here so the cost should be a lot less than that.
- Route 53 – $.32. At.40 per million requests, so no problem here.
- S3 – $15. At .023 per gigabyte plus .005 per 1000 puts, so this should be fine. We use S3 for log and config storage, software package deployment in CloudFront, and data backup of the student files on EFS, so this was expected.
So, two places to cut costs:
- Go back to MySql from Aurora and move the database to a non public endpoint. I don’t see anything in the data, but it looks like the database was getting banged from outside, so I put a stop to that by moving out of the default VPC and creating a 2 tier VPC with a set of private subnets. That should cut down the requests to only the essential ones from the webservers. You live, you learn, you get better. Thank goodness for CloudTrail and VPC logs to let me see where the traffic was from.
- Reduce the EBS volume sizes.
Make EBS volumes is a bit tricky, as there is no push button way to do it in the console, and the fact that the volumes are root level volumes on working servers meant that I had to figure out some linux magic. Luckily, I found a tutorial that made this possible: https://matt.berther.io/2015/02/03/how-to-resize-aws-ec2-ebs-volumes/ A big shout out to the author, Matt Berther, for posting the most understandable tutorial for this issue of all the stuff in Google.
- I made snap shots of the target volumes.
- I learned this the hard way: volumes are tied to a availability zone. If you create it in one zone, you can’t map it to an instance in another zone. Good pro tip!
- I spun up instances as the same AMIs in the same availability zone as the two servers that used the volumes, an Amazon-linux AMI and an Ubuntu 14.04 AMI. I found out the hard way that the root volumes are not interchangeable, which makes sense if you get some sleep, cut down on the caffeine and think about it. When creating the instances I made each volume 20GB gp2 100/3000 IOPS. This gives me the smaller volume size, but I have to make sure that the volumes perform under load without overloading the I/O queue. That will take some testing. If I need to, I can increase the volumes easily in the console. Can easily get big, a challenge to get small.
- I shut down the two new instances, made note of their root volume mapping, /dev/xvda for the AWS AMI and /dev/sda1 for the Ubuntu AMI.
- I detached the volumes and named them as “target.” There are a bunch of volumes in the EBS console and they can get confusing with a naming method.
- I created volumes of the large source volumes in the same availability zone as the target volumes, and labeled them as “source.”
- Now the fun part: when I was trying this out, I used a t2.nano instance to do the resizing and copy operations, and it was slooooowwww. The t2.nano was getting about 23MB per second throughput. One of the source volumes had about 10GB of data, so it took nearly an hour. So, I decided to use the servers instances themselves to do the copy operations, so that 160MB per second throughput came in very handy. The tutorial does a great job of explaining how to use e2fsck, resize2fs, and dd, so I wont repeat them here. However, step 9 is handled by my steps 3 & 4.
It was a challenge to use the tutorial’s calculations to figure out the block count so I made notes to make sure I didn’t fat finger the entries:
studentServicesSource /dev/sdf (/dev/xvdf1) 641235 (4k blocks) 2564940 / 12288 (208.735) 210
mdm source volume (xvdg1) 15322108 61288432 4987.66 5000
Taking the first volume, there were 641235 4k blocks. I multiplied that by 4 to get the 16k block count. The tutorial’s formula was count *4 / (16*1024), but it made more sense to just use 12288 and divide by that to get the 1M block count, then I rounded up a bit. The mdm volume was much bigger so that took a bit longer. I’ve used dd a bunch in the past, mostly creating USB boot volumes for Linux install sticks, but I didn’t know the switches. The Amazon AMI version of dd had a status=progress switch which gave me good progress data, but the Ubuntu version didn’t support it and I didn’t want to try piping it through pv. That copy operation took a while and was a bit of a mystery until it was complete, just watching top until the process disappeared. Maybe next time.
Note: the source mapping is different from the tutorial because I attached the drives to the server in the wrong order, so I had to adapt or risk wiping out my source volume. Good thing you can’t run dd with a mounted root volume or I could have wiped my server volume.
Once the target volumes were resized and fscked, I attached them to the server instance on the correct root device mapping and spun them back up. Everything worked, so I tested them out, created new AMIs of the servers, and deleted the old source volumes. thank you Amazon for cheap S3 storage for snapshots!
So, I was able to reduce my EBS costs by 96% as a baseline. If I need to increase the IOPS, I can make the volumes larger, increase the IOPS or change them into provisioned IOPS (io2) volumes and reattach them to the servers. We’ll see next month how the bill looks, but at this point, I think I am on track for $100 a month, with room to grow if we need more throughput.