Picking the Right AWS Instance Types for You, Part 2

Version: Deadline 10

Introduction

Last month, we published a blog entry detailing the differences between the instance types available on AWS. This blog is a follow up to that one, so I suggest you give it a read here. This time, I’m not going to focus on the differences between the instance types. Instead, I’m going to focus on how to pick the right instance type for you.

One Size Fits All?

I’d love to be able to end this blog post right now by saying you should use ‘c4.2xlarge’ (a compute optimized instance with 8 CPUs and 15GB RAM) for all of your instances and you’ll never run into any problems and it’ll all be great. But I can’t. It turns out, picking an instance type requires a bit more finesse, fine tuning and experimentation to achieve the best results. There isn’t a one size fits all answer. The right instance type isn’t going to be the same for each studio or even between jobs. Instead, you’ll see the best results if you take the time to figure out which instance types are the right ones for the jobs that you’ll be rendering.

Too Big, Too Small, Just Right

To help chose an instance type that’s right for you, I highly recommend doing some experimentation. You’ll want to get a scene that is similar to the kinds of scenes you’ll be sending to the cloud. I’d grab a scene that has an average amount of geometry, assets, simulations, etc. I’d also recommend getting a test scene that you’re familiar with. That way you should be able to tell if rendering is taking longer than expected.

If you have an on-premise farm, take a few test frames from your scene and render them on a few of your local render nodes. In the Deadline Monitor, if you look at the Task panel for your test job, you can see the peak and average RAM and CPU usage for your frames.

ram_cpu_usage.png

Based on these numbers, you should be able to determine an instance type that will fit your job. In the above example, my Autodesk ® Maya ® render didn’t use a lot of RAM. It topped out at 20% but usually was sitting around 10%. However, the CPU was used more heavily, hitting up to 98% usage and averaging 38%. For these tests, I was using an 8 core machine with 15GB of RAM. Based on these numbers, I should look at scaling back on my RAM.

With all this in mind, I ran the same test scene using a few different instance types on AWS. In my sample case here, I tried a ‘c4.2xlarge’, ‘m4.xlarge’ and ‘c4.xlarge’. These instance types range from 4 CPUs to 8 CPUs and have 7.5GB to 16GB of RAM. The ‘m4.xlarge’ is a general purpose instance type, while the other two are compute optimized. I chose these types to compare the results I got from dialing the CPUs and RAM up and down.

After the renders finished, I again looked at the RAM and CPU utilization on those machines. I also looked at total task time, startup time and render time, to see where I was spending the most time. If most of the time is spent on startup, it probably means the rendering application is taking a longer time to boot. Startup time should go down after the first frame renders on an instance, and it’s recommended to run a handful of frames on each instance to avoid that issue.

You can see in my second run through, all the instances took a bit longer and used a bit more CPU/RAM than the first benchmark. Each frame was about 30s slower than the original render, but these instance types are also pretty inexpensive. From here, I need to decide if I want to run more tests with smaller instance types or stick with one (or more) of these instance types. Alternatively, I could do a test with larger instance types (16 CPUs) to see how quickly they finish my render. The trick is finding the right balance between performance and cost.

ram_cpu_usage_2.png

It’s also a good idea to mix in some different categories of instance types. Start with General Purpose instances and then go with some Compute or Memory optimized ones depending on your needs. You can find a lot of detail on all of the instance types on this page. I like to use the on-demand pricing page because that lists every instance type on a single page and shows the on-demand price too.

Try to err on the side of caution when it comes to those average usage percentages. If you start hitting 80% or 90% consistently, it’s probably not a good idea to go with a less powerful instance type. If you do, it can result in reduced performance, longer render times or even failed renders.

If you don’t have an existing farm or on-premise machines that can effectively render your scenes, I suggest following the same approach as above to get your initial benchmarks using AWS instances. Start with a General Purpose instance type you think would work for your needs, and then go from there.

Rinse and Repeat

This process is going to require at least a couple rounds of iteration and testing. Yes, you could pick the largest instance type and you won’t run into any issues but that’s not going to be cost effective. Through repeated testing, you should be able to find an appropriate instance type for your jobs without paying for more processing power than you really need.

There are a lot of instance types to choose from and it’s only through repeated testing will you be able to tell which ones are right for you. Keep in mind that choosing an instance type is a process and what instance type is right for you may change. From time to time, AWS adds new generations of hardware. Even if you already have some instance types that work well for you, it’s usually worth benchmarking against new instance types when they become available. This should result in a much more efficient cloud farm.

Autodesk, Maya are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries.

All other brand names, product names, or trademarks belong to their respective holders.