Amazon Web Services, distributed.net

Amazon Web Services (AWS) Release new g2.8xlarge GPU instance

Today AWS announce a long-awaited upgrade to their G2 family of instances – the g2.8xlarge. The big brother of the 2x, which was hitherto the only GPU-backed instance available on the AWS platform.

Here’s how the two compare in specifications:

Instance vCPU ECU Memory (GiB) Instance Storage (GB)  EC2 Spot Price ($/hr)
g2.2xlarge 8 26 15 60 SSD $0.08
g2.8xlarge 32 104 60 2 x 120 SSD $0.32

The four GPUs are the same NVIDIA GRID K520 seen in the 2x instance, and as you can see by the numbers the 8x is exactly four times larger in every respect. The indicative spot price at the time of writing was also very close at roughly 4x the cost.

In my previous post where I ran a benchmark of the g2.2xlarge instances using the Distributed.net RC5-72 project, I re-ran the same test using an 8x. You will not be surprised to learn that the results shows a linear increase in the crunching keyrate to roughly 1.7GKeys/sec (previously 432 Mkeys/sec on the 2x).

Is bigger better?

AWS’ fleet and pricing structure is generally linear. For an instance twice the size, you pay twice the cost, both in spot and on-demand. The major difference that is not very clearly advertised is that the network performance is greater for larger instances. AWS are vague as to what ‘Low’, ‘Moderate’, and ‘High’ mean in terms of raw speed (many others have tried to benchmark this), but in the largest instances this is explicitly stated at 10 Gigabit. If you assume a larger box is pumping out more data, it will need a network connection to match. But you also assume that an instance only generating 1/4 as much data will be equally well served by a ‘Moderate’ connection.

A real world use case

In my day job I set up a variety of analyses on genetic data that is supplemented by EC2 computation clusters (The recent AWS Whitepaper on Genomic Architecting in the Cloud is a really useful resource I can throw at scientists when they have questions). I investigated the viability of G2 instances and for a specific analysis that was GPU-capable, it did indeed run roughly 3-4 times faster than the same job running on a single CPU core. The problem was memory – each job used roughly 3-5GiB of memory meaning I couldn’t run more than 3 or 4 jobs on a single g2.2x GPU at once.

However on a r3.8xlarge – a CPU instance with 32 cores and 244GiB memory, I could run 32 concurrent jobs with memory to spare. Sure, the jobs took 30 minutes each instead of 10, but I could run 32 of them.

Then I drilled down on cost/benefit. The G2.2x was $0.08 on spot, and the r3.8x was $0.32. Four times as much per hour to run, but with 10 times as many jobs running. It ended up being a no-brainer that a CPU instance was the way to go.

Perhaps this is a poor example, because the capabilities of genetic analysis is badly limited by the tools available for the specific job, and it’s reasonably rare to find anything that is built for multi-threading, let alone something designed specifically to run on GPUs. The implementation of these analysis tools are black box and we’re not software developers. Our tests were probably very bad exemplars for the power of a GPU instance, but it did show that a mere 15GiB RAM on the 2X just wasn’t anywhere near enough. 60GB on the 8x is a little better but in my use case it still wouldn’t offer me any additional benefit because I wouldn’t be able to leverage all of the GPUs I’m paying for (our software just isn’t good enough). FastROCS, the example cited in Jeff Barr’s AWS Blog annoucement about the g2.8x also mentions the 15GiB of the 2x being the limiting factor, so presumably they’re running jobs that can leverage more GPU in a single job without a proportional increase in memory usage.

The main benefit of one vertically-scaled box four times the size is speed. If your application can utilise four GPUs simultaneously within the memory limits then you could, for example, transcode a single video extremely quickly. If speed is the main factor in your service delivery then this is the instance for you. If however you’re running smaller jobs that are less time-critical that the 2x can handle just as well, there’s little benefit here, unless you consider the logical management of four GPU instances to be more hasslesome than running one four times the size. But then all of your GPU eggs are in one basket, and if the single instance goes down so does all of your capacity.

As with all AWS usage your chosen method of implementation will depend on what’s right for you. This is going to be great news for a lot of people – but unfortunately not me!