Amazon Web Services

Courtesy of the AWS subreddit, I was alerted to the fact the recently-updated AWS Service Terms (to covertheir new 3D Lumberyard Game Engine) have included this specific clause:

57.10 Acceptable Use; Safety-Critical Systems. Your use of the Lumberyard Materials must comply with the AWS Acceptable Use Policy. The Lumberyard Materials are not intended for use with life-critical or safety-critical systems, such as use in operation of medical equipment, automated transportation systems, autonomous vehicles, aircraft or air traffic control, nuclear facilities, manned spacecraft, or military use in connection with live combat. However, this restriction will not apply in the event of the occurrence (certified by the United States Centers for Disease Control or successor body) of a widespread viral infection transmitted via bites or contact with bodily fluids that causes human corpses to reanimate and seek to consume living human flesh, blood, brain or nerve tissue and is likely to result in the fall of organized civilization.

It’s good to know that AWS are accommodating all possible contingencies for social and economic failover.

Amazon Web Services, distributed.net

Amazon Web Services (AWS) Release new g2.8xlarge GPU instance

Today AWS announce a long-awaited upgrade to their G2 family of instances – the g2.8xlarge. The big brother of the 2x, which was hitherto the only GPU-backed instance available on the AWS platform.

Here’s how the two compare in specifications:

Instance vCPU ECU Memory (GiB) Instance Storage (GB)  EC2 Spot Price ($/hr)
g2.2xlarge 8 26 15 60 SSD $0.08
g2.8xlarge 32 104 60 2 x 120 SSD $0.32

The four GPUs are the same NVIDIA GRID K520 seen in the 2x instance, and as you can see by the numbers the 8x is exactly four times larger in every respect. The indicative spot price at the time of writing was also very close at roughly 4x the cost.

In my previous post where I ran a benchmark of the g2.2xlarge instances using the Distributed.net RC5-72 project, I re-ran the same test using an 8x. You will not be surprised to learn that the results shows a linear increase in the crunching keyrate to roughly 1.7GKeys/sec (previously 432 Mkeys/sec on the 2x).

Is bigger better?

AWS’ fleet and pricing structure is generally linear. For an instance twice the size, you pay twice the cost, both in spot and on-demand. The major difference that is not very clearly advertised is that the network performance is greater for larger instances. AWS are vague as to what ‘Low’, ‘Moderate’, and ‘High’ mean in terms of raw speed (many others have tried to benchmark this), but in the largest instances this is explicitly stated at 10 Gigabit. If you assume a larger box is pumping out more data, it will need a network connection to match. But you also assume that an instance only generating 1/4 as much data will be equally well served by a ‘Moderate’ connection.

A real world use case

In my day job I set up a variety of analyses on genetic data that is supplemented by EC2 computation clusters (The recent AWS Whitepaper on Genomic Architecting in the Cloud is a really useful resource I can throw at scientists when they have questions). I investigated the viability of G2 instances and for a specific analysis that was GPU-capable, it did indeed run roughly 3-4 times faster than the same job running on a single CPU core. The problem was memory – each job used roughly 3-5GiB of memory meaning I couldn’t run more than 3 or 4 jobs on a single g2.2x GPU at once.

However on a r3.8xlarge – a CPU instance with 32 cores and 244GiB memory, I could run 32 concurrent jobs with memory to spare. Sure, the jobs took 30 minutes each instead of 10, but I could run 32 of them.

Then I drilled down on cost/benefit. The G2.2x was $0.08 on spot, and the r3.8x was $0.32. Four times as much per hour to run, but with 10 times as many jobs running. It ended up being a no-brainer that a CPU instance was the way to go.

Perhaps this is a poor example, because the capabilities of genetic analysis is badly limited by the tools available for the specific job, and it’s reasonably rare to find anything that is built for multi-threading, let alone something designed specifically to run on GPUs. The implementation of these analysis tools are black box and we’re not software developers. Our tests were probably very bad exemplars for the power of a GPU instance, but it did show that a mere 15GiB RAM on the 2X just wasn’t anywhere near enough. 60GB on the 8x is a little better but in my use case it still wouldn’t offer me any additional benefit because I wouldn’t be able to leverage all of the GPUs I’m paying for (our software just isn’t good enough). FastROCS, the example cited in Jeff Barr’s AWS Blog annoucement about the g2.8x also mentions the 15GiB of the 2x being the limiting factor, so presumably they’re running jobs that can leverage more GPU in a single job without a proportional increase in memory usage.

The main benefit of one vertically-scaled box four times the size is speed. If your application can utilise four GPUs simultaneously within the memory limits then you could, for example, transcode a single video extremely quickly. If speed is the main factor in your service delivery then this is the instance for you. If however you’re running smaller jobs that are less time-critical that the 2x can handle just as well, there’s little benefit here, unless you consider the logical management of four GPU instances to be more hasslesome than running one four times the size. But then all of your GPU eggs are in one basket, and if the single instance goes down so does all of your capacity.

As with all AWS usage your chosen method of implementation will depend on what’s right for you. This is going to be great news for a lot of people – but unfortunately not me!

Amazon Web Services, cPanel WHM

CPanel Webserver on EC2  (or How I learned to stop worrying and love the cloud )

It’s been a long time coming but my company is now fully based on Amazon Web Services EC2 for our web hosting. It’s been a long journey to get here.

For more than 15 years we’ve cycled between a variety of providers who offered different things in different ways. First our sites were hosted by Gradwell, on a single shared server where we paid a per-domain cost for additional hosting. We left them in 2004 after 6 years and following a very brief flirtation with Heart Internet we moved to a now-defunct IT company called Amard. This gave us our first taste of CPanel as a way of centralising and easily managing our hosted sites. It was great – so great that it made moving to ServerShed (also defunct) in 2005 very easy.

This was our first dedicated server with CPanel and we enjoyed that greater level of control while still having the convenience of a largely self-managing service. In 2007 we hopped ship to PoundHost where we remained until last week. Three or four times in the last 8 years we’ve re-negotiated for a new dedicated server and battled with the various frailties of ageing hardware and changing software. Originally we had no RAID and downloaded backups manually, then software RAID0, then hardware RAID0, then most recently with S3 backups on top.

Historically ‘the cloud’ was a bit of an impenetrable conundrum. You knew it existed in some form, but didn’t really understand what it meant or how its infrastructure was set up. Access to the AWS Cloud in the early days, as far as I understand, was largely command-line (CLI) based and required a lot of knowledge to get going on the platform. It required a lot of mental visualisation and I’m sure the complexity would have been beyond me back then. Some services didn’t exist, others were in their infancy. Everything was harder.

In that respect I almost don’t mind being a bit late to this party. It’s only in the last two or three years that it seems the platform has been opened up to the less-specialised user. Most CLI functions have been abstracted into a pretty, functional web console. Services interact with each other more fluidly. Access controls that govern permissions to every resource in a really granular way have been introduced. Monitoring, billing, automated resource scaling, load balancing and a whole host of other features are now a reality and can boast really solid stability.

Even then migration has not been a one-click process. CPanel do offer a VPS-optimized version of the Web Host Manager (WHM) and that’s the one we’re running. Its main boast is that it uses less memory than the regular version, apparently as a concession to the fact that a Virtual Machine is more likely to exist as a small slice of an existing server and won’t have as many resources allocated to it. It looked to be the best fit.

Then we needed to find a compatible Amazon Machine Image (AMI) to install it. It seemed to be largely a toss-up between CentOS and RedHat Enterprise Linux. We’d used CentOS 6 in the past with good results (and unlike RHEL it requires no paid-up subscription), so we fired it up and slowly started banging the tin into shape. Compared to other set-ups there is very little AWS-specific documentation on setting up a CPanel environment, so we were mostly sustained by a couple of good articles by Rob Scott and Antonie Potgieter. CPanel themselves wrote an article on how to set up the environment but this didn’t quite cover enough and already the screenshots and references there are out of date. I will write a comprehensive overview of ‘How to Set up CPanel on EC2’ in another article, but to conclude here I will talk about what made this transition so tentative for us that we waited almost a year after setting up AWS before we went live with our main server on the platform:

Fear. Non-specific, but mostly of the unknown. It’s irrational, because when you’re buying managed hardware from resellers you never actually get to see the physical box you’re renting. It’s no more tangible to you than an ephemeral VM is, and yet there’s something oddly reassuring in knowing that your server is a physical brick of a thing loaded into a rack somewhere. If something goes wrong, an engineer that you can speak to on the phone can go up to it and plug in a monitor to see what’s happening. It’s not suddenly going to disappear without a trace.

Not so with the cloud. It’s all a logical abstraction. You don’t know, you’re not allowed to know, precisely where the data centres are. You don’t know how the internal configuration works. Infrastructure is suddenly just software configuration and we all know how easy it is to make a mistake. Click the wrong box during the set-up and you might have your root device deleted accidentally, or be able to terminate the server with all of your precious data because you didn’t enable Termination Protection. If you’re a little careless with your access keys and they become public, you’ll find your account has been compromised to run expensive clusters farming bitcoins at your expense.

Terrifying if you don’t know what you’re doing. So you investigate, learn slowly, play on the Free Tier. Make things, break things. Migrate one small site to make sure it doesn’t explode. Back it up, kill it, restore it, and be absolutely confident that this is really going to work.

And it does. For the first time I feel like we’re our own hosting provider. I spec the server, I buy it by the hour, I configure and deploy it, and Amazon is merely the ‘manufacturer’. Except that everything happens in minutes and seconds instead of weeks and days. Hardware failure is handled as simply as just stopping and restarting the instance where it will redeploy on different hardware. If you’ve architected for failure as you should, backups can be spun up in less than ten minutes. if I get nervous about spiralling running costs I can just flip the off switch and the charges stop, or the Trusted Advisor can offer suggestions on how to run my configuration more efficiently.

It’s telling that every company that I’ve ever bought a physical server from are now pushing cloud-based offerings to their customers too. The time of procuring your own hardware is passing and being replaced by more dynamic, more durable, more robust solutions.

You’d be forgiven for thinking this whole post was merely a sly advert for AWS, but I really am just a humble end user. Admittedly one that has been completely converted, and I haven’t even enumerated the full list services that I now use, to say nothing of the others that I haven’t even had time to investigate. This is a great time to make the transition, because while adoption has been growing at a huge rate, I think it’s going to skyrocket to even greater heights in the next few years.

If you don’t have a pilot for the journey, it’s time to train one.

Amazon Web Services

Amazon Web Services (AWS) now provides billing in local currency

Developers and businesses around the world will be breathing a huge sigh of relief today as Amazon Web Services finally announced the ability to have invoices generated and charged in one of 11 new local currencies.

Since its launch in 2006, AWS has exclusively billed in US dollars worldwide, causing a currency-conversion and bank fee headache for all but U.S. customers. As a multi-billion dollar, multi-national operation, the introduction of this feature is long overdue and brings AWS into line with other providers such as Microsoft Azure in offering bills in the currency you’re most familiar with.

The new list of supported currencies, when using an elligible Visa or MasterCard to pay your bill are:

  • Australian Dollars (AUD)
  • Swiss Francs (CHF)
  • Danish Kroner (DKK)
  • Euros (EUR)
  • British Pounds (GBP)
  • Hong Kong Dollars (HKD)
  • Japanese Yen (JPY)
  • Norwegian Kroner (NOK)
  • New Zealand Dollars (NZD)
  • Swedish Kronor (SEK)
  • South African Rand (ZAR)

This new preference can be set in your AWS ‘Billing and Cost Management’ account under ‘Account settings’. The changes take effect immediately and your estimated bill for the month is then presented to you in your local currency.

aws-billing

The one important factor to note is that AWS pricing remains firmly in dollars, and all prices presented to you on purchasing remain that way. Amazon appear to be offering an exchange rate that changes daily so all estimates are based on that, and will naturally be susceptible to currency fluctuations – so CFO headaches aren’t gone entirely! Presumably the final bill for the month will be based on the exchange rate for the given currency on that day, but when compared to the existing method – getting hammered by your bank for currency conversion fees at terrible transfer rates – this is substantially preferential and will result in lower overheads for all non-US AWS customers.

Amazon Web Services

How to modify ‘Deletion on Termination’ flag for EBS volume on running EC2 instance.

When launching an EC2 instance on Amazon Web Services, the EBS volume is set to ‘Delete on Termination’ by default. Most of the time this is fine as you’d often rather make a snapshot of your drive to enable you to boot up multiple copies of the same instance, or if you’re a developer that’s creating and terminating instances regularly it would be a nightmare to have all of these orphaned EBS volumes cluttering up your account.

But what do you do when you have a running instance where the EBS was set to Delete and you’ve changed your mind and want to keep it? The AWS documentation is surprisingly vague about this, mostly talking about setting the flag on launch. So how do you do it for a running instance? And for that matter, how can you tell if you set an existing instance to Delete on Termination? It might have been a while and it’s hard to remember if you ticked that checkbox.

How to check the EBS ‘Delete on Termination’ flag

It’s a little buried. Go to your EC2 management console and click on ‘Instances’. Click on the instance you’re curious about, and then under the ‘Description’ tab, scroll down to ‘Block devices’, and click on the appropriate EBS volume. This will pop up an attribute box which will state the Delete on Termination flag. This seems to be the only place in the whole AWS console to check this information!

ebs-volume-status

 

How to modify the EBS ‘Delete on Termination’ flag

The only way to do this is by using the AWS CLI, at the current time there’s no way to do this using the web console.

You can do this in one of two ways, either by specifying a JSON attribute file with the modifications you want to apply to the instance, or by doing it in-line with escape characters.

aws ec2 modify-instance-attribute --instance-id i-a3ef245 --block-device-mappings /path/to/file.json

with a .json file in the a format such as:

[
    {
    "DeviceName": "/dev/sda1",
    "Ebs": {
      "DeleteOnTermination": false
      }
    }
]

Or without using a .json file inline like this:

aws ec2 modify-instance-attribute --instance-id i-a3ef245 --block-device-mappings "[{\"DeviceName\": \"/dev/sda\",\"Ebs\":{\"DeleteOnTermination\":false}}]"

Note: You will need to configure the AWS cli (command from terminal: ‘aws configure’) either with user credentials with a policy that grants permissions to the appropriate EC2 instance, or run the CLI on an instance which has EC2 Role permissions.

Now hopefully you’ll be able to save that EBS volume after changing your mind about the termination flag!

Amazon Web Services, distributed.net

Distributed.net RC5-72 on Amazon Web Services (AWS) EC2 Cluster – A modern solution to an old problem

History

Anyone kicking around the internet since the early days will have heard of distributed.net‘s RC5-72 distributed computing project. Arising from RSA labs Secret-Key Challenge, the project sought to utilise distributed computing power to perform a brute force attack in an attempt to decrypt a series of secret messages encrypted using the RC5 block cipher. Sponsored by RSA Labs, a $10,000 reward was offered to the participant whose machine was responsible for finding the correct key for each of the challenges, starting at 56-bit encryption, and scaling up to 64, 72, 80, 88, 96, 104, 112, 120, and 128.

I started contributing to the 64-bit instance of the project (termed RC5-64) back in 2001, and with the combined computing power of around 300,000 participants, the key was cracked after almost 5 years of work in July 2002. This project required the testing of 68 billion key blocks (2^64 keys) and found the correct key after searching 82.7% of the total keyspace.

A new project to tackle the next message, encrypted with a 72-bit key, was started on December 2nd 2002. This project required the testing of 1.1 trillion key blocks (2 ^ 72 keys) – 256 times larger than the original project that had taken 5 years to complete. After a few years it became apparent that this was going to take ‘a very long time’, and RSA Labs withdrew the challenges in May 2007, along with the $10,000 prizes. Shortly after this news, distributed.net announced that they would continue to run the project and would fund a $4000 prize alternative.

Today

As of today the project has made it through 3.378% of the keyspace after 12 years of work. At the current rate the project anticipates hitting 100% in a mere 219 years.

You can imagine that so many years of effort has dulled the enthusiasm of its participants. The distributed.net statistics report that there have been some 94,000 unique participants to RC5-72 (significantly down on the 300,000 of the previous project), but that only 1200 of them remain active. With the rise of other distributed computing projects in the early 2000s such as SETI@home, Folding@home and many others, this humble project has been rather forgotten by the internet at large. The advent of Bitcoin and other e-currencies have led to people turning their spare processing power to more profitable ends.

And yet, the RC5-72 project’s overall keyrate remains higher today than it ever has been in the past. The reason for this was the development and widespread use of powerful GPUs in home computers. I’m the first to admit I don’t really understand the finer points of how computer hardware works, but GPUs turned out to be roughly 1000 times faster than even the fastest commercial CPU at crunching through keys. Traditional CPU crunching makes up less than 10% of the total daily production, with the vast majority coming from the GPU ATI Stream technology (70%), with the rest coming from NVIDIA CUDA (5%) and the more recent OpenCL – supported on both ATI and NVIDIA hardware (14%).

As much as I salute distributed.net for continuing to maintain the project and run the the supporting key servers to distribute the work and run the stats, I have to say I’ve never seen any active effort to promote it. The website is rather dated and to my memory has the exact same design that it had in 2001 when I first started. There are no social media buttons, no methods of incentivisation, and even some of the more basic things like keyrate graphs have been broken for months (or years) and nobody has worried about fixing them.

A possible solution

I know that it must be difficult to commit any real effort to something that has been a back-burner operation for almost 10 years, but the completionist in me is desperate to somehow mobilise the modern massive internet to attack the project with gusto and get it to 100% in a month. I know that’s a laughably ridiculous suggestion, because the scale of the work required is massive. If you had a reasonably powerful ATI graphics card that could churn through one billion keys per second, you might be able to crunch around 19,000 work units per day if the GPU was dedicated to the task 24/7. At that rate you could expect to complete the project in around 1.6 million years.

So 1200 people aren’t going to get this done, even though the top 100 of those can pump out almost 10 million work units a day.

The advent of ‘the cloud’ has made potentially unlimited computing power available to anyone in the world – at a cost. Amazon Web Services (AWS) have the largest compute infrastructure of any provider and I’ve spent quite a while familiarising myself with the platform. It occurred to me that a key-crunching test of RC5-72 was in order.

The Test

For the basic test, I provisioned a compute-heavy c3.8xlarge instance running a traditional Linux x86 CPU client, and a GPU g2.2xlarge instance running the CUDA 3.1 distributed.net client, and I latterly also tested the OpenCL client.

The keyrate and cost results were as follows:

Instance Type dnetc Client Keyrate (Mkeys/Sec) EC2 Spot Price ($/hr)
c3.8xlarge v2.9109.518 180 $0.32
g2.2xlarge v2.9109.518 (CUDA 3.1) 423 $0.08
g2.2xlarge v2.9109.520 (OpenCL) 432 $0.08

 

The OpenCL client was the winner, offering 432 million keys/sec for 8 cents an hour. It should be noted that this falls far short of the best possible recorded speed from a GPU, where an ATI Radeon HD 7970 can do a stunning 3.6 billion keys/sec – although this benchmark list is at least a year old so it’s probable there exists cards out there that are even more powerful. Compared to that a mere 0.43 billion/sec is only 11% as powerful.

The potential advantage of the AWS cloud is seemingly not in its raw speed, but its scale. I can’t run 10 graphics cards at home but I can run 10 instances of the dnetc client. So that’s what I did for around 36 hours. The cost of 10 instances for 1 day equated to $0.80 x 24 = $19.20. Not bank-breaking but quite a lot to achieve a total speed of 4.32 billion keys/sec. That singular 24 hour effort put me at #26 in the top 100 rankings for the day, and the 87,000 work units completed bolstered my grand total to 1.4 million units over the 12 years I’ve been working on the project. A fairly hefty chunk relative to my total effort, but still a tiny drop in an enormous ocean.

Getting to 100%?

After a few calculations, I estimated that I would require 346,000 nodes like this, running 24/7 for one year, to complete the project. Assuming I could maintain a spot price of 8 cents an hour, it would cost ‘only’ $232 million to provision the cloud to complete this project. I think it’s pretty unlikely I could crowd source this from the internet in order to complete an old cryptography project, but there are other, more simple solutions.

1) Better GPUs. AWS aren’t the only cloud provider and one of the many others may provide GPU instances with better computational power – but none of them offer AWS’ novel ‘spot pricing’ model to get instances at rock-bottom prices, so this is unlikely to be cheaper.

2) Don’t use the cloud at all. Somehow compel millions of people to start running the client on their high-end graphics cards at home and work too. Might happen… but not without a concerted social media effort, refresh of the website, and some kind of modern fun ‘game’ incentivisation thing. Doing it for the fun of cryptographic curiosity is unlikely to motivate too many casual users.

3) More bespoke supercomputers like this one. The Center for Scientific Computing and Visualization Research at the University of Massachusetts have built a bit of a Frankenstein super-computer from old Playstation and AMD Radeon graphics cards. It’s seemingly used for other computational purposes, but the excess capacity is put into the RC5-72 project and recently has been churning out an eye-popping 1.2 million work units a day (more than 10% of the total work of the top 100) – cool huh?

4) Maybe Amazon would like to take on the challenge directly to demonstrate the power of their cloud. They’ve done things like this in the past (like creating the #72 supercomputer in the world with 26,496 cores rated at 593 Tflops/sec) and would presumably do it for free as a bit of a boast, but then there would also be the fear that the other tenacious users on the project would feel a bit cheated by having a giant multinational come along and solve the problem without them. But it’s also possible that the joy of having it complete would be worth the disappointment of having not achieved it personally.

This entire experiment and post was a bit of nostalgic indulgence for me. Rc5-72 has been a tempting Everest of distributed computing and back in the day I wanted to be part of the pioneering team that conquered it. Now there are only a few of us left, and we haven’t even reached basecamp yet. At this point I’d gladly hop on a helicopter to the summit except I don’t know where to get one.