Cloud Server Outages Don’t Scare Me

Written by Greg Keller on June 3, 2014

Share This Article

The Sky Is Falling, the Sky Is Falling!

The sky isn’t really falling, but it can feel that way when your cloud provider fails. We’ve seen some spectacular failures by cloud providers in recent years, equivalent to taking down a very large traditional data center, and having all back ups (back up network, power, and cooling) fail at the same time.

These failures caused outages from some major network service providers, such as NetFlix. That’s a scary thought, if a provider like NetFlix can experience a serious service outage that can take many hours to rectify, what chance do you have of surviving such an outage?

Because a failure of this type is so expensive, it’s a foregone conclusion that the data center has been designed to be fault-tolerant, with one or more backups at each failure point. Unfortunately, the failures we’ve seen appear to be more frequently are a result of software failures rather than hardware. It’s more difficult to insulate against software failures, but not impossible.

Solution 1: Backup the site with AWS S3 buckets or others

The traditional solution to a data center failure, such as the destruction of a data center due to fire or weather, has been to keep a fully functional backup site. Data is synchronized between the production site and the backup site frequently, so that if the production site fails, the backup site can take over without impacting customers.

Having a live backup site used to be incredibly expensive when it meant duplicating your entire production infrastructure to a second site, where it would sit idle, waiting for a failure.

In the cloud, it’s less expensive, but not free, to keep a backup site. But, how do you do it?

The key is in creating a backup site in a geographically separate data center (or “region” in Amazon AWS parlance), and designing your network so that client traffic can be easily shunted to the backup site. As with traditional backup sites, you’ll need to ensure that your data is available at both sites and in sync between them.

Amazon has a great solution for this, all their S3 buckets are replicated across regions. So, if your database is hosted out of S3, you get this replication for free, and you can pay once for this ability, rather than twice. Of course, there’s a replication delay, so it can take some time for one region to see updates from work done in another. But still, it’s often better than managing this replication yourself.

Solution 2: Fast recovery and DevOps in the cloud

DevOps practices offer another way to solve the problem: quick, automated deployments. In this model, if you’ve got your data available, you can spin up an entirely new environment in a matter of minutes or hours and start serving requests from it.

This has the benefit that it doesn’t cost you extra money to pay for idle cloud instances: you can spin up the new environment only when it becomes necessary. If you take this even further, you can make your deployments cloud-agnostic, so that even if your whole provider fails, you can still rebuild in a new cloud in a matter of hours.

Of course, that hinges on your data being available, so you’ll need to plan ahead to ensure that you can access your data even if your provider completely fails. This is easier said than done, and can be expensive if you move a lot of data. A number of cloud backup services exist to help meet this need, but again, they’re expensive.

Solution 3: The Fully Distributed Service Model

If you can justify the cost, the most resilient solution is the fully distributed service model. In this solution, your services are distributed across multiple providers, and multiple geographic locations. This is expensive and somewhat complex to build, and can be somewhat costly to run, but it has the benefit of providing additional scalability, and much better uptime for your customers.

The Bottom line with cloud providers

Only you can decide which path fits your business best. No solution is a one-size-fits-all path. You have to balance cost, complexity, and your level of risk tolerance to pick the right solution for you. JumpCloud’s Directory-as-a-Service® platform can help you recover quickly from an outage. Our cloud based directory service exists outside of your infrastructure, so even if your systems are down, your users are still able to access the system, applications, and network infrastructure that is still up. And, when your equipment comes back online, you’ll be able to get everything back to order on the user management side immediately. The agents will sync back up with JumpCloud’s cloud infrastructure and pass along any user management updates.

If you would like to learn more about how JumpCloud’s Identity-as-a-Service platform can support your cloud infrastructure, drop us a note. Or feel free to give it a try. Your first 10 users are free forever.

Continue Learning with our Newsletter