Saving Money on Batch Workloads in Public Cloud
Large companies have traditionally had an impressive list of batch workloads, which run at night, when people have gone home for the day. These include such things as application and database backup jobs; extraction, transform, and load (ETL) jobs; disaster recovery (DR) environment checks and updates; online analytical processing (OLAP) jobs; and monthly/ quarterly billing updates or financial “close”, to name a few.
Traditionally, with on-premise data centers, these workloads have run at night to allow the same hardware infrastructure that supports daytime interactive workloads to be repurposed, if you will, to run these batch workloads at night. This served a couple of purposes:
- It avoided network contention between the two workloads (as both are important), allowing the interactive workloads to remain responsive.
- It avoided data center sprawl by using the same infrastructure to run both, rather than having dedicated infrastructure for interactive and batch.
Things Are Different with Public Cloud
As companies move to the public cloud, they are no longer constrained by having to repurpose the same infrastructure. In fact, they can spin up and spin down new resources on demand in AWS, Azure or Google Cloud Platform (GCP), running both interactive and batch workloads whenever they want.
Network contention is also less of concern, since the public cloud providers typically have plenty of bandwidth. The exception of course is where batch workloads use the same application interfaces or APIs to read/write data.
So, moving to public cloud offers a spectrum of possibilities, and you can use one or any combination of them:
- You can run batch nightly using similar processes as you do in your online data centers, but on separate provisioned instances/virtual machines. This probably results in the least effort to moving batch to the public cloud, the least change to your DevOps processes, and perhaps saves you some money by having instances sized specifically for the workloads and being able to leverage cloud cost savings options (e.g., reserved instances);
- You can run batch on separately provisioned instances/virtual machines, but concurrently with existing interactive workloads. This will likely result in some additional work to change your DevOps processes, but offers more freedom and similar benefits mentioned above. You will still need to pay attention to application interfaces/APIs the workloads may have in common; or
- At the extreme end of the cloud adoptions spectrum, you could use cloud provider platform as a service (PaaS) offerings, such as AWS Batch, Microsoft Azure Batch or GCP Cloud Dataflow, where batch is essentially treated as a “black box”. A detailed comparison of these services is beyond the scope of this blog. However, in summary, these are fully managed services, where you queue up input data in an S3 bucket, object blob or volume along with a job definition, appropriate environment variables and a schedule and you’re off to races. These services employ containers and autoscaling/resource groups/instance groups where appropriate, with options to use less expensive compute in some cases. (For example, with AWS Batch, you have the option of using spot instances.)
The advantage of this approach is potentially faster time to implement and (maybe) less expensive monthly cloud costs, because the compute services run only at the times you specify. The disadvantages of this approach may be the degree of operational/configuration control you have; the fact, that these services may be totally foreign to your existing DevOps folks/processes (i.e., there is a steep learning curve); and it may tie you to that specific cloud provider.
A Simple Alternative
If you are looking to minimize impact to your DevOps processes (that is, the first two approaches mentioned above), but still save money, then ParkMyCloud can help.
Normally, with the first two options, there are cron jobs scheduled to kick-off batch jobs at the appropriate times throughout the day, but the underlying instances must be running for cron to do its thing. You could use ParkMyCloud to put parking schedules on these resources, such they are turned OFF for most of the day, but are turned ON just-in-time to still allow the cron jobs to execute.
We have been successfully using this approach in our own infrastructure for some time now, to control a batch server used to do database backups. This would, in fact, provide more savings than AWS reserved instances.
Let’s look at specific example in AWS. Suppose you have an m4.large server you use run batch jobs. Assuming Linux pricing in us-east-1, this server costs $0.10 per hour, or about $73 per month. Suppose you have configured cron to start batch jobs at midnight UTC and that they normally complete 1 to 1–½ hours later.
You could purchase a Reserved Instance for that server, where you either pay nothing upfront or all upfront and your savings would be 38%-42%.
Or, you could put a ParkMyCloud schedule where the instance is only ON from 11 pm-1 am UTC, allowing enough time for the cron jobs to start and run. The savings in that case would be 87.6% (including the cost of ParkMyCloud) without the need for a one year commitment. Depending on how many batch servers you run in your environment and their sizes, that could be some hefty savings.
Public cloud will offer you a lot of freedom and some potentially attractive cost savings as you move batch workloads from on premise. You are no longer constrained by having the same infrastructure serve two vastly different types of workloads — interactive and batch. The savings you can achieve by moving to public cloud can vary, depending on the approach you take and the provider/service you use.
The approach you take, depends on the amount of process change you’re willing to absorb in your DevOps processes. If you are willing to throw caution to the wind, the cloud provider PaaS offerings for batch can be quite compelling.
If you wish to take a more cautious approach, then we engineered ParkMyCloud to park servers without the need for scripting, or the need for you to be a DevOps expert. This approach allows you to achieve decent savings, with minimal change to your DevOps batch processes and without the need for Reserved Instances.
Originally published at www.parkmycloud.com on August 23, 2017.