Hopefully you have noticed that we have been eliminating and relaxing limits lately. We have done something nice for you around limits every release for the past few releases. (…so would it kill you to write every once in a while?) Still, there are limits that get in your way. One limit that seems to be particularly obstructive is the five concurrent Batch Apex job limit. This limit brings the unhappiness of not even allowing you to create a batch job if you are already running the limit. It’s annoying enough that some of you have built your own mechanism to enqueue more batch jobs, based on only the tools available to you, in ways that would make MacGuyver proud.
We wanted to relax the heck out of the concurrent Batch Apex limit. At the same time, we wanted to do a lot more for you than that. We wanted to give you a much better experience with Batch Apex, and asynchronous processing in general.
Quit Holding Me Back
First, how did we get here? Why is this limit like this in the first place? Your asynchronous jobs all end up in a message queue, along with those of all of your multi-tenant neighbors. Our message queues have all sorts of concurrency controllers on them to make sure that one org does not monopolize. However, the original approach when Batch Apex was built was to make sure that no org had non-operational jobs in the message queue.
The rationale behind this decision had to do with the longevity of a batch process. A batch process can iterate over millions of records, and can take hours to run. If your org had five of these long-running batch jobs in flight, anything you enqueued after that would be busy-waiting, due to the concurrency controller. As such, the limit was created to prevent you from adding a sixth job while the five long-running jobs were consuming resources, so that that job wouldn’t needlessly consume resources.
In the world of queueing, you expect to enqueue lots of things and have them processed when the system is ready. @future behaves this way already. Shouldn’t Batch Apex?
We Will Fire, But We Will Never Forget!
Let’s say that you could enqueue a plethora of batch jobs at once. As mentioned, some of these can take a long time to run, so jobs might be stuck in the holding pen for quite some time. You don’t just want to know that you are stuck – who here likes the repeated “all operators are still busy; your call is still important to us” recording? You want to be able to see that holding pen, what’s in it, and how long you might need to be patient.
Knowledge isn’t everything. If we always ran first-in-first-out, you might create a dilemma. You could enqueue a very important job that needs to run now now NOW, but it would languish behind not only the running jobs, but behind all of the jobs that are waiting to run. Like when your pilot tells you that your flight, already an hour delayed, is 18th in line for takeoff.
Enter FlexQueue
You want more jobs enqueued, which means you need more visibility into the jobs in the queue, and more control over that queue. We have created FlexQueue to help address these requirements.
FlexQueue, which I am told is short for Flexible Queue, allows you to enqueue jobs beyond those that are running, and gives you access to the jobs which are waiting to run. You can look at the current queue order, so you’ll know what is going to run next when system resources are available. You can shuffle the queue, so that you could move that hyper-important job to the front such that it will be processed next. You could also shuffle jobs to the back if, say, they were enqueued by someone you don’t particularly like. (I don’t recommend this, since they could do the same to you. I’m just saying you could. Don’t tell them I told you so.)
The added flexibility here is in shuffling your own org’s order-of-operations, rather than the order of the overall multi-tenant queue. We are adding the notion of single-tenant to the multi-tenant queue. This feature wouldn’t have left the drawing board if you could shuffle jobs to the back from orgs you didn’t like!
The conceptual change is disconnecting items in our message queue from the actual job that needs to run. In the current architecture, your batch job is serialized and included directly in the MQ message. For FlexQueue, your job is serialized to a “holding pen”, and a “token” is enqueued in the message queue. While tokens are in the queue waiting for resources, the holding jobs can be reordered. When a token reaches the front for processing, it will take the first job in the holding area, whether or not that token was enqueued along with that job. (The implementation isn’t exactly like this, but conceptually this is how it works.) This disconnect allows the shuffling, and allows us to enqueue more jobs than just the five that are currently being run.
I Would Like Some FlexQueue Please
Sounds good, right? I hope so! So when can you start using the FlexQueue?
For the Summer ’14 release, FlexQueue will be in pilot. This is a fundamental change to how we process Batch Apex, so we need to make sure that (a) we can scale to the volume of asynchronous work that you will throw at us and (b) we actually process all of your Batch Apex. We are pretty confident that we can and will, but we prefer to test such a mission-critical system with a pilot group and scale up from there. (To that end, we have spent quite some time building in an “eject” button, which allows you to go back to the current Batch Apex way, just in case.) If you would like to be involved in the pilot, please contact the person you tend to contact about these sorts of things.
FlexQueue should be generally available in Winter ’15 release, assuming all goes well in the pilot.
I Would Like MORE FlexQueue Please
This is not the end of our plans. We are starting small, but we will be adding some theoretically awesome stuff to the FlexQueue, including priority levels and adding more than just Batch Apex to the FlexQueue. This is the part where the safe-harbor slide would appear in a pop-up window over your browser screen, but you have pop-up blocker on, since it’s no longer 1999. So I say SAFE HARBOR.
The ability to shuffle is very helpful; it will get you out of a jam. However, if you needed to manually shuffle every time you enqueued the ReportMyBossNeeds batch job, you’d go bananas. If a job is always critical, you just want to specify that it is a high-priority job. This prioritization scheme adds more flexibility to the flexible queue.
In addition, we will allow you to have the same visibility, shufflability, and flexibility for @future jobs. We are starting with Batch Apex because of the frustrating concurrency limit; @future does not have such a limit, so that pain is less acute. That said, the @future queue backs up with loads of jobs (which you can’t observe) and the queue can self-shuffle at hectic times, reordering your jobs (which you might not appreciate). We intend to apply all of the benefits listed above to @future. In addition, we are creating a new pattern that will live somewhere between @future and batch; this, too, will be made flexible.