Flex Your Batch Apex Muscles with FlexQueue

Hopefully you have noticed that we have been eliminating and relaxing limits lately. We have done something nice for you around limits every release for the past few releases. (…so would it kill you to write every once in a while?) Still, there are limits that get in your way. One limit that seems to be particularly obstructive is the five concurrent Batch Apex job limit.  This limit brings the unhappiness of not even allowing you to create a batch job if you are already running the limit.  It’s annoying enough that some of you have built your own mechanism to enqueue more batch jobs, based on only the tools available to you, in ways that would make MacGuyver proud.

We wanted to relax the heck out of the concurrent Batch Apex limit. At the same time, we wanted to do a lot more for you than that. We wanted to give you a much better experience with Batch Apex, and asynchronous processing in general.

Quit Holding Me Back

First, how did we get here? Why is this limit like this in the first place? Your asynchronous jobs all end up in a message queue, along with those of all of your multi-tenant neighbors. Our message queues have all sorts of concurrency controllers on them to make sure that one org does not monopolize. However, the original approach when Batch Apex was built was to make sure that no org had non-operational jobs in the message queue.

The rationale behind this decision had to do with the longevity of a batch process.  A batch process can iterate over millions of records, and can take hours to run.  If your org had five of these long-running batch jobs in flight, anything you enqueued after that would be busy-waiting, due to the concurrency controller. As such, the limit was created to prevent you from adding a sixth job while the five long-running jobs were consuming resources, so that that job wouldn’t needlessly consume resources.

In the world of queueing, you expect to enqueue lots of things and have them processed when the system is ready. @future behaves this way already. Shouldn’t Batch Apex?

We Will Fire, But We Will Never Forget!

Let’s say that you could enqueue a plethora of batch jobs at once. As mentioned, some of these can take a long time to run, so jobs might be stuck in the holding pen for quite some time. You don’t just want to know that you are stuck – who here likes the repeated “all operators are still busy; your call is still important to us” recording? You want to be able to see that holding pen, what’s in it, and how long you might need to be patient.

Knowledge isn’t everything. If we always ran first-in-first-out, you might create a dilemma. You could enqueue a very important job that needs to run now now NOW, but it would languish behind not only the running jobs, but behind all of the jobs that are waiting to run. Like when your pilot tells you that your flight, already an hour delayed, is 18th in line for takeoff.

Enter FlexQueue

You want more jobs enqueued, which means you need more visibility into the jobs in the queue, and more control over that queue. We have created FlexQueue to help address these requirements.

FlexQueue, which I am told is short for Flexible Queue, allows you to enqueue jobs beyond those that are running, and gives you access to the jobs which are waiting to run.  You can look at the current queue order, so you’ll know what is going to run next when system resources are available.  You can shuffle the queue, so that you could move that hyper-important job to the front such that it will be processed next. You could also shuffle jobs to the back if, say, they were enqueued by someone you don’t particularly like.  (I don’t recommend this, since they could do the same to you. I’m just saying you could. Don’t tell them I told you so.)

The added flexibility here is in shuffling your own org’s order-of-operations, rather than the order of the overall multi-tenant queue.  We are adding the notion of single-tenant to the multi-tenant queue. This feature wouldn’t have left the drawing board if you could shuffle jobs to the back from orgs you didn’t like!

The conceptual change is disconnecting items in our message queue from the actual job that needs to run. In the current architecture, your batch job is serialized and included directly in the MQ message. For FlexQueue, your job is serialized to a “holding pen”, and a “token” is enqueued in the message queue. While tokens are in the queue waiting for resources, the holding jobs can be reordered. When a token reaches the front for processing, it will take the first job in the holding area, whether or not that token was enqueued along with that job.  (The implementation isn’t exactly like this, but conceptually this is how it works.) This disconnect allows the shuffling, and allows us to enqueue more jobs than just the five that are currently being run.

I Would Like Some FlexQueue Please

Sounds good, right? I hope so! So when can you start using the FlexQueue?

For the Summer ’14 release, FlexQueue will be in pilot.  This is a fundamental change to how we process Batch Apex, so we need to make sure that (a) we can scale to the volume of asynchronous work that you will throw at us and (b) we actually process all of your Batch Apex.  We are pretty confident that we can and will, but we prefer to test such a mission-critical system with a pilot group and scale up from there.  (To that end, we have spent quite some time building in an “eject” button, which allows you to go back to the current Batch Apex way, just in case.) If you would like to be involved in the pilot, please contact the person you tend to contact about these sorts of things.

FlexQueue should be generally available in Winter ’15 release, assuming all goes well in the pilot.

I Would Like MORE FlexQueue Please

This is not the end of our plans. We are starting small, but we will be adding some theoretically awesome stuff to the FlexQueue, including priority levels and adding more than just Batch Apex to the FlexQueue. This is the part where the safe-harbor slide would appear in a pop-up window over your browser screen, but you have pop-up blocker on, since it’s no longer 1999. So I say SAFE HARBOR.

The ability to shuffle is very helpful; it will get you out of a jam. However, if you needed to manually shuffle every time you enqueued the ReportMyBossNeeds batch job, you’d go bananas. If a job is always critical, you just want to specify that it is a high-priority job.  This prioritization scheme adds more flexibility to the flexible queue.

In addition, we will allow you to have the same visibility, shufflability, and flexibility for @future jobs.  We are starting with Batch Apex because of the frustrating concurrency limit; @future does not have such a limit, so that pain is less acute. That said, the @future queue backs up with loads of jobs (which you can’t observe) and the queue can self-shuffle at hectic times, reordering your jobs (which you might not appreciate).  We intend to apply all of the benefits listed above to @future. In addition, we are creating a new pattern that will live somewhere between @future and batch; this, too, will be made flexible.

tagged , , Bookmark the permalink. Trackbacks are closed, but you can post a comment.
  • Derek Anderson

    Great post, Josh, as always. Looking forward to all the cool things.

  • Andrew Fawcett

    Awesome improvement, this really helps enterprise application deployments, thank you!

  • Very exciting stuff Josh, both the short-term and longer term plans look great! Especially intrigued by “a new pattern that will live somewhere between @future and batch”

  • Exciting post @Josh ! One doubt :- What is concurrent limit with Flex Queue, it stays 5 and other batch jobs above 5 are queued up ?

    • Josh Kaplan

      The number of jobs that will run concurrently will be determined by the resource availability and consumption by your org at that time. When we can run 10 concurrently, we will. When we can only run one-at-a-time, we will. Inter-org fairness is handled by the queue. This pilot allows you to control intra-org fairness.

      • Thanks @disqus_SR30GUFfxN:disqus, I understand that concurrency is based on resource consumption. I am little confused with this line :
        “When we can run 10 concurrently, we will.”

        I think you meant, within an org active concurrent jobs can scale up to 5, queued ones now can be above 5 and up to 10. Correct ?

  • ram_sj

    Much need feature, giving developers/admins more control on how we want to run batch jobs based on internal priorities.

  • Chirag Mehta

    Exciting post and great improvement. This will help for large scale MULTIPLE data processing with almost unlimited batch programs running. The best part is queue/system will automatically handle most of it and still allows you to reorder/prioritize your batch jobs. (On contrary, Inter-org prioritization, that would have made salesforce platform a bull fight platform ..ha ha … )

  • Bipin Nepani

    Great post Josh, I see that the key upcoming change is @future and ability to shuffle the order of your jobs, Let me ask you this if theres anything like a “Suspend” or “Hold” in the works, when we think of prioritizing a high-priority job, while I have 5 in progress. Nevertheless, great step in the right direction.

  • Awesome Stuff..
    hope as you mentioned at the end the post “this is just the beginning” and will see lot more benefits out of it..

  • Jeff Douglas

    Thanks for the info Josh!! This sounds like a great beginning for some future improvements. Was wondering what exactly “FlexQueue” stood for. Now I know that it’s short for “Flexible Queue”.

  • Samuel De Rycke

    Blogs like these make me enjoy being a salesforce developer even more. Can’t wait to play with this stuff!

  • Daniel Ballinger

    Damn, I just finished my MacGuyverFakeBatchQueueUsingSchedules implementation. I look forward to using this next time around.

  • Good info

  • fgwarb

    Hey! I’m curious about the status of this feature? My orgs are on Winter ’15 right now and I don’t see the feature which leads me to believe there was an issue during the pilot that blocked the progress. Is it still on the roadmap?

  • Hammad Abbas

    @disqus_SR30GUFfxN:disqus what could be the reason for a Batch Apex to be stuck at “Holding”
    status for 2 weeks while there were no other Jobs running or queued.