You will hopefully notice a Force.com platform trend in the coming releases, a trend towards fewer limits and more room in the limits you have. Limits that are less limiting. We will always have to have SOME limitations in place to protect the multi-tenant environment, but the general effort is towards making those as invisible as possible and less inconvenient than they have been in the past.

When it comes to inconvenient, many developers point at the Apex script statement limit as the king inconveniencer. With that in mind, we have eliminated it.

Starting with the Winter ’14 release, you will no longer have to think about the script statement counter.  You can write as many wrapper classes as you want, and you can write accessor methods for your accessor methods. The script statement count will no longer hinder you from following whatever programming practices you prefer, as it has in the past.

What took you so long?

The script statement limit had a very specific purpose. Contrary to snarky belief, it was not to make life difficult for developers. Its purpose was to keep a lid on runaway processes. You – yes, you reading this – you would never write code that could somehow find itself in an infinite loop. Of course you wouldn’t! But there are others less talented than you who would accidentally do such a thing. If that happened to be in a frequently encountered trigger, and we didn’t have something like the script statement limit in place, our app servers would get busy doing nothing and everybody else would be locked out.

There are two major drawbacks with using script statement counting to catch runaway processes. The first is that not all statements are created equal. A variable declaration counts as a single script execution; a call to String.getLevenshteinDistance also counts as a single script statement. You might guess that those don’t consume the same amount of resources, and they don’t, yet we count them the same. The second drawback is the overhead of incrementing a counter every time you do something. Incrementing is not the most expensive operation, but, when you do it a few trillion times, it adds up to a measurable amount of processing.

The script statement counts were scientifically chosen by people who had once taken science classes and chose numbers that sounded right. We had a general idea of the average amount of time a process would run with the statement limits, and an idea of what could be accomplished. This was sufficient to accomplish a very large percentage of use cases, and it kept our app servers free of interminable processes. This proxy did its job, but was not perfect.

Still Keeping You Safe

Sometimes dolphins get caught in tuna nets, and that’s bad. Sometimes, legitimate processes get stopped by the statement limit, and that is also bad. The tuna fishing guys had to redesign their nets to keep PETA off their case. To keep PETA off our case, we got rid of the statement limit. But we still need to keep you safe from The Other Guy and keep our servers happy.

We already have a long-running transaction timeout of 10 minutes. That alone would kill off an infinite loop. Eventually – ten minutes is a long time. If there were several of these running at once, though, that would be a slow 10 minutes for everyone. This would be enough to keep you safe, but not enough to keep you happy.

We have implemented an additional timeout for transactions based on CPU usage. If transactions consume too much CPU time, we will shut them down as a long-running transaction. Just as you have probably never hit the 10-minute transaction timeout, you should probably never hit the CPU timeout.  This timeout will cut off runaway processes faster than the 10-minute timeout.

What does this CPU timeout include? We are only counting things that require application server CPU use. For example, the time spent in the database retrieving records will not count, nor will time spent waiting for a callout to return. There are some things that use the app server CPU that we do not count, which are things beyond your control as a programmer. For example, you don’t control when your code needs compilation, so we don’t count that. We will be counting almost everything else that happens on the app server, including declarative actions. If DML in your code encounters a validation rule with a formula, we will count the time spent evaluating that formula.

Eliminating the statement limit gets you out of creative maneuvers to optimize for statement counts. It does not get you out of optimizing your logic! The thing we are measuring now, though, should be a lot more logical. If your code is crazy, you may just run into the timeout, so keep it neat and tidy. You will be happy when we cut off crazy code that The Other Guy wrote, so it’s a worthwhile tradeoff.

But I Have Fear And Doubt!

But wait, you say. How can I be sure that this won’t cause any problems in my org?

For starters, you can do more statements before the CPU timeout than you could with the rigid statement counter. That might not be true if you have a process that calls JSON.serialize() in a loop – why are you doing that? – but it generally will be quite true.

In choosing the value for CPU timeout, we have attempted to be more scientific. We still chose numbers that we thought made sense, because round numbers are better than giving you the exact number of milliseconds that gives us the 99.99% confidence interval. We looked for that confidence interval, though, because we thought it would be polite to not fail transactions that work today.

There will be some existing transactions that exceed the new timeout, but there will not be many. We have a process in place to handle these outliers, so you need not worry – even in the unlikely event that you are one of these few. If you have read my earlier post on the Hammer process, you know we go to great lengths not to regress existing code, so that we can continue to safely release new versions of our software every four months.

We Are All In This Together

If you are an ISV developer, and you have a certified managed package, you used to get your own bucket of script statements when that limit existed. You don’t have the script statement count any longer, so you no longer have a separate bucket. This means that the CPU timeout is transaction-wide, regardless of how many namespaces are involved.

This is the same pattern as exists today for both the long-running transaction timeout and for the transaction heap limit. It follow that pattern for the reason those do – it’s expensive / impossible to differentiate.

Script statements, like SOQL calls and rows returned, are discrete. They’re easy to count, and easy to count in different buckets. CPU usage is not discrete. Where would you stop/start the timer as you switched? What if you are using a global class from another package – who should the variable declaration count against? If a workflow rule calls triggers from different packages, which one accepts the costs of passing data back and forth? Or for setting static variables? There are lots of blurry lines that would need unblurring.  Identifying all and defining rules would be time-intensive, and the handoff mechanism would be expensive at processing time. In addition, it would be accident-prone, as handoffs between different kinds of logic might not correctly flip the bit that tells what bucket to log.

In the end, our ISV partners would hate the new separate-bucket rules more than they hated the script statement limit. That didn’t sound fun for anyone. Thus, we are all in this together, like we have always been for heap counting and the long-running transaction timer.

Asynchronous, For The Win!

The timeout for asynchronous methods will be larger than it will be for synchronous transactions. This is an incentive to do your processor-intensive work asynchronously. Doing so will benefit everyone.

Your users will benefit. If a user clicks “Save” and has to wait for a minute before the screen returns to them, they will likely refresh the page. Synchronous transactions assume someone is eagerly awaiting the completion of the process, so these should be as nimble as possible. Do what needs to be done synchronously, and fork off the slow stuff (e.g., statistical analysis) or the non-urgent stuff (e.g., sending an email) into an asynchronous process.

Your processor-intensive processes will benefit. In the asynchronous work queue, we can be more effective in our load balancing. Synchronous work always has priority, regardless of what the work is. We don’t take time to introspect; we just fire them off to get responses back quickly. Most of them are quick, so that’s all good. If you’re sending a slow process to us synchronously, we’re going to send it along without thinking about the best place to send it. Asynchronously, we can take the time to do a more intelligent balancing.

Your org will benefit. Asynchronous processes can be throttled when system resources are under peak load. Synchronous users should get priority at these times, and postponing more resource-intensive work until system resources are less strained helps your org do the work it needs to do when it needs to do it.

It’s Getting Better All The Time

The new pattern, like the old, will not be perfect. There will be nuances to it. I hope you’ll get used to them as you get used to the additional room we have created between you and the boundary. There will still be a boundary, as there will always be in some form, but it shouldn’t be as prominent in your field of view. We are going to keep expanding the boundary wherever we can, so long as you and your fellow multi-tenants are safe.

tagged , Bookmark the permalink. Trackbacks are closed, but you can post a comment.
  • Michael Stewart

    Is it by design, or a bug that Test.startTest() doesn’t reset the governor limits for CPU Time? We have a couple tests sporadically failing in our CI Sandbox that are attempting to set up a large amount of sample data.

    Reproducible failing test: https://gist.github.com/mickle00/6500029

  • sherod

    This is a significant leap forward in the ability to use force.com to handle any programming workload and I appreciate the work being done in this area.

  • ERP Stuff

    1. When a DML is executed on an object that has a trigger, is the time consumed by the platform in building the APEX context for the trigger’s execution included in the CPU time? Assuming there were no validation rules, assignment rules etc on the object, This would probably be the time between when the DML is fired (from APEX) TO when the first line of APEX code in the trigger is executed.
    2. If I run an apex code snippet (which didn’t contain any SOQL, DML or callouts etc but just plain apex code), can we expect the CPU time to roughly match the actual transaction time as shown in the debug logs?