Among the many new goodies in Summer ’09 release is a powerful new feature to do batch processing on your database records.  Tasks that require processing of large data volumes without any active human intervention can take advantage of this feature.  As an example, consider the task of validating addresses in your contacts when you can potentially have millions of contact records.  A batch job would be ideal for this scenario since you can start the batch job, continue to work or even log off while the job continues to execute.

To use this functionality, you need to implement the Database.Batchable interface.  You can find an example of the usage in the Apex Code Developer’s Guide.  The Database.Batchable interface has three methods that you would need to implement as shown below

global class MyBatchTest implements Database.Batchable

global Database.QueryLocator start() { ... }
global void executeBatch( SObject[] batch) { ... }
global void finish() { ... }

The start() method determines the set of records that will be processed by the executeBatch method. You would need to construct a SOQL query and return a QueryLocator object. For example,

	return Database.getQueryLocator( 'SELECT Name, MailingAddress FROM Contact' )

would return all contact records for processing. You can ofcourse, make the query as selective as you wish with additional filter criteria. There is a limit of five fifty million records which can be returned by the QueryLocator object.
To start a batch job, you create and instance of this class and call the executeBatch method.

	MyBatchTest b = new MyBatchTest( ... ) ;
ID myBatchJobID = Database.executeBatch(b) ;

When you call executeBatch on your instance, the system enqueues the job for processing and returns an ID. When the system is ready to execute the job, it calls the start method and then calls the executeBatch method for chunks of 200 records. So if the QueryLocator returned back 1000 records, the executeBatch method will be called five times. The batch job is run using the permission of the user that enqueued the job. The finish method is called after all records have been processed and can be used to perform any post-processing tasks like sending out e-mails etc. The ID returned by the Database.executeBatch method can be used to monitor the status of the job programmatically by querying the AsynchApexJob queue. You can also monitor the job under Setup->Monitoring->Apex Jobs.
The documentation has additional details on usage, governor limits and a few best practices. A common question that comes up is the ability to schedule jobs at a certain time or with some periodicity (for example run a job every day at midnight). This feature is not (yet) available. Also, the batch Apex feature is still in preview mode and has to be explicitly provisioned for your org. If you need this feature, please contact support with a short description of your use case.
Finally, I would encourage you to sign up for the Summer ’09 preview, it has a lot of other cool new features!

tagged , Bookmark the permalink. Trackbacks are closed, but you can post a comment.
  • Jason

    The lack of scheduling was a bit of a let down for this feature as I would assume most use cases for this involve scrubbing/recalculating data where scheduling would be ideal. It is still cool feature and you may be able to get around this limitation with a trigger cron job.
    Any chance we could get the demo code that was shown in the Summer09 webinar that showed the counting of states and filling in the map. I’m curious how this count is maintained as according to the docs, “You cannot use it to pass information between instances of the class during
    execution of the batch job”.

  • Nick Simha

    Scheduled Apex is on the roadmap and the PMs are well aware of the importance. One workaround (hack?) is to call your batch job in an InboundEmailHandler that can be triggered by a timed workflow task.
    Each batch is executed independently so you can’t use instance variables to share state information – you can use a database record to keep track of it. The record can be keyed of some initial member variable value. I will find out when the demo code can be posted and update it here.

  • Jason

    Cool. I figured there was some crafty workaround to perform scheduling.
    Looking forward to the demo code.

  • Jason

    Me again, :-P. I see the entry has been tweaked to say that the query locator can return 50 million records instead of 5 million.
    The apex reference guide still says 5. Can you confirm this is 50 million records?
    5 million would have been plenty for us but, 50 million, that is pretty sweet.

  • Ken Koellner

    Two comments-
    It would be nice to determine the amount of work via data in a list. Would added a where clause to the query with an “in :myApexList” where myApexList is a List variable work?
    It is going to call your execute method with 200 rows at a time. Would that then not be under the standard Apex limits, such as 100 DML operations? If you are still under those limits you might not get the work you need done done.

  • Nick Simha

    Jason – yeah – 50M is sweet. I got this number from the PM and I have pinged the doc team for confirmation.

  • Jon

    Is it possible to do Apex Callouts with Batch Apex? If not, is it on the roadmap?

  • Nick Simha

    Callouts are not allowed at this time. I will check with the PM on the roadmap but I would encourage you to create an idea on the idea exchange

  • Girish

    Hello Nick,
    I have a question on the batch implementation. I understand that batch operations can be performed on a bunch of Salesforce records result from the start Method.
    However, can we do other batch operations like parse a CSV file and insert bunch of records into Salesforce. Is that possible at all with the new Batch Enhancements provided by Salesforce. It would be great if you can let me know.

  • Jesse

    I have the same concerns as Girish. I’m looking to the batch implemenation so that we can insert our daily leads which are emailed in a csv file.

  • Nick Simha

    Jesse & Giri,
    If I understand your question , you are trying to see if you can use these higher governor limits to do data import. Is that correct? When you insert data from an external source (CSV file in this case), the API comes into play and it has a different set of governor limits that are time based. The batch methods act on records already in the salesforce database.
    Where you may be able to take advantage of it is to do post processing of the records.

  • Nick Simha

    You may be already aware of this but there are a variety of tools to import data from CSVs
    1) Import Wizard
    2) Free Data Loader from Salesforce
    3) Commercial tools – CastIron, Informatica etc.

  • Andrew

    In the above example it says that executeBatch() will get called 5 times for 1000 records. Does the runtime guarantee that the first invocation will complete before the second one begins?

  • Nick Simha

    Good question – I checked with the PM and the short answer is no. The batches could (potentially) be processed in parallel. There is plan to add an additional interface to force serial processing.
    What is your use case? Can you perhaps use a counter field to check for ordering?

  • Etienne Coutant

    Will it be possible to start a Batch job from a trigger? If yes, is there any limitations due to governors?

  • Sandy

    How does one write unit test code to test batch apex? I get 0% code coverage if I call using:
    MyBatchTest b = new MyBatchTest( … ) ;
    ID myBatchJobID = Database.executeBatch(b) ;

  • Nick Simha

    Wrap your test in Test.startTest() and Test.stopTest()
    See here – though this is asynch apex something similar is happening here.

  • Evan

    We need Batch Apex in our organization as soon as possible. However, contrary to the documentation and this blog entry, customer support tells us that the “Pilot Program” for Batch Apex is closed – they said they will not enable the feature. Why the change of policy?

  • Nick Simha

    When the article was written the feature was in preview mode. It won’t be GA till Fall. PM may still be able to turn it on in your sandbox – they would need additional information. Please contact your SE / Account rep or send me a note

  • Eric

    Two questions for Batch Apex:
    1) The document reference says the start method can return five million records by the QueryLocator object, and the excuteBatch method can execute 200 records each time. So, what if I have 100 thousand records to be updated, I will invoke excuteBatch method 500 times,that means I have to issue SOQL queries 500 times. This will exceed the maximum SOQL queries issued. I want to kown the Governor limits is applied to each batch or entire apex job.
    2) Since the batches are processed in parallel, there is a master-detail relationship in my org, and my application will update a lot of recrods on child object, and also there is a rollup summary field on parent object. So when two batches are process in parallel, the rollup summary field will be updated on parent object twice at the same time, this may cause deadlock. I run into this problem when I use future call to implement this function. I want to kown whether I can take the advantage of batch apex to avoid this problem.

  • Nick Simha

    1) Governor limits are applied on each executeBatch invocation. So they are effectively reset with each additional execute.
    2) In a future release there is panned support or serial execution to resolve this issue

  • Kevin

    Just wanted to note that callouts are now allowed from Batch Apex and Scheduled Apex is also available.
    Girish – Batch Apex isn’t the tool for that. There are tools available for that such as contact importer.