When building any type of database-driven software, it’s best to test how your app performs with the data volumes you expect to encounter. Performing this testing helps you proactively identify and fix architecture flaws and slow database operations before you move features into production.

This article demonstrates that with just a few mouse clicks and simple lines of code, you can quickly generate any volume of representative mock test data so that you can properly test the usability and performance of a Salesforce implementation or custom Force.com app.

In this article, you’ll learn:

  • Where to load test data
  • How to generate Force.com test data
  • How to leverage existing technology to generate test data for Force.com
  • How to create custom data factories easily
  • Why considering representative data is important
  • How to load data into a Force.com org using both the SOAP and Bulk APIs
  • Testing tips for orgs with extreme data volumes

Note: Just to be clear, this article is not necessarily referring to test data for running unit tests. It pertains to mock or fake test data, perhaps large volumes of it, that you can use to confirm the usability, performance, and architectural design of a Force.com app or Salesforce implementation as the volume of production data ramps up after deployment.

Using a Sandbox for Testing Force.com Apps

First things first: Always test new apps and new app features by using a sandbox org. If you don’t already have a sandbox org, create one. From Setup, click Data Management | Sandbox.


Forcefactory-01.png


When you create a sandbox for testing, take care to choose the correct type of sandbox: Configuration Only, Developer, or Full. See Sandbox Overview and Creating or Refreshing a Sandbox in the online Help.

  1. A full sandbox, which creates a copy all production data, should be your first choice for creating a representative test environment, provided that the amount of data meets your testing requirements, and that you can manage the security of sensitive data. If you need to further supplement the volume of data in a Full sandbox, the techniques in this article might help.
  2. Consider the configuration-only and developer sandboxes’ data volume limits—500 MB and 10 MB, respectively—especially if you plan to create large volumes of data to accurately represent the data volume in your production org.

Note: You can practice using the techniques in this article using a free Developer Edition org, which doesn’t support a sandbox and has a storage limit of 20 MB (about 10,000 records).

Considering Options for Generating Test Data for Force.com Apps

Once you have a sandbox org ready, the next step is to generate some representative test data to support your testing.

As this table shows, you can generate this data with several methods, each of which has its own pros and cons.


Method Advantages Disadvantages
Manual entry No code necessary Extremely inefficient for more than just a few records
Coding custom Apex factory Native
  • Requires custom code
  • Can deplete org resources when used, such as API calls
Coding external app factory, then loading data files You can:
  • Reuse existing code
  • Get practice loading data and tuning data loads
Requires knowing another programming language


The choice you make depends on your level of experience with various technologies, as well as your comfort level with learning new things. This article opts for the third method in the table and teaches you how to create factories using popular Ruby gems. If you prefer to create a native Apex factory, look at SmartFactory, a Code Share project available on Developer Force—it might serve as a good starting point for your work.

Creating Force.com Test Data Factories with Ruby on Rails

I’m a big believer in not reinventing the wheel, which is the primary reason I recommend using tried-and-tested “factory” libraries that are available in other programming languages and let you create representative test data very easily. Once you generate test data using an external app and database, you can export the data to CSV files and load it into Force.com.

Additionally, this approach allows you to:

  • Avoid consuming API calls in your org
  • Practice loading data into a Force.com org
  • Practice tuning large data loads before trying to tune similar loads with production orgs

In this section, I’ll show you how to set up your environment and quickly build a couple of Force.com test data factories by creating a simple Ruby on Rails (RoR) app. Even if you are not a Ruby on Rails programmer, the instructions should be enough to help you download, modify, and execute the code necessary to create all of the test data that you need.

The scenario covers how to:

  • Create several (tens of) new parent records for the Account object
  • Create many (thousands of) new child records, which reference the new Account parent records for the Opportunity object

And it walks you through the following steps.

  1. Create the app.
  2. Install and bundle gems.
  3. Create the development database.
  4. Create local accounts and opportunities database tables.
  5. Create a custom rake task: fake_accounts.
  6. Use the Account Factory.
  7. Export the parent account records to a CSV flat file.
  8. Load the parent account records.
  9. Create another custom rake task: fake_opportunities.
  10. Bulk-load the child opportunity records.

System Requirements

To go through the practical examples in this article, your computer must have Ruby on Rails ready to go. This site has some useful links to help you get started, but before diving in, I strongly recommend that you look at the RVM (Ruby Version Manager) site and use rvm to install, manage, and switch among the versions of Ruby and Rails development environments on your system.

Once RoR is ready, you need a database that you can use to store the data that your factories generate, as well as a companion client that you can use to work with your database.

For example:

  • SQLite3 and the Firefox add-on SQLite Manager
  • Postgres 9.0 and the pgAdmin utility
  • MySQL5 and the phpMyAdmin utility

No matter what operating system your system has, save yourself a lot of setup headaches and time by using your computer’s native package/software manager to install the above software, whenever possible.

Creating the App

Note: If you encounter problems along the way, or just don’t want to build the app step by step, you can clone the app from forcefactory.

Once your system meets the necessary requirements, switch to a development directory on your system, and then create a simple Rails app from the command line.

cd development/apps
rails new forcefactory -d postgresql

The example rails command above demonstrates the -d parameter that you can use to explicitly configure the app so that it uses your preferred local development database: PostgreSQL (-d postgresql), MySQL (-d mysql), SQLite (-d sqlite3), or others.

Installing and Bundling Gems

You can quickly add a lot of existing functionality to your Ruby on Rails apps by including gems in the app’s Gemfile and then “bundling” the app.

This example project requires a minimum set of gems, including:

  • Rails
  • Bundler
  • A database access gem: pg, sqlite, mysql, or similar
  • Faker
  • Populator

The first three gems are relatively standard, so please look them up if you need more background information. The last two gems are the interesting ones in the context of this article.

  • Faker: A gem for generating fake data such as names, addresses, phone numbers, etc.
  • Populator: A gem for mass-populating a database.

For a tutorial that both shows you how to use these two gems together and features the content that inspired this article, see this awesome 10-minute screencast from Ryan Bates: Populating a Database.

When you are ready, create a minimum Gemfile in your app project that has the following lines.

source 'https://rubygems.org'
gem 'rails'
gem 'bundler'
# change the following if you use a different database
gem 'pg' 
gem 'faker'
gem 'populator'

Then bundle your app from the command line.

bundle install

Creating the Development Database

To create the database necessary to support the development environment for your new Rails app, open up your app’s config/database.yml file and specify the username and password. (You don’t have to provide this information with SQLite.)

For example, for a PostgreSQL database:

 development:
   adapter: postgresql
   encoding: unicode
   database: forcefactory_development
   pool: 5
   username: forcefactory
   password: password


Substitute your database username and password above, as necessary, and save the file. Then go to the command line and use the standard rake task db:create to create your development environment database.

 RAILS_ENV=development rake db:create

Note: With certain local database systems, you might need to pre-create a database login for use with your Rails app. For example, with PostgreSQL, you might create a new login role as follows.

 CREATE ROLE forcefactory LOGIN
   ENCRYPTED PASSWORD ...
   NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;

Please Google around if you need more help than what this article provides.

Creating Local Accounts and Opportunities Database Tables

The next step is creating local database tables that can hold the test data you are about to generate. These tables should correspond to the target objects in your Force.com org. In this example project, you are going to create local database tables that correspond to the Account and Opportunity standard objects.

With Rails, you make version-controlled database changes using migrations. Execute the following commands to create migrations for minimal Account and Opportunity tables. (Required fields for Force.com are included.) Feel free to supplement the list of fields in these tables to support your requirements.

rails generate model Account sfdc_id:string name:string --timestamps=false

rails generate model Opportunity sfdc_id:string account_id:string name:string amount:decimal stage:string lead_source:string closed_on:date order_number:integer --timestamps=false

Optionally, feel free to open, inspect, and modify each migration file in db/migrate as necessary. For example, you might want to add a precision and scale to the amount field for the Opportunity model (in db/migrate/...create_opportunities.rb).

 class CreateOpportunities < ActiveRecord::Migration
   def change
     create_table :opportunities do |t|
       t.string :sfdc_id
       t.string :account_id
       t.string :name
       t.decimal :amount, :precision => 8, :scale => 2
       t.string :stage
       t.string :lead_source
       t.date :closed_on
       t.integer :order_number
     end
     add_index :opportunities, :account_id
   end
 end

Next you can migrate your database to create the tables that correspond to your models.

 rake db:migrate

Creating a Custom Rake Task: fake_accounts

Now create a custom rake task to generate a given number of fake Account records. In your project, create a new file, /lib/tasks/populate.rake; paste in the following code; and save the file.

  require 'faker'
  require 'populator'

  namespace :db  do
    desc "Create some fake accounts to generate real SFDC Ids"
    task :fake_accounts => :environment do
      print "How many fake Accounts do you want? "
      num_accounts = $stdin.gets.to_i
    
      # create the specified accounts
      Account.populate num_accounts do |acct|
        acct.name = Faker::Company.name
      end
    
      print "#{num_accounts} created.\n"
    end

  end

The task code is a factory for Account records. It prompts for the number of accounts to create, then uses a combination of the populator and faker gems to create Account records with representative company names. Notice that the call to Faker is there to create fake company names.

Using the Account Factory

Now go to the command line and use the new rake task to populate your local database with 20 Account records.

 rake db:fake_accounts

How many fake Accounts do you want? 20
20 created.

Optionally, use your local database client to confirm that you have twenty new records in the Accounts table. Here’s a screenshot from the pgAdmin utility.


Forcefactory-02.png

Export the Parent Account Records to a CSV Flat File

To prepare for data loading, export the new Account records from your local database to a CSV flat file using your local database’s client utility. The steps vary for accomplishing this task, depending on the database and utility you are using. For example, with the PostgreSQL pgAdmin utility, you can export all Accounts table records as follows.

  1. Create a new query: SELECT name FROM accounts
  2. Click Execute query, write result to file.
  3. Use commas to separate columns, specify a file name, and then click OK.
  4. Optionally, open the generated CSV file, and verify the data and format.


Forcefactory-03.png

Loading the Parent Account Records

Before you can perform similar steps for an Opportunity factory, you must load the parent Account records into your Salesforce or Force.com org so that you can get the assigned object IDs for the master-detail relationship field in the Opportunity object.

You can use any number of data loading utilities to load the new parent Account records into your Force.com org, including:

  1. The Import Wizard
  2. The Apex Data Loader
  3. A third-party data loading utility available from the AppExchange

For this example project, try the free Jitterbit Data Loader available from the AppExchange. Downloads are available for both Windows and Mac.


Forcefactory-04.png


Once you install the Jitterbit Data Loader, start the local app. The first step is to configure and successfully test a connection to your Salesforce/Force.com org—click File | New | New Salesforce Org, then fill out the form appropriately for your org type.

In both cases, you must specify a security token. If you don’t already have a security token, you can generate one from Setup in Salesforce and Force.com.

Click My Personal Information | Reset My Security Token | Reset Security Token, and then look for an email with your security token. Use this security token in the connection form.


Forcefactory-05.png


Next, insert the new Account records into your org by creating and executing a new Insert Data Operation within your Jitterbit project. Underneath the covers, this type of operation uses the Force.com SOAP API to insert records, which is acceptable for loading a small number of records into your org.

To create the new Insert Data Operation in your Jitterbit project:

  1. Click File | New | New Insert.
  2. Select the connection you just created, then click Next.
  3. Click the target Salesforce object, Account, then click Next.
  4. Select Local File, Select a File, then use Browse to locate the CSV file you created earlier, then click Next | Continue.
  5. Select CSV file, with header, click Next.
  6. For Run on schedule, select the default none option, then click Map & Finish.
  7. Drag and drop the name field of the local Accounts file to the Name field of the remote Account object in your org. You should end up with a mapping that resembles the following screen. Notice the handle in the mapping: if you double-click it, you have access to all sorts of data transformation options (very cool feature to consider for real data loads in the future!). Once you are done, click Finish.


Forcefactory-06.png


When you finish creating the Insert operation, you can simply click Run Insert to insert the data from your CSV file into your org’s Account object.


Forcefactory-07.png


After you go into your Salesforce or Force.com org and confirm that the new Account records are there, the next step is to export the IDs of all your Accounts to a CSV file so that you can use these IDs for the Account relationship field when you generate thousands of Opportunity records. To complete this step, you can use Jitterbit again, this time by creating a new Query operation, which uses the SOAP API again.

  1. Click File | New | New Query.
  2. Select the connection to your org, click Next.
  3. Click Account, then Next.
  4. Select Id (so that the query reads “Select Id from Account”), then click Next.
  5. Select Local File, Select a file, specify a unique file name, then click Next.
  6. Use the default file format, then click Finish.

Click Run Query to run the query and create the local CSV file that contains the Salesforce Ids of all your Account records.

Note: If your org already has some Account records, such as with a Developer Edition org, your CSV file will contain more than the 20 IDs you created earlier, but that shouldn’t matter.

Now use your favorite text editor to open your CSV file and transform the 20 or more lines into space-separated array of object IDs similar to the following.

001ixxxxxxxxxxxxAG 001ixxxxxxxxxxxxAG ... 001ixxxxxxxxxxxxAO

Those aren’t real object IDs there—I messed with them a little to obfuscate things a bit. Keep the file open and have it ready for when we generate child Opportunity records.

Sidebar: The Value of Representative Test Data

After reading the previous section, you might wonder why I spent so much effort capturing the IDs for the new Account records, which would eventually be used when creating child Opportunity records. It would be simple and easy to just create all my Opportunity records in the next section using the parent Account ID of a single existing account. You could do that, but ask yourself this question: “What data set accurately represents the production environment?”

When working with orgs that have the potential to support large data volumes, it’s critical to design your app with Force.com’s record sharing mechanisms in mind, and then test your design with good test data. Why? As several papers in the Security section of the Architect Core Resources page explain, record- and parent-ownership distribution skews can have significant impacts on record sharing recalculations, as well as query and search performance.

Here’s a practical approach to consider once you finish reading this article and the referenced papers above.

  1. Load your sandbox org with a representative data set, and then try out operations you might need to perform in production. For example, change the owner of a parent record that has many child records, change the parent of many child records, and query records as different users who own and don’t own records.
  2. Time how long each operation takes to complete.
  3. Carefully note and use the best practices you learned in those papers to architect solutions to any issues you encounter.

Doing proactive testing with representative data in sandbox just might save you lots of gray hair when you go into production and help you avoid painful redesign efforts.

Creating Another Custom Rake Task: fake_opportunities

Now you are ready to create another factory in your Rails app, this one for creating Opportunity records. Back in your app project, edit /lib/tasks/populate.rake, paste in the following new task code after the first task code, and save the file.

  desc "Create some fake opportunities using given Account Ids"
  task :fake_opportunities => :environment do
    # prompt for an array of Account Ids
    print "Enter some validated Account Ids (separated by a space): "
    accounts = $stdin.gets.split(" ").map { |s| s }
    # array of valid Opportunity stages
    stages = ['Prospecting', 'Qualification', 'Needs Analysis', 'Value Proposition', 'Id. Decision Makers', 'Perception Analysis', 'Proposal/Price Quote', 'Negotiation/Review', 'Closed Won', 'Closed Lost']
    # arry of valid lead sources
    sources = ['External Referral', 'Web', 'Phone Inquiry', 'Purchased List']    

    # prompt for the number of records to create
    print "How many fake Opportunities do you want? "
    num_opps = $stdin.gets.to_i

    # create the opportunity records
    Opportunity.populate num_opps do |opp|
      opp.account_id = accounts
      opp.name = Faker::Company.name
      opp.amount = [50000, 100000, 500000]
      opp.stage = stages
      opp.lead_source = sources
      opp.closed_on = 3.years.ago..Time.now
      opp.order_number = 1000..500000
    end
    
    print "#{num_opps} created.\n"
  end

[ For entire file, see populator.rake ]

As in the previous task, this code is a factory, but this time for Opportunity records. Because there are more fields to populate, it seems a bit more complicated, but it’s really not hard to figure out. Again, the code uses the populator gem to easily randomize realistic test data for your database. In particular, notice how you can use arrays (accounts, amount, stages, sources) and ranges of values (closed_on, order_number). The faker gem is used in this task to create random names. faker would also be useful if you needed to create phone numbers, addresses, etc.

When you run the task from the command line, it gives you two prompts. The first prompt is for an array of Account IDs, separated by a space: Simply copy and paste the array you created earlier from your CSV file.

The next prompt asks you for the number of records to create.

 rake db:fake_opportunities

Enter some validated Account Ids (separated by a space): 001ixxxxxxxxxxxxAG 001ixxxxxxxxxxxxAG ... 001ixxxxxxxxxxxxAO
How many fake Opportunities do you want? 5000
5000 created.

The previous output is truncated and modified a bit to make it more readable and hide the read IDs of my Account records.

Bulk-Loading the Child Opportunity Records

To load the Opportunity records, complete steps to those you followed when loading Account records. Start by using your local database client to export the record data in the Opportunities table to a CSV file. You need to export only the fields supported by your org. I’ll leave the specifics for this step as an challenge for the you.

Next, you can use Jitterbit to load the many records into the Opportunity object. But this time around, considering that you have thousands of records to load, it’s best practice to more efficiently bulk load the data with Jitterbit’s support for the Force.com Bulk API.

  1. Click File | New | New Bulk Process.
  2. Click Insert, then Next.
  3. Choose the connection to your org, then click Next.
  4. Click Opportunity, then click Next.
  5. Select Local File, Select a file, then click Browse and select the CSV file you created by exporting data from your local Opportunities database table, then click Next.
  6. Use the Map Headers page to map fields in the file to fields in the Opportunity object of your org. Ignore the order_number field in the file if your Opportunity object does not have this custom field. When you are finished, click Finish.


Forcefactory-08.png


Before you click Run Bulk Insert, click Advanced Options and notice how there are options that let you tune control of how the tool uses the Bulk API. This is an example of a well-designed data loading utility that lets you take advantage of various Bulk API options.

Once you run the operation, Jitterbit loads the records in batches and completes the job very efficiently. When complete, your org has all the test data necessary for you to begin testing query performance, report runs, etc.

Practicing “Real” Data Loads

Loading significant amounts of data into a Salesforce or Force.com org can pose many challenges that you should clearly understand before moving that data into production. For example, there are many different mechanisms that can significantly slow down data loads, such as active triggers, workflow rules, and validation rules. There are also corresponding solutions, such as cleansing and transforming data before loads and disabling such mechanisms during the load. For more information about such best practices, see the Architect Core Resources page’s content, including the Best Practices for Deployments with Large Data Volumes paper.

The techniques in this paper provide you with the tools, best practices, and tips you need to generate representative data that you can use to practice, test, time, and resolve data-loading challenges in sandbox without the pressure of managing your production org.

Extra Credit

Certainly, there are many ways that you can enhance and supplement what this article teaches you. For example, you might use the databasedotcom gem to create a live connection to your org and pull Account IDs into your fake_opportunities rake task so that you don’t have to enter them at runtime.

For those of you who want to make improvements, have at it: I’ve published the sample app, forcefactory, as a public Git repository on GitHub. Feel free to send me a pull request whenever you want to improve upon my work. Together, we can make the world a better place.

Summary

Creating representative test data in representative test data volumes is an important step in gauging the usability and performance of any database-driven app. This article showed you how to repurpose existing data factory libraries available for Ruby on Rails to create all the test data you need for Salesforce and Force.com orgs with eight simple Rails commands, 30 lines of basic Ruby code, and approximately 100 mouse clicks. As a bonus, you learned how to load data into your org using a free third-party data-loading tool, Jitterbit Data Loader for Salesforce.

Related Resources

About the Author

Steve Bobrowski is an Architect Evangelist within the Technical Enablement team of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound Salesforce solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Developer Force.