Working with data in enterprise environments has many forms. It’s a strong trend to create more and more real-time integrations, so that users have the latest data available at their fingertips. But it is also true that data synchronization between systems, also known as ETL (Extract Transform Load), still plays an important part. This blog post will guide you on how to use the MuleSoft Anypoint Platform for your ETL tasks.
Use case
Let’s sketch a use case and its requirements before we go into how to work with Anypoint Platform:
- You have data in an external system, let’s say a MySQL database.
- You want to sync columns from a specific MySQL table with data of Salesforce Account records.
- You have to transform specific data fields from the MySQL data, so that they fit into how they are used within your Salesforce org.
- You want to run the synchronisation during the night, so that data is updated before your users start to work.
These requirements can be generalized in this ETL flow:
- Connect to a third-party system at a specific time
- Select data from the third-party system
- Modify selected data after extraction from the third-party system
- Write data into target system
MuleSoft offers two tools for setting up ETL flows: you can use the browser-based flow designer, or the desktop IDE Anypoint Studio. In this blog we’re using flow designer, as it offers you an easily accessible interface without the need of local software installation.
Create your first flow designer flow
You access flow designer via the Anypoint web interface in Design Center. For creating a new flow, you create a new Mule Application in Design Center. Think of a Mule Application as a container for one or more flows.
Flow designer offers a card-based interface, which allows you to easily setup connections to systems, modify data, and more. The initial setup part of every flow is defining a trigger that initiates the flow invocation and specifying a target for the flow. For our scenario, the scheduling trigger gets chosen. There are also other systems and options available that can start a flow, but for this blog post we ignore them. Since we want to write data to Salesforce, we’re choosing Salesforce as target. And because we may have thousands (or millions) of data rows to process, we’re choosing Bulk API v2 as operation.
The scheduler trigger is just a cron job, that runs this specific flow based on your configuration. This can be any combination of seconds, minutes, hours, months, days of months, days of weeks, or years. You can get the full documentation here.
Connect to the third-party system
The real cool thing about Anypoint Platform is that you can easily setup connections to any kind of system. Anypoint Exchange, a marketplace with ready-made connectors, allows you to add other or more datasources with just a few clicks. Flows in flow designer are not limited to only one datasource — you can add as many as you need for your ETL needs. Plus, many of those connectors are the same as you can select them as a trigger that starts a flow too.
The next step is to setup the connection to the third-party system. The steps for this vary on the kind of system that you want to connect to. For SQL based databases a Database Connector exists, so we’re going to use that one here.
Once you’ve set up your trigger and your database connection, it’s all about transforming the data into the format that you need. Let’s do that now.
Transforming data
In our scenario, the value of the MySQL table column SomeCustomerFieldToConvert
is stored as an integer. Within Salesforce, we need this value as a 12-character string, prepended with leading zeros. To achieve this, we have to do these things:
- iterate over every selected data row
- convert the integer value into a string value
- prepend leading zeros
For the iteration itself, flow designer provides a For Each connector. The result of the database query is passed to that connector at once, and the connector then iterates over every element.
In the above screenshot, you see the databaseresult variable. This variable represents the database selection from the Database Connector. It passes every data row from the selection to another connector: the Transform connector.
The Transform connector represents a graphical user interface to Anypoint’s DataWeave capabilities. DataWeave is a data modeling language, provided by Anypoint, that enables you to modify any kind of data that passes through a flow.
For this blog post, we’re using the mapping using JSON-based syntax. This way you can easily configure how the output of this flow step will look after the transformation.
You can also use a graphical user interface for simple 1:1 data mappings between input and output. The DataWeave documentation covers the capabilities of the graphical user interface, as well as all the things that you can do with the DataWeave expression language.
Storing data
The final task is now to store data. As you saw above we’re using the Bulk API v2 for that (Because you likely don’t want to spend your daily API limit with single record transaction API calls, right?). The big benefit of using v2 of the Bulk API is that you can upload all data at once, only restricted by a daily record limit, instead of uploading chunks of data as in v1.
Depending on how you process your data and how you want to upload it (like in CSV, or JSON format), you would add another Transform connector to your flow, before updating the data in Salesforce. You can also add additional elements to the flow, like querying for the job status of the bulk upload, and then send a status email. Or you can add more data from another data source to your bulk operation or anything else, based on all the available connectors in Anypoint Exchange.
Summary
Anypoint Platform is a powerful suite of tools that enable you to build powerful data flows connecting to a variety of systems. Using the browser-based flow designer, you can easily build ETL flows that keep data in sync between multiple, siloed systems. Check out these resources to learn more about how you can leverage Anypoint Platform for your ETL needs:
- Anypoint Platform Free Trial
- Trailhead: MuleSoft Basics
- Webinar: Introduction to the MuleSoft Anypoint Platform
- Blog: Slim Down with the New Bulk API v2
- Blog (MuleSoft): Introducing Bulk API v2 Support with the Salesforce Connector
- Webinar (MuleSoft): Intro to Anypoint Design Center – Flow Designer
ETL isn’t the only use case for flow designer. In an upcoming blog post, we’ll look into how you can connect different systems with Anypoint Platform and Salesforce using an event-driven architecture.
About the author
René Winkelmeyer works as Principal Developer Evangelist at Salesforce. He focuses on enterprise integrations, Lightning, and all the other cool stuff that you can do with the Salesforce Platform. You can follow him on Twitter @muenzpraeger.