Network Best Practices for Salesforce Architects

For an architect or a developer implementing applications on the Salesforce1 platform, network-conscious testing is becoming increasingly important when analyzing application performance. This guide covers best practices that will help you identify risks and find solutions to network related challenges.

Introduction

Profiling application performance and running performance tests are critical ‘laboratory’ activities to validate that your Apex and SOQL code is optimized for scalability and that Visualforce page designs are following best practices. However, to make sure your application is ready for the real world, you need to also take into account that users will be accessing it from various geo-locations with different levels of network connectivity. As an architect, your mission is to successfully launch an application that performs well even with such network variation. You certainly don’t want to hear from end-users after the production goes live saying “Why is my page taking so long to load when my colleague can load it in a second?” Continue reading to learn best practices that will help you identify risks and find solutions to network related challenges as an architect.

Assessing network performance for Salesforce users

If someone is asking “Why is my page taking so long to load when my colleague can load it in a second?,” chances are the users are set up differently and the time or size to render the content is not the same. To make sure you’re comparing apples to apples and focusing on networking, you must have a controlled setup that should ideally:

Have at least two almost identical terminals (PCs) in two or more different locations: one geographically closer to the Salesforce data center (i.e., where the colleague is) and others in remote sites (i.e., where your offices are).
Access the target (e.g., Visualforce page) using the same browser and user (or set of users) to rule out as much variabilities as possible.
Use the same tools to measure the timing (explained in following sections).
Run tests within similar time frames to assess issues related to network bandwidth, and multiple times to rule out cache effects.

If you don’t have access to remote locations to run tests, there are tools such as Charles or Shunra as well as utilities such as netem and ipfw that allow you to artificially add latency and bandwidth limits to simulate different network environments.

Once you have a controlled testing setup, you are ready to collect benchmark statistics and use them to iteratively assess performance tuning efforts. Regardless of how you set up your tests or which tool you choose to run them, there are ultimately two things we’re after:

Reducing the payload.
Reducing the network latency.

For the sake of simplicity, we will discuss one at a time in the following sections. However, keep in mind that it is important to look at both, not just one.

Reducing payload

The goal of reducing payload is to reduce network time. Given that you are testing and comparing identical pages with consistent content sizes, the time spent on the server side and rendering at the client side should both be very similar. The larger the difference in time spent on downloading resources, and the more significant portion it takes up relative to the overall duration of the request, the more networking performance improvements you are likely to get by reviewing the page design and reducing the payload.

You don’t have to have fancy application performance monitoring (APM) tools to assess payload. You can use free browser tools to collect key metrics. Chrome, Firefox, and Internet Explorer each have similar tools that give you graphical representations of where the time is spent from the moment the page request is sent to Salesforce to when the user perceives the page being ‘loaded’ all the way to when the entire rendering process is complete. You can also use tools such as Fiddler or Charles to do advanced analysis.

When doing so, don’t get too hooked on the byte sizes the tools show you for each resource being downloaded. Exchanging data doesn’t happen bit-by-bit (or byte-by-byte for that matter). They are sent over the wire in units of packets. For example, if you worked on several image files to reduce a few bytes here and there but don’t see much performance improvement, it’s most likely because you’re not reducing the actual number of packets per resource download. If you have a highly graphical dynamic page with lots of resources such as images, CSS or JS files, combining them or splicing them and then minifying them might have a larger effect than reducing size by a fraction and still requiring tens (and hopefully not hundreds) of resources to be downloaded in parallel. There are other general web application optimization techniques you can apply to minimize downloading payloads, reduce the number of round trips and handshakes, etc.

More importantly, make sure to review Visualforce Performance Best Practices and Building Efficient Visualforce Pages on the Salesforce1 platform. For example, remove unnecessary Visualforce tags that bloat page view state sizes. Limit the amount of data loaded and displayed on your page by carefully choosing the fields required by the users and using techniques such as pagination. If you have a multi-step wizard application that walks the user through a process, consider implementing a solution that makes the transition between the pages stateless. If you have a form that allows users to update records with large number of fields, send only the delta information instead of the entire dataset.

Reducing network latency

While you iterate on reducing the payload of your target page by optimizing your application, you should also look into the network layer before concluding there’s nothing you can do to bring the end user closer to the Salesforce server.
You can use basic utility tools such as Traceroute (tracert) or more advanced tools to do deeper analysis. At Salesforce we have several third party tools to continuously monitor and collect various network related metrics, which we can also deploy on-demand to troubleshoot issues from remote sites (contact Customer Support for assistance). These tools can give us good visibility into RTT, BGP Routing, and details like packet loss rates that help discover problematic areas. The following sections will explain how to use these metrics to determine what you could do to reduce network time. You are most likely going to engage your IT, Network Engineering, or ISP teams to obtain statistics and do deep-dive analysis.

Reducing latency

When using Salesforce, the majority of browser page or mobile app requests are bursts of transactions, each requiring multiple roundtrips to and from the Salesforce servers to establish a connection, send/receive data, and acknowledge each packet exchanged.

From a networking perspective, there are two key aspects that you should be aware of:

An instance is not distributed across multiple data centers (other than its disaster recovery clone which is on standby at a geographically remote data center). In other words, user transactions are connected to and served from only one of our data centers at any given moment.
Salesforce.com deploys a carrier-neutral architecture by linking to multiple industry-leading network providers directly at the edge of each data center’s boundaries with high bandwidth backbones. This provides redundancy as well as flexibility for delivering optimal network performance to our users connecting over the Internet.

While the latency added due to the geographic distance between your user and Salesforce is fixed, there could be opportunities to reduce the ‘topological’ latency specific to your user’s network. Make sure you cover at least the following:

Optimize BGP. BGP Routing plays an important role in determining latency when your data packets are sent across the internet. In extreme cases, your packets could be sent through the longer way around the globe to reach Salesforce, or could be hopping over an excessive number of relay points, each adding latency. While optimizing BGP can sometimes be more of an art than a science, we have seen significant gains after careful investigation and change of network routing preferences. Using network monitoring and analysis tools such as ThousandEyes and Appneta provide insight to uncover issues.
Avoid unstable paths. Shortest path is not necessarily the best path. Consider the implications due to network stability issues such as packet loss and data jittering. If either end doesn’t receive expected packets within a given timeframe (i.e. gives up waiting for an acknowledgement from the other side), it resubmits the last package and then waits for a number of times, each multiplying and adding to the overall latency. The impact gets worse with geographic latency because RTOs and SRTTs increase. Similar to BGP analysis, monitoring tools can identify paths that have stability issues. Based on the investigation, you can work with your network team and ISPs to optimize routing to fix or avoid paths that are known to have stability issues. This will lead to fewer packet re-transmissions, which means less time wasted waiting on redundant packet exchanges.
Identify bottlenecks. There could be an intermediate device within your network that’s adding latency. By using tools like Wireshark and working closely with your IT/Networking team and Salesforce Support, packet level trace analysis might uncover optimization opportunities related to suboptimal, or misconfigured devices within your office or hosting data center. You might also discover that access to resources other than those served by salesforce are also showing performance issues.
Avoid redirects. Each redirect adds to overall RTT and causes many roundtrips to and from the redirected servers to complete SSL handshakes. Evaluate and avoid unnecessary redirects. For example, enable My Domain and point your login requests to the My Domain URL instead of the generic login page. If you have implemented SSO, make sure your SAML assertions are sent back to the My Domain endpoint. See this guide for other techniques you can apply to reduce latency due to redirects.

Leveraging CDN

If you have a public page that is built on Force.com Sites and is not served over SSL, Salesforce provides a caching option that allows you to leverage our partner’s Content Delivery Network (CDN). CDN improves page load times by serving static resources from cache servers located geographically closer to the user. This approach has a similar effect to reducing network latency.

Maximizing bandwidth utilization

If you know you have a local branch office that has limited throughput with many users sharing the line, that might be something to look into. However, having 1Gbps enterprise backbone going out from your office does not mean you don’t have to worry about bandwidth, since bandwidth utilization is limited by the latency and TCP window size. It may not be easy to control TCP window size configurations for all parties involved, but it is worth looking into as a tuning opportunity for the client PC that is having issues. For details, read this and this. To learn more about bandwidth requirements when using Salesforce, read this help article.

Integration design considerations

If you have integrations to services outside of Salesforce as part of your application design, consider the following:

Use mashup (iFrame) techniques where applicable to make calls directly to and from the client, rather than making multiple roundtrips through the Salesforce platform. This approach has a similar effect to reducing network latency.
It is also important that handshake and data transfers be minimized (via batchifying and compressing) to reduce the payload. Timeout settings should be carefully tuned to balance latency and avoid holding connections too long. For API calls, parallelization (when possible) can help alleviate some of the negative effects of latency.
Idempotence is also a key design consideration, especially when poor network connectivity is expected. You should assume that any transaction could fail before completion, and all requests to and from the remote server/service should work, and work only once to allow multiple retries without risking data integrity issues.

Summary

Make sure that your page load time goal is met not only in your development lab setup but also for the remote users with additional latency or sub-optimal network connectivity. Look into both web page optimization techniques as well as removing network bottlenecks to reduce networking time. This will ensure you won’t be hearing from end users “Why is only my page taking so long to load?”

Related resources

About the author and CCE Technical Enablement

Daisuke Kawamoto is an Architect Evangelist within the Technical Enablement team of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound salesforce.com solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Developer Force.