How We Use Open Source at Salesforce.com – Part 1

While many don't realize it, open source is key to salesforce.com. The Salesforce application is written in Java running on Sun’s JDK, and the entire development stack also runs on open source. Ian explains.

While many may not realize the importance of open source at salesforce.com, it plays a critical role here. These days, developers at salesforce.com spend all day working in open-source software, but this wasn’t always the case.

In the early years of the company, the standard-issue workstation ran Windows. But a few developers noticed that building the Salesforce application went much faster on Ubuntu Linux, using the same exact hardware — cutting the build time from hours to minutes. They started passing around a CD, and before anyone knew what had happened, the change went viral and nearly everyone in engineering had switched to Linux. (Thankfully, our awesome corporate IT team soon followed suit and started issuing new machines with a supported version of Linux for new hires.)

On top of the Linux OS, the entire development stack also runs on open source. The Salesforce platform is primarily written in Java and runs on Sun’s JDK (which is open source). Most developers at salesforce.com write, compile, debug, and test the application in Eclipse, an open-source programming editor (maintained by the aptly named Eclipse Foundation). Many projects use Git as their source version control, although the central repository is Perforce because of its size.

Like most companies, we use many of the “usual suspect” Java open-source libraries during development: Guava, Gin, and Guice (open sourced by Google); Apache Commons (from the Apache Software Foundation); Jackson (killer JSON parsing library written and maintained by Salesforce engineer Tatu Saloranta); among hundreds of others. Our extensive automated test suites make heavy use of the test frameworks JUnit, Mockito, JMockit, and Selenium.

Building the software is a big task for a complex platform like Salesforce. The platform was originally built using the build framework Apache Ant, an open-source tool. But as the complexity of the system has grown, the core build team has been making the transition to Apache Maven (with the help of Jason van Zyl, the author of Maven). Maven’s declarative dependency management gives teams a much faster, more modular build.

Once the code is written, there’s a large-scale system that builds and packages the software. Once committed, software artifacts are built and packaged for deployment using Jenkins, a simple, continuous integration and scheduling framework. Our release package store is written in Java and uses a sequence of Jenkins instances to build, package, test, and promote the resulting compiled artifacts. Internal testing — that is, running hundreds of thousands of tests on every checkin — is done by a fleet of OpenStack instances.

Once the application is built and packaged, it needs to be deployed into production. In addition to three major releases per year, salesforce.com engineering also does hundreds of maintenance releases and patches containing performance improvements, capacity adds, configuration updates, and the like. Historically, this has been done using a homegrown tool written in Perl, which allows deployments under the extremely rigorous security restrictions that protect Salesforce production instances. However, the sheer scale of Salesforce’s deployment infrastructure has prompted recent advances in this area, upping both the automation factor as well as the security. The new systems use a suite of open-source software tools, including Razor (hardware provisioning), Puppet (server install automation), Salt (orchestration), and Rundeck (operation).

And when the software is running in production, salesforce.com engineers monitor their services using an open-source metrics aggregation platform called Graphite. Data flows into Graphite using Apache Kafka, a high-throughput distributed messaging system written by LinkedIn.

The Importance of Open Source at Salesforce.com – Controlling Your Own Destiny

An important benefit of open source is that it lets you control your own destiny. Adopting open-source software gives our engineers the ability to solve problems directly.

An example of this is the software that routes web requests to our servers (the “servlet container”). Salesforce started out using a well-established commercial product, which worked well for many years. However, a new project in 2011 ran into a snag when the servlet container lacked a key feature, and the vendor was unwilling to implement it. Instead, the team decided to switch to an open-source version called Jetty. This was a big project, of course. It’s hard to switch a core component in a big, complex application — especially one that’s been in active use for 10 years! But the project was a success, and Salesforce now runs Jetty everywhere.

Search Indexing at Salesforce.com

Another great example of open-source software usage at Salesforce is search indexing (the process of taking text, like Account details and Chatter posts, and making it accessible to fast user searches).

The original implementation of search at Salesforce used a popular open-source search indexer called Apache Lucene. The search development team at the time decided to “fork” Lucene (that is, make a local version that diverges from the community-maintained version).

However, as everyone knows, the scale of Salesforce has increased manyfold over the years, to the tune of 1.5 billion transactions every day. Along the way, the search team discovered a scalability challenge in the architecture of their implementation of Lucene. The team needed a way to scale this capability “out” rather than “up,” using a large number of smaller machines to process the same requests.

The solution they found was Apache Solr. Solr is a horizontally scalable system with a loosely coupled REST interface. This architecture allowed the team to move the query and index processes to the same host, and cut out the requirement to use a SAN. This move also netted the team a spate of new features. (Solr still uses the latest version of Lucene for the core library.) And the team has contributed to and sponsored fixes to Solr, including a somewhat uncommon one: the ability to support indexers with over 10,000 cores!

As you can see, we use a LOT of open-source. It gives us the flexibility needed to solve for customer and platform demands such as performance and scale. In Part 2, we’ll detail more of the ways that salesforce.com contributes to existing open source projects.

Published
July 18, 2014

Leave your comments...

How We Use Open Source at Salesforce.com – Part 1