Migrating from MySQL to Postgres in 20 Minutes (give or take)
This blog tells the story of how we changed our developer.salesforce.com website, including Trailhead, from a MySQL cluster running on Amazon, to a Postgres database on Heroku. I’ll share some of the technical details, but the important part is that we improved performance, got an easier environment to work in, and gained the ability to easily integrate with Salesforce via Heroku Connect.
To explain why we made the switch, let’s turn the clocks back to the dark ages (ahem, 2006). Back then we exclusively used a MySQL database, mostly because we were using WordPress for our blog, and that’s what WordPress plays well with. But we also had a MediaWiki site, and chose MySQL again so we could have a consistent database platform. As time moved on and our website evolved, we switched to using a Ruby on Rails-based system, and with that came a new database. But by that time, we were pretty entrenched in MySQL.
An important aside to this backstory, is that Salesforce acquired Heroku in 2010, and so it was quite natural for us to choose one of the Mysql Add-Ons from Heroku. However, this didn’t work too well, because the MySQL Heroku add-on is limited to 40 connections, which wasn’t nearly enough. To get more connections, we leased a MySQL Cluster from ClearDB, and that worked, but at some expense. It should be noted that the Postgres databases supplied by Heroku do not have such strict connection limitations.
The Heroku Advantage
We didn’t have to think about it too hard, switching to Heroku had some clear advantages.
- Dataclips – We use Google Analytics and other tools to analyze traffic and other external details on our website, but we also need to run queries and reports on internal data. For example, people on our team need to know how many Trailhead badges were completed this week. Or how many people pre-registered for a particular event. By using a Postgres database, we could leverage Heroku’s Dataclips, which is an easy way for us to export queries as CSV files, and give non-technical staff the ability to easily run the queries for data analysis.
- Follower databases – Heroku Follower databases are essentially read-only copies of our database that we can run queries on without affecting the performance of master database. This hot copy of the database continuously stays in sync with the master database and allows us to make real-time queries whenever we want.
- Heroku Connect – Since we integrate with two Salesforce orgs, we wanted to use Heroku Connect from the get-go. Postgres is required for Heroku Connect, and this made the integrations a lot easier for us.
- Database management – Heroku gives us a nicer architecture, it’s cleaner, and easier to manage accounts.
Converting from MySQL to Postgres (in 20 Minutes)
We had two volunteers from Heroku help us, Lyric Hartley and Phil Ripperger. They regularly do this same kind of work with external customers, and it would have been a very different (and maybe unsuccessful) process without them. First they installed a program called ‘pgloader’ on an EC2 instance, which would do the actual conversion. This was as easy as telling pgloader to point directly at the source MySQL instance and the destination Postgres instance. Because both the MySQL and Postgres databases were on the same Amazon network, the conversion took only 20 minutes. Seriously. Our production database amounted to about 1 GB of uncompressed text, and if we’d done the conversion a different way across networks, we’d still be doing this a week later.
Code Issues Between MySQL and Postgres
Everything didn’t go exactly to plan, we had some platform compatibility issues to iron out.
- Postgres uses true and false for boolean values – Since we were used to working with MySQL, we had hardcoded some of our Rails database migration code with zeros and ones instead of the database agnostic boolean values (note that Postgres uses t/f). Once the hardcoding was fixed in Rails, the boolean values were interpreted correctly.
- Postgres is case sensitive – During QA we discovered a bug where some data was not showing up. To debug this we loaded the same page in two servers (one using Postgres and one using MySQL) and scanned the SQL statements that were executed by each (this is in Rails, so the logs show the SQL). The MySQL query and the Postgres query were the same, but the number of rows returned were different. We eventually traced this to a case-sensitivity issue: the code incorrectly used the lowercase string when it should have used a capitalized string (to match the existing entries in the database). On MySQL this didn’t matter as the database was setup as case insensitive, but the Heroku Postgres is a case-sensitive database.
- Postgres doesn’t like backticks – One final issue was code we wrote that was custom SQL for an order-by clause. We used backticks in the SQL query, which is valid in MySQL, but not in Postgres. Since this was part of an order-by clause, we changed the query to use pure Rails code instead.
That was pretty much it, and we’d do it again in a heartbeat. Our reason for choosing Postgres is perhaps slightly influenced by Salesforce’s purchase of Heroku, but the unbiased benefits really speak for themselves.
About the Author
Mike Papper has been working on the Salesforce Developer’s website for the past four years, specializing in back-end coding and system architecture. He was the original architect for Trailhead, and when not programming, Mike likes skiing, biking and windsurfing the waves in Santa Cruz.