Episode 168: Nebula Cache Manager with Jonathan Gillespie

In this episode, we have a conversation with the lead software engineer here at Salesforce, Jonathan Gillespie. Jonathan develops in Apex and Lightning Components and works in the internal org, Gus. He has been on the podcast before, but he’s back to discuss the follow-up to his Nebula Logger project – the Nebula Cache Manager.

We talk about how Platform Cache works, how it works with Apex, and how it can make your life as a developer better. Jonathan also shares the three specific limits that led him to start thinking about better cache strategies.

Listen in to learn how the new Nebula Cache Manager can be a benefit to you.

Show Highlights:

What Nebula Logger is.
What happens when we have a static variable in Apex.
The difference between organizational and session platform caches.
The expiration of an organization.
How Jonathan handles an org with no Platform Cache.
The things Jonathan wanted to give back to a developer from a utility point of view.

Episode Transcript

Jonathan Gillespie:
I work on the platform, so I develop in Apex and Lightning Components, working in one of our internal orgs called Gus, so building Salesforce for Salesforce.

Josh Birk:
That is Jonathan Gillespie, a lead software engineer here at Salesforce. I’m Josh Birk, your host of the Salesforce Developer podcast. Hear the podcast, you’ll hear stories and insights from developers for developers. Today, we bring back Jonathan actually to the mic to talk about a follow-up to his Nebula Logger project, the Nebula cache manager. We’re going to talk about how cache works, how it works with Apex, and how it can make your life better. But first, we’re going to ask Jonathan what his earliest memory with a computer was.

Jonathan Gillespie:
Earliest, earliest would probably have to be, I think, years before we got a computer at my house, visiting… I want to say probably my cousin in Virginia. His parents had… I think it was an old Apple of some sort, but it was in color. It had a Teenage Mutant Ninja Turtles thing that you could color in with paint, and color it in however you want to, digitally. Really cool stuff.

Josh Birk:
Love it. Love it. It’s advanced. Was computing something that you always wanted to get into?

Jonathan Gillespie:
As long as I can remember. I mean, since at least middle school or so. I think I was always interested in tinkering with the technology.

Josh Birk:
Gotcha. Now this Apex cache project we’re going to talk about comes in part with your previous experience from your previous episode of Nebula Logger Just to level set for everybody, what exactly is Nebula Logger?

Jonathan Gillespie:
Yeah. So Nebula Logger, as far as I’m aware, it is now the most popular logging tool for Salesforce available. It’s a project that I’ve been working on since 2016, give or take. I don’t know where the time goes, but about six or seven years now. A lot of help from a friend and colleague, James Simone, along the way. It’s just something I’ve been kind of going at. Talked about in the last podcast about just the history about it and all that, but it’s just a project that I’ve been pretty heavily working on over the last two or three years in particular, is when it’s really gotten popular. I mean, it’s just intended to be something that works in pretty much any Salesforce org, gives you the ability to log extra data for Apex Flow as well as Lightning components.

Josh Birk:
Nice. Nice. Now in the accompanying blog post on the choice of Apex that you’re talking about this new project, you identify three specific limits which led you to start thinking about better cache strategies. Can you walk me through those limits?

Jonathan Gillespie:
Sure. Yeah. If I can remember the three, I believe the three, if I’m remembering right, are around some query limits. Just because anytime you try and access the database in Salesforce or Apex, various limits that come into play, one of which is SOQL query limits. So depending on the type of transaction that you’re running in, if it’s async or synchronous, the query limit can vary a little bit, but you basically have around a hundred or so queries that you can use. I mean in some really complex orgs, those queries can go pretty fast, especially if you’re using managed packages have their own limits, but other packages that you may have installed in your org could count towards the same limit depending on how those packages have been designed. So query limits are a big one that can go through pretty quickly on top of just queries are kind of slow anyway.
I believe one of the other limits was CPU time. Some data doesn’t necessarily live in the database, it’s generated through code, kind of on the fly, and that work takes time as far as CPU time goes for transactional limits. So anything you can do to reduce how much code has to execute or can help speed things up. And then I think the third one that I mentioned in the article and something that I come across a lot with integrations is call out limits. Just like query limits there’s a limit on the number call outs you can make per transaction. There’s also an additional amount of complexity of you can’t do, oh, I always get this backwards. You can’t do GML before call out, I believe.

Josh Birk:
I believe that’s correct.

Jonathan Gillespie:
Okay, good, good, good.

Josh Birk:
Yeah. So yeah, because it makes sense, right? You want to make sure the data you’re about to impact hasn’t already impacted before you did the call out. Am I walking through that correctly? Yeah. Okay.

Jonathan Gillespie:
It sounded good.

Josh Birk:
Yeah. And these are limits that you’ve just sort of generally run into because does Nebula Logger have any particular call out limits that it runs into, or is this just these are limits on Apex that you run into a lot?

Jonathan Gillespie:
Those three are just more broad, broadly speaking ones that I’ve run into a lot over the years, not necessarily something specific to Nebula Logger, but some of these did come about because of running into it in particular, I didn’t typically run into call out limits with Nebula Logger, it makes a single call out at most per transaction, I believe. So that one’s not going to do much harm, but just the general performance of Nebula Logger, I like to think that the code base is pretty well optimized and clean overall, but it’s grown a lot. There’s a lot of features and things happening that have been added over the last year or two. And in some orgs there are some orgs using Nebula Logger that are trying to log data and an org that has millions of rows of records, some in multiple objects each have millions of rows of data. So just trying to do things like queries, every query becomes that much more precious and can have a negative impact if you’re writing too many queries at all. And CPU time also can be if you do a lot of logging data at once, again, I try and keep the code optimized overall I think. But stuff happens and then log large data volumes, CPU time can also be a concern.

Josh Birk:
I mean, I guess that makes sense because Nebula Logger is trying to help monitor throughout your whole org. So the larger, the more complicated, and especially if you start getting to large data volumes, then what’s the most obvious impact that people would see? Is it just slower or is it every now and then just kind of like a wrench gets thrown at it because it just went over a SOQL query limit.

Jonathan Gillespie:
So I don’t actually hear many people reporting about it going over the query limit. I think most of the architecture of Nebula Logger, a lot of it runs asynchronously. So as far as risk of running over the query limits and the number of queries it does through all throughout the code base, not a major concern, but there was one very large org using it. And because of some of the queries that it runs, and I guess just for context, some of the queries it does is if you log, say any kind of S object record, it’ll query for additional information about that record. It also logs information about the current user, the current organization that the code is running in, the Apex classes deployed to your org, the flow is deployed to your org. Things like that, just metadata or meta metadata. And some of those queries are great as far as having extra context when looking at logging information, but they come at the expense of all these extra queries that run. So even though they’re running asynchronously, I thought originally that that was a good enough solution because they were running async, so they didn’t block the original users transaction or slow things down. But what turned out to happen in some of these large orgs, this one in particular, this customer was logging, I want to say 3 or 3 million rows of logging data per day. Per day. Pretty nuts.

Josh Birk:
Gee.

Jonathan Gillespie:
And so that’s millions of records being generated. Each one then is running multiple queries. So they were actually getting throttled at the database level on their Salesforce instance. So it had not something that I think typically happens, but for this org in particular, it really had some big impact to their org overall, slowed down essentially lots of their org because they had integrated logging everywhere.

Josh Birk:
Gotcha. Now you kind of bring up static variables as sort of, let’s walk a little bit because we have kind of this very unique architecture and all of that kind of stuff. When we say static variable in Apex, what are we actually seeing and what’s the scope of that?

Jonathan Gillespie:
So in Apex, any static variable you have is going to basically last for the duration of a transaction, and that’s transaction is defined by Salesforce and the platform overall. If you’re Salesforce admin or Salesforce developer and you’ve ever tried to learn about the order of execution, that is your Salesforce transaction. So user click save, that fires triggers that fires flows, that fires any other automations you’ve got and then you’re done. That is basically the duration that a single or any static variable will live. I think that’s a little bit different from a lot of other platforms or languages. It’s been years since I’ve done a lot of C Sharp or Java. But I think in those languages, once your app starts up and the static variables are initialized, they stay populated or in memory for the duration of your app versus Salesforce, it’s much smaller focus and it’s just a single transaction.

Josh Birk:
And that transaction would often have a career associated with it. It’s a static variable that’s trying to create a nest object or a series of nest objects based on some kind of, now I’ve got that tongue tie, some kind of SOQL filter.

Jonathan Gillespie:
Right. Exactly. So every transaction has this upfront overhead of it has to run these queries in order to populate any static variables or if it’s doing call outs or CPU usage to generate data, whatever the case may be there. Every transaction has this overhead.

Josh Birk:
So every transaction has to redo its query. I can only access the value of that static variable during the transaction. How does Platform Cache help with?

Jonathan Gillespie:
That? So Platform Cache is an extra layer that Salesforce has. It does have some included space for every org, I believe. Some include some larger orgs, you may have to purchase additional usage, but the idea behind it is it’s a way to store data that doesn’t need to live in a custom object or somewhere else in your Salesforce data model, but you want it to be, it’s not going to change frequently. So you want it to be reused when possible for some duration of time. Could be few seconds, could be for the duration of a single day, but Platform Cache basically provides that extra layer of having a place to store data that pans multiple transactions.

Josh Birk:
And what’s the difference between organizational and session Platform Caches?

Jonathan Gillespie:
Organization is really designed for data that is not user specific and that anybody in the org should be able to access even in if it’s indirectly via code. So the example I used in the article I wrote for Joys of Apex, I like using the example of Cue data Cue’s our Salesforce feature.

Josh Birk:
Got it.

Jonathan Gillespie:
That basically you can assign records to either a user or to a queue, and there’s just no way to get information about your Cue’s and your org without doing a query. So the Cue’s are a great way or a great example of this, they have to be queried. For most orgs, Cue’s don’t change very frequently. There are some where you’re constantly creating or changing your Cue information, but most of the projects, at least anecdotally ones that I’ve seen over the last several years, we might create a new Cue once per every couple of sprints or as part of some new feature we’re rolling out or we’re not sitting there constantly creating Cues every day or anything. So they’re a pretty good option of something to store in something like Platform Cache.

Josh Birk:
Got it. And then a session cache is user specific,

Jonathan Gillespie:
Correct? Yep. So the other side of it is session cache, which not only is it user specific, it also means that the user has to have an active session. So there are some nuances there. Things like asynchronous jobs running, those don’t always have a user session in context, so that gets even trickier using platform session cache.

Josh Birk:
Got it. So it kind of sounds like an organization cache is almost more like that application layer we’re talking about and then a session cache is ephemeral that has the duration as long as I have an active session. Is that a good paraphrase?

Jonathan Gillespie:
Correct. Although the organization cache does automatically expire too, they both have automatic expirations. Yeah.

Josh Birk:
Oh, okay. What’s the expiration on organization cache?

Jonathan Gillespie:
I believe organization cache can have a max of 24 hours and session cache can have a max of either eight hours or a new session with the user, whichever comes first.

Josh Birk:
Got it. Okay. Now, what are some of the limitations of cache other than, so we talked about data size and that’s depending on your edition. If you’ve got a enterprise edition, you’re not going to have the same as performance edition. So we’ve got size, we have duration. Any other limitations you can think of?

Jonathan Gillespie:
Definitely a couple of others. There’s just some little things to be aware of. Things like if you are trying to store a no value, that’s not supported, I think I kind of understand the intent behind it. There’s no need for a caching system to have a no value, but it makes it more challenging from a code perspective because that means that you may have done the work, you may have made some call-outs to some other system, but that other system’s response is a no value. And so now you don’t have a direct way using Platform Cache to actually cache a no value. I ended up addressing that with a little goofy workaround. I basically, I have a substitute value for if there’s a null value of any sort that comes across, I just use this ridiculous looking string value as a substitute. So it works and it’s a way to basically still have that cache serve as a place to know whether or not you’ve already tried to do the caching, even though null values aren’t quite supported.
There’s also just some other goofiness around, there’s some limitations around the keys that you can use. So the idea is every time you want to populate something into Platform Cache, you provide a string key to identify that. Those keys have to be alphanumeric. You can’t have any kind of underscores or delimiters or special characters or anything like that. Not a major deal in my opinion, but just something to be aware of when working with it.

Josh Birk:
And I love little details like that on the platform because you just know it’s probably based on some other person’s implementation that we just haven’t figured out how to throw another shim in there to make it work exactly the way we want to because as you said, it’s not that big of a deal. But I love the null thing because I feel like it’s one of those philosophical programmatic questions. Do you need null if it doesn’t exist? But as you’re saying, you kind of do sort of thing. What is the value of null, and it’s in knowing that the response was actually empty kind of thing. So I love that.
When did you start looking at some of these things? Because this sounds like one of those organic developer issues where you were working with Nebula Logger or working with other stuff in Apex. When did you first start thinking, okay, the platform has these capabilities, but what I really want to do is build an abstract layer on top of this?

Jonathan Gillespie:
Yeah, I mean it definitely started organically. It started probably a year or two ago, a couple people on GitHub started asking if I’d considered using Platform Cache to help speed up things overall and to reduce some of these queries. It was something I’d been avoiding honestly for about the first year or two of releasing the unlocked package for Nebula Logger because I knew that not all orgs have Platform Cache, so I didn’t want to become highly dependent on it got it as a feature. And I didn’t want to put people into a spot where things wouldn’t perform as well if they didn’t have Platform Cache. So I really initially just spent a lot of probably a couple months trying to do every optimization I could, just getting the existing code to work efficiently when possible. But then especially once I heard from some internal Salesforce employees about this very large org, doing a lot of millions of years of logging data, seemed like time to finally investigate things a little bit.

Josh Birk:
How are you currently handling that when you have an org that has no Platform Cache, are you detecting that and then giving it some kind of a stub or what’s your solution there?

Jonathan Gillespie:
A couple ways I handle it. So Platform Cache itself, just as far as what the platform provides, like you said it is, it’s organization cache and session cache. In this project, Nebula Cache Manager, I’ve introduced a third type that I’ve called the transaction cache, and that’s basically a way of storing stuff for the duration of a single transaction using static variables. So it’s kind of done, you can use it if you do have your own code that you want to or own data that you want to cache for the duration of transaction, the Cache Manager does have a way for you to use that as well. So it’s publicly available in the class, but internally I basically use it to also supplement the Platform Cache. So I try and use the organization cache and session cache when possible. But a couple things can go wrong with that. Either the org has used up all of the allocated session, or it could be that they don’t have any allocation or for whatever reason deleted the session itself or the partition itself altogether. So there’s a couple things like that that can go wrong with trying to use it. So basically internally it’s anytime you try and use the organization or session cache internally, it’s also falling back on using the transaction cache as a way to supplement and overcome some of the limitations of Platform Cache.

Josh Birk:
Interesting. So some of this, it sounds like the developer doesn’t really need to know. You’re kind of treating the situation where you, you’ve run out of session cache or you simply never had session cache in the first place and your code will fall back to transaction caching. Or if I’m working on an org that I know doesn’t have cache, I can talk to the transaction manager and just talk to transaction cache directly and kind of a avoid the problem altogether.

Jonathan Gillespie:
Yeah, exactly. I wanted it to be, because I’m building this as an open source project, I wanted it to be something that could be installed into any org, and even if you don’t have Platform Cache, you do at least get the benefits of the transaction level caching. So there still some benefits to it and it should work in any org.

Josh Birk:
Talk to me about a little bit more about the other design considerations that you had. So this is a big one, is being able to give people a proxy layer even if they don’t have cache to behave in this way. What were some other things that you wanted to give back to a developer from a utility point of view with this,

Jonathan Gillespie:
One of the big, well some of the big things is just out of the box Platform Cache functionality. In Apex, it’s all in the cache namespace. You can do a lot of really cool stuff, but there are a couple things that I didn’t like when I happened to initially work with it or that felt error-prone. Things like there’s not a strongly typed reference to a cache partition. So if you want to reference what cache partition you want to put data into, you have to use a hard-coded string somewhere. Somebody could mistype the hard-coded string, something go wrong or the name could change, and that means you’d have to redeploy the code. So with this, I ended up adding in a custom metadata object to basically let you configure not only the name of the session partition or the cache partitions, but you can also configure other things like how long is the cache valid for? You can disable it or enable it on the fly. You can clear it, you can do a couple nice things all through just configuration by an admin or developer.

Josh Birk:
Nice. So instead of relying it in code, I mean that’s what customizing is for, right?

Jonathan Gillespie:
It’s nice. It’s one of my favorite things over the last couple years in the platform. I don’t remember how long ago they released it, but it’s super handy having custom metadata types.

Josh Birk:
Nice. I will also say that every now and then when I’m looking at code from projects like yours, it’s also just the way you build out your interface is a very good example of why and how to build out an interface in Apex. It was almost like one of those aha moments when I first started doing data classes within Apex just to create a structure kind of thing. And to kind of walk down that a little bit, you’ve, with that interface, you’ve kind of given people a one point in the code where they can just go to and catch up to the work that you’ve been doing to abstract this layer to, does that sound right?

Jonathan Gillespie:
It does. And that was another thing that wasn’t something I initially set out to do, but as this evolved when I was originally doing it for Nebula Logger over the course of a couple months, doing the initial implementation of Platform Cache and then converting that into a kind of standalone project, initially it started as three separate interclasses, one for transaction cache, one for session cache, and a third for organization cache. Organization and session cache got combined pretty quickly because they’re both ultimately just doing block cache stuff, but then kept on working on it. And I still had some inconsistencies between the classes and everything was public, and it felt like a lot of the implementation was being exposed. So had James Simone was working with me on this, doing some code reviews and he pointed out there’s some opportunities to streamline some of this.
So yeah, we started moving towards doing an interface instead. I think it’s much nicer doing it that way. And I tried to walk through that in the article I wrote on Joys of Apex because especially when I first started years ago doing Apex, I didn’t know what interfaces were or why would I over use them. But you know, get into some situations like this where you start building it out and you start iterating a little bit more and more, and then finally it’s just makes sense. Oh, an interface would solve a lot of problems here and make things a little more consistent across the board.

Josh Birk:
Yeah. I love that moment you get into where you just feel like your code’s sort of clashing with itself a little bit and it kind of like, it’s almost like you’re trying to solve the same problems you were trying to solve with the original Platform Cache implementation. You’ve abstracted it out to a layer, but now you’re just like, you have that overwhelming feeling of clunkiness that you want to solve.

Jonathan Gillespie:
That’s right.

Josh Birk:
Any other features of the Logger you want to call out on

Jonathan Gillespie:
Of the Logger or the Cacher?

Josh Birk:
Oh, sorry, the Cacher. We have a whole other episode of the Logger. It’s true.

Jonathan Gillespie:
I mean, I think for me it’s the big takeaways or why an Apex developer might want to use this is it just solves a lot of the headaches for you. Stuff that you could for sure write your own cache management class, but there’s just some nuances of dealing with Platform Cache that I’ve found a little either goofy or confusing to deal with initially. So it really helps simplify if you just want to be able to use caching, it really takes away some of the headaches with that. And then to me, one of the best things you can do when doing these kinds of developments is making things configurable. So again, that the custom metadata types, I think are huge wins for both admins and developers. If there’s any kind of issue that goes wrong in prod, you can disable caching easily. You can change how long the session, or how long they caching lasts.
And there’s actually has second custom metadata type as well where through configuration, you can define data that would be automatically stored into cache. So if you want something automatically populated into either org or session cache, you can create a custom metadata record and have some static version of it could be JSON or string or whatever data type you need to use. And then that will automatically get loaded the next time the Cache Manager is used. So a couple edge cases and things that maybe people don’t need to use all the time or very infrequently, but as soon as something goes wrong with code that’s using the cache management and you need to temporarily disable it or you need to override some value that’s being used in the cache, being able to do all that through configuration and not having to redeploy your code is a huge time saver, I think.

Josh Birk:
Yeah. Well, and it just feels like back in the day when everybody was writing their own REST API manager, I think there was 1,000 different variants of that out there at some point. So before y’all consider writing your own wrapper around the Platform Cache stuff, let’s check this out. Where can people learn more about this?

Jonathan Gillespie:
Best place right now is on either joysofapex.com. The most recent article, at the moment at least, is one I wrote about it. Or you can go to GitHub and look at the info that way. I can definitely provide the links. But the link for the GitHub is github.com/johngpie/nebulacachemanager.

Josh Birk:
And that’s our show. Now before we go, since Jonathan is another repeat guest, instead of asking after his favorite non-technical hobby, I asked what hobby he’d really like to try?

Jonathan Gillespie:
Oh, woodworking for sure.

Josh Birk:
Nice.

Jonathan Gillespie:
My wife and I bought a house last summer. We got a lot of projects that need to get done. A lot of new furniture we need to have because we have more space now than we did in our apartment. So functionally and just for fun woodworking sounds like a blast.

Josh Birk:
I want to thank Jonathan for the great conversation information. And as always, I want to thank you for listening now. If you want to learn more about this show, head on over to developer.salesforce.com/podcast where you can hear old episodes. See the show notes at links favorite podcast service. Thanks again everybody, and I’ll talk to you next week.

Show Highlights:

Links:

Episode Transcript