The Einstein Audit and Feedback Data Model in Data Cloud

In a previous blog post, we covered how the Einstein Trust Layer provides a trusted and secure infrastructure for your generative AI applications. As you roll out generative AI across your enterprise, you’ll need to continuously monitor and optimize how your users interact with this technology.

Einstein audit and feedback data is stored in Data Cloud, and this blog post walks you through the core concepts and the basics of accessing the data with Data Cloud-native tools in your Salesforce org.

What kind of data is stored within Data Cloud?

For an end-to-end understanding, we’ll use a sales email generator application as our example. The primary user of such an app would be a sales agent who uses the app to generate emails for their customers. This could be, for example, a prompt like “Generate a sales pitch email for John Doe from Jane at Fit Shoes Inc. pitching the new sports shoe line launched this year,” resulting in email text generated by a large language model (LLM).

Upon receiving such a request in the form of an input prompt, the chosen LLM, and other request parameters, Einstein Generative AI Platform strips any personally identifiable information (PII) from the input prompt (with a certain confidence score and probability of error), and sends the prompt to the chosen LLM. Upon receiving the response from the LLM, the platform calculates a safety score for the response across various safety categories, adds back the stripped PII to the response, and sends back the response to the user (e.g., the sales agent).

Upon receiving the response, the sales agent would review the response and choose to accept it as-is, edit the response, or choose to regenerate the response. These user actions are collected as “implicit feedback” by the system. Upon receiving an acceptable response, the sales agent may choose to provide feedback with any additional notes/details to the system about the specific generation.

With this use case in mind, now let’s walk through the data model in detail. We’ll divide the tables into the following categories:

Request and Response
Content Quality
Feedback

Each of these categories contains one or more types of data, which are then reflected as dedicated data model objects (DMO) within Data Cloud.

Note: The DMO names are always appended with “__dlm” and the field names are always appended with “__c” when using these entities in queries.

Einstein Audit and Feedback Data Model.

Request and response DMOs

The request and response DMOs capture the interactions between the user and the Einstein Trust Layer. Simply put, they capture what is sent to the Salesforce generative AI layer and the large language models (LLM) response. These DMOs include:

GenAIGatewayRequest (see docs): Captures incoming requests
GenAIGatewayRequestTags (see docs): Captures any request tags sent to the Salesforce generative AI layer
GenAIGatewayResponse (see docs): Stores responses generated by the system
GenAIGeneration (see docs): Details the specific generative outputs

Let’s put this in context of our ongoing sales email generator example, where a sales agent requests the system to generate an email pitch. The request details, such as the input prompt and parameters, are stored in the GenAIGatewayRequest DMO. The generated email response is then stored in the GenAIGatewayResponse DMO, and the specific text generated is detailed in the GenAIGeneration DMO.

More on the `GenAIGatewayRequestTag`

Every generative AI platform request can contain a free-form (key-value) dictionary of tags. The dictionary can contain any number of keys and their corresponding values, which can be simple strings, number, dates, or even dictionaries (which are serialized as JSON strings). Each tag key-value pair is written as a row to the GenAIGatewayRequestTag table with the corresponding generative AI platform request’s primary key as the parent field.

Each generative feature can have its own set of tags and tag-values. For example, for the plannerservice (EinsteinCopilot) feature, a tag can be prompt_template_dev_name and the corresponding tagValue could be AiCopilot__IntentClassifier. For the Service replies feature, a common set of tags/tagValues is org_has_ai_trust_pii_masking_enabled and its value could be true or false, and org_has_ai_trust_perms and its value could be true or false.

Please note, as generative features evolve, the set tags and the corresponding values may change.

Content quality DMOs

All responses that are generated by a large language model (LLM) are checked for quality and safety by the Einstein Trust Layer. The results of those checks are stored as follows:

GenAIContentQuality (see docs): Evaluates the quality of the generated content
GenAIContentCategory (see docs): Categorizes the content based on various safety and quality metrics

Coming back to the previous scenario: in the sales email generator application, after generating an email, the system evaluates the content for quality and safety. The GenAIContentQuality DMO that stores a summary of the content quality, for example, is given generation toxic or not. The GenAIContentCategory DMO stores details about the various scores assigned to the content for various safety categories. Content scores range from 0.0 to 1.0, where higher means more likely.

The eight safety categories that are provided are:

Toxicity
Hate
Identity
Violence
Physical
Sexual
Profanity
Biased

Feedback DMOs

The feedback DMOs capture user feedback on the generated content. Feedback can either be given explicitly by the user, like clicking a UI control (e.g., thumbs up/thumbs down), or implicitly, for example, when a response was generated, but not used or was edited.

The feedback DMOs include:

GenAIFeedback (see docs): Stores explicit and implicit feedback from users
GenAIFeedbackDetail (see docs): Provides detailed feedback on actions taken by users
GenAIAppGeneration (see docs): Stores any GenAI App-specific updates on the generated response

Feedback in our example could look like this: After the sales agent receives the generated email, they can provide feedback by accepting, editing, or rejecting the response. This feedback is captured in the GenAIFeedback DMO. If the agent edits the email, the specific changes and actions are detailed in the GenAIFeedbackDetail DMO, helping the system learn and improve future generations.

More on `GenAIAppGeneration`

GenAI apps can transform generated responses to fit certain requirements, and present those transformed responses to the user. The user, upon reviewing such responses, then provides feedback on these transformed responses, unaware of the fact that the actual generation was something else.

Transforms can be of many types, for example, a Salesforce GenAI app may decide to translate the generated response into a different language, or the generated response can be broken down into multiple generations and presented to the user as “n” different generations.

When the app “modifies” a generation, we refer to it an AppGeneration (alluding to the fact this “state” of the generation/response was actually generated by the Salesforce Gen AI app) and it has a corresponding GenerationUpdateId and a GenerationUpdate (the transformed text).

Feedback DMO associations and cardinality

Each request can result in 1 or n generations, and each generation can have 0 or n feedback points associated with it, i.e., a user can provide feedback for the same generation multiple times or provide no feedback at all. Hence, if we query the data model for “all requests with feedback,” we will see repeated request_ids if there are multiple generations on that request, and/or if each generation has multiple feedback points associated with it.

How to access the data

We’re providing some pre-built reports and dashboards that you can readily use to report on the data. This Salesforce Admin blog post walks you through some of the possibilities that are available to you with those pre-built options.

Now, as mentioned in the introduction, it is essential to build up a continuous optimization loop to ensure that your users have a great experience when interacting with generative AI. A key strength of Data Cloud is being able to act on data, so this is what we’ll focus on next.

We will focus on accessing the data via a Data Cloud-native tool called Query Editor, and then in upcoming blog posts, we’ll explore other methods to access and consume the data.

Data Cloud Query Editor is a tool built into your Salesforce org’s Data Cloud Home page, as shown below.

Data Cloud Query Editor

Query Editor is primarily used to develop ANSI SQL queries to access data from Data Cloud DMOs, DLOs, etc. It highlights the true power of Data Cloud by allowing ANSI SQL queries on DMOs, which are also accessible via Salesforce SOQL queries (more on this in upcoming blog posts).

Now, let’s focus on writing queries for accessing our Salesforce Generative AI Audit and Feedback data via Query Editor. Let’s write a query to access “all input prompts across all applications, the corresponding response text, and any available feedback,” and the ANSI SQL query would be as follows.

The following is what the output looks like.

Query Editor output

The following are a few more query examples (not an exhaustive list).

All toxic LLM responses using boolean identification and not based on individual safety category scores:

Responses with a Violence score above a certain threshold:

Query all prompt templates with a specific `GenAIApp` request tag:

NOTE: For textual comparisons in SQL queries, we prefer using the LIKE operator over algebraic operators such as =.

Prompt templates with feedback:

Conclusion

In summary, the Einstein Audit and Feedback Data Model in Data Cloud equips you with powerful tools to monitor, analyze, and optimize your generative AI interactions. You learned on a high level about the different categories of data that are available to you, how the data is generated based on the interaction with the Einstein Trust Layer, as well about some of the options to act on the data. In an upcoming blog post, we’ll dive deeper into some of the most common scenarios and best practices around actioning on the different DMOs. Stay tuned!

About the Author

Makarand Bhonsle is a Principal Engineer with the Einstein Generative AI Platform team, and is the architect of the Einstein Audit and Feedback Framework.