Supported Models for Models API
The Models API supports large language models (LLMs) from multiple providers, such as Amazon Bedrock, Azure OpenAI, OpenAI, and Vertex AI from Google. The API also supports several types of model, some that can be configured from within Einstein Studio and some that you can manage in your own environment. See supported model types for more information.
Announcements for new models and model deprecations are part of the Einstein Platform release notes on a monthly basis.
Model deprecation is the process where a model provider gradually phases out a model (usually in favor of a new and improved model). The process starts with an announcement outlining when the model will no longer be accessible or supported. The deprecation announcement usually contains a specific shutdown date. Deprecated models are still available to use until the shutdown date.
After the shutdown date, you won’t be able to use that model in your application and requests to that model will be rerouted to a replacement model. We recommend that you start migrating your application away from a model as soon as its deprecation is announced. During migration, update and test each part of your application with the replacement model that we recommend.
These models are deprecated and will be shut down in the future.
Deprecated Model | Recommended Replacement | Deprecated Date | Shutdown Date |
---|---|---|---|
Azure OpenAI GPT 3.5 Turbo 16k | Azure OpenAI GPT 3.5 Turbo | 2023-11-06 | 2024-11-01 |
OpenAI GPT 3.5 Turbo 16k | OpenAI GPT 3.5 Turbo | 2023-11-06 | 2024-09-13 |
OpenAI GPT 4 32k | To be determined | 2024-06-06 | 2025-06-06 |
The following models are available to use with the Models API.
The context window determines how many input and output tokens the model can process in a single request. The context window includes system messages, prompts, and responses.
The latest versions of GPT 3.5 Turbo and GPT 4 Turbo have a hard limit of 4,096 tokens on output, despite their extended context window for input. GPT-4o mini has a hard limit of 16,384 output tokens.
All models are currently limited to a context size of 32,768 tokens when data masking is turned on in the Einstein Trust Layer. To turn off data masking and use the full context window, see Set Up Einstein Trust Layer in Salesforce Help.
Providers | Model | Good For | Context Window |
---|---|---|---|
Amazon Bedrock | Anthropic Claude 3 Haiku | Most tasks (balanced) | 200,000 tokens |
Azure OpenAI, OpenAI | Ada 002 | Retrieval-augmented generation | 8,191 tokens |
Azure OpenAI, OpenAI | GPT 3.5 Turbo | Most tasks (balanced) | 16,385 tokens |
Azure OpenAI, OpenAI | GPT 3.5 Turbo 16k | Deprecated | 16,385 tokens |
Azure OpenAI, OpenAI | GPT 3.5 Turbo Instruct | Specialized prompting | 16,385 tokens |
Azure OpenAI, OpenAI | GPT 4 | Deprecated | 8,192 tokens |
Azure OpenAI, OpenAI | GPT 4 32k | Deprecated | 32,768 tokens |
Azure OpenAI, OpenAI | GPT 4 Omni (GPT-4o) | Advanced tasks (latest model) | 128,000 tokens |
OpenAI | GPT 4 Omni Mini (GPT-4o mini) | Low latency tasks (latest model) | 128,000 tokens |
Azure OpenAI, OpenAI | GPT 4 Turbo | Advanced tasks (older model) | 32,768 tokens |
Vertex AI (Google) via BYOLLM | Gemini 1.5 Pro | Advanced tasks | 128,000 tokens |
For information about the different ways to use these models, see supported model types.
To access an LLM with the Models API, you must know its API name.
Most endpoints for the Models API require the model’s API name in the URL path. For example, to use the GPT 3.5 Turbo model with the /generations
endpoint, the URL looks like this:
The API name is also required in the modelName
property when making a Models API request using Apex. For example:
The API name is a string made up of substrings:
- Namespace:
sfdc_ai
- Separator:
__
- Configuration name:
Default
- Provider name:
OpenAI
(not used with most geo-aware models) - Model name:
GPT35Turbo
(not used with BYOLLM and custom-configured models)
To look up the API name of any custom or standard model configuration in Einstein Studio:
- Go to the Models page.
- Click the Generative tab.
- Click the name of a configured model.
- The API name is shown in the configured model details.
The Models API supports these types of models:
- Geo-aware Models (recommended) to access models that automatically route to a nearby data center.
- Salesforce-managed Models to access models that are entirely within the Salesforce Trust Boundary.
- Default Configured Models to access models hosted by Salesforce using default configurations. You can also configure these models in Einstein Studio and then use those custom-configured models.
- Bring Your Own LLM to access one of various Salesforce-supported models that you host yourself.
- LLM Open Connector Models to access any model on any platform.
A geo-aware model automatically routes your LLM request to a nearby data center based on where Data Cloud is provisioned for your org. Geo-aware routing offers greater control over data residency, and using nearby data centers minimizes latency. We recommend using geo-aware models whenever possible.
Proximity to the nearest LLM server is determined by the region in which your Einstein generative AI platform instance is located. If you enabled the Einstein generative AI platform on or after June 13, 2024, then your Einstein generative AI platform region is the same as your Data Cloud region (Data Cloud: Data Center Locations). Otherwise, contact your Salesforce account executive to learn where it’s provisioned.
To learn more about geo-aware routing, see Geo-Aware LLM Request Routing in Salesforce Help.
Use the following API names for each model type.
Model Name | API Name | Notes |
---|---|---|
Azure OpenAI Ada 002 | sfdc_ai__DefaultTextEmbeddingAda_002 | Embeddings only |
Azure OpenAI GPT-3.5 Turbo | sfdc_ai__DefaultGPT35Turbo | |
Azure OpenAI GPT-3.5 Turbo 16K | sfdc_ai__DefaultGPT35Turbo_16k | Deprecated |
OpenAI GPT 3.5 Turbo Instruct | sfdc_ai__DefaultGPT35TurboInstruct | |
Azure OpenAI GPT-4 | sfdc_ai__DefaultGPT4 | Older GPT-4 model |
Azure OpenAI GPT-4o | sfdc_ai__DefaultGPT4Omni | Latest GPT-4 model |
Azure OpenAI GPT-4 Turbo | sfdc_ai__DefaultGPT4Turbo | Older GPT-4 model |
Requests are routed to a nearby data center provided by Azure OpenAI and hosted in one of its Azure availability zones.
If there’s a problem with the nearby data center, requests are routed to a data center provided by OpenAI in the United States. This fallback routing to the United States can’t be disabled.
For Brazil, Canada, the United States, and all other countries where geo-aware routing isn’t yet supported, the request is routed directly to OpenAI in the United States.
The Trust Layer also has separate data residency regions for:
- Data masking and toxicity detection models
- Audit Trail data stored in Data Cloud
This table describes the countries and data center regions where data resides or passes through for geo-aware models from OpenAI, such as GPT 3.5 Turbo.
Data Cloud Country | Trust Layer Country | Data Center Region | Fallback Region |
---|---|---|---|
Australia | Australia | Australia East | United States |
Brazil | United States and Brazil* | US East 2 / US West | Not applicable |
Canada | United States | US East 2 / US West | Not applicable |
France | Germany | France Central | United States |
India | India | India South | United States |
Italy | Germany | France Central | United States |
Japan | Japan | Japan East | United States |
Germany | Germany | France Central | United States |
Spain | Germany | France Central | United States |
Sweden | Germany | France Central | United States |
Switzerland | Germany | France Central | United States |
United Kingdom | Germany | UK South | United States |
United States | United States | US East 2 / US West | Not applicable |
All others | United States | US East 2 / US West | Not applicable |
*For Brazil, data masking models and toxicity detection models are hosted in the United States and Audit Trail data is hosted in Brazil.
The following geo-aware models are supported for each data center region.
Data Center Region | Ada 002 | GPT-3.5 Turbo | GPT-3.5 Turbo 16k | GPT-3.5 Turbo Instruct | GPT-4 | GPT-4o |
---|---|---|---|---|---|---|
Australia East | ✅ Azure OpenAI | ✅ Azure OpenAI | ✅ Azure OpenAI | ✅ Azure OpenAI | ||
France Central | ✅ Azure OpenAI | ✅ Azure OpenAI | ✅ Azure OpenAI | ✅ Azure OpenAI | ||
India South | ✅ Azure OpenAI | ✅ Azure OpenAI | ✅ Azure OpenAI | |||
Japan East | ✅ Azure OpenAI | ✅ Azure OpenAI | ✅ Azure OpenAI | |||
UK South | ✅ Azure OpenAI | ✅ Azure OpenAI | ✅ Azure OpenAI | |||
US East 2 | ✅ Azure OpenAI | ✅ Azure OpenAI | ✅ Azure OpenAI | ✅ OpenAI | ✅ Azure OpenAI | ✅ Azure OpenAI |
US West | ✅ Azure OpenAI | ✅ OpenAI | ✅ Azure OpenAI | ✅ Azure OpenAI |
Salesforce-managed models are operated on Amazon Bedrock infrastructure entirely within the Salesforce Trust Boundary. In contrast, other models are operated by Salesforce partners, either inside a shared trust zone or through the LLM provider directly using Einstein Studio’s bring your own LLM (BYOLLM) feature.
The first available Salesforce-managed model is Claude 3 Haiku from Anthropic. To learn more about Anthropic and the Claude 3 family of models, jump to Anthropic in the About the Providers section.
Salesforce-managed Model | API Name |
---|---|
Anthropic Claude 3 Haiku on Amazon | sfdc_ai__DefaultBedrockAnthropicClaude3Haiku |
This table describes the countries and Amazon Bedrock data center regions where data resides or passes through for geo-aware models from Anthropic, such as Claude 3 Haiku.
Data Cloud Country | Trust Layer Country | Amazon Bedrock Data Center |
---|---|---|
Australia | Australia | Asia Pacific (Sydney) |
Brazil | United States and Brazil* | South America (São Paulo) |
Germany | Germany | EU (Frankfurt) |
India | India | Asia Pacific (Mumbai) |
Japan | Japan | US West (Oregon) |
United States (East) | United States | US West (Oregon) |
United States (West) | United States | US West (Oregon) |
All others | United States | US East (N. Virginia) |
This table lists the API names for all the standard configuration models in Einstein Studio. These models don’t support geo-aware routing. In addition to these models, you can use the API name from any custom model configuration in Einstein Studio.
Model | API Name | Notes |
---|---|---|
Anthropic Claude 3 Haiku on Amazon | sfdc_ai__DefaultBedrockAnthropicClaude3Haiku | Salesforce-managed |
Azure OpenAI Ada 002 | sfdc_ai__DefaultAzureOpenAITextEmbeddingAda_002 | Embeddings only |
Azure OpenAI GPT 3.5 Turbo | sfdc_ai__DefaultAzureOpenAIGPT35Turbo | |
Azure OpenAI GPT 3.5 Turbo 16k | sfdc_ai__DefaultAzureOpenAIGPT35Turbo_16k | Deprecated |
Azure OpenAI GPT 4 Turbo | sfdc_ai__DefaultAzureOpenAIGPT4Turbo | Not supported by Models API. Use BYOLLM instead. |
OpenAI Ada 002 | sfdc_ai__DefaultOpenAITextEmbeddingAda_002 | Embeddings only |
OpenAI GPT 3.5 Turbo | sfdc_ai__DefaultOpenAIGPT35Turbo | |
OpenAI GPT 3.5 Turbo 16k | sfdc_ai__DefaultOpenAIGPT35Turbo_16k | Deprecated |
OpenAI GPT 4 | sfdc_ai__DefaultOpenAIGPT4 | Older GPT-4 model |
OpenAI GPT 4 32k | sfdc_ai__DefaultOpenAIGPT4_32k | Deprecated |
OpenAI GPT 4 Omni (GPT-4o) | sfdc_ai__DefaultGPT4Omni | Latest GPT-4 model. Geo-aware. |
OpenAI GPT 4 Omni Mini (GPT-4o mini) | sfdc_ai__DefaultOpenAIGPT4OmniMini | Low latency version of GPT-4o. |
OpenAI GPT 4 Turbo | sfdc_ai__DefaultOpenAIGPT4Turbo | Older GPT-4 model |
The Models API doesn’t support OpenAI’s snapshot model names, such as gpt-3.5-turbo-0613
. Always test your prompts to make sure that they perform as expected with new model versions.
When you bring your own LLM, you consume 30% fewer Einstein Requests compared to other models. For details, see Einstein Usage.
The Models API supports Einstein Studio’s bring your own LLM (BYOLLM) feature, which currently supports Amazon Bedrock, Azure OpenAI, OpenAI, and Vertex AI from Google as foundation model providers. With BYOLLM, you can add a foundation model from a supported provider, configure your own instance of the model, and connect to the model using your own credentials. Although inference is handled by the LLM provider, the request is still routed through the Models API and Trust Layer features are fully supported.
To connect any language model (including custom-built models) to Einstein Studio's BYOLLM feature, you can use the LLM Open Connector. See the Einstein AI Platform GitHub repository for API specifications and example code for the LLM Open Connector.
Using a BYOLLM model with the Models API is the same as any other model. Look up the API Name of the configured model in Einstein Studio and use it as the {modelName}
in the REST endpoint path or as the modelName
property of the Apex request object.
This table lists all the foundation models that you can add in Einstein Studio with BYOLLM.
Provider(s) | Model | Notes |
---|---|---|
Amazon Bedrock | Claude 3 Haiku | |
Amazon Bedrock | Claude 3 Sonnet | |
Amazon Bedrock | Claude 3 Opus | |
Amazon Bedrock | Claude 3.5 Sonnet | |
Azure OpenAI, OpenAI | GPT 3.5 Turbo | |
Azure OpenAI, OpenAI | GPT 3.5 Turbo 16k | Deprecated |
Azure OpenAI, OpenAI | GPT 4 Omni (GPT-4o) | Latest GPT-4 model |
Azure OpenAI, OpenAI | GPT 4 Turbo | Older GPT-4 model |
OpenAI | GPT 4 | Older GPT-4 model |
OpenAI | GPT 4 32k | Deprecated |
Vertex AI (Google) | Gemini Pro 1.5 |
To learn more about BYOLLM, see Bring Your Own Large Language Model in Einstein 1 Studio on the Salesforce Developers Blog.
The Bring Your Own Large Language Model (BYOLLM) Open Connector is designed to provide powerful AI solutions to customers, independent software vendors (ISVs), and internal Salesforce teams. With this connector, you can connect the Einstein AI Platform to any language model, including custom-built models.
The BYOLLM Open Connector is a commitment to community-driven growth and innovation. By allowing users to integrate any LLM—from those models hosted on major cloud platforms to those models developed in-house—we're opening up a world of possibilities for enhanced, bespoke AI applications. This capability not only caters to the needs of large enterprises looking to leverage specific models like IBM Granite or Databricks DBRX, but also supports smaller teams eager to experiment with open-source models. With features designed to ensure ease of use, such as a streamlined UX in Einstein Studio and API specifications closely based on the OpenAI API, this connector empowers our users to enhance their AI-driven applications while maintaining high standards of security and compatibility.
See the Einstein AI Platform GitHub repository for API specifications and example code for the LLM Open Connector.
For most tasks, choose a model that offers a balance of many criteria like Claude 3 Haiku or GPT 3.5 Turbo.
To choose the right model for your application, consider these criteria.
Capabilities: What can the model do? Advanced models can perform a wider variety of tasks (usually at the expense of higher costs and slower speeds—or both). The ability to follow complex instructions is a key indicator of model capabilities.
Cost: How much does the model cost to use? For details on usage and billing, see Einstein Usage.
Quality: How well does the model respond? The quality of model responses can be hard to measure quantitatively, but a good place to start is the LMSYS Chatbot Arena.
Speed: How long does it take the model to complete a task? Includes measures of latency and throughput.
For benchmarks and evaluations of LLMs and embedding models, see these resources.
- Artificial Analysis: Aggregated data on LLM performance.
- LLM Benchmark for CRM: Evaluation of LLMs for Sales and Service use cases. Provided by Salesforce AI Research.
- LMSYS Chatbot Arena: Human scoring of LLMs. Anyone can participate!
- MTEB Leaderboard: Benchmarks for embedding models from Huggingface.
- SEAL Leaderboard: Evaluations of LLMs using private datasets from Scale AI.
Salesforce has partnered with several LLM providers to offer you a wide range of models to choose from. Learn more about each provider and what they have to offer.
Amazon Bedrock is a managed service by Amazon Web Services (AWS) for hosting LLMs from leading AI companies. Salesforce has partnered with Amazon to provide a Salesforce-managed version of Anthropic’s Claude 3 Haiku model. You can also use Claude 3 Haiku, Claude 3 Sonnet, Claude 3.5 Sonnet, and Claude 3 Opus via Amazon Bedrock with Einstein Studio’s BYOLLM feature.
Anthropic’s LLMs are designed to be interpretable, steerable, and aligned with human values. They use a novel approach to model alignment that they call “constitutional AI,” and their models are known for their long context windows. The version of Anthropic’s Claude 3 Haiku model that is supported by the Models API is managed by Salesforce entirely within the Trust Boundary.
To learn more about the Claude family of models, see Anthropic’s Claude page.
The Azure OpenAI service, offered by Microsoft, enables Salesforce to provide models developed by OpenAI with additional enterprise features that aren’t yet offered by OpenAI themselves. Features include:
- Regional availability outside of the United States
- Certified compliance with HIPAA, ISO27001, SOC 1, SOC 2 (type 1 and 2), and SOC 3
- More formal processes and procedures for access control, data management, and security testing
To learn more about a particular model, see Azure OpenAI’s models overview.
OpenAI is one of the best-known AI labs due to the popularity of their ChatGPT product. Their GPT 4 series of models is focused on advanced capabilities, while the GPT 3.5 series is optimized for speed.
To learn more about a particular model, see OpenAI’s models overview.
Google’s Gemini Pro 1.5 model is available through their Vertex AI service and can be added as foundation model using Einstein Studio’s BYOLLM feature. Google’s models are the product of longstanding investment in AI research and vast amounts of data and compute. Their Gemini models are known for their advanced capabilities, long context windows, and excellent performance on recall tasks.
To learn more about the Gemini family of models, see Google’s Gemini models overview.
- Models API Developer Guide: Access Models API with REST
- Models API Developer Guide: Access Models API with Apex
- Models API Developer Guide: Rate Limits for Models API
- Models REST API Reference