Large language models (LLMs), such as GPT-4, Llama3, Gemini, etc., have taken the world by storm with their remarkable ability to understand and generate human-like text, code, and images. However, these powerful models are typically trained on broad datasets, making them generalists. For many real-world applications, we need models that can handle specialized tasks and domains.

Imagine that you’ve just hired a talented assistant — bright and eager to learn — but one who needs some specific training to excel in your unique work environment. That’s the essence of how fine-tuning LLMs can help you in your enterprise-specific use cases.

In this guide, we will explain LLM fine-tuning and walk through the different types of fine-tuning methodologies.

What is fine-tuning?

Fine-tuning is the process of taking a pre-trained LLM and training it further on a smaller, task- or domain-specific dataset. This allows the model to specialize and perform better at that particular task. While it may sound complicated, the core idea is quite simple: we are essentially teaching a knowledgeable model some new skills and knowledge.

For example, if we want an LLM to assist with a legal contract review, we would fine-tune it on a dataset of legal contracts. This allows the model to learn and understand the nuances of legal language and the specifics of contract terminology, structure, and reasoning.

The key advantage is that we don’t have to train a massive model from scratch, which requires vast computational resources. Instead, we can efficiently specialize a foundation model to our needs, making it feasible to create custom LLM solutions.

Unpacking the methods of fine-tuning

There are several types of fine-tuning methodologies. Let’s take a closer look at each.

Fine-tuning methodologies

1) Supervised fine-tuning

Supervised fine-tuning allows you to leverage the power of a pre-trained LLM and tailor it to your specific needs, making it a valuable tool for various tasks.

In supervised fine-tuning, you train the LLM on a dataset that contains labels. You would prepare a dataset specific to the target task or domain, where each example consists of an input text and a corresponding output label or target text. This dataset should be representative of the task you want the model to perform.

Supervised fine-tuning is widely used in various natural language processing tasks, such as text classification, named entity recognition, question answering, summarization, machine translation, and more. By fine-tuning task-specific labeled data, the pre-trained model can learn to specialize in the desired task while retaining its general language understanding capabilities.

2) Parameter-efficient fine-tuning (PEFT)

An interesting area of research in LLM fine-tuning centers on reducing the costs of updating the parameters of the models. This is the goal of parameter-efficient fine-tuning (PEFT), a set of techniques that try to reduce the number of parameters that need to be updated.

There are various PEFT techniques. One of them is low-rank adaptation (LoRA), a technique that has become especially popular among open-source language models. The idea behind LoRA is that fine-tuning a foundation model on a downstream task does not require updating all of its parameters. There is a low-dimension matrix that can represent the space of the downstream task with very high accuracy.

Fine-tuning with LoRA trains this low-rank matrix instead of updating the parameters of the main LLM. The parameter weights of the LoRA model are then integrated into the main LLM or added to it during inference. LoRA can cut the costs of fine-tuning by up to 98%. It also helps store multiple, small-scale, fine-tuned models that can be plugged into the LLM at runtime.

3) Reinforcement learning human feedback (RLHF)

Fine-tuning based on reinforcement learning from human feedback (RLHF) is a technique used to fine-tune large language models by leveraging human feedback and reinforcement learning principles. Unlike supervised fine-tuning which relies on a labeled dataset, RLHF fine-tuning uses human preferences and rewards to guide the model’s training process.

Fine-tuning capabilities for internal uses

Important steps in RLHF

  • Human feedback collection: Collect human feedback on the model’s outputs by presenting the model with prompts or tasks and gathering ratings, rankings, or other forms of feedback from human annotators or raters.
  • Reward modeling: Train a separate reward model, often using supervised learning techniques, to predict the human-assigned rewards or preferences based on the model’s outputs and the prompts or tasks.
  • Reinforcement learning: Use reinforcement learning algorithms, such as Proximal Policy Optimization (PPO) or other policy gradient methods, to fine-tune the pre-trained model. During this process, the model generates outputs, and the reward model evaluates these outputs, providing rewards or preferences that guide the reinforcement learning process.

It’s important to note that RLHF fine-tuning can be computationally expensive and requires a significant amount of human feedback and iterative training. Additionally, the quality and consistency of the human feedback can greatly impact the effectiveness of the fine-tuning process.

Common fine-tuning use cases for CRM

There are several common fine-tuning use cases for large language models (LLMs) that can leverage the powerful capabilities of these models while tailoring them to specific business needs. Here are some examples:

  • A Service Cloud customer wants to fine-tune LLMs to automatically generate concise summaries of customer service cases or support tickets based on the detailed case descriptions
  • A Marketing Cloud customer wants to fine-tune LLMs based on their labeled data from Data Cloud to deliver better results in alignment with their brand and voice.
  • An industry vertical cloud customer want to fine-tune their banking, legal, or healthcare LLMs with their enterprise-specific data to deliver high-precision results
  • A Service/Marketing Cloud customer wants to improve LLM output by fine-tuning with user sentiment feedback data collected from the last month to increase their customer satisfaction score (CSAT)
  • A CRM customer wants to fine-tune a sales/service chatbot based on their historical data/knowledge available in Data Cloud

When not to use LLM fine-tuning

Here are some situations where fine-tuning is not possible or not useful:

  1. Limited fine-tuning service: Some models are only available through application programming interfaces (API) that have no, or limited, fine-tuning services.
  2. Limited data: You lack sufficient data specific to your task. Fine-tuning needs labeled examples to guide the model, and without enough, it can lead to poor performance.
  3. Frequent changes: Your target domain or task updates frequently. Fine-tuning captures a snapshot of knowledge, and a constantly evolving domain might render it outdated quickly.
  4. Highly dynamic content: You’re dealing with highly dynamic or user-specific content, and fine-tuning might struggle to adapt to the ever-changing nature of such data.
  5. Simpler tasks: The task is well-suited for simpler techniques or the few shot learning approach. Fine-tuning is a powerful tool, but for basic tasks, it might be overkill.

Before you start your fine-tuning journey

In general, fine-tuning is most beneficial when you have high-quality, task-specific data, sufficient computational resources, and a target task or domain that is relatively stable. If these conditions are not met, alternative approaches like prompt engineering or retrieval-augmentation generation (RAG) may be more appropriate.

Understanding your use case and ROI is critical before you can start your investment into fine-tuning your LLM.

Resources

About the authors

Manjeet Singh is a Senior Director of Product Management for Salesforce AI Cloud. He has 23 years of unique experience and learning from building and leading products at early-stage startups, large enterprises, and hyper-growth companies in Silicon Valley. He is passionate about transforming business and human lives through innovation and technology. Follow him on LinkedIn or Twitter.

Daryl Martis is the Director of Product Management for Einstein at Salesforce. He has over 10 years of experience in planning, building, launching, and managing world-class solutions for enterprise customers, including AI/ML and cloud solutions. Follow him on LinkedIn or Twitter.

Get the latest Salesforce Developer blog posts and podcast episodes via Slack or RSS.

Add to Slack Subscribe to RSS