Dataset and Model Best Practices

  • A dataset can have up to 500 labels, but we recommend a maximum of 100 labels for better model accuracy.

  • If you have a dataset that contains a lot of classes, increase the number of examples per label.

  • We recommend that an Einstein Intent or Einstein Sentiment dataset contain a maximum of 100 labels. If you need more than 100 labels, consider hierarchical classification.

  • We recommend less than 150 words for the length of the intent or sentiment string. This guideline applies to both a language dataset example and a string sent into a model for prediction.

  • During the training process, special text formatting, like emojis, words in all uppercase , and punctuation aren’t included. For example, if you add a text example containing a smiley emoji to a dataset, the emoji isn’t considered during training. Only the text is used.

  • When you send in text for prediction, the model doesn’t consider special text formatting and punctuation. For example, when you send the string “We had a great time! :)” to the model, the model returns a prediction for the string “We had a great time”.

  • Batch predictions aren’t supported. When you send text in for a prediction, you make a single API call to the /intent endpoint or the /sentiment endpoint.