Dataset and Model Best Practices
-
A dataset can have up to 500 labels, but we recommend a maximum of 100 labels for better model accuracy.
-
If you have a dataset that contains a lot of classes, increase the number of examples per label.
-
We recommend that an Einstein Intent or Einstein Sentiment dataset contain a maximum of 100 labels. If you need more than 100 labels, consider hierarchical classification.
-
We recommend less than 150 words for the length of the intent or sentiment string. This guideline applies to both a language dataset example and a string sent into a model for prediction.
-
During the training process, special text formatting, like emojis, words in all uppercase , and punctuation aren’t included. For example, if you add a text example containing a smiley emoji to a dataset, the emoji isn’t considered during training. Only the text is used.
-
When you send in text for prediction, the model doesn’t consider special text formatting and punctuation. For example, when you send the string “We had a great time! :)” to the model, the model returns a prediction for the string “We had a great time”.
-
Batch predictions aren’t supported. When you send text in for a prediction, you make a single API call to the
/intent
endpoint or the/sentiment
endpoint.