Extract Data from Invoices (Beta)

The Invoice solution extracts important information such as invoice number, invoice date, due date, total amount, etc., from any invoice image or PDF. In real world, there are many key word variations for the same concept. For instance, “invoice #”, “invoice number”, “INV No.”, etc., all refer to the same thing. Our solution leverages machine learning models to understand the document context, consolidates the key variations in the same entity (for example, invoice_number ) and extracts the correct values accordingly.

The Einstein OCR invoice solution currently supports the list of entities below.

Entity Name	Key Variation Examples
`invoice_number`	invoice number, invoice #, invoice, invoice ID...
`invoice_date`	invoice date, date, ...
`due_date`	due date, due on, ...
`purchase_order`	PO number, PO#, purchase order, ...
`total_amount`	total, total amount, ...
`total_tax_amount`	tax, total tax, ...
`amount_due`	due, amount due, ...

When you call the API, send in the form as an image or PDF, set task to invoice and specify the tabulatev2 modelId. The JSON response contains entity-value pairs for each field in the form.

ExtractDataFromInvoicesExample

In the example above, the extracted entity value pairs are:

Entity	Value
`invoice_number`	940226
`invoice_date`	2/18/1994
`due_date`	3/20/1994
`total_amount`	852.8