Extract Data from Invoices (Beta)

The Invoice solution extracts important information such as invoice number, invoice date, due date, total amount, etc., from any invoice image or PDF. In real world, there are many key word variations for the same concept. For instance, “invoice #”, “invoice number”, “INV No.”, etc., all refer to the same thing. Our solution leverages machine learning models to understand the document context, consolidates the key variations in the same entity (for example, invoice_number ) and extracts the correct values accordingly.

The Einstein OCR invoice solution currently supports the list of entities below.

Entity NameKey Variation Examples
invoice_numberinvoice number, invoice #, invoice, invoice ID...
invoice_dateinvoice date, date, ...
due_datedue date, due on, ...
purchase_orderPO number, PO#, purchase order, ...
total_amounttotal, total amount, ...
total_tax_amounttax, total tax, ...
amount_duedue, amount due, ...

When you call the API, send in the form as an image or PDF, set task to invoice and specify the tabulatev2 modelId. The JSON response contains entity-value pairs for each field in the form.

ExtractDataFromInvoicesExample

In the example above, the extracted entity value pairs are:

EntityValue
invoice_number940226
invoice_date2/18/1994
due_date3/20/1994
total_amount852.8