Newer Version Available

This content describes an older version of this product. View Latest

Initiate Text Extraction Action

Extract text from an uploaded document through the Amazon Textract API.

You can automate the Intelligent Form Reader’s text detection and extraction step using this invocable action.

Special Access Rules

This action is available in API version 58.0 and later for users with the AWSTextract1000LimitAddOn or IntelligentDocumentReaderAddOn license.

Supported REST HTTP Methods

URI
/services/data/vXX.X/actions/standard/initiateTextExtraction
Formats
JSON, XML
HTTP Methods
POST
Authentication
Authorization: Bearer token

Inputs

Input Details
contentDocumentId
Type
string
Description
Required. The unique content document ID of the uploaded document to initiate text extraction.

You can specify up to 20 content Document IDs.

Note

endPageIndex
Type
integer
Description
Optional. The page number up to which the text must be extracted. The default value is the last page number in the specified document.
ocrService
Type
picklist
Description
Optional. The name of the OCR service that extracts text from the document. Valid values are:
  • AMAZON_TEXTRACT - Indicates AWS Document service.
  • AMAZON_TEXTRACT_ANALYZE_ID - Indicates AWS Analyze ID service.
startPageIndex
Type
integer
Description
Optional. The page number to start text extraction. By default, the starting page number is 1.

You can extract text from up to 20 pages in a specified document.

Note

Outputs

Output Details
ocrDocumentScanResultDetails
Type
string
Description
A comma-separated list containing an OcrDocumentScanResult ID and a page number for each extracted page of the specified document.

Example

Sample Request

1{
2   "inputs":[
3      {
4         "contentDocumentId":"069T10000004FnoIAE",
5         "startPageIndex":1,
6         "endPageIndex":20,
7         "ocrService":"AMAZON_TEXTRACT"
8      }
9   ]
10}

Sample Response

1[
2   {
3      "actionName":"initiateTextExtraction",
4      "errors":null,
5      "isSuccess":true,
6      "outputValues":{
7         "ocrDocumentScanResultDetails":{
8            "ocrDocumentScanResults":[
9               {
10                  "pageNumber":1,
11                  "ocrDocumentScanResultId":"0ixT100000000bv"
12               }
13            ]
14         }
15      },
16      "version":1
17   }
18]