Newer Version Available

This content describes an older version of this product. View Latest

Initiate Text Extraction Action

Extract text from an uploaded document by using the Amazon Textract API.

You can automate the Intelligent Document Reader’s text detection and extraction steps using this invocable action.

Special Access Rules

This action is available in API version 58.0 and later for users with the AWSTextract1000LimitAddOn or IntelligentDocumentReaderAddOn license.

Supported REST HTTP Methods

URI
/services/data/vXX.X/actions/standard/initiateTextExtraction
Formats
JSON, XML
HTTP Methods
POST
Authentication
Authorization: Bearer token

Inputs

Input Details
configurationAPIName
Type
string
Description
For internal use only.
contentDocumentId
Type
string
Description
Required. The unique content document ID of the uploaded document to initiate text extraction.

You can specify up to 20 content Document IDs.

documentTypeId
Type
string
Description
Optional. The ID of the document type that contains the queries. These queries are used to retrieve the OCR Service. Available in API version 60.0 and later.
endPageIndex
Type
integer
Description
Optional. The page number up to which the text must be extracted. The default value is the last page number in the specified document.
ocrService
Type
picklist
Description
Optional. The name of the OCR service that extracts text from the document. Valid values are:
  • AMAZON_TEXTRACT—Indicates the AWS Document service.
  • AMAZON_TEXTRACT_ANALYZE_ID—Indicates the AWS Analyze ID service.
  • AMAZON_TEXTRACT_DETECT_TEXT—Indicates the AWS Detect service that displays the text detected in a document.
  • PDF_DOCX_EXTRACT_TEXT—Indicates the AWS Extract service that automatically extracts content from PDF files.

Required if the documentTypeId property isn’t specified. The ocrService is retrieved based on the documentTypeId property.

startPageIndex
Type
integer
Description
Optional. The page number to start text extraction. By default, the starting page number is 1.

You can extract text from up to 20 pages in a specified document.

Outputs

Output Details
ocrDocumentScanResultDetails
Type
string
Description
A comma-separated list containing an OcrDocumentScanResult ID and a page number for each extracted page of the specified document.

Example

Sample Request

1{
2  "inputs": [
3    {
4      "contentDocumentId": "069T10000004FnoIAE",
5      "startPageIndex": 1,
6      "endPageIndex": 20,
7      "ocrService": "AMAZON_TEXTRACT",
8      "documentTypeId": "0deT10000004CCbIAM"
9    }
10  ]
11}

Sample Response

1[
2   {
3      "actionName":"initiateTextExtraction",
4      "errors":null,
5      "isSuccess":true,
6      "outputValues":{
7         "ocrDocumentScanResultDetails":{
8            "ocrDocumentScanResults":[
9               {
10                  "pageNumber":1,
11                  "ocrDocumentScanResultId":"0ixT100000000bv"
12               }
13            ]
14         }
15      },
16      "version":1
17   }
18]