Initiate Text Extraction Action | Public Sector Solutions Developer Guide

You can automate the Intelligent Document Reader’s text detection and extraction steps using this invocable action.

Special Access Rules

This action is available in API version 58.0 and later for users with the AWSTextract1000LimitAddOn or IntelligentDocumentReaderAddOn license.

Supported REST HTTP Methods

URI: /services/data/vXX.X/actions/standard/initiateTextExtraction
Formats: JSON, XML
HTTP Methods: POST
Authentication: Authorization: Bearer token

Inputs

Input	Details
configurationAPIName	Type string Description For internal use only.
contentDocumentId	Type string Description Required. The unique content document ID of the uploaded document to initiate text extraction. You can specify up to 20 content Document IDs.
documentTypeId	Type string Description Optional. The ID of the document type that contains the queries. These queries are used to retrieve the OCR Service. Available in API version 60.0 and later.
endPageIndex	Type integer Description Optional. The page number up to which the text must be extracted. The default value is the last page number in the specified document.
ocrService	Type picklist Description Optional. The name of the OCR service that extracts text from the document. Valid values are: `AMAZON_TEXTRACT`—Indicates the AWS Document service. `AMAZON_TEXTRACT_ANALYZE_ID`—Indicates the AWS Analyze ID service. `AMAZON_TEXTRACT_DETECT_TEXT`—Indicates the AWS Detect service that displays the text detected in a document. `PDF_DOCX_EXTRACT_TEXT`—Indicates the AWS Extract service that automatically extracts content from PDF files. Required if the `documentTypeId` property isn’t specified. The `ocrService` is retrieved based on the `documentTypeId` property.
startPageIndex	Type integer Description Optional. The page number to start text extraction. By default, the starting page number is 1. You can extract text from up to 20 pages in a specified document.

Outputs

Output	Details
ocrDocumentScanResultDetails	Type string Description A comma-separated list containing an OcrDocumentScanResult ID and a page number for each extracted page of the specified document.

Example

Sample Request

1{
2  "inputs": [
3    {
4      "contentDocumentId": "069T10000004FnoIAE",
5      "startPageIndex": 1,
6      "endPageIndex": 20,
7      "ocrService": "AMAZON_TEXTRACT",
8      "documentTypeId": "0deT10000004CCbIAM"
9    }
10  ]
11}

Sample Response

1[
2   {
3      "actionName":"initiateTextExtraction",
4      "errors":null,
5      "isSuccess":true,
6      "outputValues":{
7         "ocrDocumentScanResultDetails":{
8            "ocrDocumentScanResults":[
9               {
10                  "pageNumber":1,
11                  "ocrDocumentScanResultId":"0ixT100000000bv"
12               }
13            ]
14         }
15      },
16      "version":1
17   }
18]