DocumentScanner Data Types
DocumentScanner defines and uses several data types.
An object representing a scanned document. Returned as the result of a successful scan operation.
The following image illustrates how DocumentScanner might scan and structure a document (in this case, some sample text). There are three TextBlock
elements, each outlined in purple, and each TextBlock
is further broken down by line (TextLine
) and word or character (TextElement
).
Property Name | Type | Description | Example |
---|---|---|---|
imageBytes | String | A string containing the base64 image data of the scanned document. Only provided when returnImageBytes is set to true in your DocumentScannerOptions configuration object. | "2432BcYPJhSkzZjHS-Uiz1g8iZdQnRHtPnFKbMltJEc" |
text | String | A string value providing the recognized text from the scanned document. | "This is a demonstration of how the text in a document is detected and broken down to TextBlock objects." |
blocks | TextBlock[] | An array of TextBlock objects that represent a structured text result that is visually aligned with the corresponding image. See TextBlock for details of this structured text data. | blocks[0] = [ [ ["This"], ["is"], ["a"], ["demonstration"], ["of"], ["how"], ["the"], ["text"], ["in"], ["a"], ["document"], ], [ ["is"], ["detected"], ["and"], ["broken"], ["down"], ["to"], ["TextBlock"], ["objects."], ], ] |
An object representing a contiguous section of the scanned text. Text that is visually close together is grouped into a block of text. A document is made up of one to many blocks, and each block can be further broken down into smaller text elements: TextLine
(a single line of text in a visually aligned run of text) and TextElement
(an individual word or glyph).
Property Name | Type | Description | Example |
---|---|---|---|
text | String | A string containing the text content of the block. | "This is a demonstration of how the text in a document is detected and broken down to TextBlock objects." |
lines | TextLine[] | An array of TextLine objects, each of which represents a visually aligned line of text within the TextBlock. | lines = blocks[0].lines; lines[0] = [ ["This"], ["is"], ["a"], ["demonstration"], ["of"], ["how"], ["the"], ["text"], ["in"], ["a"], ["document"], ] |
recognizedLangCodes | String[] | The BCP-47 language code values for the languages detected in the recognized text. | ["en", "ja"] |
frame | Frame | An object containing the coordinates — position and size — that represent the bounding rectangle in the scanned image that contains the TextBlock. | { x: 100, y: 100, width: 650, height: 200 } |
cornerPoints | Point[] | An array of Point objects that define a closed shape within the scanned image that contains the TextBlock. | [ { x: 100, y: 100 }, { x: 750, y: 100 }, { x: 750, y: 300 }, { x: 100, y: 300 } ] |
frame
and cornerPoints
both represent shapes that enclose the TextBlock
in the scanned document image. You can use these regions to create user interfaces and displays for interacting with the scan results to further process them. For example, to assign different TextBlocks
or TextLines
to different form fields. An example of this would be scanning a business card into name, company, and contact fields.
The difference between the two is that frame
is a rectangle, the smallest that fully encloses the scanned text within the image, while cornerPoints
defines a not-necessarily rectangular shape, which more tightly encloses the scanned text. Use frame
for approximate shapes, and cornerPoints
when you want to be as close as possible to the scanned text.
The preceding example values of frame
and cornerPoints
represent the same region of a scanned document image. That’s unlikely to happen in a real world scan, where it’s hard to hold a camera perfectly aligned with a document. The nice, round numbers for the size of the region are similarly unlikely.
An object representing a single line of scanned text.
Property Name | Type | Description | Example |
---|---|---|---|
text | String | A string containing the text content of the line. | "This is a demonstration of how the text in a document" |
elements | TextElement[] | An array of TextElement objects, each of which represents a word or glyph within the TextLine. | [ ["This"], ["is"], ["a"], ["demonstration"], ["of"], ["how"], ["the"], ["text"], ["in"], ["a"], ["document"], ] |
recognizedLangCodes | String[] | The BCP-47 language code values for the languages detected in the recognized text. | ["en"] |
frame | Frame | An object containing the coordinates — position and size — that represent the bounding rectangle in the scanned image that contains the TextLine. | { x: 100, y: 100, width: 650, height: 100 } |
cornerPoints | Point[] | An array of Point objects that define a closed shape within the scanned image that contains the TextLine. | [ { x: 100, y: 100 }, { x: 750, y: 100 }, { x: 750, y: 200 }, { x: 100, y: 200 } ] |
An object representing a single word, individual character, or glyph.
Property Name | Type | Description | Example |
---|---|---|---|
text | String | A string containing the text content of the word or character. | “This” |
recognizedLangCodes | String[] | The BCP-47 language code values for the languages detected in the recognized text. | ["en"] |
frame | Frame | An object containing the coordinates — position and size — that represent the bounding rectangle in the scanned image that contains the TextElement. | { x: 100, y: 100, width: 45, height: 100 } |
cornerPoints | Point[] | An array of Point objects that define a closed shape within the scanned image that contains the TextElement. | [ { x: 100, y: 100 }, { x: 145, y: 100 }, { x: 145, y: 200 }, { x: 100, y: 200 } ] |
An object representing a bounding rectangle. When used in DocumentScanner, the Frame is the smallest that fully encloses a region of scanned text for a TextBlock, TextLine, or TextElement.
Property Name | Type | Description | Example |
---|---|---|---|
x | Number | The X coordinate of the top-left of the rectangle, in pixels, within the coordinate system of the scanned image. | 100 |
y | Number | The Y coordinate of the top-left of the rectangle, in pixels, within the coordinate system of the scanned image. | 100 |
width | Number | The width of the rectangle, in pixels. | 650 |
height | Number | The height of the rectangle, in pixels. | 200 |
An object representing a point in a coordinate system.
Property Name | Type | Description | Example |
---|---|---|---|
x | Number | The X coordinate of the point. | 100 |
y | Number | The Y coordinate of the point. | 100 |
An object containing configuration details for a document scanning session.
Property Name | Type | Description | Example |
---|---|---|---|
permissionRationaleText | String | Optional, and only for Android implementations. The text shown in the UI when the device prompts the user to grant permission for your app to use the camera. | "Grant permission for [app name here] to use the camera to scan documents" |
imageSource | DocumentScannerSource | Optional. Specifies the source of the document to be scanned. Defaults to "DEVICE_CAMERA". | "PHOTO_LIBRARY" |
scriptHint | Script | Optional. Specifies the language writing system of the text to be scanned. Defaults to "LATIN". | "DEVANAGARI" |
returnImageBytes | Boolean | Optional. Specifies whether the image data should (true ) or should not (false ) be returned by the plugin. Defaults to false . This setting is overridden and set to false when imageSource is set to “INPUT_IMAGE”. | true |
inputImageBytes | String[] | Optional. A stringified array of base64 image data to be scanned. Used when imageSource is set to "INPUT_IMAGE". | "2432BcYPJhSkzZjHS-Uiz1g8iZdQnRHtPnFKbMltJEc" |
An object representing an error that occurred when accessing DocumentScanner features.
Property Name | Type | Description |
---|---|---|
code | DocumentScannerFailureCode | A value representing the reason for a biometrics service error. See DocumentScannerFailureCode for the list of possible values. |
message | String | A string value describing the reason for the failure. This value is suitable for use in user interface messages. The message is provided in English and isn’t localized. |
See Also