Set Up an IBM watsonx.data Connection
Configure a connection between Data Cloud and IBM watsonx.data, and federate data into Data Cloud.
User Permissions Needed | |
---|---|
To create a connection: | System Administrator |
Prerequisites:
- Firewall: If the watsonx.data instance is behind a network firewall, add these Data Cloud IP addresses to your access control list before configuring a connection. Both the watsonx.data catalog and the storage bucket must be publicly accessible. Connecting over AWS PrivateLink or Azure Private Link is not supported unless data is stored in an AWS S3 bucket, and the AWS S3 bucket is in the same AWS region as the Data Cloud tenant.
If separate firewalls protect the watsonx.data metadata catalog server and the storage bucket, update both.
- Catalog: The watsonx.data instance must be managed by a metadata catalog that implements the Apache Iceberg REST OpenAPI specification. See Unity Catalog REST API and Iceberg Catalog REST API.
- Storage: Regardless of where the watsonx.data catalog server is hosted (for example, IBM Cloud), the data must be formatted as Apache Parquet files and organized as Apache Iceberg tables. These tables must be stored in one of the following: an AWS S3 bucket, an Azure Data Lake Storage Gen2 container, or an Azure Blob Storage container. Federating data from the IBM Cloud Object Storage isn't supported.
Set Up Connection
-
In Data Cloud, click Setup, and select Data Cloud Setup.
-
Under External Integrations, select Other Connectors.
-
On the Source tab, select IBM watsonx.data and click Next.
-
Enter a connection name, connection API name.
-
In the Authentication Details section, select CATALOG_PROVIDED if your REST Catalog supports storage credential vending. Otherwise, select S3 if data is stored in AWS S3 and AZURE if data is stored in either Azure Blob Storage or ADLS Gen2.
-
watsonx.data Catalog: In IBM Cloud IAM, generate an API key, and use the API key to generate an OAuth 2.0 access token. The maximum TTL of this token is 60 minutes. See Managing user API keys.
-
Storage Bucket: If you did not select CATALOG_PROVIDED, Data Cloud requires additional information about your storage bucket or container. Currently, watsonx.data’s metadata catalog doesn't support storage credential vending for AWS S3 and Azure Storage.
Storage Type Authentication Details Azure Blob Storage or Azure Data Lake Storage Gen2 Storage Account Name - Provide the name of the storage account. SAS Token - SAS Token - Provide the shared access signature token that Data Cloud will use to access the relevant storage container within the storage account. AWS S3 Bucket Name - Provide the name of the storage bucket. Access Key ID - Provide the access key ID for the IAM user that Data Cloud will use to access the storage bucket. Secret Access Key - Provide the secret access key for the IAM user that Data Cloud will use to access the storage bucket. AWS Region - Provide the name of the AWS region the storage bucket is hosted in. See Regions, Availability Zones, and Local Zones - Amazon Relational Database Service for the list of AWS regions. -
-
In the Connection Details section, enter the publicly-accessible HTTPS URL of the watsonx.data catalog.
- In IBM Cloud, from the left navigation menu, select watsonx.data.
- Launch watsonx.data
- Select Infrastructure Manager in the left navigation menu.
- Select the appropriate metadata catalog and record both the Metastore REST endpoint and the name of the catalog.
- Prepend https:// to the endpoint.
- Append /mds/iceberg to the endpoint.
-
In the warehouse parameter, enter the name of the catalog.
-
If the test succeeds, click Save.
Considerations
- Row-Level Updates: Querying Iceberg tables that are configured to use Iceberg V2 MoR Position Equality Deletes or Iceberg V3 Deletion Vectors isn't supported.
- Views: Querying Iceberg views isn't supported.
- Namespaces: Only single-level (catalog -> database -> table) and two-level namespaces (catalog -> database -> schema -> table) are supported.
- Temporal Data Types: The time and timestamp_ntz data types aren't supported.
- Cross-Region S3 Storage Bucket: If your Data Cloud org is not in the same region as your S3 storage bucket and your catalog does not support storage credential vending, make sure that the server hosting the REST catalog includes the client.region property in the LoadTableResult object. See, Iceberg open API
- Change Data: Certain features in Data Cloud require the ability to detect when a data lake object changes (example, data actions). In order for Data Cloud to construct a change data feed, that is, a log of incremental changes to an external data lake object, a primary key must be specified. To use such features, leverage Iceberg’s identifier-field-ids construct to define which columns comprise a table’s primary key and ensure that your query engines (writers) respect your configuration.
- Governance: See Access management and governance in watsonx.data.