Set Up an Apache Iceberg File Federation Connection
Set up the Apache Iceberg File Federation connection to start the flow of data into Data Cloud.
| User Permissions Needed | |
|---|---|
| To create a connection: | System Admin profile or Data Cloud Architect permission set |
Before you begin:
- If the Iceberg data lakehouse you want to connect to is behind a network firewall, add these Data Cloud IP addresses to the network firewall’s allowlist before creating a connection.
- Make sure the Iceberg metadata catalog (hereinafter REST catalog) implements the REST OpenAPI specification and is publicly accessible. This table describes the REST catalog endpoints that Data Cloud may make requests to.
| REST Catalog Endpoint | Description |
|---|---|
/v1/config(Required) | Get all configuration metadata. When you save a connection, Data Cloud will attempt to make a request to /v1/config, and a 200 response must be received in order for the connection to be saved. See REST Config Spec. |
/v1/{prefix}/namespaces (Required) | List all namespaces in the catalog. See Namespaces Spec. Even if you do not support nested namespaces, ensure that a request to /v1/{prefix}/namespaces?parent={some_namespace} returns an empty list per the SupportsNamespaces API. |
/v1/{prefix}/namespaces/{namespaces} (Required) | Get metadata about the specified namespace. See Namespace Metadata Spec. |
/v1/{prefix}/namespaces/{namespaces}/tables (Required) | List all tables in the namespace. See Tables List Spec. |
/v1/{prefix}/namespaces/{namespace}/tables/{table}(Required) | Get metadata about the specified table. See Table Metadata Spec. |
/v1/oauth/tokens (Optional) | When the access token you provide to Data Cloud expires, Data Cloud will exchange the expired access token for a new one by making a token-exchange request to /v1/oauth/tokens. The request must contain the required fields defined in RFC 8693. This endpoint is deprecated but still supported by some catalog implementations. See Token Exchange Spec. If unsupported, you must manually or programmatically update the connection to use a new access token. |
- Make sure that the data is stored as Apache Parquet files stored in either a publicly-accessible AWS S3 bucket, a publicly-accessible ADLS Gen2 container, or a publicly-accessible Azure Blob Storage container.
- Data Cloud requires that you specify the OAuth 2.0 bearer token that the server hosting the REST catalog expects Data Cloud to provide. This token is an access token and has a TTL. The TTL should be the value of the token-expires-at-ms property included in the response to /v1/config if the token-exchange flow is supported by the /v1/oauth/tokens endpoint. If not, you need to periodically rotate the token using Connect REST API.
- If your Data Cloud org is not in the same region as your storage bucket (or container), make sure that the server hosting the REST catalog includes the client.region property in the LoadTableResult object. There is no support for Cross-region deployments where the catalog does not support storage credential vending.
- Understand that querying Iceberg tables that are configured to use Iceberg V2 MoR Position / Equality Deletes or Iceberg V3 Deletion Vectors is not supported.
- Understand that querying Iceberg views is not supported.
- Understand that only single-level (catalog -> database -> table) and two-level namespaces (catalog -> database -> schema -> table) are supported.
-
In Data Cloud, click Setup, and select Data Cloud Setup.
-
Under External Integrations, select Other Connectors.
-
Click New.
-
On the Source tab, select Apache Iceberg File Federation and click Next.
-
Enter a connection name, connection API name.
-
Enter the authentication details.
If your data store’s REST catalog supports storage credential vending, select CATALOG_PROVIDED. If not, select S3 if the data is stored in AWS S3 and AZURE if the data is stored in either Azure Blob Storage or ADLS Gen2.
- For REST Catalog, in the Client Access Token field, enter the OAuth 2.0 bearer token.
- For Storage Bucket, if you did not select CATALOG_PROVIDED, Data Cloud requires additional information about your storage bucket or container.
Storage Type Authentication Details AWS S3 Bucket Name: Provide the name of the storage bucket. See General purpose bucket naming rules - Amazon Simple Storage Service. Access Key ID: Provide the access key ID for the IAM user that Data Cloud will use to access the storage bucket. See Manage access keys for IAM users. Secret Access Key: Provide the secret access key for the IAM user that Data Cloud will use to access the storage bucket. See Manage access keys for IAM users. Azure Blob Storage or Azure Data Lake Gen2 Storage Account Name: Provide the name of the storage account. See Storage account overview – Azure Storage | Microsoft Learn. SAS Token: Provide the shared access signature token that Data Cloud will use to access the relevant storage container within the storage account. See Create shared access signature (SAS) tokens – Azure AI services | Microsoft Learn. -
In the Connection Details section, enter the publicly-accessible HTTPS URL of the Apache Iceberg REST catalog. To make a request to an endpoint, append the endpoint to the URL. For example, a typical REST catalog's URL is
https://myicebergdatalakehouse:443/iceberg. -
If your REST catalog’s implementation of the /v1/config REST endpoint accepts a value for the optional warehouse parameter and you want Data Cloud to read warehouse-specific configuration metadata, specify a value for the warehouse field.
-
To review your configuration, click Test Connection.
-
Click Save.
After the connector details are accepted, the connection is created and listed under Connectors. You can now create data streams.
Considerations
- Cross-Region S3 Storage Bucket: If your Data Cloud org is not in the same region as your S3 storage bucket and your catalog does not support storage credential vending, make sure that the server hosting the REST catalog includes the client.region property in the LoadTableResult object. See, Iceberg open API. If you use AWS S3, the S3 bucket’s name must not contain a period. Such buckets can only be accessed via path-style addressing, which AWS has marked for deprecation. Only buckets that can be accessed via virtual host-style addressing with SSL are supported. See Virtual hosting of general purpose buckets - Amazon Simple Storage Service.