Set Up a Snowflake File Federation Connection

Configure a connection between Data 360 and an AWS or Azure-hosted Snowflake instance and federate data into Data 360.

User Permissions Needed 
To create a Snowflake Data Federation connection in Data Cloud.System Admin profile or Data Cloud Architect permission set

Requirements:

Before you configure the connection, review these network, catalog, and storage requirements.

  • Firewall: If the Snowflake instance is behind a network firewall, add these Data 360 IP addresses to your access control list. If different network firewalls protect the server hosting the catalog and the storage bucket, update both. Make sure that both the Open Catalog and the storage bucket are publicly accessible.

    Data 360 doesn't support connecting over AWS PrivateLink or Azure Private Link. However, if the AWS S3 bucket is in the same AWS region as the Data 360 tenant, and a VPC gateway endpoint is provisioned for the bucket, then Data 360 can transparently use AWS PrivateLink to connect to the bucket.

  • Catalog: Manage every table that you need to access in Data 360 with Open Catalog. For Data 360 connect to Open Catalog, you must configure a custom client application in Snowflake. Record the client ID and client secret that Snowflake generates when you create a new client application.

  • Storage: Based on where the Snowflake instance is hosted, there are two options to store data. The files, which are managed by Open Catalog, must be organized as Apache Iceberg V1 tables.

    • If the Snowflake instance is hosted on AWS, store data as Apache Parquet files in an AWS S3 storage bucket.
    • If the Snowflake instance is hosted on Azure, store data as Apache Parquet files in an Azure Data Lake Storage Gen2 storage container. Enable storage credential vending in Open Catalog. This security best practice is required because, Data 360 doesn't support long-lived storage credentials.

Set Up the Connection

  1. In Data 360, click Setup and select Data Cloud Setup.

  2. Under External Integrations, select Other Connectors.

  3. On the Source tab, select Snowflake File Federation and click Next.

  4. Enter the connection name and connection API name.

  5. Enter the client ID and client secret generated by Snowflake when you register Data 360 as a client. Data 360 uses this information to connect to Open Catalog, which provides short-lived storage credentials at runtime.

  6. In connection details section, enter the publicly-accessible HTTPS URL of the Open Catalog. The format for the URL is https://<open_catalog_account_identifier>.snowflakecomputing.com/polaris/api/catalog.

    Make sure that the URL doesn't have any underscores ("_").

  7. To review your configuration, click Test Connection.

  8. Click Save.

Keep these behaviors in mind when you set up a Snowflake File Federation connection.

Storage Considerations

  • Cross-Region S3 Storage Bucket: If your Data 360 org and S3 storage bucket are in different regions, and your catalog uses storage credential vending (you don’t provide storage-related information during setup), the REST catalog server must include the client.region property in the LoadTableResult object.
  • S3 Storage Bucket Name: Do not include periods (“.”) in the AWS S3 bucket name. Such buckets can only be accessed through path-style addressing, which AWS has marked for deprecation. Only buckets accessible via virtual host-style addressing with SSL are supported.
  • Azure Storage: If you store data in Azure, ensure that all file paths (for JSON metadata, Avro manifest list, Avro manifest, and Parquet data files) included in the LoadTableResult object use either the abfs or abfss protocols. Avoid using the wasb or wasbs protocols as they are officially deprecated by Microsoft.

Other Considerations

  • Views: Querying Iceberg views is not supported.
  • Row-Level Updates: Querying Iceberg tables that are configured to use Iceberg V2 MoR Position or Equality Deletes, or Iceberg V3 Deletion Vectors is not supported.
  • Namespaces: Only single-level (catalog -> database -> table) and two-level namespaces (catalog -> database -> schema -> table) are supported. When you configure a data stream, the Database dropdown displays the names of all top-level namespaces, and the Schema dropdown displays the names of all secondary namespaces registered under the selected top-level namespace. If there are no namespaces registered under the selected top-level namespace, the Schema picklist will be empty.
  • Temporal Data Types: The time and timestamp_ntz data types aren't supported.
  • Change Data: Features in Data 360 (such as Data Actions) that require tracking any changes to a data lake object are not currently supported. To enable incremental change logging, Open Catalog would need to:
    • Use the identifier-field-ids construct to define a primary key for the source Iceberg table.
    • Provide a way for Data 360 to access the second-to-last (parent) snapshot of the source Iceberg table to compare changes. This functionality is not supported by Snowflake.