Create a Google Cloud Storage Data Stream

Create a data stream to import objects from a Google Cloud Storage (GCS) bucket. You can experience some latency while data is transferred to the staging environment.

Before you begin:

  • Make sure the GCS Connection is set up.
  1. In Data Cloud, on the Data Streams tab, click New.

    You can also use App Launcher, to find and select Data Streams.

  2. Select Google Cloud Storage.

  3. From the Connection dropdown, select the GCS connector that you want to use.

  4. Enter the file and source information, and click Next.

    Field LabelDescription
    File TypeSelect CSV or Parquet. For more information, see Supported File Formats and Delimiters.
    Import From DirectoryThe remaining folder path under the parent directory that points to a file’s specific location. Place your source files in the directory because the data stream can’t recognize files stored in nested subdirectories.
    File NameName of the file that must be retrieved from the specified directory. The field is pre-populated with '*'. If no file is specified, the system chooses the first file found. After you create a data stream, it retrieves all files found in the directory. Wildcards are also supported, for example, you can use *abc*.csv to indicate retrieval of all files containing “abc” in their name. Each time the stream runs, all files satisfying the wildcard are imported.
    SourceA label designating the external system from where data is sourced. Multiple data streams can use the same label for the Source.
  5. Enter the object details. You can create a data lake object (DLO) or use an existing DLO.

    If you choose to create a DLO then refer to Naming standards for data lake objects. If you choose to use an existing DLO then refer to Using existing data lake object to create a data stream and familiarize yourself with the guardrails to be considered when using an existing DLO.

  6. Select a data stream category and primary key. Add new formula fields if needed.

  7. Click Next.

  8. From the Data Space dropdown, select the applicable data space or the default data space.

  9. Enter the deployment details such as refresh mode, frequency.

  10. Click Deploy.

Your data stream is created. Map your data stream to data model objects to start using your data for your GCS use cases.

All privacy or compliance requests processed in Data Cloud operate only on the data within your Data Cloud org. Records containing personal data in external products like GCS aren’t affected. For example, a consumer asserts their right to be forgotten and requests that you delete any copies of personal data. This consumer has records in GCS and a Data Cloud profile. If you’ve enabled the GCS connector, submit deletion jobs to both GCS and Data Cloud.