Google Cloud Storage (GCS)

You can load data from files in your GCS bucket into a Destination database or data warehouse using Hevo Pipelines.

Hevo automatically unzips any Gzipped files on ingestion. Further, files are re-ingested if updated, as it is not possible to identify individual changes.

As of Release 1.66, __hevo_source_modified_at is uploaded to the Destination as a metadata field. For existing Pipelines that have this field:

  • If this field is displayed in the Schema Mapper, you must ignore it and not try to map it to a Destination table column, else the Pipeline displays an error.

  • Hevo automatically loads this information in the __hevo_source_modified_at column, which is already present in the Destination table.

You can, however, continue to use __hevo_source_modified_at to create transformations using the function event.getSourceModifiedAt(). Read Metadata Column __hevo_source_modified_at.

Existing Pipelines that do not have this field are not impacted.


Configuring Google Cloud Storage as a Source

To configure Google Cloud Storage as a Source in Hevo:

  1. Click PIPELINES in the Asset Palette.

  2. Click + CREATE in the Pipeline List View.

  3. In the Select Source Type page, select GCS.

  4. In the Configure your GCS Account page, click + ADD GCS ACCOUNT.

  5. Select your Google account having access to GCS bucket you intend to connect and click Allow to authorize Hevo to read your data in Google Cloud Storage.

  6. In the Configure your GCS Source page, specify the following:

    GCS settings

    • Pipeline Name: A unique name for this Pipeline.

    • Bucket: Name of the bucket from which you want to ingest data.

    • Path Prefix: Path prefix for the data directory. By default, the files are listed from the root of the directory.

    • File Format: Choose a file format. Hevo currently supports AVRO, CSV, JSON, and XML formats. Contact Hevo Support if your Source data is in a different format.

      Based on the format you select, you must specify some additional settings:

      • CSV:

        1. Specify the Field Delimiter. This is the character on which fields in each line are separated. For example, `\t`, or `,`).

        2. Disable the Treat First Row As Column Headers option if the Source data file does not contain column headers. Hevo, then, automatically creates these during ingestion. Default setting: Enabled. See Example below.

        3. Enable the Create Event Types from folders option if the path prefix has subdirectories containing files in different formats. Hevo reads each of the subdirectories as a separate Event Type.

        Note: Files lying at the prefix path (and not in a subdirectory) are ignored.

      • JSON: Enable the Create Event Types from folders option if the path prefix has subdirectories containing files in different formats. Hevo reads each of the subdirectories as a separate Event Type.

        Note: Files lying at the prefix path (and not in a subdirectory) are ignored.

      • XML: Enable the Create Events from child nodes option to load each node under the root node in the XML file as a separate Event.

  7. Click TEST & CONTINUE.

  8. Proceed to configuring the data ingestion and setting up the Destination.


Example: Automatic Column Header Creation for CSV Tables

Consider the following data in CSV format, which has no column headers.

 CLAY COUNTY,32003,11973623
 CLAY COUNTY,32003,46448094
 CLAY COUNTY,32003,55206893
 CLAY COUNTY,32003,15333743
 SUWANNEE COUNTY,32060,85751490
 SUWANNEE COUNTY,32062,50972562
 ST JOHNS COUNTY,846636,32033,
 NASSAU COUNTY,32025,88310177
 NASSAU COUNTY,32041,34865452

If you disable the Treat first row as column headers option, Hevo auto-generates the column headers, as seen in the schema map here:

Column headers generated by Hevo for CSV data

The record in the Destination appears as follows:

Destination record with auto-generated column headers


Limitations

  • Hevo does not support UTF-16 encoding format for CSV files. As a workaround, you can convert the files to UTF-8 encoding format before these are ingested by the Pipeline.

Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Jun-28-2021 1.66 Updated the page overview with information about __hevo_source_modified_at being uploaded as a metadata field from Release 1.66 onwards.
Feb-22-2021 NA Added the limitation about Hevo not supporting UTF-16 encoding format for CSV data. Read Limitations.
Last updated on 26 Jul 2021