Google Drive

You can load data from files in your Google Drive into a Destination database or data warehouse using Hevo Pipelines.

For creating Pipelines using this Source, Hevo provides you a fully managed BigQuery data warehouse as a possible Destination. This option remains available till the time you set up your first BigQuery Destination irrespective of any other Destinations that you may have. With the managed warehouse, you are only charged the cost that Hevo incurs for your project in Google BigQuery. The invoice is generated at the end of each month and payment is recovered as per the payment instrument you have set up. You can now create your Pipeline and directly start analyzing your Source data. Read Hevo Managed Google BigQuery.

As of Release 1.66, __hevo_source_modified_at is uploaded to the Destination as a metadata field. For existing Pipelines that have this field:

  • If this field is displayed in the Schema Mapper, you must ignore it and not try to map it to a Destination table column, else the Pipeline displays an error.

  • Hevo automatically loads this information in the __hevo_source_modified_at column, which is already present in the Destination table.

You can, however, continue to use __hevo_source_modified_at to create transformations using the function event.getSourceModifiedAt(). Read Metadata Column __hevo_source_modified_at.

Existing Pipelines that do not have this field are not impacted.

If your Google Drive account does not contain any new Events to be ingested, Hevo defers the data ingestion for a pre-determined time. Hevo re-attempts to fetch the data only after the deferment period elapses. Read Deferred Data Ingestion.

User Authentication

You can connect to the Google Drive Source via User accounts (OAuth) or Service accounts. One service account can be mapped to your entire team. Read Google Account Authentication Methods to know how to set up a service account if you do not already have one.


Prerequisites

  • An active Google Drive account from which data is to be ingested.

  • The Google Drive API is enabled for the service account, if connecting via it.


Configuring Google Drive as a Source

Perform the following steps to configure Google Drive as a Source in your Pipeline:

  1. Click PIPELINES in the Asset Palette.

  2. Click + CREATE in the Pipelines List View.

  3. In the Select Source Type page, select Google Drive.

  4. In the Configure your Drive account page, select the authentication method for connecting to Google Drive.

    Select authentication account

  5. Do one of the following:

    • To connect using a User Account:

      1. Click + ADD DRIVE ACCOUNT.

      2. Select the Google account associated with your Google Drive data and click ALLOW to authorize Hevo to access the data.

    • To connect using a Service Account:

      1. Attach the Service Account Key file.

        Note: Hevo supports only JSON format for the key file.

      2. Click CONFIGURE DRIVE ACCOUNT.

  6. Select the folders which contain the files whose data is to be replicated. Hevo ingests data from all supported files present in the chosen folders.

  7. In the Configure your Drive Source page, specify the following:

    Google Drive Settings

    • Pipeline Name: A unique name for your Pipeline, not exceeding 255 characters.

    • Folders: Select the check box next to each folder whose files you want to replicate to the Destination system. Hevo ingests data from all supported files present in the selected folders.

      Note: If a folder contains a child folder, you must explicitly select it to replicate its files to the Destination system.

  8. Click CONTINUE.

  9. Proceed to configuring the data ingestion and setting up the Destination.


Treatment of Duplicate Column Headers

To identify each column in the Source uniquely during ingestion, Hevo renames any duplicate column headers that are found, as:

  • Columns with the same header: The column header is changed to lowercase and the column number is suffixed to it. For example, if there are three columns, test, test, and test in columns G, H, and K, respectively, these are changed to test_g, test_h, and test_k. The same happens for headers with space between them. For example, if there are three columns, Software Used, Software Used, and Software Used in columns H, I, and J, respectively, these are changed to software_used_h, software_used_i, and software_used_j.

  • Columns with same header but different case: All column headers are changed to lowercase and assigned a sequential, numeric suffix. For example, if there are three columns, test, Test, and tEst, these are changed to test, test_1, and test_2 during the auto mapping phase. The same happens for headers with space between them. For example, if there are two columns, Software Used, and Software used, these are changed to software_used, and software_used_1.

  • Columns with __Hevo_id as the header: The column header is changed to __hevo_id, and the __hevo_id column that Hevo adds to keep track of offsets during ingestion is named as __hevo_id_1.

  • Columns with __hevo_id as the header: Hevo suffixes the column number to the column header. For example, if the __hevo_id_ column lies in column number B, it is renamed to __hevo_id_b. If the __hevo_id_b column is also present in the Source column list, the original __hevo_id column is named to __hevoid_b_b. This is done because Hevo adds the column, __hevo_id_ as a primary key to keep track of offsets during the ingestion process. Towards this purpose, any existing column with that name is renamed.


Source Considerations

If your Google Drive account does not contain any new Events to be ingested, Hevo defers the data ingestion for a pre-determined time. Hevo re-attempts to fetch the data only after the deferment period elapses. Read Deferred Data Ingestion.


Schema and Data Model

The following are some important considerations regarding the schema and the data model used by Hevo to ingest and upload the Google Drive data to the Destination:

  • Hevo treats each Google Drive folder as a Pipeline object.

  • Google Sheets: Hevo uses the naming convention foldername_worksheetname to name the Event Types, where foldername is the name of the Google Drive folder containing the Google Sheet or Excel spreadsheet, and worksheetname is the name of the worksheet tab within the spreadsheet. For example, if your folder name is Westeros and the worksheet name is usa, the name of the Event Type becomes Westeros_usa.

    For Google Sheet or Excel files that contain multiple worksheet tabs:

    • If the worksheet names are different across different Google Sheets, Hevo ingests each worksheet as an individual Event Type and replicates its data to corresponding tables in the Destination. For example, if there are two worksheets named Westeros Sheet 1 and Westeros Sheet 2 under two different Google Sheets respectively, then Hevo ingests these worksheets as individual Event Types with the names westeros_sheet_1 and westeros_sheet_2 respectively.

      Note: Each Event Type consists of all the fields in the corresponding worksheet.

    • If the worksheet names are same across multiple Google Sheets, Hevo ingests all the worksheets as a single Event Type and replicates them to one table in the Destination. For example, if there are two worksheets with the same name Westeros Sheet 1 under two different Google Sheets respectively, then Hevo ingests these worksheets as one Event Type with the name westeros_sheet_1, and the data in these sheets is ingested and loaded into one table in the Destination.

    • If the worksheets contain duplicate column headers, Hevo updates the column headers as mentioned in section, Treatment of Duplicate Column Headers, and then ingests and loads them to your Destination.

  • CSV and TSV Files: Hevo ingests data from all the files of a Google Drive folder as one Event Type. The name of the Event Type is the name of the Drive folder. For example, if the name of your Google Drive folder is Westeros, all CSV files are ingested into the Event Type Westeros.

  • __hevo_id, __hevo_ingested_at, and __hevo_loaded_at are additional columns that Hevo creates in the Destination system while replicating the Google Drive data. Read Hevo-generated metadata.


Limitations

  • Hevo only supports ingesting Excel, Google Sheets, CSV, and TSV files from Drive.

  • If the names of any of the Drive folders are modified after the Pipeline is created, then the new Events are ingested with a different Event name that is derived from the new folder name.



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Sep-21-2022 NA Updated section, Schema and Data Model to add examples for the scenarios.
Apr-11-2022 NA Updated screenshot in the Configuring Google Drive as a Source section to reflect the latest UI.
Apr-11-2022 1.86 Updated sections, Schema and Data Model and Limitations to reflect support for TSV file format.
Apr-11-2022 1.86 Added section, Treatment of Duplicate Column Headers.
Mar-21-2022 1.85 Updated section, Limitations to remove the point about Hevo not supporting UTF-16 encoding format for CSV files.
Mar-07-2022 1.83 Added content in the section, Schema and Data Model, regarding worksheets with duplicate column headers.
Dec-06-2021 1.77 Removed the limitation of not ingesting data from shared Drives, as Hevo supports it now.
Sep-09-2021 NA Added the limitation about Hevo not supporting ingestion of shared Drives. Read Limitations.
Aug-8-2021 NA Added a note in the Source Considerations section about Hevo deferring data ingestion in Pipelines created with this Source.
Jul-26-2021 NA Added a note in the Overview section about Hevo providing a fully-managed Google BigQuery Destination for Pipelines created with this Source.
Jun-28-2021 1.66 Updated the page overview with information about __hevo_source_modified_at being uploaded as a metadata field from Release 1.66 onwards.
May-05-2021 1.62 - Included steps to connect to Drive using a service account.
- Updated the document as per the latest UI and functionality.
Feb-22-2021 NA Added the limitation about Hevo not supporting UTF-16 encoding format for CSV data. Read Limitations.
Last updated on 27 Sep 2022

Tell us what went wrong