Google Drive

You can load data from files in your Google Drive into a Destination database or data warehouse using Hevo Pipelines.

For creating Pipelines using this Source, Hevo provides you a fully managed BigQuery data warehouse as a possible Destination. This option remains available till the time you set up your first BigQuery Destination irrespective of any other Destinations that you may have. With the managed warehouse, you are only charged the cost that Hevo incurs for your project in Google BigQuery. The invoice is generated at the end of each month and payment is recovered as per the payment instrument you have set up. You can now create your Pipeline and directly start analyzing your Source data. Read Hevo Managed Google BigQuery.

As of Release 1.66, __hevo_source_modified_at is uploaded to the Destination as a metadata field. For existing Pipelines that have this field:

  • If this field is displayed in the Schema Mapper, you must ignore it and not try to map it to a Destination table column, else the Pipeline displays an error.

  • Hevo automatically loads this information in the __hevo_source_modified_at column, which is already present in the Destination table.

You can, however, continue to use __hevo_source_modified_at to create transformations using the function event.getSourceModifiedAt(). Read Metadata Column __hevo_source_modified_at.

Existing Pipelines that do not have this field are not impacted.

User Authentication

You can connect to the Google Drive Source via User accounts (OAuth) or Service accounts. One service account can be mapped to your entire team. Read Google Account Authentication Methods to know how to set up a service account if you do not already have one.


Prerequisites

  • An active Google Drive account from which data is to be ingested.

  • The Google Drive API is enabled for the service account, if connecting via it.


Configuring Google Drive as a Source

Perform the following steps to configure Google Drive as a Source in your Pipeline:

  1. Click PIPELINES in the Asset Palette.

  2. Click + CREATE in the Pipelines List View.

  3. In the Select Source Type page, select Google Drive.

  4. In the Configure your Drive account page, select the authentication method for connecting to Google Drive.

    Select authentication account

  5. Do one of the following:

    • To connect using a User Account:

      1. Click + ADD DRIVE ACCOUNT.

      2. Select the Google account associated with your Google Drive data and click ALLOW to authorize Hevo to access the data.

    • To connect using a Service Account:

      1. Attach the Service Account Key file.

        Note: Hevo supports only JSON format for the key file.

      2. Click CONFIGURE DRIVE ACCOUNT.

  6. Select the folders which contain the files whose data is to be replicated. Hevo ingests data from all supported files present in the chosen folders.

  7. In the Configure your Drive Source page, specify the following:

    Google Drive Settings

    • Pipeline Name: A unique name for your Pipeline, not exceeding 255 characters.

    • Folders: Select the check box next to each folder whose files you want to replicate to the Destination system. Hevo ingests data from all supported files present in the selected folders.

      Note: If a folder contains a child folder, you must explicitly select it to replicate its files to the Destination system.

  8. Click CONTINUE.

  9. Proceed to configuring the data ingestion and setting up the Destination.


Source Considerations

  • Files must be less than 10MB in size for Google Drive to process these. Read Google Drive Files Export for more information.

If your Google Drive account does not contain any new Events to be ingested, Hevo defers the data ingestion for a pre-determined time. Hevo re-attempts to fetch the data only after the deferment period elapses. Read Deferred Data Ingestion.


Schema and Data Model

The following are some important considerations regarding the schema and the data model used by Hevo to ingest and upload the Google Drive data in the Destination:

  • Hevo treats each Google Drive folder as a Pipeline object.

  • Google Sheets: Hevo uses the naming convention foldername_worksheetname to name the Event Types, where foldername is the name of the Google Drive folder containing the Google Sheet or Excel spreadsheet, and worksheetname is the name of the worksheet tab within the spreadsheet. For example, if your folder name is Westeros and the worksheet name is usa, the name of the Event Type becomes Westeros_usa.

    For Google Sheet or Excel files that contain multiple worksheet tabs:

    • If worksheet names are different across different Google Sheets, Hevo ingests each worksheet as an individual Event Type and replicates its data to corresponding tables in the Destination.

      Note: Each Event Type consists of all the fields in the corresponding worksheet.

    • If worksheet names are same across multiple Google Sheets, Hevo ingests all the worksheets as a single Event Type and replicates them to one table in the Destination.

  • CSV Files: Hevo ingests data from all the CSV files of a Google Drive folder as one Event Type. The name of the Event Type is the name of the Drive folder. For example, if the name of your Google Drive folder is Westeros, all CSV files are ingested into the Event Type Westeros.

  • __hevo_id, __hevo_ingested_at, and __hevo_loaded_at are additional columns that Hevo creates in the Destination system while replicating the Google Drive data. Read Hevo-generated metadata.


Limitations

  • Hevo only supports ingesting Excel, Google Sheets and CSV files from Drive.

  • Hevo does not support UTF-16 encoding format for CSV files. As a workaround, you can convert the files to UTF-8 encoding format before these are ingested by the Pipeline.

  • The names of the Drive folders should not be modified once the Pipeline is created. Else, the new Events are ingested with a different Event name that is derived from the new folder name.

  • Hevo does not support ingesting data from shared Drives.



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Sep-09-2021 NA Added the limitation about Hevo not supporting ingestion of shared Drives. Read Limitations.
Aug-8-2021 NA Added a note in the Source Considerations section about Hevo deferring data ingestion in Pipelines created with this Source.
Jul-26-2021 NA Added a note in the Overview section about Hevo providing a fully-mananged Google BigQuery Destination for Pipelines created with this Source.
Jun-28-2021 1.66 Updated the page overview with information about __hevo_source_modified_at being uploaded as a metadata field from Release 1.66 onwards.
May-05-2021 1.62 - Included steps to connect to Drive using a service account.
- Updated the document as per the latest UI and functionality.
Feb-22-2021 NA Added the limitation about Hevo not supporting UTF-16 encoding format for CSV data. Read Limitations.
Last updated on 13 Sep 2021