Google Sheets

Hevo can replicate your Google Sheets data to your Destination using Google’s Sheets API.

For creating Pipelines using this Source, Hevo provides you a fully managed BigQuery data warehouse as a possible Destination. This option remains available till the time you set up your first BigQuery Destination irrespective of any other Destinations that you may have. With the managed warehouse, you are only charged the cost that Hevo incurs for your project in Google BigQuery. The invoice is generated at the end of each month and payment is recovered as per the payment instrument you have set up. You can now create your Pipeline and directly start analyzing your Source data. Read Hevo Managed Google BigQuery.

User Authentication

You can connect to the Google Sheets Source via User accounts (OAuth) or Service accounts. One service account can be mapped to your entire team. Read Google Account Authentication Methods to know how to set up a service account if you do not already have one.


Prerequisites


Configuring Google Sheets as a Source

Perform the following steps to configure Google Sheets as a Source in your Pipeline:

  1. Click PIPELINES in the Asset Palette.

  2. Click + CREATE in the Pipelines List View.

  3. In the Select Source Type page, select Google Sheets.

  4. In the Configure your Google Sheets account page, select the authentication method for connecting to Google Sheets.

    Sheets Authentication Method

  5. Do one of the following:

    • To connect using a User Account:

      1. Click + ADD GOOGLE SHEETS ACCOUNT.

      2. Select the Google account associated with your Google Sheets data and click ALLOW to authorize Hevo to access the data.

    • To connect using a Service Account:

      1. Attach the Service Account Key file.

        Note: Hevo supports only JSON format for the key file.

      2. Click CONFIGURE GOOGLE SHEETS ACCOUNT.

  6. In the Configure your Google Sheets Source page, specify the following:

    Google Sheets Configuration

    • Pipeline Name: A unique name for the Pipeline.

    • Sheets: Select the check box next to the spreadsheets whose data you wish to replicate. By default, all the tabs of a sheet are selected. However, you can click on the Expand icon for a sheet to display the list of tabs it contains and select the ones you need.

  7. Click TEST & CONTINUE.

  8. Proceed to configuring the data ingestion and setting up the Destination.


Pipeline Behaviour

  • Values in the first row of the sheet are treated as field headers. These headers need to be unique.

    If a header is missing, then the column name is used as the header.

  • Renaming your sheet does not cause your Pipeline to break.

  • Deleted rows are represented by a __hevo_marked_deleted field in your Destination table.

  • Row number is used as a primary key which is represented by a field called __hevo_id in your Destination table.

  • Cells containing formulas or scientific numbers are loaded as their calculated value.

    Example: 6.16E+12 is loaded as 6160000000000.

  • Search option while selecting your sheets only works on prefixes.

  • The Pipeline supports only Google Sheets. For MS Excel files, please check out the Drive Source.


Data Replication

Default Pipeline Frequency Minimum Pipeline Frequency Maximum Pipeline Frequency Custom Frequency Range (Hrs)
5 Mins 5 Mins 48 Hrs 1-48

Note: The custom frequency must be set in hours, as an integer value. For example, 1, 2, 3 but not 1.5 or 1.75.

  • Historical Data: By default, Hevo starts fetching data from the time your Google Sheets account was created. The Pipeline takes the offset for ingesting data from the last_modified_at timestamp of the sheet.

  • Incremental Data: Google Sheets does not provide any mechanism to detect changes made to specific rows in a spreadsheet. The only available information is the last_modified_at timestamp of the spreadsheet, resulting in a timestamp-based offset. If any changes are detected via the modification time, then all the data present in that spreadsheet is ingested again. Hevo also fetches the Events object from Google Sheets, which tracks the updates and deletes made in the last 30 days, in order to sync the related Events with your Destination.

  • Data Refresh: Not applicable.


Billing Considerations

The first load of the historical data is free for Google Sheets. Subsequently, if any data changes in a sheet at Source, the entire sheet is re-ingested as there is no way to identify the change. All such subsequent loads count towards your Events quota consumption and are billed accordingly.


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Oct-25-2021 NA Added the Pipeline frequency information in the Data Replication section.
Sep-20-2021 1.72 Added the section, Data Replication.
May-05-2021 1.62 Added steps to connect to Google Sheets Source using a service account.
Apr-13-2021 NA Added the section, Billing Considerations.
Last updated on 22 Oct 2021