Google Sheets

Hevo can replicate your Google Sheets data to your Destination using Google’s Sheets API.

For creating Pipelines using this Source, Hevo provides you a fully managed BigQuery data warehouse as a possible Destination. This option remains available till the time you set up your first BigQuery Destination irrespective of any other Destinations that you may have. With the managed warehouse, you are only charged the cost that Hevo incurs for your project in Google BigQuery. The invoice is generated at the end of each month and payment is recovered as per the payment instrument you have set up. You can now create your Pipeline and directly start analyzing your Source data. Read Hevo Managed Google BigQuery.

User Authentication

You can connect to the Google Sheets Source via User accounts (OAuth) or Service accounts. One service account can be mapped to your entire team. Read Google Account Authentication Methods to know how to set up a service account if you do not already have one.


Prerequisites


Configuring Google Sheets as a Source

Perform the following steps to configure Google Sheets as a Source in your Pipeline:

  1. Click PIPELINES in the Asset Palette.

  2. Click + CREATE in the Pipelines List View.

  3. In the Select Source Type page, select Google Sheets.

  4. In the Configure your Google Sheets account page, select the authentication method for connecting to Google Sheets.

    Sheets Authentication Method

  5. Do one of the following:

    • To connect using a User Account:

      1. Click + ADD GOOGLE SHEETS ACCOUNT.

      2. Select the Google account associated with your Google Sheets data and click ALLOW to authorize Hevo to access the data.

    • To connect using a Service Account:

      1. Attach the Service Account Key file.

        Note: Hevo supports only JSON format for the key file.

      2. Click CONFIGURE GOOGLE SHEETS ACCOUNT.

  6. In the Configure your Google Sheets Source page, specify the following:

    Google Sheets Configuration

    • Pipeline Name: A unique name for the Pipeline.

    • Sheets: Select the check box next to the Sheets whose data you wish to replicate. By default, all the tabs of a Sheet are selected. However, you can click on the Expand icon for a Sheet to display the list of tabs it contains and select the ones you need.

    • Refresh Daily (Optional): If enabled, Hevo re-ingests the selected Sheets, in case no change to the data is detected for 24 hours since the last ingestion. If disabled, Hevo only re-ingests the Sheet if it detects any new changes to the data.

  7. Click TEST & CONTINUE.

  8. Proceed to configuring the data ingestion and setting up the Destination.


Pipeline Behavior

  • Values in the first row of the Sheet are treated as field headers. These headers need to be unique. Read Treatment of Duplicate Column Headers.

    If a header is missing, then the column name is used as the header.

  • Renaming your Sheet does not cause your Pipeline to break.

  • Deleted rows are represented by a __hevo__marked_deleted field in your Destination table.

  • Row number is used as a primary key which is represented by a field called __hevo_id in your Destination table. Read Treatment of Duplicate Column Headers.

  • Cells containing formulas or scientific numbers are loaded as their calculated value.

    Example: 6.16E+12 is loaded as 6160000000000.

  • Search option while selecting your Sheets only works on prefixes.

  • The Pipeline supports only Google Sheets. For MS Excel files, please check out the Drive Source.


Treatment of Duplicate Column Headers

To identify each column in the Source uniquely during ingestion, Hevo renames any duplicate column headers that are found, as:

  • Columns with the same header: The column header is changed to lowercase and the column number is suffixed to it. For example, if there are three columns, test, test, and test in columns G, H, and K, respectively, these are changed to test_g, test_h, and test_k. The same happens for headers with space between them. For example, if there are three columns, Software Used, Software Used, and Software Used in columns H, I, and J, respectively, these are changed to software_used_h, software_used_i, and software_used_j.

  • Columns with same header but different case: All column headers are changed to lowercase and assigned a sequential, numeric suffix. For example, if there are three columns, test, Test, and tEst, these are changed to test, test_1, and test_2 during the Auto Mapping phase. The same happens for headers with space between them. For example, if there are two columns, Software Used and Software used, these are changed to software_used, and software_used_1.

  • Columns with __Hevo_id as the header: The column header is changed to __hevo_id, and the __hevo_id column that Hevo adds to keep track of offsets during ingestion is named as __hevo_id_1.

  • Columns with __hevo_id as the header: Hevo suffixes the column number to the column header. For example, if the __hevo_id_ column lies in column number B, it is renamed to __hevo_id_b. If the __hevo_id_b column is also present in the Source column list, the original __hevo_id column is named to __hevoid_b_b. This is done because Hevo adds the column, __hevo_id_ as a primary key to keep track of offsets during the ingestion process. Towards this purpose, any existing column with that name is renamed.


Data Replication

Default Pipeline Frequency Minimum Pipeline Frequency Maximum Pipeline Frequency Custom Frequency Range (Hrs)
5 Mins 5 Mins 48 Hrs 1-48

Note: The custom frequency must be set in hours, as an integer value. For example, 1, 2, 3 but not 1.5 or 1.75.

  • Historical Data: By default, Hevo starts fetching data from the time your Google Sheets account was created. The Pipeline takes the offset for ingesting data from the last_modified_at timestamp of the Sheet.

  • Incremental Data: Google Sheets does not provide any mechanism to detect changes made to specific rows in a Sheet. The only available information is the last_modified_at timestamp of the Sheet, resulting in a timestamp-based offset. If any changes are detected via the modification time, then all the data present in that Sheet is ingested again. Hevo also fetches the Events object from Google Sheets, which tracks the updates and deletes made in the last 30 days, in order to sync the related Events with your Destination.

  • Data Refresh: If the Refresh Daily option is enabled during Source configuration, Hevo re-ingests all the data from the selected Google Sheets in case no changes are detected via the modification time for 24 hours. So, every time there is no data ingestion for 24 hours, a data refresh happens. This is done in order to ensure no Events are missed for ingestion.


Source Considerations

We recommend enabling the Refresh Daily option in the Source configuration if you are importing data from other worksheets into the Google Sheets selected for replication. At times, there can be a delay between the last_modified_at timestamp change and the data getting successfully imported. As a result, no new data may get ingested if Hevo polls for a last_modified_at time between this delay, as the import operation would still be incomplete but the timestamp-based offset would have been incremented. A daily refresh ensures all such data gets synchronized to the Destination successfully by re-ingesting all the data in the Google Sheets if no changes are detected for 24 hours. All Events ingested during data refresh count towards your Events quota consumption.


Billing Considerations

The first load of the historical data is free for Google Sheets. Subsequently, if any data changes in a Sheet at Source, the entire Sheet is re-ingested as there is no way to identify the change. All such subsequent loads count towards your Events quota consumption and are billed accordingly.


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Apr-11-2022 NA Updated screenshot in the Configuring Google Sheets as a Source section to reflect the latest UI.
Feb-21-2022 1.82 Added section, Treatment of Duplicate Column Headers.
Dec-06-2021 1.77 - Added section, Source Considerations.
- Updated section, Configuring Google Sheets as a Source to add information about data refresh.
Oct-25-2021 NA Added the Pipeline frequency information in the Data Replication section.
Sep-20-2021 1.72 Added the section, Data Replication.
May-05-2021 1.62 Added steps to connect to Google Sheets Source using a service account.
Apr-13-2021 NA Added the section, Billing Considerations.
Last updated on 28 Apr 2022