Pipeline FAQs

On This Page

Will my Pipeline get paused if I exceed the monthly Events quota?

Your Pipeline is not paused in case you exceed your Events quota. However, the excess Events are sidelined by the Pipeline and stored in a data warehouse until you purchase additional Events, upon which they are replayed.

Read Billing Alerts for more information.


Is my data stored after I delete a Pipeline?

All the data that has been ingested but not yet loaded into the Destination is dropped automatically once you delete the Pipeline. Any Transformations and Schemas in the Pipeline are also deleted. Only the Usage data summary of the deleted Pipeline remains with Hevo.

The data already loaded to the Destination (before Pipeline deletion) remains unaffected.


How can I change the data type of a Destination table column?

You can change the data type of a Destination table column while ingesting data from the Source either through Transformations or the Schema Mapper. You can only change the data type to one with a broader definition. For example, you can change int to float or varchar, but you cannot change varchar to any other type.

Changing the data type of a field that is a primary key can lead to data mismatch in the Destination table. Therefore, all Events having that field are sidelined. If you want to replay or successfully reload those Events, you must undo the changes and replay the Events.

Through Transformations:

In Hevo, you can design and apply the Format Number to String and Format Date to String Transformations on Events and Event Types through Drag-and-Drop or the Python script option.

Examples:

  • Let us assume that the data type int from the Source is mapped to the Destination data type int, and you want to change the Destination data type to string. You can do this using the Format Number to String transformation.

    setting-up-filters

    number-to-string

  • The following Python script converts the data type of nested_field to string using the following commands:

    def transform(event):
    
        properties = event.getProperties()
        properties['nested_field'] = str(properties['nested_field'])
    
        return event
    

Through the Schema Mapper:

  1. Disable Auto Mapping and select Drop Table from the Kebab menu next to the Destination table name.

    drop-table

  2. Click Create Table and Map.

    create-table-&-map

  3. Modify the data types for the fields as required.

    manual-mapping

Read Handling Different Data Types in Source Data for more information.


How can I load only filtered Events from the Source to the Destination?

You can use the Drop Events Transformation to filter Events and load only those to the Destination.

Read Drop Events for more information.


My Historical Load Progress is still at 0%. What does it mean?

The progress represents the percentage of objects ingested out of all the active objects. The 0% means that that ingestion is in progress and none of the objects are completely ingested. Let us say, your Source has only one object. Then, the Historical Load Progress changes from 0% to 100% on successful ingestion of the object.

Read Viewing Pipeline Progress for more information.


How can I change sort keys and their order in a Redshift Destination table?

You cannot change the sort keys and their order in an existing Redshift Destination table. To change the sort keys, you must drop the Destination table and create a new one with the desired sort keys. However, by doing this, you lose your existing data in that Destination table. You can re-ingest the historical data into the new table by using the Restart Historical Load option for the Pipeline Object from the Pipeline Detailed View. The re-ingested historical data does not count towards your quota consumption and is not billed.

Perform the following steps to change the sort keys and their order:

  1. Disable Auto Mapping for the specific table from the Schema Mapper.

    Disbale the Auto Mapping option

  2. Click the Kebab menu icon next to the Destination Table name and click Drop Table.

    Click on Drop Table

  3. Click CREATE TABLE & MAP to create a new Destination table.

    Create a new table

  4. Select the Sort Key check box for all the fields you want to specify as sort keys.

  5. Use the Up/Down arrow keys next to the field name to change its order of placement. For higher performance of the query, place the sort key fields together at the top in the Destination table.

    Select the Sort Keys and change the order using arrow keys

  6. Specify the Destination Table Name and click CREATE TABLE & MAP.

A new Destination table is created with the specified sort keys and order. The data is replicated into this table as per the Pipeline schedule.


How can I make sure that each record is loaded only once?

To ensure that each record is loaded only once, you can take some actions prior to and post-Pipeline creation:

  • Pre-Pipeline creation:
    • Create the Pipeline with Auto Mapping disabled. Manually map the Event Type (Source object) and define one of the fields as the primary key. This field must have unique and non-null values. Finally, enable Auto Mapping for the Event Type.

    • Disable the Append Rows on Update option for the table in the Destination Overview page.

      Disbale Append Rows on Update

      Read How do I enable or disable the deduplication of records in my Destination tables?

      Note: This feature is available only for Amazon Redshift, Google BigQuery, and Snowflake data warehouse Destinations.

  • Post-Pipeline creation

    1. Disable Auto Mapping for the Event Type from the Schema Mapper.

      Disable Auto Mapping

    2. Click the Kebab menu icon next to the Destination Table name and click Drop Table.

      Drop present Destination table

    3. Click CREATE TABLE & MAP to create a new Destination table.

      Create a new Destination table

    4. Set a field as the primary key by selecting the Primary Key check box for it.

      Select Primary key fields

    5. Specify the Destination Table Name and click CREATE TABLE & MAP.

    The data is replicated to the new Destination table as per the Pipeline schedule, using the defined primary keys to ensure no duplicate Events are created.


How does changing the query mode of a Pipeline in table mode affect data ingestion?

If you change the query mode for an Event Type where the Pipeline mode is Table, the data is ingested again from the beginning, which overwrites the previously ingested data. This re-ingestion is considered similar to the historical load. It does not count towards your Events quota consumption and is not billed.


How does Transformation affect the structure of the Destination table?

You can use Transformations for preparing the data before replicating it to the Destination, such as cleansing it, adding or dropping fields, changing the data type of the fields, or updating the field values based on specific conditions. Some of these Transformations may result in a change in the structure of the Destination table. Read Effect of Transformations on the Destination Table Structure for more information.


How can I set a field as a primary key to avoid duplication?

You can set a field as a primary key in the Destination table through either Transformations or Schema Mapper. The Transformation method works only for warehouse Destinations and not for database and Firebolt Destinations. In the Schema Mapper, you need to drop the existing Destination table and create a new table with the desired primary keys.

To set a field as a primary key through Transformations:

  1. Click Transformations to go to the Python-based Transformation interface.

    Python-based Transformation interface

  2. Write the following code in the CODE panel:

     from io.hevo.api import Event
    
     def transform(event):
    
       # Set a field (or a list of fields) as the primary key
       event.setPrimaryKeyFields(['journey_id', 'key'])
    
       properties = event.getProperties()
       properties['new_pk_fields'] = event.getPrimaryKeyFields()
    
       return event
    
    

    Here, the fields journey_id and key are set as the primary key. Replace these with the fields you need.

    NOTE: Nullable columns cannot be marked as primary key in Redshift Destination.

  3. Click DEPLOY to apply the Transformation.

    Deploy the Transformation

  4. Click Reset Schema for the Event Type from the Schema Mapper page. This reloads the schema in the Destination table as per the newly defined primary key fields.

    Click Reset Schema option

The data is ingested as per the Pipeline schedule. To apply the Transformation on existing data, click Restart Object from the Pipeline Overview page. The re-ingested data counts towards your Events quota consumption and is billed.

Restart Object to apply Transformation on existing data

NOTE: You may get duplicate records in the Destination table; you need to remove them manually.

To set a field as a primary key through the Schema Mapper:

  1. Disable Auto Mapping for the Event Type.

    Disable Auto Mapping

  2. Click the Kebab menu icon next to the Destination Table name and click Drop Table.

    Drop present Destination Table

  3. Click CREATE TABLE & MAP to create a new Destination table.

    Create a new Destination Table

  4. To set the field as a primary key, select the check box under the Primary Key column.

    Select Primary Key fields

    NOTE: Nullable columns cannot be marked as primary key in Redshift Destinations.

  5. Specify the Destination Table Name and click CREATE TABLE & MAP.

A new Destination table is created with the specified primary keys. The data is replicated into this table as per the Pipeline schedule.


Does triggering ingestion using Run Now affect the scheduled ingestion frequency of the Pipeline?

Manually triggering the ingestion using Run Now option does not affect the scheduled ingestion for the Object. That occurs as per the frequency set for the Pipeline.

Suppose you create a Pipeline at 3.00. p.m. (UTC) and set 1 Hour as the ingestion frequency. If you trigger ingestion using Run Now at 3.15. p.m., the Events are ingested once and the next ingestion happens at 4.00 p.m. as per the defined schedule. It is not moved to 4.15 p.m. Thus the Events are ingested at 3.15 p.m., 4.00 p.m., 5.00 p.m., and so on.

Read Scheduling a Pipeline for more information.


Does creation of Pipeline incur cost?

Creation of Pipelines does not incur cost. Costs are incurred for the Events that are loaded to a Destination based on the quota available in your pricing plan. This applies even for users on Free plans.

Hevo Trial Accounts are free. Therefore, during the trial period, there are no restrictions and you can load unlimited Events to the Destination. Once you switch over to a plan, the cost is decided based on your plan’s quota.

A paused or deleted Pipeline does not incur cost.


Why is there a delay in my Pipeline?

Hevo is built on real-time data ingestion architecture. That means Hevo ingests data in real-time and writes to the Destination as soon as possible. Hevo’s architecture allows it to scale horizontally whenever it detects a higher volume of Events being ingested through the Pipelines. Still, there are situations where you might see a delay in your Pipelines, such as:

  • Replay of a large number of failed Events in one or more of your Pipelines. Replayed Events are fed back to the Pipelines and share the same resources that are used by the Pipeline itself. In this scenario, you can stop replaying the Events in the Pipeline from the Pipeline Overview page.
  • Slow Destination. In cases of Destinations where data upload is not done through files, for example, MySQL and Postgres, the Pipeline may experience delays. This happens because Hevo is not able to write data into the Destination as fast as it is ingesting it from the Source. If you think the slowness in the Destination is a temporary situation, you may wait till it gets resolved, else you should upgrade the hardware configuration of the Destination to allows it to accept a higher rate of writes.

Things to Note

  • Pipelines across different accounts in Hevo do not affect each other.

When should I pause vs delete my Pipeline?

You can opt to pause an active Pipeline when you want to hold off the ingestion for sometime and resume it later.

However, if you do not need a Pipeline anymore, you can go ahead and delete it. Deleting a Pipeline may also help in freeing up your Source quota. The deleted Pipeline will no longer be accessible and you cannot restore a deleted Pipeline.

Read more about Pausing and Deleting a Pipeline.


Can I import standard Python libraries in transformation?

Hevo supports the Jython environment. Jython does not support third-party Python libraries that use extensions written in C. This means that popular Python libraries like numpy, scipy and scikit-learn will not work in Jython.

Read Python, Jython and Java to know more.


Why am I getting warnings while adding Pipelines?

Check your usage in the Plan Details page under Settings, Billing. Read Viewing Current Usage.

If you are on a Business plan, you can increase your Source quota by contacting the Hevo Sales team. Read Purchasing On-Demand Events.


How can I change the ingestion frequency for individual tables?

The ingestion frequency applies to the entire Pipeline. You cannot modify this for specific Objects. However, you can trigger the ingestion for individual Objects. To do this, click the Run Now option in the Action menu of the Object in the Pipeline Detailed View. The ingestion is started for the object immediately.

Run ingestion for an object

For more information, read more on Scheduling a Pipeline.

Note: Triggering ingestion using Run Now does not affect the scheduled ingestion for the Object. That occurs as per the frequency set for the Pipeline.


Can I connect tables from multiple Pipelines to a common Destination table?

Yes, you can select the same Destination table in multiple Pipelines. The Destination table contains all the columns of the Source tables. If any Source table column does not exist already in the Destination table, it gets created.

To connect the Source table to the desired Destination table, in the Schema Mapper:

  1. Disable Auto Mapping for the Object from the Mapping Summary Page.

  2. Click Change Destination Table and select the Destination table from the drop-down menu.

  3. Click APPLY CHANGES.

  4. Repeat the above steps for all the tables in the other Pipelines.

Once the tables are connected, the data from the tables load into the selected Destination table as per the primary key defined in the table and the schedule of the respective Pipelines.

Read more about Changing the Destination table.


How can I change or delete the Destination table prefix after creating a Pipeline?

You cannot change the Destination table prefix after creating the Pipeline. You can create another Pipeline with a new prefix and delete the previous Pipeline.

Alternatively, you can re-assign specific Objects to new Destination tables and give the desired name, with or without any prefix, using Schema Mapper. To do this:

  1. Disable the Auto Mapping for the specific Object from the Mapping Summary page.

  2. Use the Create Table & Map option to create the new Destination table.

  3. Specify the table name with the desired prefix in the Table Name field, and click CREATE TABLE & MAP.

The Object gets mapped to the new Destination table, and the data gets replicated into this table as per the Pipeline schedule.

Specify Destination Table name

For more information, read Mapping an Event Type to a New Destination Table.

Note: You can ingest the historical data into the new Destination table by using the Restart Historical Load for the Object from the Pipeline Detailed View.


How can I transfer Excel files using Hevo?

You can load your Excel files in Google Drive and use Hevo Pipelines to transfer these to a Destination of your choice. To do this:

  1. Upload the Excel files into your Google Drive.

  2. Create a Pipeline with Google Drive as the Source. For more information, read Configuring Google Drive as a Source.

The Pipeline gets created and data starts ingesting to the Destination.

Note:

  • Hevo treats each Google Drive folder as a Pipeline object.

  • For Excel files having multiple worksheet tabs, Hevo ingests each tab as a separate Event Type and replicates the data to the corresponding Destination tables.


How can I add new sheets into an existing Pipeline created with Google Sheets as the Source?

You can add new sheets into an existing Pipeline by modifying the Source configuration settings of the Pipeline. To do this, in the Pipeline Detailed View:

  1. Click the Settings icon next to the Source name to edit the Source configuration.

    Edit Source Configuration

  2. Click the Edit icon in the hover window.

  3. In the Configure your Google Sheets Source page, select the check box next to the sheets whose data you want to replicate. By default, all the tabs of a sheet are selected. However, you can click on the Expand icon for a sheet to display the list of tabs it contains and select the ones you need.

  4. Click TEST & CONTINUE.

Add Google Sheets

The newly selected Google sheets are added in the Pipeline Object section and get replicated to the Destination as per the Pipeline schedule.

Note: If the Pipeline uses OAuth for user authorization, then, only the creator of the Pipeline can edit its configurations. Other users must reauthorize the account to make any changes.


How can I change the query mode for Pipelines created with Table Mode?

For an existing Pipeline created with the Table Pipeline mode, you can only change the query mode of individual objects and not of the entire Pipeline.

To change the query mode for an object:

  1. Select the Edit Config option from the Action menu of the object in the Pipeline Detailed View page.

    Select the Edit option

  2. Select the required Query Mode from the drop-down.

    Select the Query Mode

    Depending on the query mode, specify the following parameters:

    Query Mode Parameter
    Full Load None
    Delta-Timestamp Timestamp Column Name and Timestamp Column Delay
    Change Data Capture Incrementing Column Names, Timestamp Column Names, and Timestamp Column Delay
    Unique Incrementing Append Only Unique Column Names

    For example, for the Delta-Timestamp query mode, you must specify the parameters as shown below:

    Delta-Timestamp Query Mode Example

  3. Click SAVE CHANGES.

The ingestion for the object starts once you change the query mode. Please note that the entire data is ingested again from the beginning. The re-ingested historical data is not billed.

Read Query Modes for Ingesting Data from Relational Databases for more information.

Note: This method works only for the Pipelines having a relational database (such as Amazon Aurora MySQL, Amazon Redshift, SQL Server, MySQL, or PostgreSQL) as the Source.


How can I restart the historical load for all the objects at once?

To restart the historical load for all the objects at once, in the Pipeline Detailed View:

  1. Select the Objects check box to select all the objects in the Pipeline. You can also select specific objects by selecting the check box next to their names. Restart Historical Load for all objects

  2. Select the Restart option from the MORE menu to start the historical data ingestion.

Note: The re-ingested historical data is not billed.


How does the Timing of Change Schedule Work?

As soon as you change the schedule of a Pipeline, the ingestion of Events starts as per the new frequency.

Suppose you change the schedule at 11.00 a.m. (UTC) and set a 12-hour ingestion frequency. Then, the Events are ingested at a 12-hour interval starting immediately, which means, at 11.00 a.m. (UTC) and 11.00 p.m. (UTC) daily.

Read Scheduling a Pipeline to know the steps to change the schedule.


Can I alphabetically sort Event Types listed in the Schema Mapper?

It is not possible to sort Event Types listed in the Schema Mapper alphabetically. However, you can view the Event Types based on their mapping status.

To view the Event Types by their status:

  1. In the Pipelines List View, click on the Pipeline for which you want to view the Event Types.

  2. From the Tools Bar, click the Schema Mapper icon to access the Schema Mapper Overview page.

    Schema Mapper Overview

  3. Click the Filter icon, and from the drop-down list, select the status for which you want to view the associated Event Types.

    Filter Event Statuses

Read Mapping Statuses.


Will pausing the ingestion of some objects increase the overall ingestion speed of the Pipeline?

Hevo is horizontally scalable, which means that it automatically increases its resources with the increase in the amount of data being ingested. Thus, pausing the ingestion of some objects does not have an impact on the ingestion speed of the Pipeline.

You can, however, choose to ingest the Events more frequently by changing the Pipeline schedule. Read Scheduling a Pipeline for more information.


Can I see the percentage completion of the historical load?

You can see the percentage completion of the historical load only for log-based Pipelines. It is visible under Historical Load Progress in the Pipelines Detailed View. This information is not available for SaaS and REST API Sources.

Historical Load Progress

The percentage completion value represents the percentage of objects ingested out of all the active objects. For example, if you have enabled the historical load for three objects, and two of them have been ingested, Hevo shows the percentage completion as 67%.


What happens if I delete a Pipeline and re-create a similar one again?

When you delete a Pipeline, the following elements are not deleted:

  • The data present in the Source

  • The data loaded to the Destination

  • Schema of the Destination tables

If, after deleting a Pipeline, you re-create the same Pipeline with the same Destination table, then the entire data is re-loaded into the tables created by the previous Pipeline. This increases the number of ingestions if you do not specify the position offset in the Pipeline correctly. Read more about position offset in SaaS and RDBMS Sources. You may also get duplicate records if primary keys are not defined for the Destination table. These duplicate records count towards quota consumption and are also billed.

Read Loading Data to a Data Warehouse and Loading Data in a Database Destination to know how Hevo loads your data based on whether the primary keys are available or not.


How can I load an XML file from an S3 folder using Hevo?

To load your XML files from your S3 folder, specify the path to the folder in the Path Prefix field and select XML in the File Format drop-down in the Configure your S3 Source page during Pipeline creation.

File Format in S3 Source Configuration Page

Once the Pipeline is created, all your XML files present in the folder get replicated into the Destination table. You can verify this in the Schema Mapper as shown in the following image:

XML files loaded at the Destination

NOTE: You can only load files of a single format in each Pipeline. To load files of different formats, you need to create separate Pipelines for each format. Read Creating a Pipeline.


How can I load different folders of an Amazon S3 bucket as separate Event Types?

While configuring Amazon S3 as a Source for the Pipeline, the Create Event Types from folders setting controls whether folders from an S3 bucket are loaded as different Event Types or merged into one.

For the folders to be loaded as separate Event Types, the files they contain must be in either CSV or JSON format.

You can enable and disable the Create Event Types from folders option in the Configure your S3 Source page as shown in the image below:

CreateEventTypesFromFolders Option

Let us suppose that you have an S3 bucket, hevo-demo-3, containing the folders, Json, and parent_folder:

S3 Source Folder

  • If the Create Event Types from folders option is enabled, then the two folders are loaded as separate Event Types:

    Folders loaded as Separate EventTypes

  • If the Create Event Types from folders option is disabled, then the two folders are loaded as a single Event Type. The name of the Event Type is the same as the S3 folder name:

    Folders loaded as single EventType



Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Nov-22-2021 NA Added the following FAQs:
- Will my Pipeline get paused if I exceed the monthly Events quota?
- Is my data stored after I delete a Pipeline?
- How can I change the data type of a Destination table column?
Nov-09-2021 NA Added the following FAQs:
- How can I load only filtered Events from the Source to the Destination?
- My Historical Load Progress is still at 0%. What does it mean?
- How can I change sort keys and their order in a Redshift Destination table?
- How can I make sure that each record is loaded only once?
- How does changing the query mode of a Pipeline in table mode affect data ingestion?
- How does Transformation affect the structure of the Destination table?
- How can I set a field as a primary key to avoid duplication?
- Does triggering ingestion using Run Now affect the scheduled ingestion frequency of the Pipeline?
Oct-25-2021 NA Added the following FAQs:
- What happens if I delete a Pipeline and re-create a similar one again?
- How can I load an XML file from an S3 folder using Hevo?
- How can I load different folders of an Amazon S3 bucket as separate Event Types
Oct-19-2021 NA Added the following FAQs:
- What happens if I delete a Pipeline and re-create a similar one again?
- How can I load an XML file from an S3 folder using Hevo?
- How can I load different folders of an Amazon S3 bucket as separate Event Types?
Oct-04-2021 NA Added the following FAQs:
- Can I see the percentage completion of the historical load?
- Will pausing the ingestion of some objects increase the overall ingestion speed of the Pipeline?
- How does the timing of Change Schedule work?
- Can I alphabetically sort Event Types listed in the Schema Mapper?
Sep-20-2021 NA Added the following FAQs:
- How can I restart the historical load for all the objects at once?
- How can I restart the historical load for all the objects at once?
Sep-09-2021 NA Added the FAQ, How can I change the query mode for Pipelines created with Table Mode?
Aug-25-2021 NA Added the following FAQs:
- How can I transfer Excel files using Hevo?
- How can I add more sheets in Google Sheets after creating a Pipeline?
Aug-17-2021 NA Added the following FAQs:
- Can I connect tables from multiple Pipelines to a common Destination table?
- How can I change or delete the Destination table prefix after creating a Pipeline?
Aug-09-2021 NA Added the FAQ, How can I change the ingestion frequency for individual tables?
Mar-09-2021 NA - New document.
- Merged the Why is there a delay in my Pipeline? FAQ into this document.
Last updated on 24 Nov 2021