You can load data from files in an FTP location into your Destination database or data warehouse using Hevo Pipelines.

For creating Pipelines using this Source, Hevo provides you a fully managed BigQuery data warehouse as a possible Destination. This option remains available till the time you set up your first BigQuery Destination irrespective of any other Destinations that you may have. With the managed warehouse, you are only charged the cost that Hevo incurs for your project in Google BigQuery. The invoice is generated at the end of each month and payment is recovered as per the payment instrument you have set up. You can now create your Pipeline and directly start analyzing your Source data. Read Hevo Managed Google BigQuery.

Hevo automatically unzips any Gzipped files on ingestion. Further, files are re-ingested if updated, as it is not possible to identify individual changes.

As of Release 1.66, __hevo_source_modified_at is uploaded to the Destination as a metadata field. For existing Pipelines that have this field:

  • If this field is displayed in the Schema Mapper, you must ignore it and not try to map it to a Destination table column, else the Pipeline displays an error.

  • Hevo automatically loads this information in the __hevo_source_modified_at column, which is already present in the Destination table.

You can, however, continue to use __hevo_source_modified_at to create transformations using the function event.getSourceModifiedAt(). Read Metadata Column __hevo_source_modified_at.

Existing Pipelines that do not have this field are not impacted.

Configuring FTP/SFTP as a Source

To configure FTP/SFTP as a Source in Hevo:

  1. Click PIPELINES in the Asset Palette.

  2. Click + CREATE in the Pipeline List View.

  3. In the Select Source Type page, select FTP/SFTP.

  4. In the Configure your FTP/SFTP Source page, specify the following:

    FTP/SFTP settings

    • Pipeline Name: A unique name for the Pipeline.

    • Type: Select FTP or SFTP.

    • Host: The IP address or the DNS for your FTP location.

    • Port: The port at which Hevo can connect with your FTP/SFTP Server. The default port is 21.

    • User: The user ID for logging in to the FTP/SFTP server.

    • Password: The password of the user logging in to the FTP/SFTP server. The password is optional for SFTP type connections. However, in that case, you will have to add our public key displayed on the UI to the .ssh/authorized\_keys file on your SFTP Server.

    • Path Prefix: The path Prefix for the data directory. By default, the files are listed from the root of the directory.

    • File Format: The format of the data file in the Source. Hevo supports the CSV, JSON, TSV, and XML file formats to ingest data.

      Note: You can select only one file format at a time. If your Source data is in a different format, you can export the data to either of the supported formats, and then ingest the files.

      Based on the format you select, you must specify some additional settings:

      • CSV:

        • Specify the Field Delimiter. This is the character on which fields in each line are separated. For example, `\t`, or `,`).

        • Disable the Treat First Row As Column Headers option if the Source data file does not contain column headers. Hevo automatically creates the headers during ingestion. Default setting: Enabled. See Example below.

      • TSV:

        • Disable the Treat First Row As Column Headers option if the Source data file does not contain column headers. Hevo automatically creates the headers during ingestion. Default setting: Enabled.
      • XML: Enable the Create Events from child nodes option to load each node under the root node in the XML file as a separate Event.

    • Create Event Types from folders: Enable this option if the prefix path has subdirectories containing files in different formats. Hevo reads each subdirectory as a separate Event Type.

      Note: Files lying at the prefix path (and not in a subdirectory) are ignored.

    • Connect through SSH: Enable this option to connect to Hevo using an SSH tunnel, instead of directly connecting your FTP host to Hevo. Read Connecting Through SSH.

      If this option is disabled, you must whitelist Hevo’s IP addresses to allow Hevo to connect to your FTP host.

  5. Click TEST & CONTINUE.

  6. Proceed to configuring the data ingestion and setting up the Destination.

Example: Automatic Column Header Creation for CSV Tables

Consider the following data in CSV format, which has no column headers.

  CLAY COUNTY,32003,11973623
  CLAY COUNTY,32003,46448094
  CLAY COUNTY,32003,55206893
  CLAY COUNTY,32003,15333743
  SUWANNEE COUNTY,32060,85751490
  SUWANNEE COUNTY,32062,50972562
  ST JOHNS COUNTY,846636,32033,
  NASSAU COUNTY,32025,88310177
  NASSAU COUNTY,32041,34865452

If you disable the Treat first row as column headers option, Hevo auto-generates the column headers, as seen in the schema map here:

Column headers generated by Hevo for CSV data

The record in the Destination appears as follows:

Destination record with auto-generated column headers

Data replication

Default Pipeline Frequency Minimum Pipeline Frequency Maximum Pipeline Frequency
5 Mins 5 Mins 3 Hrs

See Also

Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Sep-21-2022 NA Added a note in section, Configuring FTP / SFTP as a Source.
Apr-11-2022 1.86 Updated section, Configuring FTP/SFTP as a Source to reflect support for TSV file format.
Mar-21-2022 1.85 Removed section, Limitations as Hevo now supports UTF-16 encoding format for CSV files.
Oct-25-2021 NA Added the section, Data Replication.
Jun-28-2021 1.66 Updated the page overview with information about __hevo_source_modified_at being uploaded as a metadata field from Release 1.66 onwards.
Feb-22-2021 NA Added the limitation about Hevo not supporting UTF-16 encoding format for CSV data.
Last updated on 21 Sep 2022

Tell us what went wrong