FTP / SFTP

Hevo lets you load data from files in an FTP location into your data warehouse.

Note: For Pipelines created with this Source, Hevo provides you a fully-managed BigQuery data warehouse Destination if you do not already have one set up. You are only charged the cost that Hevo incurs for your project in Google BigQuery. The invoice is generated at the end of each month and payment is recovered as per the payment instrument you have set up. You can now create your Pipeline and directly start analyzing your Source data. Read Hevo Managed Google BigQuery .

Connection Settings

Provide FTP connection details on FTP Connection Settings page. You will have the following options in the connection details block:

  1. Pipeline Name - A unique name for the Pipeline.
  2. Type - Select FTP or SFTP.
  3. Host - The IP address or the DNS for your FTP location.
  4. Port - Port at which Hevo can connect with your FTP/SFTP Server.
  5. User - User to log in to the FTP/SFTP server.
  6. Password - Password to be used to log in to the FTP/SFTP server. The password is optional for SFTP type connections. However, in that case, you will have to add our public key displayed on the UI to the .ssh/authorized_keys file on your SFTP Server.
  7. Path Prefix - Path Prefix for the data directory. By default, the files are listed from the root of the directory.
  8. File Format - Choose a file format. Hevo currently supports CSV, JSON and XML formats. Contact Hevo Support if your Source data is in a different format. Based on the format you select, you must specify some additional settings:
    • CSV:
      • Specify the Field Delimiter. This is the character on which fields in each line are separated. For example, `\t`, or `,`).
      1. Disable the Treat First Row As Column Headers option if the Source data file does not contain column headers. Hevo, then, automatically creates these during ingestion. Default setting: Enabled.
        See Example below.
    • XML:
      • Enable the Create Events from child nodes option to load each node under the root node in the XML file as a separate Event.
  9. Create Event Types from folders - Enable this option if the prefix path has subdirectories containing files in different formats. Hevo reads each subdirectory as a separate Event Type. Note: Files lying at the prefix path (and not in a subdirectory) are ignored.

  10. Connect through SSH: Enable this option to connect to Hevo using an SSH tunnel, instead of directly connecting your FTP host to Hevo. Read Connecting Through SSH.

If this option is disabled, you must whitelist Hevo’s IP addresses to allow Hevo to connect to your MySQL host.

Things to Note

  • Gzipped files are automatically unzipped on ingestion by Hevo.
  • Files are re-ingested on update.

Example: Automatic Column Header Creation for CSV Tables

Consider the following data in CSV format, which has no column headers.

  CLAY COUNTY,32003,11973623
  CLAY COUNTY,32003,46448094
  CLAY COUNTY,32003,55206893
  CLAY COUNTY,32003,15333743
  SUWANNEE COUNTY,32060,85751490
  SUWANNEE COUNTY,32062,50972562
  ST JOHNS COUNTY,846636,32033,
  NASSAU COUNTY,32025,88310177
  NASSAU COUNTY,32041,34865452

If you disable the Treat first row as column headers option, Hevo auto-generates the column headers, as seen in the schema map here:

Column headers generated by Hevo for CSV data

The record in the Destination appears as follows:

Destination record with auto-generated column headers

Last updated on 10 Nov 2020