Hevo lets you load data from files in an S3 bucket into your data warehouse. Let's walk through steps for adding an S3 source.
1. Create a new Pipeline
Click on PIPELINES option in the left navigation bar and click on Create New Pipeline.
2. Select Source Type
Select S3 from the list on Select Source Type Screen.
3. Provide Connection Settings
Provide S3 connection details on S3 Connection Settings page. You will have following options in connection details block:
- Source Name - A unique name for this source
- Access Key ID - AWS access key ID which has permissions to read from the given bucket
- Secret Access Key - AWS Secret Access Key for the above Access Key ID
- Bucket - The name of the bucket from which you want to ingest data.
- Prefix - Path Prefix for the data directory. By default, the files are listed from the root of the directory.
- Bucket Region - Choose the AWS region where the bucket is located.
- File Format - Choose a file format. Hevo currently supports JSON and CSV formats. Let us know if you need support for a different format.
- Files GZipped - Select this option if the files in your S3 bucket are GZipped.
- Create Schema from folders - Select this option when your prefix path has subdirectories containing files in different formats. Hevo, in that case, will read each of the subdirectories as a separate event type. Please note, that any files lying at the prefix path(and not in any of the subdirectories) will be ignored.
4. Select the Destination
Select the Destination where you want to replicate data from your S3 bucket or Click on NEW DESTINATION to create a new Destination. Check out How to add Destination tutorial for the detailed walkthrough on steps needed for adding new Destination.
5. Pipeline Created
Your Pipeline will be created when you enter this page and you will have an option to see Sample Data and Map Schema.
While Hevo tries to load schemas from your files, you can select CONTINUE IN BACKGROUND if it is taking too much time. Click on CREATE SCHEMA MAPPING to map Source and Destination Schemas, check out Introduction to Schema Mapper to learn about Schema Mapper or you can select DO IT LATER to directly head to Pipeline page. You can map schemas later on Schema Mapper page in your pipeline.
A complete log of all files ingested by Hevo can be found in the File Log
Please note that your data will not start replicating in Destination tables until you map source and Destination schemas.