Release Version 2.28

Last updated on Oct 08, 2024

The content on this site may have changed or moved since you last viewed it. As a result, some of your bookmarks may become obsolete. Therefore, we recommend accessing the latest content via the Hevo Docs website.

This release note also includes the fixes provided in all the minor releases since 2.27.

To know the complete list of features available for early adoption before these are made generally available to all customers, read our Early Access page.

To know the list of features and integrations we are working on next, read our Upcoming Features page!

In this Release


Hevo Edge Pipelines (Early Access)

We are excited to announce the release of Hevo Edge, which brings significant enhancements across reliability, performance, observability, and cost efficiency. Here is what’s new:

Request Early Access

Key Features (Edge)

Pipeline Reliability and Control

  • Enhanced CDC Connector: Experience faster and more reliable data capture that reduces delays in replicating data. This ensures your Pipelines stay up-to-date providing seamless synchronization between your Source and Destination.

  • Enhanced Error and Schema Handling: New Pipeline-level controls allow for more flexible and predictable error handling and schema evolution. This is helpful for handling a variety of replication scenarios.

  • Failure Handling: Failure handling options ensure consistency between Source and Destination.

  • Alerts on Job Failures: Alerts are triggered when jobs fail, ensuring immediate awareness and timely action to resolve issues.

Pipeline Observability

  • Detailed Job Execution Insights: Filter and view granular details of each job run, including duration and end-to-end latency for better performance understanding.

  • Access Session Log for Job: Quickly download session logs for individual jobs, which provide detailed insights into job execution and facilitate troubleshooting.

  • Display Objects Offset and Latency: You can now track the offset of your objects to monitor data sync progress and identify latency between Source and Destination.

  • Source Monitoring: Track PostgreSQL Write-Ahead Logging (WAL) disk usage with built-in monitoring and alerts. This feature helps you prevent disk capacity issues before they impact Pipeline performance.

Pipeline Performance

  • Upto 4x faster performance for loading historical data.

  • Experience 10x speed improvements for incremental data runs.

  • Predictable performance with isolated runtime for each job.

Cost Efficiency

  • Predictable Destination Loads: Deterministic Destination loads lead to predictable warehouse costs.

  • No Metadata Query Costs: Avoid costs from metadata queries.

  • Accurate Data Types: Prevents unnecessary data type changes at the Destination due to deterministic data type inference, minimizing data processing overhead and ensuring consistency.

  • Cost Savings with Append-Only Mode: The append-only mode provides substantial cost savings compared to the traditional merge method.

With Hevo Edge, we are providing a more robust and efficient platform, designed to elevate the reliability and performance of your data integration processes.

Request Early Access to Edge Pipelines and be among the first to leverage Hevo’s cutting-edge capabilities!

Request Early Access

Limitations (Edge)

Pipelines

  • Limited Edit Functionality: Currently, in the Pipeline configuration, you can only edit the object, field, and sync frequency. More advanced edit options are not yet available.

  • WAL Slot Monitoring Threshold: Users cannot modify the Write-Ahead Logging (WAL) slot monitoring threshold. If you want to disable or make any changes, you must contact Hevo support.

  • Object Limit: Each Pipeline currently supports up to 25,000 objects. If your Pipeline has more than 25000 objects, you must contact Hevo support.

  • Historical Job Progress: Currently, historical sync jobs remain in progress without displaying any updates or statistics on the job details page until they are completed.

  • Standard Pipelines Migration Not Supported: Currently, migrating existing Standard Pipelines to Edge Pipelines is not possible.

  • Delayed Incremental Ingestion: Incremental ingestion starts only after the historical load is complete for all tables in the Pipeline. Sometimes, this process may take a lot of time, during which the Write-Ahead Log (WAL) slot can increase significantly. You can enable WAL monitoring to avoid database downtime.

  • Case-Sensitive Identifiers in Snowflake: Currently, Edge Pipelines create case-sensitive tables and columns in Snowflake. To avoid errors, users must use quoted identifiers when querying data in Snowflake.

  • High Latency During Data Spikes: The polling mechanism ingests all available data from the logs at the start of each poll before loading data to the Destination. This can result in high end-to-end (E2E) latencies when the Source database experiences large data spikes.

  • Potential Data Loss on Resyncing the Pipeline: Currently, in the resync Pipeline operation, existing tables are dropped and replaced with new ones. This process could result in a data loss, such as permanently losing information about rows deleted from the Source tables. Furthermore, if data is being loaded in the Append mode, access to the history of all changes made in the Source database will no longer be available.

  • Features Not Supported:

    • Custom mapping of fields.

    • Python-based Transformations.

    • Loading data at a specific time.

    • Existing SQL and dbt™ Models are not compatible.

Sources - PostgreSQL

  • Unsupported Data Types: Currently, geometry and geography data types are not supported for Sources.

  • PostgreSQL versions 15.8 and lower do not support logical replication on read replicas. This feature is available starting from version 16.

Upcoming Features (Edge)

  • New Source: Support for MySQL and Amazon RDS Oracle as a Source will be introduced. This integration will allow you to seamlessly connect, extract, and replicate data from these databases.

  • New Destination: Integration with Amazon S3 as a Destination. This feature lets you replicate your data directly into Amazon S3, leveraging its scalable storage solutions for further analysis.

  • 5-Minute Schedule: Pipelines will support scheduling syncs as frequently as every 5 minutes.

  • Improved Job Progress Visibility: Historical jobs will show progress updates, offering better observability during job execution.

  • Alerts: Ability to configure alert preferences, subscribe for Pipeline alerts created by another user, and view alerts in the dashboard.

  • Public API Availability: Manage your Pipelines programmatically using the public APIs. This provides you with the create, read, update, and delete operations for your data Pipelines.

  • Set Data Replication Type: Ability to configure the replication process to skip historical data and replicate only incremental data to the Destination, streamlining data sync and reducing processing time.

  • Support for Key Pair Authentication in Snowflake: This feature allows you to connect to your Snowflake data warehouse Destination using a key pair for authentication, making the connection more secure. This method allows you to provide a private key instead of a database password while configuring your Snowflake Destination.


Early Access Features

Destinations

  • Amazon Redshift Serverless as a Destination

    • Integrated Amazon Redshift Serverless as a cloud-based data warehouse Destination for creating Pipelines. Redshift Serverless automatically scales resources based on your needs, simplifying data analysis and management, while its pay-as-you-go pricing offers a cost-effective solution for handling unpredictable workloads. You can contact your Hevo account executive or the Support team to enable Amazon Redshift Serverless for your team.

      Read Amazon Redshift Serverless.

      Request Early Access

      Refer to the Early Access page for the complete list of features available for early access.

  • Support for Key Pair Authentication in Snowflake

    • Hevo now supports connecting to your Snowflake data warehouse Destination using a key pair for authentication, thus making the connection more secure. This method allows you to provide a private key instead of a database password while configuring your Snowflake Destination. You can contact your Hevo account executive or the Support team to enable the Key Pair Authentication feature for your team.

      Request Early Access

      Refer to the Early Access page for the complete list of features available for early access.


New and Changed Features

Documentation

  • Hevo Edge Documentation

    • The Hevo documentation site has been updated with a new tab dedicated to Hevo Edge. You can navigate and access Hevo Edge documentation. Additionally, the search functionality now includes a separate tab for Hevo Edge content.

      While the full documentation is not yet available, we are continuously adding and updating content.

Sources

  • Support for Exports API in Amazon Ads Source (Added in Release 2.27.3)

    • Effective Release 2.27.3, Hevo uses the Exports API for data ingestion related to core objects of Sponsored Products. Amazon Ads will deprecate the Snapshots API, which Hevo previously used, on October 15, 2024. To ensure continued support for existing functionalities, Hevo will automatically migrate your existing Amazon Ads Pipelines to the new API.

      The API update impacts data ingestion for the following objects:

      Objects Changes
      Ad Groups - Source object schema is changed and substituted by new object, Ad_Groups_V2.
      Campaigns - Source object schema is changed and substituted by new object, Campaigns_V2.
      Campaigns Negative Keywords - Source object is deprecated, and the data has been moved to the Product_Targeting_V2 object. You can filter this data type within the object using the targetType field.
      Keywords - Source object is deprecated, and the data has been moved to the Product_Targeting_V2 object. You can filter this data type within the object using the targetType field.
      Negative Keywords - Source object is deprecated, and the data has been moved to the Product_Targeting_V2 object. You can filter this data type within the object using the targetType field.
      Negative Product Targeting - Source object is deprecated, and the data has been moved to the Product_Targeting_V2 object. You can filter this data type within the object using the targetType field.
      - Child objects negative_product_targeting_expression and negative_product_targeting_resolved_expression are removed.
      Product Ads - Source object schema is changed and substituted by new object, Product_Ads_V2.
      Product Targeting - Source object schema is changed and substituted by new object, Product_Targeting_V2.
      - Product_Targeting_V2 contains data for Negative Product Targeting, Negative Keywords, Keywords, and Campaigns Negative Keywords data types. You can filter the data within the object for these data types using the targetType field.
      - Child objects product_targeting_expression and product_targeting_resolved_expression are removed.

      In your existing Pipelines, the older objects you had selected for ingestion will be marked as completed, and the corresponding new objects will be added.

      This change applies to all new and existing Pipelines created with Amazon Ads as the Source.

Fixes and Improvements

Data Loading

  • Handling Incorrect Data Spike Alerts Due to Time Zone Differences (Fixed in Release 2.27.2)

    • Fixed an issue where Data Spike Alerts were generated using UTC time zone, which caused data from two days to overlap for users in different time zones. As a result, users received incorrect alerts with higher than actual Event counts. With this fix, Data Spike Alerts are now aligned with users’ local time zones, ensuring accurate Event counts and preventing incorrect alerts.

Destinations

  • Handling Data Loading Issues for Databricks Destinations (Fixed in Release 2.27.1)

    • Fixed an issue where Hevo was unable to load data into a Databricks table whose schema had changed. This issue occurred when the schema refresh and the sink file uploader tasks ran simultaneously. As a result, the sink file contained data structured according to the earlier schema, and Hevo was unable to load data to the Destination table. Now, if the sink file uploader task fails due to a schema mismatch, Hevo sidelines all the Events in the sink file and recreates the sink file with the updated schema.

      The fix is currently implemented for teams only in the AU (Australia) region to monitor its impact. Based on the results, it will be deployed to teams across all regions in a phased manner and does not require any action from you.

Pipelines

  • Handling of Bulk Restart of Objects (Fixed in Release 2.27.1)

    • Fixed an issue where bulk restart was not completed for all objects when more than 1000 objects were selected. The limit has now been removed, ensuring that the bulk restart is completed for all selected objects.

Sources

  • Handling Account Selection Issues in LinkedIn Ads

    • Fixed an issue where the LinkedIn Ad Accounts drop-down was not visible during Pipeline creation. This issue occurred because the API call to fetch LinkedIn Ad Accounts was returning null values due to pagination. As a result, users were unable to select an account and create the Pipeline. With this fix, pagination has been removed, and the API now returns all ad accounts in a single response, ensuring that the accounts are correctly displayed during Pipeline creation.
  • Handling Data Mismatch Issues in Facebook Ads (Fixed in Release 2.27.3)

    • Fixed an issue in Facebook Ads whereby the API call to fetch the attribution values returned incorrect data in the Destination when multiple action types were present in the attribution list. This issue occurred due to incorrect parsing of attribution values, where all action types used the same key, leading to overwritten values and only the last action type being returned. With this fix, each action type now has unique values based on the selected attribution settings, ensuring correct data in the Destination.

      This fix applies to all new Pipelines created after Release 2.27.3. Contact Hevo Support to enable the fix for your existing Pipelines.

  • Handling of Data Ingestion Issues in Salesforce Bulk API V2 Source (Fixed in Release 2.27.2)

    • Fixed the following issues in the Salesforce Bulk API V2 integration:

      • High data ingestion: Hevo repeatedly re-ingested the same data in consecutive polls after a poll failure, resulting in high data ingestion and Events count. The issue occurred because the locator, which tracks the last ingested page of records, was set to null after a poll failure. This caused Hevo to re-ingest previously fetched records. With this fix, Hevo now retrieves the locator that is stored in offset, ensuring accurate data ingestion.

      • Data ingestion delay: The data ingestion task was incorrectly marked as completed while the polling job was still in progress. This led to a delay in data ingestion as the next ingestion task was scheduled based on the Pipeline frequency. With this fix, if the polling job is still in progress, the ingestion task is deferred and retried every 5 minutes, ensuring timely data ingestion.

  • Handling of Deleted Fields in Salesforce Bulk API V2 Source (Fixed in Release 2.27.1)

    • Fixed an issue in the Salesforce Bulk API V2 integration where the deletion of a field from an object during an ongoing polling job caused the job to fail with an invalid field permission error. The invalid job ID was not cleared from the offset, causing the system to repeatedly query the deleted field and resulting in data ingestion failures. With this fix, the invalid job ID is cleared from the offset, and a new job with a new ID is initiated for seamless data ingestion.
  • Handling of Issues During Migration of Google Sheets Pipelines to Service Accounts

    • Missing Shared Folders: Fixed an issue where shared folders were not accessible during migration to a service account. Hevo was missing a key parameter needed to query Google Sheets, which prevented access to folders not owned by the user. With this fix, Hevo will now correctly retrieve all shared folders, allowing users to easily migrate their data.

    • Incorrect Account Display: Fixed an issue wherein, during migration, the first service account was displayed instead of the selected authorized service account. This occurred due to incorrect logic for retrieving the account name. With this fix, the correct authorized service account will now be displayed during the migration process.

  • Improved Offset Management in Salesforce Source (Fixed in Release 2.27.1)

    • Fixed an issue in the Salesforce integration where Hevo was not resetting the job ID for Full Load objects after all records were polled. The job ID, used as the offset for Full Load objects, was not being cleared. This caused Hevo to query the same ID repeatedly until it expired, leading to data ingestion failures. After the fix, the job ID is cleared from the offset once polling is complete, ensuring accurate data ingestion.

Documentation Updates

The following pages have been created, enhanced, or removed in Release 2.28:

Destinations

Getting Started

Sources

Tell us what went wrong