Salesforce Bulk API V2

Last updated on Sep 03, 2024

Salesforce is a cloud-based Customer Relationship Management (CRM) platform that allows businesses to effectively handle customer interactions, streamline sales and marketing processes, and analyze the data received from it. It also allows you to create customized applications as per your business requirements that can help enhance customer relationships.

Hevo uses Salesforce’s Bulk API 2.0 to replicate data from your Salesforce applications to the Destination database or data warehouse. To replicate this data, you need to authorize Hevo to access data from the relevant Salesforce environment.

This Source is available for all new and existing customers. However, if you are an existing customer, you must contact Hevo Support to get it enabled for your team. Also, currently, the Source is in Early Access, which means that the Events loaded to the Destination using this Source are free.


Salesforce Environments

Salesforce allows businesses to create accounts in multiple environments, such as:

  • Production: This environment holds live customer data and is used to actively run your business. A production organization is identified by URLs starting with https://login.salesforce.com.

  • Sandbox: This is a replica of your production organization. You can create multiple sandbox environments for different purposes, such as one for development and another for testing. Working in the sandbox eliminates risk of compromising your production data and applications. A sandbox is identified by URLs starting with https://test.salesforce.com.


Source Considerations

  • During incremental loads, derived or calculated fields that obtain their values from other fields or formulas are not updated in the Destination. This implies that even if the values change as a result of a formula or the original field being modified, these fields remain unchanged.

    In Salesforce, whenever any change occurs in an object, its SystemModStamp timestamp field is updated. Hevo uses this SystemModStamp field to identify Events for incremental ingestion. In case of derived fields, a change in the formula or the original field does not affect the object’s SystemModStamp value. As a result, such objects are not picked up in the incremental load. However, if another field in the object is updated simultaneously, then the subsequent incremental load picks up the derived field updates also.

    As a workaround, Hevo automatically restarts the historical load for all such objects in your Pipelines every 20 days by default. You can contact Hevo Support to change this historical load frequency. Additionally, you can also restart the historical load for the object manually. If the object was created after Pipeline creation, you need to restart the historical load at the Pipeline level.

  • Hevo cannot ingest incremental data for back-dated records, as Salesforce does not update the SystemModTimestamp column for such records. Hevo uses this column to identify Events for incremental ingestion.

    As a workaround, you can restart the historical load for the object.

  • When a record from a replicable object is deleted in Salesforce, the IsDeleted column for it is set to True. Salesforce moves the deleted records to the Salesforce Recycle Bin, and they are not displayed in the Salesforce dashboard. Now, when Hevo starts the data replication from your Source, using either the Bulk APIs or REST APIs, it also replicates data from the Salesforce Recycle Bin to your Destination. As a result, you might see more Events in your Destination than the Source.

  • Salesforce retains deleted data in its Recycle Bin for 15 days. Therefore, if your Pipeline is paused for more than 15 days, Hevo cannot replicate the deleted data to your Destination. Apart from this, Salesforce automatically purges the oldest records in the Recycle Bin every two hours if the number of records in the Recycle Bin exceeds the record limit for your organization, which is 25 times your organization’s storage capacity. Therefore, to capture information about the deleted data, you must run the Pipeline within two hours of deleting the data in Salesforce.

  • The maximum number of Events that can be ingested per day is calculated based on your organization’s quota of batches.

    Suppose your organization is allocated a daily quota of 15000 batches per 24 hours, and each batch can contain a maximum of 10000 Events.

    Then, the daily Event consumption is calculated as follows:

    • The number of batches created per object (X) = Number of Events for the Object/10000.

      Note: This value, X is rounded off to the next integer.

    • And,

      The total number of batches created across all objects in the Pipeline (Y) = Sum of the number of batches created for each object (ΣX).

      This number, Y is the number of batches that are submitted in one run of the Pipeline. This number may vary in each run of the Pipeline and is calculated as follows:

      The number of Pipeline runs in a day (Z) = 24/Ingestion frequency (in hours).

      The number of batches that can be submitted in a day = 15000

      Therefore,

      The maximum number of batches that can be submitted in one run of the Pipeline = 15000/Z.

    Example:

    Suppose you have two objects containing 55800 and 25000 Events respectively, and the ingestion frequency is 12 hours. Then,

    The number of batches created for object 1 (X1) = 55800/10000 = 5.58.

    Therefore, six batches are created; five with 10000 Events each and the sixth with 5800 Events.

    The number of batches created for object 2 (X2) = 25000/10000 = 2.5.

    Therefore, three batches are created; two with 10000 Events each and the third with 5000 Events.

    The total number of batches created across all objects in the Pipeline (Y) = X1 + X2 = 6 + 3 = 9.

    These nine batches are submitted in one run of the Pipeline.

    Now, as the Ingestion frequency is 12 hours,

    The total number of Pipeline runs in 24 hours (Z) = 24/12 = 2.

    And,

    The maximum number of batches that can be submitted in one run of the Pipeline = 15000/2 = 7500.

    Here, against the available limit of 7500 batches per Pipeline run, only 9 batches are being submitted.

    Therefore, as long as Z x Y <= 15000, you are within the daily prescribed quota.

  • Salesforce Bulk API V2 does not support the following objects:

    • Attachment

    • ContentDocumentLink

    • ContentFolderItem

    • ContentFolderMember

    • FieldDefinition

    • FlowVersionView

    • IdeaComment

    • ListViewChartInstances

    • PlatformAction

    • SearchLayout

    • Vote


Limitations

  • Hevo does not fetch any columns of Compound data type.

  • It is not possible to avoid loading the deleted data. Hevo loads the new, updated, and deleted data from your Salesforce account.


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Mar-05-2024 2.21 Updated the ingestion frequency table in the Data Replication section.
Sep-11-2023 2.16.2 New document.

Tell us what went wrong