MySQL is the most popular Open Source Relational SQL Database Management System used by small and large businesses. It has a customizable software and works well even with large data sets.
You can ingest data from your MySQL database using Hevo Pipelines and replicate it to a warehouse of your choice.
Prerequisites
Perform the following steps to configure your Generic MySQL Source:
Set up MySQL Binary Logs for Replication
A binary log is a collection of log files that records information about data modifications and data object modifications made on a MySQL server instance. Typically binary logs are used for data replication and data recovery.
Hevo supports data ingestion for replication from MySQL servers via binary logs (BinLog). For this, binary logging must be enabled on your MySQL server. You can do this via the MySQL server configuration file or via server startup options to mysqld
.
Follow these steps to set up BinLog replication:
1. Check if BinLog replication is already enabled
-
Access the MySQL database you want to check for BinLog activity:
mysql -h hostname -u user -p database
-
Open a secure shell:
-
Enter the command:
If this statement returns a value of 1, BinLog is active. If value returned is 0, this means that BinLog is disabled. To enable it, follow the steps below.
2. Enable BinLog replication
-
Log in to your MySQL server instance.
-
Check your MySQL Server configuration:
sudo nano /etc/mysql/my.cnf
This may be sudo nano /etc/my.cnf
in some cases.
-
In the config file, ensure the following configurations are specified. If not specified, add them now.
[mysqld]
binlog_format=ROW
binlog_row_image=FULL
expire_logs_days=3 -- The retention period (`expire_log_days`) can also be set in seconds by using the command: `binlog_expire_logs_seconds=259200`
log_bin=mysql-binlog -- For ubuntu, use: `/var/log/mysql/mysql-bin.log`
server-id=1 -- (only in the case of ubuntu)
log_slave_updates=1
max_allowed_packet=16777216 -- The SQL statement that the server can accept from a client. It is specified in bytes.
gtid_mode=ON
enforce_gtid_consistency=ON
Note:
-
The log_slave_updates
setting is required only if you are connecting a read replica. When it is set to 1, the replica logs any updates received from the main database, maintaining a record of those changes in its log.
-
Enabling Global Transaction Identifiers (GTIDs) is recommended because it simplifies replication by uniquely identifying transactions. This makes it easier to track them and ensures that the primary and replica servers are in sync.
-
Restart the MySQL server using the following command:
-
After restart, log in to the MySQL server to check BinLog again:
The value returned is now 1, indicating that BinLog is active.
Note: The retention period should ideally be at least 72 hours (3 days). This helps Hevo ensure that no log-file is missed from being read, specially when historical data loading is enabled.
-
Run the following command to check the current value of binlog_row_value_options
variable:
show global variables where variable_name = 'binlog_row_value_options';
-
If the value is set to PARTIAL_JSON
, unset this variable by:
set @@global.binlog_row_value_options="" ;
The replication reference guide on MySQL’s documentation portal provides a complete reference of the options available for replication and binary logging.
Allowlist Hevo IP addresses for your region
You need to allowlist the Hevo IP address for your region to enable Hevo to connect to your MySQL database. To do this:
-
Edit the MySQL server configuration:
sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf
-
Scroll to [mysqld]
and add:
Or
bind-address = 10.2.7.152
Check the Hevo IP address for your region.
-
Save the file.
Create a Database User and Grant Privileges
1. Create a database user (Optional)
Perform the following steps to create a database user in your MySQL database:
-
Connect to your MySQL database as an admin user with an SQL client tool, such as MySQL workbench.
-
Run the following command to create a user in your database:
CREATE USER <database_username>@'%' WITH mysql_native_password BY '<password>';
Note: Replace the placeholder values in the command above with your own. For example, <database_username> with hevo.
2. Grant privileges to the user
The database user for Hevo requires the following privileges:
Privilege |
Grants access to |
SELECT |
Retrieve rows from the database tables. |
RELOAD |
Clear or reload internal caches, flush tables, or acquire locks. |
SHOW DATABASES |
View the list of database names in the server. |
REPLICATION CLIENT |
Access the MySQL server’s BinLog for replication. |
REPLICATION SLAVE |
View replication status and log details. |
Connect to your MySQL database as an admin user with an SQL client tool, such as MySQL Workbench, and run the following script. These commands grant only the privileges required by Hevo to ingest data from your MySQL database.
# Grant Privileges to the Database User
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO <database_username>@'%';
# (Optional) View the grants for the user:
show grants for <database_username>@localhost;
# Finalize the User’s Permissions
FLUSH PRIVILEGES;
Note:
-
Replace the placeholder values in the commands above with your own. For example, <database_username> with hevo.
-
The SELECT
, RELOAD
, and SHOW DATABASES
privileges are required only for the historical load.
Perform the following steps to configure your MySQL Source:
-
Click PIPELINES in the Navigation Bar.
-
Click + CREATE PIPELINE in the Pipelines List View.
-
On the Select Source Type page, under All Sources, click Edge, and then select MySQL.

-
In the MySQL screen, specify the following:

-
Source Name: A unique name for your Source, not exceeding 255 characters. For example, MySQL Source.
-
In the Connect to your MySQL section:
-
Database Host: The MySQL host’s IP address or DNS. For example, 10.123.10.001.
Note: For URL-based hostnames, exclude the http:// or https:// part. For example, if the hostname URL is http://mysql-replica.westeros.inc, enter mysql-replica.westeros.inc.
-
Database Port: The port on which your MySQL server listens for connections. Default value: 3306.
-
Database User: The authenticated user who has the permissions to read tables in your database. This user can be the one you created in Step 3 above. For example, hevouser.
-
Database Password: The password of your database user.
-
Database Names: The comma separated list of databases from where you want to replicate data. For example, demo1, demo2.
-
(Optional) In the Additional Settings section:
-
Use SSH: Enable this option to connect to Hevo using an SSH tunnel instead of directly connecting your MySQL database host to Hevo. This provides an additional level of security to your database by not exposing your MySQL setup to the public.
If this option is turned off, you must configure your Source to accept connections from Hevo’s IP address.
-
Use SSL: Enable this option to use an SSL-encrypted connection. Specify the following:
-
CA File: The file containing the SSL server certificate authority (CA).
-
Client Certificate: The client’s public key certificate file.
-
Client Key: The client’s private key file.
-
Click TEST & CONTINUE to test the connection to your MySQL Source. Once the test is successful, you can proceed to set up your Destination.
Read the detailed Hevo documentation for the following related topics:
Data Type Mapping
Hevo maps the MySQL Source data type internally to a unified data type, referred to as the Hevo Data Type, in the table below. This data type is used to represent the Source data from all supported data types in a lossless manner.
The following table lists the supported MySQL data types and the corresponding Hevo data type to which they are mapped:
MySQL Data Type |
Hevo Data Type |
- BIT(1) - BOOLEAN - TINYINT(1) - TINYINT UNSIGNED(1) |
BOOLEAN |
- TINYINT(>1) - SMALLINT - TINYINT UNSIGNED(>1) |
SHORT |
- INT - MEDIUMINT - SMALLINT UNSIGNED - MEDIUMINT UNSIGNED - YEAR |
INTEGER |
- BIGINT - INT UNSIGNED - BIGINT UNSIGNED |
LONG |
- FLOAT(0-23) |
FLOAT |
- REAL - DOUBLE - FLOAT(24-53) |
DOUBLE |
- NUMERIC - DECIMAL |
DECIMAL |
- CHAR - VARCHAR - TINYTEXT - TEXT - MEDIUMTEXT - LONGTEXT - JSON - ENUM - SET |
VARCHAR |
- TIMESTAMP |
TIME_TZ |
- DATE |
DATE |
- TIME |
TIME |
- DATETIME |
TIMESTAMP |
- BIT(>1) - BINARY - VARBINARY - TINYBLOB - BLOB - MEDIUMBLOB - LONGBLOB |
BYTEARRAY |
At this time, the following MySQL data types are not supported by Hevo:
Note: If any of the Source objects contain data types that are not supported by Hevo, they are marked as unsupported during object configuration in the Pipeline.
Source Considerations
- MySQL does not generate log entries for cascading deletes. So, Hevo cannot capture these deletes for log-based Pipelines.
Limitations
-
Hevo only fetches tables from the MySQL database. It does not fetch other entities such as functions, stored procedures, views, and triggers.
-
Hevo does not set the metadata column __hevo__marked_deleted to True for data deleted from the Source table using the TRUNCATE command. This action could result in a data mismatch between the Source and Destination tables.