Setting up MySql binary logs for replication

A binary log is a collection of log files that records information about data modifications and data object modifications made on a MySQL server instance. Typically binary logs are used for data replication and data recovery.

Hevo supports data ingestion for replication from MySql servers via binary logs. The minimal configuration that must be available for enabling ingestion via binary logs is as follows:

  1. Binary logging must be enabled on the server. The options may be provided via the MySql server configuration file or via server startup options to mysqld. Some of the relevant options are:
    • log-bin: Set this to enable binary logging by providing a base name for the log files
    • binlog-format: This must be set as ROW
    • binlog-do-db: Database for which binary logging is to be enabled, one per database (optional)
    • binlog-ignore-db: Database for which binary logging is to be disabled, one per database (optional)
    • binlog_checksum: To enable the master server to write checksums for each log event (optional)
    • server_id: A number greater than 0. If slaves are present, all of the servers should have unique identifiers
  2. The database user provided to Hevo in the connection settings must have the following global privileges:
    • SELECT
  3. The retention period of the binary-logs should not be less than 7 days, this will help us make sure we don’t miss any log-file to read especially under the settings where historical data is enabled to be loaded.

The replication reference guide on MySql’s documentation portal provides a complete reference to the options available for replication and binary logging.

After a historical load of the initial state of the MySQL database, the binlog is continuously streamed. Also note that for the first time when the data is ingested, the table definitions are directly read from the schema and not the binary logs. Post that all of the schema updates are read from the binary logs.

We found this to be the best approach for near real-time replication at scale. It supports both deletions and table alterations leading to exactly one-to-one replication. It doesn’t require locks or affect the performance of the database. It is also very consistent with the stream processing paradigm allowing near real-time performance.