MySQL replication is the foundation of most production database architectures. It lets you maintain one or more read replicas in near-real-time sync with a primary, enabling read scale-out, failover, and backup offloading without touching the primary under load.

This guide walks through setting up async primary-replica replication with GTIDs on MySQL 8.0 — the setup I recommend for any new deployment.

How Replication Works

MySQL replication is event-driven: the primary records every write operation to the binary log (binlog). Each replica runs two threads:

With GTIDs (Global Transaction Identifiers), every transaction gets a unique ID assigned by the primary. This makes failover and replica re-pointing dramatically simpler — no more tracking binlog filenames and byte offsets.

Configuring the Primary

Edit /etc/mysql/mysql.conf.d/mysqld.cnf (or your distro's equivalent) and add the following under [mysqld]:

[mysqld]
server-id       = 1
log_bin         = /var/log/mysql/mysql-bin.log
binlog_format   = ROW
gtid_mode       = ON
enforce_gtid_consistency = ON
bind-address    = 0.0.0.0

Then restart MySQL and create a dedicated replication user:

CREATE USER 'repl'@'%' IDENTIFIED BY 'strong_password_here';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';

Restrict the replication user to the replica's IP ('repl'@'10.0.1.x') instead of '%' in production for better security.

Provisioning the Replica

The replica needs a consistent copy of the primary's data as a starting point. Use mysqldump for small datasets or Percona XtraBackup for large ones (no downtime, no lock).

# With mysqldump (consistent snapshot via --single-transaction, no table locks for InnoDB)
mysqldump -h primary-host -u root -p \
  --all-databases \
  --single-transaction \
  --source-data=2 \
  --set-gtid-purged=ON \
  > /tmp/full-dump.sql

# Restore on the replica
mysql -u root -p < /tmp/full-dump.sql

Configure the replica's my.cnf:

[mysqld]
server-id                = 2
gtid_mode                = ON
enforce_gtid_consistency = ON
read_only                = ON
log_bin                  = /var/log/mysql/mysql-bin.log
relay_log                = /var/log/mysql/mysql-relay-bin.log

Starting Replication

With GTIDs, pointing the replica at the primary is straightforward:

CHANGE REPLICATION SOURCE TO
  SOURCE_HOST     = '10.0.1.100',
  SOURCE_USER     = 'repl',
  SOURCE_PASSWORD = 'strong_password_here',
  SOURCE_AUTO_POSITION = 1;

START REPLICA;

SOURCE_AUTO_POSITION = 1 tells MySQL to use GTIDs to automatically determine where to start — no filename or position needed.

Verifying the Replica

Check the replica's status immediately after starting:

SHOW REPLICA STATUS\G

The two lines you care about most:

Replica_IO_Running: Yes
Replica_SQL_Running: Yes

Both must show Yes. Also check Seconds_Behind_Source — it should drop to 0 within seconds to minutes depending on the initial data volume. A value that keeps climbing means the replica can't keep up with the primary's write rate.

Common Issues

Error 1062: Duplicate entry

The replica already has a row the primary is trying to insert. This usually means data was written directly to the replica (bypassing read_only) or the initial dump was inconsistent. Fix the data divergence and then skip the bad transaction with:

SET GTID_NEXT = 'uuid:transaction_id';
BEGIN; COMMIT;
SET GTID_NEXT = AUTOMATIC;
START REPLICA;

Error 1032: Row not found

The replica is missing a row the primary wants to update or delete. Same root cause as above — data divergence. Investigate with pt-table-checksum from Percona Toolkit, then fix with pt-table-sync.

High replication lag

Check whether the bottleneck is I/O (replica disk can't keep up) or CPU (SQL thread is slow). Enable parallel replication to replay transactions in parallel based on the primary's commit order:

# In replica my.cnf
replica_parallel_workers = 4
replica_parallel_type    = LOGICAL_CLOCK

Conclusion

GTID-based replication is the right default for MySQL 8.0. It removes the fragility of binlog file/position tracking, makes failover straightforward, and is required for most HA tooling like Orchestrator or Group Replication. Once you have replication running smoothly, it becomes the backbone for read scaling, zero-downtime schema migrations, and disaster recovery.


Running into a replication issue not covered here? Get in touch — replication debugging is one of my favorite things to untangle.