MySQL replication is the foundation of most production database architectures. It lets you maintain one or more read replicas in near-real-time sync with a primary, enabling read scale-out, failover, and backup offloading without touching the primary under load.
This guide walks through setting up async primary-replica replication with GTIDs on MySQL 8.0 — the setup I recommend for any new deployment.
How Replication Works
MySQL replication is event-driven: the primary records every write operation to the binary log (binlog). Each replica runs two threads:
- I/O thread — connects to the primary and streams binlog events into a local relay log.
- SQL thread — reads the relay log and replays each event against the replica's data files.
With GTIDs (Global Transaction Identifiers), every transaction gets a unique ID assigned by the primary. This makes failover and replica re-pointing dramatically simpler — no more tracking binlog filenames and byte offsets.
Configuring the Primary
Edit /etc/mysql/mysql.conf.d/mysqld.cnf (or your distro's equivalent) and add the following under [mysqld]:
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
gtid_mode = ON
enforce_gtid_consistency = ON
bind-address = 0.0.0.0
Then restart MySQL and create a dedicated replication user:
CREATE USER 'repl'@'%' IDENTIFIED BY 'strong_password_here';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
Restrict the replication user to the replica's IP (
'repl'@'10.0.1.x') instead of'%'in production for better security.
Provisioning the Replica
The replica needs a consistent copy of the primary's data as a starting point. Use mysqldump for small datasets or Percona XtraBackup for large ones (no downtime, no lock).
# With mysqldump (consistent snapshot via --single-transaction, no table locks for InnoDB)
mysqldump -h primary-host -u root -p \
--all-databases \
--single-transaction \
--source-data=2 \
--set-gtid-purged=ON \
> /tmp/full-dump.sql
# Restore on the replica
mysql -u root -p < /tmp/full-dump.sql
Configure the replica's my.cnf:
[mysqld]
server-id = 2
gtid_mode = ON
enforce_gtid_consistency = ON
read_only = ON
log_bin = /var/log/mysql/mysql-bin.log
relay_log = /var/log/mysql/mysql-relay-bin.log
Starting Replication
With GTIDs, pointing the replica at the primary is straightforward:
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '10.0.1.100',
SOURCE_USER = 'repl',
SOURCE_PASSWORD = 'strong_password_here',
SOURCE_AUTO_POSITION = 1;
START REPLICA;
SOURCE_AUTO_POSITION = 1 tells MySQL to use GTIDs to automatically determine where to start — no filename or position needed.
Verifying the Replica
Check the replica's status immediately after starting:
SHOW REPLICA STATUS\G
The two lines you care about most:
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
Both must show Yes. Also check Seconds_Behind_Source — it should drop to 0 within seconds to minutes depending on the initial data volume. A value that keeps climbing means the replica can't keep up with the primary's write rate.
Common Issues
Error 1062: Duplicate entry
The replica already has a row the primary is trying to insert. This usually means data was written directly to the replica (bypassing read_only) or the initial dump was inconsistent. Fix the data divergence and then skip the bad transaction with:
SET GTID_NEXT = 'uuid:transaction_id';
BEGIN; COMMIT;
SET GTID_NEXT = AUTOMATIC;
START REPLICA;
Error 1032: Row not found
The replica is missing a row the primary wants to update or delete. Same root cause as above — data divergence. Investigate with pt-table-checksum from Percona Toolkit, then fix with pt-table-sync.
High replication lag
Check whether the bottleneck is I/O (replica disk can't keep up) or CPU (SQL thread is slow). Enable parallel replication to replay transactions in parallel based on the primary's commit order:
# In replica my.cnf
replica_parallel_workers = 4
replica_parallel_type = LOGICAL_CLOCK
Conclusion
GTID-based replication is the right default for MySQL 8.0. It removes the fragility of binlog file/position tracking, makes failover straightforward, and is required for most HA tooling like Orchestrator or Group Replication. Once you have replication running smoothly, it becomes the backbone for read scaling, zero-downtime schema migrations, and disaster recovery.
Running into a replication issue not covered here? Get in touch — replication debugging is one of my favorite things to untangle.