Pages

Friday, January 15, 2021

How to identify MySQL replication slave lag

Let's first understand how the standard asynchronous
MySQL replication works at a high level.





For simplicity, let's assume that there is a Master and one Slave node.
And MySQL version 5.6

Thus, the master maintains a special log called binary log which stores replication events.
This binary log file stores data that the replication slave will be reading later.

MySQL can use 3 formats for writing data to the binary log:

Statement-based - the SQL query itself is written to the binary log.
Row-based - the row changes are logged in the binary log.
Mixed - statement-based logging is used by default, but the logging mode
             switches automatically to row-based in certain cases.

The slave node has 2 processes/threads that are responsible for the replication:

IO Thread  - a process that pulls data from Master's binary log over the network
                    and writes the data/changes to a local file called a relay log

SQL Thread - a process that continuously reads the relay log
                     and applies the changes to the slave database.

So, in asynchronous replication, the slave node requests a packet from the master,
the master sends the packet and doesn't check if the slave received the data packet or not.

Also as you can see there is only one thread that reads the binary log from the master.

The slave node keeps track of the position in the master's binlog of the last event applied on the slave. This allows the slave server to re-connect and resume from where it left off after replication has been temporarily stopped.

When you see a replication lag, the first thing you need to understand is which of the two
replication threads(IO Thread or SQL Thread) is behind

For this, execute the commands below :

On the master: show master status;
On the slave:   show slave status;

To check where IO Thread currently is, compare the values from the following fields:

From the output of the show master status;

      File -  for example: mysql-bin-changelog.549445

From the output of the show slave status;

      Master_Log_File – last file copied from the master
                                  for example: mysql-bin-changelog.549440


Compare the numeric value from the "File" field from master status and
the value of "master_log_file" from slave status.
The difference will be how many binlogs the salve is behind.(IO Thread)

To get the current status of the SQL Thread check the following values:

Read_Master_Log_Pos – holds until what position the binary log from master
                                      is copied over to the relay log on the slave.

Exec_Master_Log_Pos – holds position from master binary log is being
                                     executed by SQL thread.

You can calculate the lag of SQL_Thread like this:
Read_Master_Log_Pos - Exec_Master_Log_Pos

Also, the "Seconds_Behind_Master" field should give you some sort of indication.

Possible reasons for IO_Thread lag:

- Slow network between master/slave  - See if enabling 'slave_compressed_protocol' helps
- Binary logging on the slave - It is IO intensive, you can disable it unless you need
                                               it for a point in time recovery.

Possible reasons for SQL Thread lag:

- Tables without PK/Unique key when you use row based binlog format

No comments:

Post a Comment