1
Confirm replication is caught up
Connect to mysql-prod-primary (currently a replica) and check lag is zero.
SQL — run on mysql-prod-primary
SHOW REPLICA STATUS\G
-- Replica_IO_Running: Yes
-- Replica_SQL_Running: Yes
-- Seconds_Behind_Source: 0If mysql-prod-primary has not yet rejoined as a replica, do that first — see the Emergency Failover runbook, step 4.
2
Freeze writes on the current primary
This starts the write outage. Move quickly through steps 3–5.
SQL — run on mysql-prod-replica
SET GLOBAL super_read_only = 1;
SET GLOBAL read_only = 1;3
Confirm lag has fully drained
Should be instant since writes are frozen. Do not promote until this is 0.
SQL — run on mysql-prod-primary
SHOW REPLICA STATUS\G
-- Seconds_Behind_Source: 04
Promote the original primary
SQL — run on mysql-prod-primary
STOP REPLICA;
RESET REPLICA ALL;
SET GLOBAL read_only = 0;
SET GLOBAL super_read_only = 0;5
Redirect traffic back
Get proxy IP
kubectl get pods -n tailscale \
-l tailscale.com/parent-resource=mysql-primary \
-o jsonpath='{.items[0].status.podIP}{"\n"}'Patch EndpointSlice
kubectl patch endpointslice mysql --type=json \
-p '[{"op":"replace","path":"/endpoints/0/addresses/0","value":"<PRIMARY_PROXY_IP>"}]'Write outage ends here.
6
Rejoin the replica
SQL — run on mysql-prod-replica
STOP REPLICA;
RESET REPLICA ALL;
SET GLOBAL read_only = 1;
SET GLOBAL super_read_only = 1;
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = 'mysql-prod-primary',
SOURCE_USER = 'repl',
SOURCE_PASSWORD = '<REPL_PASSWORD>',
SOURCE_AUTO_POSITION = 1,
GET_SOURCE_PUBLIC_KEY = 1;
START REPLICA;7
Verify
App traffic
kubectl run verify --rm -it --image=mysql:8.0 --restart=Never -- \
mysql -h mysql.default.svc.cluster.local -u <user> -p<pass> \
-e "SELECT @@server_id, @@read_only"
-- server_id should match mysql-prod-primary, read_only should be 0Replica health
SHOW REPLICA STATUS\G -- run on mysql-prod-replica
-- Replica_IO_Running: Yes
-- Replica_SQL_Running: Yes
-- Seconds_Behind_Source: 0