Lets learn "How to replace a failed etcd member"

 


It is possible that one of the nodes in the etcd cluster went into a bad state either because of network issues or due to disk corruption. In such scenarios, one requires replacing an etcd member in etcd cluster. This tutorial explains the steps to replace etcd member. This tutorial assumes that the reader has sufficient knowledge about what exactly etcd is.

The terminology used in this tutorial:

  • Peer node: One of the active nodes in etcd cluster
  • Corrupted node: Node which is going to be replaced
  • Fresh node: Node with which the corrupted node is going to be replaced with.

So below is the list of commands we need to run to perform the etcd replacement.
  1. Stop etcd service on Fresh node.
  2. Clear the etcd data directory on Fresh node just to ensure that we don't have any stale data there. The command is "rm -rf <etcd-data>" directory.
  3. The next step is we need to figure out the member id of the corrupted node. Then command to figure that out is "etcdctl member list | awk '/<Corrupted node IP>/'". This command need to be run on Peer node and this will return the output in the form of "<memberID>: name=<> peerURLs=<> clientURLs=<> isLeader=<>"
  4. Once we figure out the member id of the Corrupted node, we need to remove that member by running the following command on Peer node "etcdctl member remove <Corrupted node memberID>".
  5. The next step is to add the fresh member. To do that, we need to run "etcdctl member add <Fresh node hostname> http://<Fresh node IP>:<etcd HTTP port>" on Peer node.
  6. Now to want etcd on Fresh node to join the existing cluster, we need to temporarily modify etcd startup configuration on Fresh node using "sed -i -E 's/initial-cluster-state=new/initial-cluster-state=existing/ <etcd-config-file>".
  7. Start the etcd service on Fresh node.
  8. Verify if a member gets added using the "etcdctl member list" command on Fresh node.
  9. Revert the configuration change made in Step 6.
Hope this helps, Let me know if you have any questions in the comments below. Thanks for your time :)

Comments

Popular posts from this blog

Lets learn "About kube proxy in iptables mode"

Lets learn "System design for paste bin (or any text sharing website)"

Lets learn "Factory design pattern"