Incorrect Ping-Node configuration and functionality can cause problems with HA clusters. This post is critical for a proper setup with the HA system.
Why do we need a ping-node or ping-nodes?
ZX NAS series uses a heartbeat to check the Primary and Secondary hosts to each other. We require at least 2 NICs configured for the heartbeat. Additionally, we strongly recommend using a direct crossover or what is called a point-to-point connection for the heartbeat. With a direct connection, both hosts can communicate even during a switch failure and you save on 2 switch ports.
canThis creates an issue. What would happen if both the Primary and Secondary ZX hosts are functioning well and are able to communicate to each other (i.e. via the mentioned direct connection) but the storage client has lost network connection to the Primary ZX host?
For example the switch port or NIC in that path has a problem for the client.
The heartbeat will NOT decide about the failover procedures because both hosts “think” are OK, but still the storage client cannot access the storage. This is where the Ping Node comes into play and prevents such situations. The cluster manager realizes that the Primary host has lost access to the Ping-Node(s) but the Secondary host has access. So the cluster manager executes failover. Because lost access to a single Ping Node will cause a failover, it is strongly recommended to use at least 2 Ping-Nodes for every network segment that needs a Ping-Node. This will minimize failover events in case of an unreliable Ping-Node.
Which network segment will need the Ping-Node(s) for monitoring? Not every NIC, but only those network paths connected to storage clients need to be monitored with Ping-Node(s). Ping nodes IP addresses must be reachable from Ring interfaces. So the ping node must use the same network subnets as ring interfaces.