Ratis Membership Change
Overview
Adjusting the configuration, such as replacing failed servers or changing replication levels, is sometimes necessary in practice. Ratis facilitates these functionalities through joint consensus, allowing for simultaneous addition or replacement of multiple servers in a cluster.
For details on how joint consensus algorithm works, please refer to section 4.3 in Raft Paper
During membership changes, the availability guarantee of Raft is more vulnerable than usual. Agreement (for elections and entry commitment) requires separate majorities from both the old and new configurations. For example, when changing from a cluster of 3 servers to a different cluster of 9 servers, agreement requires both 2 of the 3 servers in the old configuration and 5 of the 9 servers in the new configuration. Be careful to keep both separate majorities online!
Adding a new server
To add a new node (e.g., N3
) to an existing group (e.g., N0
, N1
, N2
), follow these steps:
- Start the new peer
N3
with EMPTY group.
RaftServer N3 = RaftServer.newBuilder()
.setGroup(RaftGroup.emptygroup())
.setProperties(properties)
.setServerId(n3id)
.setStateMachine(userStateMachine)
.build();
N3.start()
- Invoke a
setConfiguration
method in the AdminApi with the new group as the parameter. It will wait for the new peer to catch up before returning the reply.
reply = client.admin().setConfiguration(List.of(N0, N1, N2, N3))
⚠️ Note
Avoid starting
N3
with the group (N0
,N1
,N2
,N3
) as it may lead to shutdown scenarios, particularly ifN3
initiates a leader election before being recognized by original peers.
Removing an existing peer
To remove an existing node (e.g., N3
) from a group (e.g., N0
, N1
, N2
, N3
), follow these steps:
- Issue a
setConfiguration
request to inform about the configuration change.
reply = client.admin().setConfiguration(List.of(N0, N1, N2))
- Check if
reply.isSuccess()
Note that N3
will automatically shut down. The data in N3
can be safely deleted.
Arbitrary configuration change
To perform multiple member changes at one time, like replacing two old nodes with new ones in a five-node cluster (changing from an existing group N4 to a new group N6), follow these steps:
- Start new peers with an EMPTY group.
- Issue a
setConfiguration(List.of(N0, N1, N2, N5, N6))
request to the original group. The removed peersN3
andN4
will automatically shut down. For full code examples, see examples/membership/server.
ADD
Mode and COMPARE_AND_SET
Mode
There are different setConfiguration
modes:SET
(default), ADD
and COMPARE_AND_SET
.
SET
: The leader will consider the given peers in request to be the new group.ADD
: The leader will only add the new peers in request to the existing group and will not remove any existing peers.COMPARE_AND_SET
: The leader will first compare between the current configuration and the provided configuration in request, and it will accept the new configuration only if the current configuration equals to the provided configuration in request. For theADD
mode and theCOMPARE_AND_SET
mode, build aSetConfigurationRequest.Arguments
object and then pass it to setConfiguration(SetConfigurationRequest.Arguments).
Majority-add
Majority-add is defined as below:
-
Adding a majority of members in a single
setConfiguration
request. In other words, addingn
members to a group ofm
members in asetConfiguration
request, wheren >= m
. Note that, when asetConfiguration
request removes and adds members at the same time, the majority is counted after the removal. For examples,setConfiguration
to a 3-member group by adding 2 new members is NOT a majority-add. However,setConfiguration
to a 3-member group by removing 2 of members and adding 2 new members at the same time is a majority-add. Majority-add is unsafe since it can end up in a no-leader state which cannot be recovered automatically. The following property is to enable/disable majority-add. -
raft.server.leaderelection.member.majority-add
(boolean
, default=false
)
In a monitored environment, majority-add can be enabled in order to reduce the number of setConfiguration
calls. As an example, change from non-HA single server to a HA 3-server group can be done in a single setConfiguration
call, instead of 2 setConfiguration
calls with majority-add disabled. In case the no-leader state problem happens, the monitor (a human or a program) can fix it by restarting the servers.