ZooKeeper in Simple Terms
I have heard about ZooKeeper multiple times, but didn’t really understand it. Until recently, I tried to discover what it is, which led me to a lot of questions. After delving into it, I kind of get an idea and would like to share my view of why we need it, what is it, and how it works.
Why we need ZooKeeper
In the distributed world, it is challenging to share information between machines. Let’s imagine there are 2 machines in a cluster. The straightforward approach would be one machine sending a change to the other so that both have data in sync. That might be problematic, though.
- When network failure occurs, the machine does not get the change request. Now data between two machines differ. It is then hard to tell which one hold the latest version of data.
- Without any failure, it is still hard to have both machines agree on the data. For example, machine A tries to update x to 100 and machine B tries to update x to 200, and that happens almost at the same time. What should be the value after the update succeeds?
What is ZooKeeper
That comes ZooKeeper, a distributed coordination service, to help alleviate the aforementioned problem.
Zookeeper follows a simple client-server model. Servers are those which provide ZooKeeper service — storing data, for example. Clients, meanwhile, make use of the service — reading and writing data.
There can be multiple servers and clients in the system. Each client connects to one server, while one server can accept connections from multiple clients.
ZooKeeper Guarantees
Before we, as a client or a ZooKeeper user, decide to use the service, we need to know what advantages it offers to us. As stated in https://zookeeper.apache.org/doc/current/zookeeperOver.html, these are guarantees that we can have.
- Sequential Consistency
- Atomicity
- Reliability
- Single System Image
- Timeliness
In the following sections, we will see how ZooKeeper works, and at the same time, see what each guarantee means in more details.
How Writes Processed
Before we continue to see how write is processed, it is necessary to know that there are 2 types of server.
- Leader —responsible for accepting incoming state changes from clients and replicate them to all servers. There must be one leader in the system.
- Follower — the remaining are followers.
In the scenario that clients want to update data, these steps are taken.
- Client issues a request to the server it connects to.
- The server forwards the request to the leader. The leader assigns the update a globally unique identifier, called zxid. The zxid is increased monotonically, which causes the update that comes later to have a higher zxid than all the previous updates.
- The leader ask all the followers if they are ready to persist the change.
- Followers notify the leader they are ready for the change.
- When majority of the followers (more than half) are ready, the leader commands them to commit the change. In the scenario that there are not enough available followers, the change is aborted.
Implications
- All updates are processed sequentially, giving sequential consistency — updates from clients will be applied in the order they are sent. This can be achieved since the leader is a single point of update requests, and by assigning an identifier to the request, it provides followers information about the order of requests as well. Followers are then able to commit transactions in the correct order.
- All updates are atomic. When update fails, no client will ever see the update. By using a variation of two-phase-commit protocol, presented in step 3–5, it brings about atomicity.
- Zookeeper is reliable — updates can survive server failure. It can tolerate machines failing when there are at least three in the system, since it does not require all but only the majority of them to acknowledge the change. That’s why normally we have an odd number of servers in the system. For example, when there are six machines, two machines can go down without affecting the system, because we are left with four machines which is still the majority of the system. However, it gives no benefit over the case of having five machines, which can tolerate up to the same number of machines failing.
- Write performance deteriorate when the number of servers increases because each write needs to be coordinated among more servers.
How Reads Processed
Read is simple. Clients sends a read request to the server it connects to. The server reads the state of the local database and generates a response.
Implications
- Reads are fast and scalable, because it involves only one server and the data is stored locally.
- It is plausible for clients to see a stale state. The server might be the minority that is slow to commit the change, and hence have an outdated state. To prevent this from happening, there’s a mechanism to force update so that the server can catch up with the leader.
How Clients Detects Server Failures
On startup, a client initiates a connection to a server, and the server creates a session for the client. During the entire session, the client periodically sends ping requests, also known as heartbeats, to detect server failure. When it does not receive any response from server, it assumes the server is down. It therefore switches to other servers.
When clients try to connect to other servers, ZooKeeper ensures that the server to accept connections must not have the state older than the one the client has already seen, guaranteeing single system image. It means that the client will never see an older view of the system than it has previously seen.
How ZooKeeper Prevents Clients From Lagged View
Timeliness is one of the guarantees that ZooKeeper provides. It does not allow clients to see very outdated data by having the lagged server shut down automatically, forcing clients to reconnect to a more up-to-date server.
Conclusion
As we understand how ZooKeeper works and see a set of guarantees — i.e. sequential consistency, atomicity, reliability, single system image and timeliness, it is convincing how it helps in developing distributed applications and why it can be combined to cover many use cases.