Distributed system involves N number of computers connected to each other via any communication medium (LAN/MAN/WAN) and they communicate by passing messages on the communication medium.
In DCS, all the computers share their own data. In order to access updated data from each computer, time stamp of all computers must be same. So we have to make sure that all the computers' clocks (which are part of DCS) should be synchronized.
Let’s take an Example to make it more clear. Suppose an application code is scattered on different machines, and that application uses remote build tool to deploy new changes on a server.
Remote build tool is keeping track of time when it ran last time. So when the Remote Build Tool runs next time, it will take only those files for which the modification time is grater then last run time. Remote Build Tool is functioning in this manner in order to get time optimization.
Now, assume that there are two computers in DCS and both of them have changed few files which need to be deployed. Time of each computer clock is not synchronized. Let's say Computer A is running 1 second ahead of Computer B. For the first time, Remote Deploy Tool (which is available on Computer A) runs and deploys files on server at time t1. At this moment time at Computer B is t1-1. Now, when it runs second time after 10 seconds (t1+10), it will take all the files changed on Computer A and Computer B after t1. So it will miss those files of computer B which were changed during the time t1-1 and t1.
Ultimately deploy will fail even though all files are correct on both the machines. This problem occurred because the clocks were not synchronized for all machines which were in the network. For other application this problem could be more severe.
To avoid such problems, we have to make sure that all computers in network have a same time stamps. In other words, Clock time of each computer must be synchronized.
How to make sure that Clocks are synchronized?
One can use any Time Services which are provided by Network Time server for Windows/Linux/Solaris.
Case Study:
- In Alfresco clustering, I faced a similar issue. I was trying to add/update/delete content on Node1, it was not reflecting on Node2.
- When I dig into detail, I found that there was a 50 seconds difference between two nodes. Node1 was running 50 seconds ahead then node2.
- When I set the same time for both the nodes and restarted node2 (as it was not synchronized with node1), both nodes were synchronized.
- Now again I created content on node1, and it was available on node2. I performed different tests mentioned in Alfresco WIKI to make sure that Clustering is working fine.
Hetal Patel
Consultant, CIGNEX