Thursday, September 12, 2013

[-18000] Have not heard from this server (OSI PI)

You're reading this because:
  • You are an automation engineer with (OSI) PI administration duties.
  • You support a PI High-Availability (PI HA) system
  • Your PI collective keeps going out-of-sync
This is what you see when you launch the PI Collective Manager:

This is a sporadic issue, with many possible causes; but, one cause is this tuning parameter:

Replication_SyncTimeoutPeriod

If the time it takes to read configuration changes exceeds the Replication_SyncTimeoutPeriod, the operation is aborted and the Secondary may get out of sync with the Primary. In this case, no amount of retrying by the Secondary will be successful because the configuration changes are dequeued after the initial request. Replication to that Secondary will halt until it gets re-initialized.
The default value for Replication_SyncTimeoutPeriod is 300 seconds (a.k.a. 5-minutes). So if, for some reason, your secondary server disconnects from the primary for 5-minutes, you need to re-initialize.

Re-initializing the secondary server is essentially a "copy-paste" of the primary PI server onto the secondary and can take several hours since it starts with a full backup of the primary PI server first. If you can avoid having to babysit a secondary re-initialization, you ought to change this tuning parameter: Replication_SyncTimeoutPeriod.  It can take up to 20-minutes for a server to reboot, so choose wisely.

p.s. - Thanks, Joy Wang for the support.

No comments: