Redundancy Option Kit for Groov EPIC?

varland · February 2, 2022, 12:07pm

Anything planned for Groov EPIC that would add the functionality that the SNAP-PAC-ROK offers for SNAP PAC S-series controllers? I know the Groov controllers can do a lot more than the S-series controllers could, but I would love to have something like this for cases where an EPIC is being used as “just” an Opto controller.

Beno · February 2, 2022, 3:14pm

Not that I am aware of.

Barrett · February 14, 2022, 8:42pm

You are better off creating your own redundancy. The Pac version does work, but it makes working with it complicated and is a little buggy.
Also, by building your own, you can chose what determines an switchover issue. I think they have figured this out, that it’s not worth it for Opto to make it happen when it can be pretty simple and less complicated to roll your own.

DB_Digital · February 15, 2022, 12:44am

Hey Barrett

Have you done this, I am interested in what you have done, I am running a snap system but would like to determine the switch over.
Let me know please
dave

varland · February 15, 2022, 2:30pm

@Barrett: maybe I’ll look into this, but there are quite a few things I don’t really know how I would handle. Maybe the PAC redundancy kit doesn’t handle these things, but off the top of my head:

What’s the best way to sync variable values between multiple controllers? Each controller would need the ability to act as the primary or secondary, and either push values or pull values.
How would I tell a PAC Display project to switch over to a backup controller (that has a different network address)?

Barrett · February 15, 2022, 4:04pm

I haven’t actually implemented a redundancy of my own, but I have implemented the Opto22 version.
The system I replaced had a crude version of redundancy using 2 - LCM4 controllers, both were ethernet. The implementation was based on heartbeat relay output module (wrong - it would fail every five years for obvious reasons…) from the primary unit and received by the secondary. The secondary would take over if the heartbeat would time out and stay in one state. This is a very simple but effective implementation.
This implementation can be used in a variety of ways, and it all comes back to what you want to use to decide you have a problem and then make the secondary take over.
There is a myriad of ways to do this, for instance, the heartbeat can be a scratchpad communication. The chart in which you implement a heartbeat is important as to whether it is most central to the operation. Obviously a comm fail of the ethernet port at the controller port or the switch port would cause a switchover based on a heartbeat through scratchpad. You can also use the Calculate Strategy CRC command to get the integrity of the strategy and do a switch if that number doesn’t match, see Calculate Strategy CRC command help file. You need to take the primary offline if it switches over and do this by having a relay supplying power to the primary controller.
To summarize, you can make the switchover based on anything you can dream up and you can also go nuts and use the redundant ports on the brains to parallel the network as well.

Barrett · February 15, 2022, 4:29pm

In the ROK kit, they use serial ports and a redundancy manager with power relays to attempt to do all of what you are asking. Unless the spec calls for this, don’t go there. In my experience, the controllers will run for 20+ years without an issue, period. Adding complexity will actually reduce the reliability. Your biggest fail point will be the network hardware. Making sure electrical surges are prevented on both the controller power, the switch power and the ethernet ports will do way more for the reliability of the system than anything else. Of course you are dealing with engineers who probably do not have that experience, so they will insist on mucking up the works.
In order to sync vars in both directions, you have to keep in mind that if there is a problem with the primary, then the secondary will have to operate a power relay and reboot it, so all that logic has to be in place to make the primary reboot, have a timer that waits for the primary to reboot, then you can’t have the primary booting up on line…so it starts getting very complicated because all this stuff doesn’t necessarily go like you would like it to.
Pac Display Pro fortunately has a very nice feature that automatically detects the failure of the primary and picks up the secondary (with separate IP), unfortunately, no other Opto22 software does this, so if you are going to do this, then I recommend using Pac Display, otherwise, you might be able to pull this off in Ignition. When I did the ROK, I had no idea what using Groov View was like and the limitations, and therefore it is a sad implementation where I have 2 - Groov servers and a graphic that indicates you need to open the other server…
Keep in mind, that the Opto ROK kit does work, maybe a little bit too good, because pretty much anything can trigger a switchover, which doesn’t necessarily solve anything. So for instance, if any one of the brains on the network are powered down, even for a power blip seperate from the controller, the system switches over and reboots the primary. In other words, anything that interrupts comm, will initiate a switchover. I have had several cases where I could not figure out what caused the problem, because the system reboots the primary and then you have no debug info because that gets wiped.

Barrett · February 15, 2022, 8:11pm

Oh Btw, I forgot about a big consideration that the ROK does handle. You have to consider what effect the current data values will present when you consider rolling your own. The ROK automatically keeps the secondary controller up to date on all of these variables, via a direct ethernet connection via the 2nd enet port to the 2nd enet port on each controller. Of course you can do the same by setting up a 2nd port to 2nd port connection on 2 Epics as well. You would have to create a scratch table of vars you need to keep current and then transmit it constantly through that port connection. The ROK does this by adding an instruction block for you to use in the appropriate places and it then updates the secondary at those locations in the strategy. I think basically it is one block per chart to synchronize the data.
The issue here is how critical are the current data, some may be super critical, others may not be critical at all. One way of simplifying this is to read all the data from the racks, before starting the strategy logic control and have an initialization block to handle any more crucial data.

DB_Digital · May 2, 2022, 8:12pm

Thanks Barrett
Right now i have the ROK and 2x s2 controllers, i have the second one that keeps erroring out on IO impaired, notice when it happens Ethernet 2 stops blinking, i replaced the controller to see but it does the same thing, also, How many sync blocks do you use? i had one per chart and then i removed them and just had one per strategy running every 1000ms to sync,
One bad thing i did was try to read both of the status’s from the memory map with
//statusTrashCan = ReadNumFromIoUnitMemMap(PacS1_2 ,0xF8001000 , status2);
//statusTrashCan = ReadNumFromIoUnitMemMap(arb ,0xF8001000 , status3);
but this caused way more errors.

Where do you have sync blocks? how many do you have ?
do you get the error I/o Impaired?
I actually need to restart the controller 2
Thanks for any input.

Barrett · June 10, 2022, 8:49pm

Sorry, did not see this until now.
Remember, the sync block simply updates the secondary controller on all the variables you chose to update in that chart. Not every var needs to be updates, only the ones which will cause a problem if the ROK switches over to the secondary if the current value is not there.
I think generally, one sync per chart, but that depends on your strategy, and you do not have to have any, but the effect of a switchover with non-current values will be the issue.
Where? My assumption was at the return of the loop in a chart, seems like the best place, it probably needs to be in the primary flow, I,E; not in a branch that may or may not execute, but a place in the chart where it will be in every loop iteration.
I/o Impaired? Not sure been too long, but sounds like you are having problems with the ROK sensing that 1 or more IO points are not being read. If anything is a problem on the network that is interfering with comms of the Opto system, it will try to switch over.

DB_Digital · July 7, 2022, 2:32am

I found the issue , loose wire

Barrett · July 7, 2022, 2:27pm

Yah, when it comes to hardware, pretty rare when Opto controller or even IO is actually bad…