Groov EPIC PR-1 Factory Reset Required

I had a situation last night that I’m not quite sure how to resolve…

I had a Groov EPIC PR-1 controller restart unexpectedly over the weekend after a UPS died in my shop. The power issue knocked out both the EPIC controller and the network switch the controller was connected to. I got everything back up and running and everything seemed fine.

Last night around 10pm, I got a call from my operator stating that Groov View didn’t seem to be working. They were essentially pressing a start button and nothing was happening.

I tried to get into debug mode in OptoControl to see what was happening and then when my issues started. Every time I tried to get into debug, I got an error message. Socket not connected. Last command sent: Rev. There may have been more.

I could log in to Groov Manage no problem. I tried restarting the controller. I tried turning the power completely off and back on. I tried disabling and re-enabling the PAC Control runtime in the controller. I tried backing up and restoring the controller. Nothing worked – I couldn’t get in to debug mode.

I finally did a factory reset around midnight, and I didn’t have any issues downloading my strategy to the controller and running it after that.

Has anybody else ever experienced anything like this?

Wow. Yeah, that’s a pretty unusual one right there… I just wonder if you pulled the full set of logs before you did the factory reset?

Would the logs be included in the backup file?

No, its not a check option in the list and we don’t hide them.

When you did the dis/re-enable of the PAC Control engine in groov Manage, were their running charts?

Also do you know if your Node-RED stuff was talking to the PAC Control tags ok? ie was it just groov View that was not talking to the tags?

Were their any groov View tags connected direct to I/O, ie, not PAC Control? Im wondering if you feel it was a PAC Control issue (debug aside) or a groov View issue (thinking of the button that could not be pressed).

Did you bypass the UPS or fix it?
I don’t trust any controller not on a UPS. Power quality issue will drive you nuts with phantom problems.
Also, anytime a controller gets hit like that, I recommend a redownload of the strategy or a reset like you did. The power surge whether its through the ethernet port or the power, can cause all sorts of strange problems, with bits errors and so on.

@Beno:

There may have been running charts. When I went into that section, it showed that the runtime was enabled, but it wouldn’t load any strategy details in the lower section.

From what I could tell, Node-RED wasn’t talking to PAC Control tags either.

I only have one tag in Groov View, and it’s toggling a boolean variable. No tags connected directly to IO.

@Barrett:

I replaced the UPS. I agree – controllers always on a UPS. Having to re-download wouldn’t be a problem, but having to do a factory reset seems like a bit much.

Well you have to consider that a power event can set bit errors in the OS, which is likely what happened in your case. Downloading the strategy will not fix that.
It would not matter what controller you use, this would be a problem for any of them.
The key is making sure that you cannot get a power event at the controller, period. Many people assume that the ethernet is not a problem, and they would be absolutely wrong. Make sure the switch is on the same UPS and make sure that all devices talking to the controller are on the same UPS or another UPS of same quality. Make sure that all the devices talking to a remote switch are on the same UPS the switch is on. If that is not possible, then use fiber.
I have also seen cheap UPSs not prevent the surge, usually on nasty power events. I have had at least one event so bad it knocked out a new APC SmartUPS 1500, and in that event it did not touch either the switch or the controller. The switch was on a fiber network and the local connection was copper to the controller.
I recently installed a fiber network with APC UPSs extended run, and for all the remote connections coming into each switch that were copper, I installed ethernet surge arrestors on each one…this network stayed up and running while Cargill’s in house network went down.

Just had this come up again this week. Was in debug testing a new chart in PAC Control and I got kicked off and couldn’t do anything with PAC Control again until I reset the controller (did a firmware update). I did dump all the log files this time if that’s helpful.

I have seen something like this in the past… Are there any (and I mean ANY) charts that have a communication handle?
What can happen is that the comm handle can get out of sorts and open and close a bunch of times (ie, you the programmer don’t hand them gracefully) and you use up all the TCP sockets on the device, and thus cant talk to groov View, PAC Control, PAC Display etc
groov Manage will work Ok as that is not a port 2001 or 22001 (ie, not strategy related).

Almost the same happened to me like 2 years ago, I was cleaning up some old code during a plant stop and tried to debud the program and got all bells and whistles in pac control abort, retry, etc. PAC control closed it self, I re opened PAC control and it will not find the controller, I powered off the EPIC and restart and all went back to normal, I was behind one fw version and did update immediately after, no more issues so far. Important to notice that when this happened I was totally under UPS power, a good APC pure sine output with lots of protections yet 100% have no incoming power from electric company, mains were off.

@Beno: there were definitely communication handles involved. Is there some way to reset the TCP stack or something so that I wouldn’t have to completely reset the controller? As you suspected, Groov Manage was still accessible.

What firmware are you on?
Make sure it is the latest 3.4.4 release as your symptoms are similiar to issues that 3.4.4 resolves.