Corrupt Numeric Table

manuel.avitia.v · September 10, 2021, 12:42pm

Hello,

We have been experiencing a random failure, where part of the process depends on a numeric value, stored in a table. for example: process[0] = 1

the variable is shared between groov view and pac control, the user can modify the value of the variable from groov view, and pac control uses it to run a script under an IF statement.

There is nowhere in the PAC control code, where there is an instruction to modify such variable, it always just compares it.

The failure, is that the variable changes randomly (changes from 1), and stops running the script subject to the condition.

I am looking for some wisdom in this forum to understand how this could be possible, and if the value could somehow be changing due to hardware/memory constraints.

would it be safer to use a conventional variable compared to a value within a table?

Regards

Beno · September 10, 2021, 1:30pm

Can you please flesh out some hardware/network details so we can start to mull this over.

What hardware is groov View running on?
What hardware is PAC Control running on?
If they are not the same device, what sort of network are they on? Are there other PAC Controllers also talking to that groov View project?
PAC Control Pro or Basic?
How does the problem come to your attention?
Have you been able to look in the groov View logs when it happens?

manuel.avitia.v · September 10, 2021, 3:05pm

Beno,

Thanks for the quick reply, all this is running under a GRV-Epic processor, Serial 803393, using Firmware grv-epic-pr1-ignition7-3.2.2-b.168.field.bin
They are on the same device, configured as local, this is a Pac Control Basic

Problem comes up randomly, sometimes takes up to 72hrs, sometimes a few hours in between, the device is installed on a remote location, I have a technician heading that way, since the user has little knowledge on the system.

Additional data: we figured out its not only the variable, but the whole strategy is restarting, therefore causing the variables to revert to the initialization stage, I can confirm is not a power loss, but only the strategy is restarting, and the system is up and running in a few seconds, not booting up again, we have several ModbusTCP comm handles that are unavailable on that particular operating mode, so we think that those non-available modules are triggering errors, that eventually cause the strategy to restart, the action item at this point is to remove all unnecessary charts and modules out of the strategy, look for persistent errors, monitor closely for some hours to ensure we can continue operating reliably. I would appreciate any additional feedback into what kind of errors would cause the strategy to restart.

Regards

manuel.avitia.v · September 10, 2021, 6:01pm

Beno,

We found that when running under that script, executes a Modbus TCP request, and uses a parameter table of size 5, where it should be a minimum 6

This was flooding the errors table, which maxes out at 999 errors.

60 -12 Error Invalid table index. TCP_M4 MB_Write_M4.22 1 M4_Parameters Table Numeric Integer 32 08:35:56 09/10/21

We already changed that table size, and running the code again, no errors accumulating this time, the question now is, at what point would the strategy restart itself under this condition? I am thinking that if it only accumulates errors while running that code, then based on the process, the time at which the memory floods will be variable, but I would like to get some feedback from you

Regards

varland · September 10, 2021, 6:13pm

@manuel.avitia.v: while I can’t answer questions about why your strategy may be restarting, there are a few things you may want to think about to mitigate the effect you’re seeing:

Change the variable/table to initialize only on strategy download, not on strategy run. Doing this will prevent the value from changing (presumably back to 0) when the controller restarts.
Consider making the table a persistent table. Again, doing so should maintain the values in the table across a restart.

Even if you can figure out the unexpected restarts now, you may want to do these things in case of future issues like power outages, etc.

manuel.avitia.v · September 10, 2021, 6:17pm

Varland,

Thank you for your input, I am taking action on those topics.

Regards

philip · September 10, 2021, 6:51pm

I’ve unintentionally written to out of range indexes many times and never had an issue with the strategy restarting because of that nor because of the event log getting filled up.

I recommend looking in the logs in groov manage to see if it points to an issue.

Jakes · September 10, 2021, 7:06pm

As a safety, which I have always needed. I have a chart that saves all important variables to persistents, and another that writes them back to the variables on powerup, should something weird happen, which it always does…

You can also time this to save or restore with a little bit of effort from the operator if need be. Get your strat to save variables to persistents at set times. And if something resets you can have an operator driven (or remote intervention) button to restore the values.

manuel.avitia.v · September 10, 2021, 7:49pm

Philip, thank you so much for the help, its my first time looking at this logs.

I Know the exact time of the last failures, which was today at 8:32:50, software does say Signal 11, can you point me to the documentation that relates this error messages?

regards

[2021-09-10 08:32:50] Error: Signal 11
/usr/bin/SoftPAC(_Z12FaultHandleri+0x18)[0x55de8]
/lib/libc.so.6(__default_sa_restorer+0x0)[0x76962ab0]
/usr/bin/SoftPAC(_ZN9O22Stream9GetStatusEv+0x54)[0xbedf0]
/usr/bin/SoftPAC(_ZN9O22Stream6IsOpenEv+0x20)[0xbee6c]
/usr/bin/SoftPAC(_Z14PRTStringSubCRv+0x74)[0xd1fc0]
/usr/bin/SoftPAC(_Z11f_PrtStringv+0x7c)[0xd206c]
/usr/bin/SoftPAC(_Z8ToDoCoreP7O22TaskP4CELLb+0x74)[0xe69ec]
/usr/bin/SoftPAC(_ZN13O22Subroutine4CallEv+0x270)[0xbc2a4]
/usr/bin/SoftPAC(_Z9f_CallSubv+0x18)[0x75378]
/usr/bin/SoftPAC(_Z8ToDoCoreP7O22TaskP4CELLb+0x74)[0xe69ec]
/usr/bin/SoftPAC(_Z7RunTaskP7O22Task+0x410)[0xe923c]
/usr/bin/SoftPAC(_Z9SetupTaskPv+0x7c)[0x5c1e8]
[2021-09-10 08:32:51] Error: Signal 11
/usr/bin/SoftPAC(_Z12FaultHandleri+0x18)[0x55de8]
/lib/libc.so.6(__default_sa_restorer+0x0)[0x76962ab0]
[2021-09-10 08:32:56] ================== START ===================
Set HOME to /home/pac22/
Opto 22 Control Engine
Copyright (c) 2001-2021 Opto 22
Version: R10.4e
Build Date: 06/07/2021
Build Time: 09:26:11
This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (https://www.openssl.org/)
[Creating User Threads]:
…
[2021-09-10 08:32:56] [Running]: Press to quit
[2021-09-10 08:32:56] [H0]+1:Total=1
[2021-09-10 08:32:58] [H0]+1:Total=2

manuel.avitia.v · September 10, 2021, 8:04pm

Further search, I found this other part of the log inside PM2

2021-09-09 19:45:10: App name:soft-pac id:3 online
2021-09-10 08:32:51: App [soft-pac] with id [3] and pid [8676], exited with code [1] via signal [SIGINT]
2021-09-10 08:32:56: Starting execution sequence in -fork mode- for app name:soft-pac id:3
2021-09-10 08:32:56: App name:soft-pac id:3 online

philip · September 10, 2021, 8:09pm

Signal 11 is a segmentation fault which means that SoftPAC tried to access system memory that didn’t belong to it. This is likely a bug somewhere in SoftPAC since there is nothing that we should be able to do in PAC Control to cause that to happen.

Best step is to contact Opto PSG and send your logs (and probably strategy) over to them to see if they can track it down.

manuel.avitia.v · September 10, 2021, 8:22pm

Thanks for the help, already on it

regards