RIO Ignition mmpServer Modbus won't connect after a while

I’m having some reliability issues with reading and writing the memory map with Modbus on the RIO. It works fine for a while, but eventually stops communicating and stops taking new connections. I’m actively working on a project with this RIO using Ignition / node-red / OptoMMP streaming / node-red MQTT / modbus so I may have some self-inflicted issues, but I seem to be hitting some kind of limit in the mmpServer.

I’m using Ignition to talk to the RIO using Ignition’s Modbus driver. This is a connection that occasionally will have packet loss (think vpn over cellular).

I have noticed that the Ignition Modbus driver is very aggressive about reconnecting after a communications failure and that has caused some devices to hang / refuse to accept new connections and I am wondering if the RIO has the same issue with that. Restarting the RIO (or disabling the device in Ignition for a long while and re-enabling) is the only thing I’ve been able to get it going again.

Here is a netstat output when the failure happens:

netstat | grep 8502

tcp        0      0      ESTABLISHED
tcp        0      1 localhost.localdo:60296 localhost.localdom:8502 SYN_SENT
tcp        0      0      ESTABLISHED
tcp        0      0      ESTABLISHED
tcp        0      0      ESTABLISHED
tcp        0      0      ESTABLISHED
tcp        0      0      ESTABLISHED
tcp        0      0      ESTABLISHED
tcp        0      0      ESTABLISHED

You can see that there are several connections - these are broken/orphaned connections from an Ignition gateway. The localhost connection is me attempting to communicate from a Modbus node in node-red. It fails to connect as well.

I get this in the Operating System log on the RIO occasionally, which may be related:

2021-04-17T20:10:38.653019+00:00 opto-04-b4-3b kernel: [9138597.259972] TCP: request_sock_TCP: Possible SYN flooding on port 8502. Dropping request. Check SNMP counters.

The OptoMMPServer log occasionally has MB_Wrk messages, not sure what they are.

[2021-04-18 02:31:36] GRV-R7-MM1001-10
[2021-04-18 02:31:36] Copyright (c) 2017-2020 Opto 22
[2021-04-18 02:31:36] Version: R3.0a
[2021-04-18 02:31:36] Build Date: 11/25/2020
[2021-04-18 02:31:36] Build Time: 21:37:08
[2021-04-18 02:31:36] Default scheduling policy is normal (non real-time).
[2021-04-18 02:31:36] [Creating Threads ...]
[2021-04-18 02:31:36] Using BootId=47
[2021-04-18 02:31:37] Reopen USB connection.
[2021-04-18 02:31:37] Failed to open communication with /dev/usb_rio.
[2021-04-18 02:31:38] Reopen USB connection.
[2021-04-18 02:31:38] Delete module at slot 0 of type 0.
[2021-04-18 02:31:38] New module at slot 0 of type f0000022 with 10 channels.
[2021-04-18 02:31:38] Modbus/TCP Server threads down-grade to normal scheduling policy.
[2021-04-18 02:31:38] MMP Server threads down-grade to normal scheduling policy.
[2021-04-18 02:31:38] O22Timer threads down-grade scheduling priority to 1.
[2021-04-18 02:31:38] [Running ...]
[2021-04-18 02:32:28] Power Up Clear Received.
[2021-04-18 02:32:28] Power Up Clear Received.
[2021-04-18 02:34:53] Manual Store to Flash.
[2021-04-18 02:36:38] MB_Wrk00 Status: 0
[2021-04-18 02:36:38] MB_Wrk00 Rx error: 0
[2021-04-18 02:36:38] MB_Wrk00 Header packet len: 1536
[2021-04-18 02:37:22] Manual Store to Flash.
[2021-04-18 02:37:57] MB_Wrk01 Status: 0
[2021-04-18 02:37:57] MB_Wrk01 Rx error: 0
[2021-04-18 02:37:57] MB_Wrk01 Header packet len: 2816
[2021-04-18 02:37:57] MB_Wrk00 Status: 0
[2021-04-18 02:37:57] MB_Wrk00 Rx error: 0
[2021-04-18 02:37:57] MB_Wrk00 Header packet len: 1536
[2021-04-18 02:37:58] Manual Store to Flash.
[2021-04-18 02:38:09] MB_Wrk01 Status: 0
[2021-04-18 02:38:09] MB_Wrk01 Rx error: 0
[2021-04-18 02:38:09] MB_Wrk01 Header packet len: 2816

Just FYI.
Been working on this one for the last three days… Thought I would have something to say by now, but its a verrrrrrrrry long email thread with the engineers…

Understood, and thanks for taking this on. This can’t be an easy one to figure out.

I’ve created a work around by proxying the connection through Node-RED - so Node-RED is listening on the port that Ignition connects to, and then opens a new connection to the mmpServer modbus port. This insulates the mmpServer from the connection burden that Ignition puts on it after a connection failure. So far Node-RED has been handling it just fine. My original thinking was I could monitor the traffic for troubleshooting with debug nodes and that it is far faster to restart a Node-RED flow if there is an issue rather than rebooting the entire RIO, but it also seems to be more reliable this way - who knew.

Ultimately, I would prefer to go direct to the mmpServer, as what I am doing is a bit “how ya doin”.

I see a number of Modbus and TCP fixes in the 3.2.1 firmware, are any of them intended to address this issue?

All the fixes have KB’s so you can probably look quicker than I can since you know what you are looking for?

Edit: Sorry, that was a rare lazy answer. Let me dig into it and get back to you.

Yeah, I read them, but they are sparse on the details. I’m not going to run out and update the firmware right away on anything so take your time. :wink:

1 Like

Short answer: Yes, there is a very good chance.

Long answer: Hard to say exactly which one might address this exact issue since they all might be part of the issue and fix.

If I can over simplify your original post:

KB89636 Addresses the TCP packet length of the ModbusTCP connections. Communication issues may have caused the ModbusTCP server on the EPIC/RIO to think it has enough data when it may not.

KB89670 Addresses the way stale ModbusTCP connections were closed. The ModbusTCP server allows 8 connections. That’s either 1 device on a crappy connection, or 8 devices on a solid connection.
The point is, we were a little generous on leaving connections hanging before closing them and thus making them available again.

KB89676 Addresses the fact that the ModbusTCP server may have returned an exception code that did not make sense (match up) with the request it got. In other words, Modbus/TCP function codes now check for proper header data length field for each function code. If header data length field is not valid or a data field which also gives a length (like function code 16) does not correlate to the header data length field then exception code 3 is returned. This implies the request data field structure is incorrect.

RIO firmware 3.2.1 with these fixes was released on the 17th of June (about a week after the EPIC update came out).

1 Like

Thanks Ben, those are much better descriptions then what ends up in the knowledge base articles, which typically amount to “something about something has been fixed”.