Controller stopped connecting to the FTP server

aus.eng · May 6, 2022, 1:07pm

I am currently using 3 different communication handles to connect to the same FTP server at different points in my strategy. However, suddenly I started getting error -9 on one communication handle “Timeout, No response from device. Check hardware connection, address, power, and jumpers.” and on the other started receiving this when I am on Debug on PAC Control and try to open a connection:

My questions are:

If all communication handles are closed, is it possible that one of them is keeping the connection open without reporting status?
Is it good practice to have different communication handles connect to the same FTP Server at different points (No two try to connect at the same time)?
Any way to interpret the message “Last command sent: *Logging_handle COMVAR@P” ?
The CPU usage jumps to 87% from 58% when it is trying to connect to the FTP server, would that something could hinder proper operation of the controller ?

Beno · May 6, 2022, 1:26pm

Yes.
Things are slightly more complicated than just the hard number of comm handles… TCP has a TTL. Time To Live aspect to the packets/socket. So there is some ebb and flow / give and take in the time between closing the handle (connection to the FTP server) and the time the controller and server are really done talking with each other. The depth of the TCP stack also comes into play. So stack depth and TTL make it hard to know for sure when things really are done and dusted.
No.
You really should only have one comm handle and one chart using that comm handle and the other two charts should pass what they need to it rather than doing their own thing for many reasons, also see 1.
Tip. That one chart should not be opening and closing the connection to the FTP server for many reasons, also see 1.
Sort of.
Its a Forth word thing. I can try and dig into it, but it would not be worth the effort. The point is the timeout, not the command that was not sent.
Mostly no.
I assume you are on EPIC. The real time extensions in the Linux kernel try and keep things like this CPU spike under control. All bets go out the window if the CPU hits and stays at 100% (Something has to give at times like that) The other fun thing is there is a lag between what the CPU is really doing and what you see it doing.
The real question is why the CPU spike in simply opening a TCP connection to a remote server. I bet its not the comm handle doing that, but the way you are dumping the log data? Or back to 1 and 2, there really is more going on opening many comm handles to the one FTP server.

aus.eng · May 6, 2022, 3:17pm

Thank you, would you mind elaborating more on that tip? Am i correct to understand then that the connection to be opened, stay open, and then close when done? Does this apply the same when communicating with the internal storage (sending and receiving tables)?

Beno · May 6, 2022, 3:34pm

As per the manual.

So yeah, you really should have only the one chart open it, keep it open, handle the other charts data requests to the FTP server, keep the connection alive (ie, ‘ping’), test the connection now and then to ensure its open and all good. It should only be closed if you know you are going to write anything for another x-minutes. x depends on a lot of things, but I’d say over 15 minutes.

@philip has some real world experience here. Most of my FTP server work was in the same room (different PC, but on the same switch).

Local will depend on how your writing the data. RESTfully? Node-RED vs PAC Control etc.

philip · May 6, 2022, 4:28pm

Leaving the connection open is only for TCP comm handles for the reasons Ben has mentioned. Usually, if you are closing the connection properly you will be able to reconnect again shortly, but if the connection is getting dropped (like stopping the strategy or chart without the connection being closed) then the other side of the TCP connection (the FTP server) will not know it. This can take 10s of minutes for the TCP stack to figure out.

Also, many (if not all) FTP servers limit how many connections can be made from a single client/IP, so they are more strict than the TCP stack on the device they are on.

I ran into this very issue with the Modbus driver in Ignition. It is very aggressive with reconnecting when a connection gets dropped, as it attempts to reconnect every second. Some of the modbus devices that are connected to it are over connections that are not always reliable, a lot of dropped packets and lost connections. Many Modbus TCP devices don’t have a lot of resources available and can only have a few simultaneous connections. In this case the modbus devices were unaware that some of the connections are no longer alive (google TCP dead peer) and would refuse new connections.

philip · May 6, 2022, 4:39pm

This makes me think you are opening and closing the connections quite rapidly to the point where your computer can no longer connect to the controller because the controller is out of resources.

aus.eng · May 6, 2022, 5:54pm

Thank you for looking into it. I will go ahead and modify it with a delay and avoid this rapid opening and closing connection.
I have tested again after we restarted the FTP server and it was able to finally connect and transmit a table.

Thanks.