TCP Communication Problems on Upgrade

Communication Problems on Firmware / Software Upgrade.
Looking to see if anyone else has run into this problem.

We recently upgraded a PAC-S2 from 9.1f to 10.4 (both Firmware and PACProject Pro) – it happens to be using the DNP Integration Kit via TCP Connections as a DNP Outstation. The problem is related to TCP connectivity and listening / receiving request via TCP ports verses the bulk of the DNP command / response configuration.

When operating on 9.1f we had no issues receiving requests and passing data to the DNP Master.

When we upgraded to 10.4 we no longer had communications and both “Listen Ports” (com handle tcp:20000 and tcp:21000) show an error of -442 (Could not accept on socket – no device are currently attempting to connect to this port).
PACControl logic and PACManager configuration remained the same and logic was loaded with PACControl 10.4 PRO (last version).

There are no changes to anything else in the system, except that the DNP Master could NOT connect to the updated PAC-S2. Network personnel did confirm that they could see the PAC-S2 Ethernet Port (using Ethernet 2 with a Gateway as part of the PACManager configuration).

Yes, we did restart the communication link / chart to try to reconnect to the Master with no success.

The command that appears to be throwing the error is coming from the ACCEPT INCOMING COMMUNICATIONS command. It’s a legit error (no incoming connections) and we can recreate the same error code on a similar setup we have in house (without the DNP Master communications). It does appear that “something” is blocking connectivity when 10.4 is loaded. Note testing did allow connectivity (via Putty or Filezilla) on similar setups but these approaches are not recreating the DNP master requests.

I had the site reinstall 9.1f firmware and load the (backed up) PACControl configuration via PACControl 9.2 PRO.

Once up and running, communication was re-established (error code was gone and the configuration was receiving request / sending responses). Again, nothing was changed other than the Firmware and PACControl version to load the configuration.

Tech Support was a great help but in the end we still do not have a definitive answer to WHY we have communications (TCP to port setup to listen for commands) with 9.1f (before and after our upgrade) but will not connect when loaded with 10.4

Any ideas, similar experiences, solutions, etc. would be greatly appreciated. I’ll be happy to find out we did something stupid but at this point it seems like the firmware upgrade caused this issue (and older firmware fixed this issue).

An FYI on TCP communication issues during a firmware upgrade.
It was discovered that there is an issue with accepting communication on an open com handle.
This issue is due to a difference between older and newer firmware.

The solution was straightforward (after the problem was identified):
Check that communication is open (IsCommunicationOpen)

  • If not, then issue the AcceptIncomingCommunication command
  • If Open, then check for characters at port (GetNumCharsWaiting)

Accepting Incoming Communication Command does not show port open (-47) but indicates that it cannot accept on socket (-442). Previous firmware would provide -47 when you issued the Accept Incoming Com, noting that it was open and no issues existed.

My thanks to tech support in helping to identify the problem as well as provide a work around.

Happy you got it working. I’m curious about the explanation though. I think it may not be an issue with the commands, but rather how the DNP3 charts are using them.

Can you post what they had you change to get it working?

-442 is the normal response for Accept Incoming Communication. I have some TCP server logic that allows multiple clients to connect and I use that status in my chart, it is used in 9.x versions, and I tested it on 10.x and still works the same. If you were getting -442, that simply means no clients were trying to connect (at least as far as the strategy is concerned)

-47 is telling you the comm handle is already open, which basically means Accept Incoming Communication was called on an comm handle that is already in use. If the comm handles were being managed well by the logic, you shouldn’t get that. In the DNP3 chart, it is a side effect of how the chart is written. I can assure you that the chart is returning both -47 and -442 depending if there is a client connected to the comm handle. These are the likely responses:

-47: someone is already using the comm handle, move along
-442: this comm handle has waited for 10 seconds for a client to connect and nobody did, just letting you know, call me again to keep waiting
0: We got a new client connected, lets listen to them

I looked at the DNP3 chart and it handles Accept Incoming Communication in a way that is strange to me (looking in the Outstation protocol chart). It is a complicated chart, so I could be total wrong on this, but it looks like this protocol is setup so a client connects, get what it needs and disconnects. If the clients keep the connection open, then I suspect they would get frequent timeouts while the chart is sitting waiting for Accept Incoming Communication. Maybe that works for this specific protocol, however, I use the below pattern for a server in a PAC Control strategy.

PAC Strategy TCP server pattern
For those who are interested in this. Let me know if you see any issues.

Accept Incoming Communication is a blocking command and will wait 10 seconds for a client to connect and if not, returns -442. The response to a -442 status would be to call Accept Incoming Communication and wait again. This makes it unsuitable to be called in the same chart that your client processing logic is. Therefore, this command should be in a server listener chart and the client processing should be handled in a separate chart. If Accept Incoming Communication returns 0, we have a client, anything else, then there is an issue and we go back and call Listen for Incoming Communication again on that comm handle. Your comm handle pool will need as many comm handles as you expect clients and logic to handle pool exhaustion (delay until a comm handle comes available).

When a client connects, the server listener chart will hand the client off to another chart to handle the client(s), then rotate to the next available comm handle in your pool and call Accept Incoming Communication and wait for the next client. The client chart would return the comm handle to the available pool when the client disconnects.

This allows the strategy to serve multiple clients simultaneously.

See HTTP Interfacing and ports - #9 by philip for which order the commands should be called in for a TCP server.

1 Like

The problem we had with the DNP chart is a combination of an issue with the old firmware and a glitch in the current firmware (which could effect older setups):

Current Firmware Issue: Attempting to do an ACCEPT on an open COM handle should return a -47 (Already Open) error, but a bug exists that returns a -442 and closes the COM handle.

I agree the DNP Outstation chart is a complicated chart which is setup to interface with multiple masters / clients. The approach taken on this system is different from other interfaces we’ve setup (which were much more straightforward by comparison) but juggling multiple ports and multiple masters, I can see why it was setup the way it was.

Since the protocol is looking at multiple Ports and Masters a couple of items had to be modified when going from old to newer firmware:

  • The current firmware issue, as noted above.
  • Older firmware issue that it did not wait per the default setting (10sec for tcp) to Accept Incoming Communications (it would check if someone is connected and continue, no wait). While the newer firmware will wait 10sec unless you specify the time to wait for incoming communication.

ORIGINAL (block 543 in DNP3_Protocol chart)

  if (ntNL_Comm_Handle_Accept_Status[nNL_Master_Index_1] == 0 or ntNL_Comm_Handle_Accept_Status[nNL_Master_Index_1] == -47) then//Check Session Status
        ntNL_Comm_Handle_Accept_Status[nNL_Master_Index_1] = GetNumCharsWaiting(*pochDNP_Master_Listen_Port);//Check For CHR AT Port
  else      
        ntNL_Comm_Handle_Accept_Status[nNL_Master_Index_1] = AcceptIncomingCommunication(*pochDNP_Master_Listen_Port);//Accept Incoming
  endif//Check Session Status

MODIFIED AS FOLLOWS:

  SendCommunicationHandleCommand(*pochDNP_Master_Listen_Port,"set.to.open.i:.01");
  if(IsCommunicationOpen(*pochDNP_Master_Listen_Port))then 
        ntNL_Comm_Handle_Accept_Status[nNL_Master_Index_1] = GetNumCharsWaiting(*pochDNP_Master_Listen_Port);//Check For CHR AT Port
  else      
        ntNL_Comm_Handle_Accept_Status[nNL_Master_Index_1] = AcceptIncomingCommunication(*pochDNP_Master_Listen_Port);//Accept Incoming
  endif//Check Session Status

The SendCommunicationHandleCommand is used to set the Accept Incoming Communications time delay to 10msec (otherwise it would default to 10sec on the current firmware). For most setups the default delay doesn’t hurt, but on a system that is monitoring multiple ports and masters (within the same routine via pointers) a long wait would cause timeouts if more than 1 client is trying to connect.

The Original Setup was looking for -47 (Com Handle Open) but the current firmware issue (that didn’t exist on the older firmware) will send back a -442 and close the com handle when the AcceptIncomingCommunication command issued. On the next cycle of the logic we would see -442 not -47 and repeat the cycle, preventing the logic for checking GetNumCharsWaiting.

The fix regarding the return status error code is checking if the port is open prior to issuing the AcceptIncomingCommunication. This check prevents the current problem (return of -442) and subsequent closing of the port.

The old logic wasn’t “wrong”, it just worked (older firmware) and didn’t work (newer firmware) due to firmware issues.

Having said that I think a good practice is to always check if Communication is Open (as you had done in your example) prior to reissuing the command and/or checking for characters waiting.

Hope this helps.

2 Likes

Thanks for the explanation - I’m sure it will help future users of the DNP3 chart.

Ahh - that is an interesting change - I can see how that would cause issues in the DNP3 chart and it makes more sense how the chart is written now.

One would not be affected by this change if they are not calling AcceptIncomingCommunication on an open comm handle - which I think is a questionable practice - much better to check if it is open first, or keep track - looks like a good improvement to the logic with that.