.NET SDK Timeout

Hello, all. This is a really technical question but I’m hoping someone in the know can give me some insight.

I’m using the OptoMMP3 dll, version 3.0.0.0, to query an R2 about once a second. When it does this, it retrieves a block of data from the scratchpad. Any communications errors are logged to a text file.

This works the majority of the time, but there are some days in which communications fails. Looking back at the log, I notice two types of errors.

The first is a simple timeout, which makes since because the R2 is turned off during this time. The error message returned by the API is “BlockReadRequest; Timeout, no data was received. Either the remote host didn’t receive the request or a packet was dropped.”

The second error occurs when the R2 is on and should be communicating. The error message returned by the API is “An communication error occurred (BlockReadRequest). Check port status, may have automatically closed.::An existing connection was forcibly closed by the remote host”.

The R2 is not connected directly to the computer. It’s going back to a Cisco switch. This is at a customer’s location so I can’t say how the switch is configured. Recently, we connected the computer directly to the R2 and the errors seem to have been resolved.

To me, it seems the Cisco switch is interfering with comm, but I don’t know how. Can someone explain what might be happening and what that second error actually means?

Thanks.

Are you sure there was only a switch between the computer and the R2? It sounds a bit like a routers NAT state table was getting cleared. Either way, your program should be able to recover from this gracefully - you should have the computer close and reopen the connection upon a closed connection failure.

1 Like

There is more. I’ll explain.

The R2 is mounted in a cabinet. It goes to an unmanaged switch and from there to a router. The router does have a NAT table. The WAN side of the router is connected to the facility network. From there it goes to the Cisco switch. I can’t say if there’s anything else between the Cisco and the PC or if there’s anything before the Cisco. Since being made aware of the issue, the customer has connected the WAN on the router to another unmanaged switch then connected his computer and the Cisco to this switch. Now he has access to the network but also has a direct path to the cabinet. So far, the errors seem to have stopped. At least I haven’t seen any forcible disconnects since he’s done this. That’s why I’m assuming it’s the Cisco or something else on their network but maybe it’s a combination of things.

Using the OptoMMP class in the API, I connect using UDP then use the method ScratchpadFloatRead to read the scratchpad. It’s not the only thing I read from the PLC but it is the first thing I read and this is where it’s throwing the exception.

Some days it will communicate just fine for a few hours, then error out as I’ve described for a few hours then come back. It’s very strange. At least to me.

Another odd thing is that they have two cabinets but this seems to be happening only on the one cabinet. That’s why I put routers in the cabinets because I wanted the R2s to have the same address. Also in each cabinet is an EB2 and a PC which acts as the HMI, but there shouldn’t be any communication between these and the outside.

In hindsight, I think a better way to do this would be to have a server listening for data rather than actively retrieving it. The R2 can easily open an Ethernet connection and send a string and I wouldn’t need the NAT table. If I can’t get this resolved, I will probably rewrite it to do this.

There is no guarantee that your request or the response will make it to it’s destination and UDP will not retransmit a lost packet- if a network component is too busy, it is allowed to just drop packets.

What does your program do when it encounters the error? Does it retry? Is this just a log spam issue (maybe you could only log this if it happens repeatedly?).

Switching the R2 as the client won’t necessarily solve lost packet issues - but it may handle them differently (probably ignored on UDP). One upside of doing this is it will relieve you from NAT port forwarding hassle on your router, but I wouldn’t want to have to rewrite a bunch of mostly working code because of an entry in a log.

It’s not a logging issue as the data it retrieves from the R2 is supposed to be written to a database. If it can’t receive the data, then it will be lost.

I don’t really handle the error. I log it and move on until the next time it queries the R2. I could try disconnecting from the IO unit to see how that would work. I didn’t think it would matter since it was using UDP.

If I were to make the R2 a client, I would switch to TCP instead. It would maintain a connection to the server at all times, sending it data as it is created.

It is a silly error message for UDP, but I think what is going on is that a previous transmit failed and the only way for the UDP socket to let you know is at the next transmit or receive.

I’m not sure if you are on windows or something else, but the MSDN documentation states this:

WSAECONNRESET
The virtual circuit was reset by the remote side executing a hard or abortive close. The application should close the socket; it is no longer usable. On a UDP-datagram socket this error indicates a previous send operation resulted in an ICMP Port Unreachable message.

I think it may be safe and prudent for your code to immediately close the connection, reopen and try again one time and with an appropriate delay there-after when encountering this error.