Slows down with uptime

I have found my groov AR1 starts responding slowly the longer it is running without a reboot (around 2-3 months). Eventually tags stop working and I have to reboot it, then all tags work again and the performance picks up.

Has anyone else experienced this?

Hmm, it would be interesting to know when you purchased the AR1. I have 3 AR1s for this one particular job. The first AR1 I bought was purchased about 4-5 months before the other 2. This particular AR1 needs to be rebooted every 1-2 months since it stops responding (I even had to wipe it and factory reset it). The other 2 AR1s have not been touched since I installed them in late 2016- early 2017.

Edit: All three AR1s have the exact same program and settings. All three are within 3 miles of each other.

Yes, but my issue us with Groov server windows. Kept going back to Julio and groov would lock up (sort of) by not being able to poll andy data after 1-2 months, then a reboot would fix it. This has been going on for like a year+ in spite of all the work with tech support and Johnathan. They came out with 4.1a and I installed that version, and then the triangles showed up right away…then they released 4.1b, and now, groov is having the triangles show up briefly and then go away and the pages are taking upwards of 15 seconds to load…

Other than that it works perfect…:confounded:

That’s one I haven’t heard. This is still groov Server for Windows?

Prior to 3.5, groov would pre-load all page definitions when you opened the app, so you’d pay a large up front penalty but switching between pages would be pretty much instantaneous. That puts a heavier burden on mobile clients though, so in 3.5 we switched to loading only the page index + the page you’re actually viewing, so there’s a (supposed to be) short delay while loading a page the first time. (Switching back to a previously loaded page should be fast, unless someone saved it in the meantime.)

I’m really curious why you’d be seeing 15 seconds to load a page. Unless I’m misreading it and you’re saying 15 seconds before all gadgets get values?

Clearly, it depends…
Here is my groov Demo system uptime;
image

Its running fine, no issues.

And here is https://demo.groov.com which is a groov server for windows (Opto’s main demo);

(Windows 10 no longer forcing updates should result in much higher up times for this system).

Well, of course a page load should be defined by the length of time to load all values, otherwise it’s worthless. I am speaking about that time to load. Apparently, some of the time (not all) the triangles show up right before the values finish loading and then disappear when the page is finished.
I wasn’t there but Natasha is pretty reliable in her description. I sounds to me like some pages are taking much longer than they should and others are sort of slow but acceptable.
On version 4.0c the pages loaded ok, although it has always been slow on the pages with a fair number of values to display, say 40+ values. I just spoke with her and she said that she was getting a very slow response from the same page multiple times in a row, so apparently it is not an initial load sort of thing. I don’t have a problem with initial loading (assuming they are ridiculous).
Check with Julio, Natasha just sent Julio a log file, may this will help.
Is it possible that this problem is somehow a cause of an older version of windows 10 pro? This is a problem because like many many industrial systems, no internet access is allowed, so this version is probably 2 years old.

That just means, the version you have this one on is still working, what about the versions after that? Also, since it is an AR1, it’s not plagued by windows updates or lack thereof. This is why we need Pac Project Pro to run on Linux. Groov runs on Linux but only on the box, the problem there is if you need more HP than the box has, like (I thought would be necessary) in this case.

Alright, Julio passed along the log. (For future reference: these log files compress down really well. The 28.3 MB log file Natasha sent along compresses down to 519KB in .zip format.)

This immediately jumps out, the second to last line in the log:

2019-04-09T13:13:56-05:00 Tracking 7 subscriptions, scanning 553 tags on 1 device.  INFO  - com.opto22.groov.scanner.core.SubscriptionCoordinator

553 tags is a heavy load on a single controller. I’m not surprised you’re having issues, and you’ve probably been having slow updates for awhile; versions prior to 4.1 weren’t as quick to let you know.

You can see earlier in the file that groov had a bit of trouble getting connected to it in the first place:

2019-03-27T11:00:40-05:00 Connecting to device ControlEngine-10.6.68.143:22001  INFO  - com.opto22.groov.scanner.core.DeviceController
2019-03-27T11:00:40-05:00 Groov has finished starting up.  INFO  - com.opto22.groov.server.GroovServletContextListener
2019-03-27T11:00:45-05:00 Connection timed out while trying to communicate with device ControlEngine-10.6.68.143:22001.  WARN  - com.opto22.groov.scanner.core.DeviceController
2019-03-27T11:00:45-05:00 Waiting 1.0 seconds until trying to reconnect to device ControlEngine-10.6.68.143:22001.  INFO  - com.opto22.groov.scanner.core.DeviceController
2019-03-27T11:00:47-05:00 Connecting to device ControlEngine-10.6.68.143:22001  INFO  - com.opto22.groov.scanner.core.DeviceController
2019-03-27T11:00:52-05:00 Connection timed out while trying to communicate with device ControlEngine-10.6.68.143:22001.  WARN  - com.opto22.groov.scanner.core.DeviceController
2019-03-27T11:00:52-05:00 Waiting 1.0 seconds until trying to reconnect to device ControlEngine-10.6.68.143:22001.  INFO  - com.opto22.groov.scanner.core.DeviceController
2019-03-27T11:00:54-05:00 Connecting to device ControlEngine-10.6.68.143:22001  INFO  - com.opto22.groov.scanner.core.DeviceController
2019-03-27T11:00:59-05:00 Connection timed out while trying to communicate with device ControlEngine-10.6.68.143:22001.  WARN  - com.opto22.groov.scanner.core.DeviceController
2019-03-27T11:00:59-05:00 Waiting 2.0 seconds until trying to reconnect to device ControlEngine-10.6.68.143:22001.  INFO  - com.opto22.groov.scanner.core.DeviceController
2019-03-27T11:01:01-05:00 Connecting to device ControlEngine-10.6.68.143:22001  INFO  - com.opto22.groov.scanner.core.DeviceController
2019-03-27T11:01:04-05:00 Connected to device ControlEngine-10.6.68.143:22001  INFO  - com.opto22.groov.scanner.core.DeviceController

(That was still under 4.1a at that point.)

Next thing to look at is the new Device Health section in 4.1. In Build mode, go into the Configure Devices and Tags section, click on your controller, and click the Device Health button. It’ll look like this:

In particular: what do the response times look like? That records the full amount of time it takes to send a batch of tags to the controller and get the full response back.

Ok, looks like we might be getting somewhere. So my read on this response is you think this is too many tags for a single controller. Also, don’t forget, the controller is also a redundant S1.
When you say 553 tags is too much for a single S1, my experience is that (say with Pac Display) it is not, however, when you factor in the redundancy issue, it might be. Not sure exactly how that impacts the controller but obviously since it has to update a whole mess of variables on a fairly regular basis, that could be the problem. In designing the program in pac control, I specifically tried to limit how often the S1 would attempt to update the sync block and tried to limit the number of charts with sync blocks. I think the way I have it designed is pretty efficient. I suspect the only way to fix that aspect is to use SoftPac on the Groov servers but not sure that Softpac can be made redundant.
Also, since Groov only has one refresh time, that could also be a factor since in Pac Display I can use Prime Numbers and many refresh times to make comms much more efficient.
Realistically, 553 tags is not a massive project maybe a medium size project. I can’t say how many tags some of the other systems I’ve built, but it seems to me I’ve had over 300 tags on one page all updating at 0.5 seconds with no issues at all in Pac Display.
Let me check to see if I changed the update speed, but I’m pretty sure I left it at 1 second. Maybe backing it off to 1.2 seconds will fix the problem?

What about the subscriptions? How does that factor into all of this?
What 7 subscriptions?

Internal term for a set of tags that’s used as a group. Generally, there’ll be 1 subscription for trends, 1 for events, and then 1 per page that’s being actively viewed. The subscriptions are shared between clients as of R3.5Aa, so if you have 50 browsers parked on the same page, it’s still 1 subscription.

I’ve put a lot of effort into making sure tags aren’t requested redundantly as well, so if the same tag is used in a trend and in a page, we’ll only request it once, etc.

I agree, 553 is a small project. However, it isn’t how many tags are in your project, but how many tags groov is requesting (subscriptions) at that time. So everything on the current page view plus trended tags, plus alarm variables - and if you just left a page, it still asks for the tags from the page you left for a couple minutes (unless that has been fixed).

The issue is how the controller responds to requests from groov. If groov asks for 553 tags, the controller responds with 553 SEPARATE packets. At a 1 second poll rate - it’s not going to happen.

There are some things you can try to help (besides reducing the number of tags on a page):

  • You can start a separate control engine task through PAC Control and have groov read on that new port. (This helped a little bit on an R1)
  • You can write the values groov needs to the scratchpad, and then use modbus. Modbus will return 125 registers at a time in a single request. This is pretty easy for items you just need to view in groov.
  • You may be able to do the scratchpad thing and use a OptoMMP device type in groov too (I haven’t tried this).
  • You may be able to use Node-RED to request your data more efficiently and write to groov local storage. (With a bit of communication work in PAC control, you could even have the controller write to Node-RED instead of having Node-Red poll the controller).
  • Write a MQTT client in PAC Control and publish your tags with report by exception. Piece of cake :wink:
  • Wait for Opto to come up with a more efficient means of getting data out of a PAC controller.

Some of these were ideas I had to help a customer speed up their project - I only tried out the first couple (low hanging fruit). They helped, but ultimately they chose to use a different HMI. :man_shrugging:

Right, and that’s what the logs are showing: at peak times, it’s requesting 553 tags at whatever the scan rate is set to, probably once a second.

Subscriptions are reference counted, and cleaned up once every 30 seconds. So if you’re the only user on a page, and you leave it, groov should keep scanning the tags on that page for up to 30 seconds. If you come right back to it (or someone else views it) before the cleanup, those tags will be immediately available again.

This gets a little messy because I can’t always determine when you’ve left a page cleanly: if you navigate away the client will let the server know it’s done with it, but if you just close the browser window, lock your phone, etc, I have to keep track of how long it’s been since that tag subscription set has been queried for updates. I close out a subscription during the every-30-seconds cleanup phase if it hasn’t been queried within 15 seconds.

Yeah, that’s on my list of things to address.

I’m not sure where that came from, but nope. We only request the registers we need, and unfortunately don’t batch them up well at the moment.

MMP’s probably more efficient overall, yes.

125 registers is the max for the modbus protocol (that’s where that came from).

So if there are contiguous registers, groov only requests one register at a time? Hmmm.

We hit devices that can’t handle many registers at once. We do requests in much smaller chunks for safety.

Yeah, I deal with a VFD that is that way - it sucks. Too bad all the well behaving devices get penalized. Maybe a checkbox to enable/disable. The configure tags interface already asks for the register quantity (array size), I assumed it would read them all at once.

I had to go check this for myself, and sure enough groov asks for each register one at a time, I’m a bit depressed now…

For larger data types, and for strings, we’ll batch them together better: 32-bit types request two registers at a time, strings request however many are configured for the tag.

You may also see better request patterns when requesting contiguous elements from an array in 4.1b, across all device types. I’d need to go through and read the Modbus driver to see whether it’s taking advantage of the better array packing code though.

Okay, I look forward to that. It looks like OptoMMP only requests a single tag at a time as well. It would be nice to be able to enter a length for arrays of data for MMP too (particularly for scratchpad stuff) instead of one element at a time.

Sorry to hijack the thread y’all.

Hey Jonathan,

Tasha here. Noted to compress log files for next time, I was primarily focused on getting the logs to you guys asap.

Device Health:

I seem to be having the same issue since the last patch of pages taking a long time to load with very little values to display. This didn’t happen with version B3.5d-r47553.