Thanks.
So for the last tests, I didn't block the device port this time.
Instead I created another identical instance and tested by routing the devices traffic towards it's IP. With traffic, API calls take 15+ seconds. Without traffic, 60ms.
Traccar server stops processing the incoming packets until the API call is completed, same as before.
At the end, it might be the Cache Manager but for some reason the issue is only experienced when there are incoming data packets.
That's how caching works. It caches data only for connected devices.
Well, then I am definitely going to be looking into the Cache Manager.
The current logic obviously works for users with only small amount of devices but any user with more than 100+ devices with relatively short reporting intervals and many geofences/groups/computed attributes will prove to be a problem.
So, I had a deeper look at the cache manager. Fixing the this issue won't be an easy task and has to be thoroughly tested with active incoming traffic. This obviously cannot be done on production, so we either have to 1. deploy a mirrored dev environment and forward the data packets to it or 2. we have to setup a script to simulate incoming data packets. The latter has to be done from scratch, unless you have something already made.
Given the time required for a general fix, I will have to implement a quick workaround first to patch up the production instance.
Can you advice on what would be the best temporary fix?
The current setup is done as per your recommendation from here. This obviously no longer works.
Right now, each user is assigned to an individual group, which is then assigned to 1 Admin user. The admin has 40 computed attributes and assigns them to each user group. This way we have only 40 different attributes and if we change one of them we change it for all users.
The alternative, I can think of, is to "delink" the user groups from the admin and assign 40 computed attributes to each group. This is not particularly effective (nor efficient) because if we assume 500 users this will give us 20000 different attributes. And if we need to change one of the attributes, we will have to make 500 changes.
Is there a better way to do it given the cache constraints?
I think the easiest option is just to unlink the groups from admin that you don't need to work with immediately. You can also divide your users into batches. There are many ways to deal with it.
So basically, no better option - I will have to do 20K computed attributes. It's what I was afraid of.
Two notes: Unlinking the users from the admin works but then we have to add the attributes per each user, individually. If we talk about scaling this is an awful way to do it and it might involve other complications that we are currently unaware of. Also, I can't imagine what the performance would be with say 100K computed attributes. The more computed attributes there are, the higher the CPU usage.
Second, users in batches will be probably more complicated to setup and manage as you have to keep track of which users belong to which group. Two groups would be fine but 10 or more?
Out of curiosity, how do you solve cache issues for the the one user <-> many devices scenario on the demo servers? It's hard to imagine that there are no users with hundreds devices/attributes/geofences? So you should be experiencing similar issues too.
There is a large number of false assumptions in your comment, so I don't think this conversation is going into right direction. For example, the number of computed attributes is irrelevant. The number of times you execute attributes matter. If you share a single attribute shared between all devices or have a separate attribute for each device, the performance will be pretty much the same.
You are right, I stand corrected. Performance wise the CPU usage should increase if more attributes are executed. So for the performance, it shouldn't matter the count of the total computed attributes, rather how many of those are executed. I thought the total count might have some implications on performance but this is only true if all are assigned to the same user (because of the cache).
Still, how do you handle the same situation on the demo servers?
In any case, I appreciate the productive conversation, it will help to eventually solve the caching issue.
We don't handle this on demo server. Demo servers are not intended to be used for hundreds of devices per user. In fact we banned a couple of users who abused it in the past.
Ok got it. Thanks for the confirmation and also for the provided free advice!
I will look into urgently fixing this for my use case and I will then turn to look into the details of cache manager. If I come up with a solution I will create a pull request.
Let me know if you have some solution for simulating device traffic. Traccar cannot be used for forwarding the original data packets, so I should either do it server side once the packet is received or create a separate script to simulate incoming packets.
Traccar cannot be used for forwarding the original data packets
This is again wrong. It is possible to forward the original data.
Really? I read previously in the forum that traccar can only forward the packets after they are "processed" via the osmand protocol.
The original HEX cannot be forwarded. Or am I wrong?
Can I forward a HEX like this to a dev server:
696d65693a3335393538373031303132343930302c747261636b65722c313230363038313633372c2c462c3139333732392e3030302c412c313234392e393233382c532c30333831362e313738382c572c302e30312c3330332e33363b
Thanks.
We added it recently, so what you read might have been outdated information.
Ahh this is amazing. It will help immensely to troubleshoot the cache.
I found the release note for v.5.7: Raw network data forwarding in original format.
Is this the only entry that I need to add (found it in another topic as there's noting in the configuration yet):
<entry key='server.forward'>192.168.0.111</entry>
?
The cache is in memory only, so it is cleared on restart.