Large performance issues when opening websocket connection in Traccar v5.12

Oliver 8 months ago

Hello,

We have quite a large traccar DB and while performing the upgrade to traccar v5.12 from v4.12 we noticed some large performance issues, which forced us to revert after deploying to prod. After doing analysis with traccar instrumented with datadog, I have noticed that the query to fetch the latest positions when opening a websocket is quite expensive in multiple ways.

I will share my observations but feel free to correct if I am wrong.

  • When the websocket opens we do a query to fetch all devices, then we get the latest position for all devices, and then convert them to POJOs, before filtering for only the relevant positions for that user. (see attached) I added a logger to check the size of the positions returned. It is querying for the same ~48k positions when each WS is opened no matter the user.

Code snippet

  • Then after a restart, we have all of our users reopening the websocket very quickly. In the ramp-up period this is about 2.5 opened per second. In Traccar 4 this is quite instantaneous, but in Traccar 5 the CPU spikes to max and the service becomes highly unstable. In profiling I believe this is due to the allocations of all the POJOs onto the heap, but after heap they are subsequently garbage collected. See the profiling snippet.

Profiled Traccar 5

We will probably modify the code in our fork to only return the relevant positions for the user before constructing the POJOs, but I also wanted to inform you to get your opinions or so you can decide to revise for the next version at your preference.

Thanks,
Oliver

Oliver 8 months ago

A correction, it doesn't get all devices, but rather all devices for the user, and uses these to filter after all the POJOs were created.

Gps man8 months ago

There is major change between 4.12 and 5.x.... you need to study before thinking of migration....

Oliver 8 months ago

Hello Gps man,

Trust me when I say we didn’t take the upgrade lightly. We did some load testing but failed to test the web socket implementation between the versions. The DB migration was no issue but it seems like a potential issue when opening 10 web sockets at the same time overloads a 4-core, 8 GB mem instance.

It would be waste of resource regardless of horizontal scaling. I can also post the profiled results from Traccar 4 with the same dataset and you’ll see the CPU usage goes down to second as opposed to minutes.

Anton Tananaev8 months ago

If you find a good way to filter the latest positions on request, it would be a good contribution. I agree that what we currently have there is not the most efficient way.

Oliver 8 months ago

Sure we will take a look and contribute our implementation for review.

Gps man8 months ago

Oliver can u Pl share your connect we can work jointly to contribute to the solution

Oliver 8 months ago

Hello Anton,

I have opened the PR for your review here: https://github.com/traccar/traccar/pull/5296/files

Thanks,
Oliver