Extreme CPU load

Anton Tananaev2 years ago

Yes, thanks.

memesaregood2 years ago

Yeah, so I've deployed two separate instances, first one running the preview, and the second one running the release build. It appears there's actually some difference: release build will run out of memory slightly slower. Here are some statistics of today, 4PM to 9PM:

Preview Deployment Performance

Release Deployment Performance (please use data since 14:00, as this was the time i've set this one up

I am sorry for poor release deployment graphs, but it appears that Proxmox does not detect small memory spikes like Zabbix does.
However, during my experiments, I've figured two things out:

  • Traccar will run out of memory and crash, no matters if it's executing a script or an expression;
  • CPU Load is minimal, until Traccar runs out of heap.

This has led me to the thought: could it have been a memory leak? I've seen RAM usage building up during this Traccar experiment, but I haven't seen anything that would look as Traccar freeing any memory.

I've sent the dumps to your mail.

Anton Tananaev2 years ago

We discussed that you will test the specific change. Have you done that?

memesaregood2 years ago

I did.
The task was to compare the two builds: one uses createExpression, and the other uses createScript. I've done that.
The createScript build takes 35-40 minutes to die.
The createExpression build takes from 40 minutes to an hour to die. This one varies.

I am not sure what the task was in your opinion now. Please tell me if we had a misunderstanding. Not trying to be a dick about it.

Anton Tananaev2 years ago

Then I'm not sure why there's a comparison between preview and release. I'm confused.

memesaregood2 years ago

I figured you would like to have some additional information that could be useful. I'm sorry for the confusion. Anyway, what's next?

Anton Tananaev2 years ago

How many comparisons did you do? And what specifically have you compared? I'm just confused that we discussed comparing with the changes reverted, but instead you posted another comparison between master and last release. And you didn't say anything about comparing with and without the computed attributes change until I pointed that out. So I feel like we're not on the same page here. Basically you posted what I didn't ask and you didn't post what I asked for.

Anton Tananaev2 years ago

The thing is that there are virtually no other big changes.

How about you take version 5.6 (not master) and apply the PR on top of it and test that with the official release?

memesaregood2 years ago

That's what I was thinking! I figured there are no other big changes regarding Computed Attributes, so I took the Release installer (changes not yet merged) and the Preview installer (changes implemented), then compared the performance and "the time it takes for Traccar to crash". That's what I've done. I've let it run for some hours, then just collected the statistics (graphs and ETA). I was pretty much sure that's what was requested.

How about you take version 5.6 (not master) and apply the PR on top of it and test that with the official release?

Sure, I can do that. But please, let's just clarify, what exactly kind of info you need. А то мы так будем в кошки-мышки играть о-о-о-очень долго. А мы оба люди занятые, нам есть чем заняться.

Anton Tananaev2 years ago

But please, let's just clarify, what exactly kind of info you need.

To start with we need to see if this change is the root cause of the problem. If it is, we would probably need to start with reverting the change first and then testing some different variations to see what actually causes it and maybe there's a fix.

memesaregood2 years ago

The createScript switch does not seem to be the cause of the issue. The issue persists even with createExpression.

Anton Tananaev2 years ago

Can you please provide proof that you tested what we discussed. Let's start with links to the source code.

memesaregood2 years ago

First, I took the 5.6 release repository branch, and applied my changes over it. Then I let it run for some time, which resulted into high RAM usage with high CPU usage later, resulting in thread starvation.
Then I reverted my changes and monitored the resources again. The result is the same.
What kind of proof you need? What data? Logs? Graphs?

Anton Tananaev2 years ago

Wait, so you're saying that it happens either way? Previously you said:

The official release does not have such issues.

The official release is built from the same tag that you've used. Something doesn't add up here.

memesaregood2 years ago

I'm not quite sure this is true now. We did not experience this issue on release before, but somehow, I managed to reproduce it on release build. My release instance is hanging as we speak. Sorry for false info before.