Traccar Data Loss with 1000+ Devices

B.Ihab9 months ago

Hi there,
I am writing to request assistance with an issue we are encountering in Traccar, which we have been using for approximately 3 years (starting from version 4.xx).

We are currently managing over 1,000 GPS devices communicating with the server every 10s. Unfortunately, we are experiencing intermittent data loss for these devices. After a certain period - days sometimes weeks -, data reception for all devices stops. Restarting either the database temporarily resolves the issue but not in all cases.

For your reference, I have attached relevant logs, our configuration server details, and database configuration. Any insight or suggestions you may have regarding this issue would be greatly appreciated.

Thank you for your time.

Log:

VPS Backend configuration:

OS: Ubuntu
CPU 8 vCPU Cores
RAM 24 GB RAM
Storage 300 GB NVMe

VPS database configuration:

OS: Ubuntu
CPU 4 vCPU Cores
RAM 16 GB RAM
Storage 160 GB NVMe

Database configuration:

[mysqld]
innodb_buffer_pool_size = 15G
innodb_redo_log_capacity = 2G
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 0
slow_query_log = 1
slow_query_log_file = /var/lib/mysql/mysql-slow.log
long_query_time = 7
log_error=/var/lib/mysql/mysql-error.log

Traccar configuration:

- java -Xms20g -Xmx20g -Djava.net.preferIPv4Stack=true
---------
<entry key='server.timeout'>20</entry>
<entry key='status.timeout'>300</entry>

Linux configuration:

ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 120079
max locked memory           (kbytes, -l) 3851552
max memory size             (kbytes, -m) unlimited
open files                          (-n) 100000
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 120079
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

We did all the recommended optimization, but only on the traccar server not on the Database server:
https://www.traccar.org/optimization/

Nothing wrong on : mysql-error.log, mysql-slow.log, traccar.log

We tried with 2 traccar instance on the same time, and we noticed the same behavior, which made us think that the problem is on the database side.

Anton Tananaev9 months ago
  1. Is there a delay when the issue happens?
  2. Have you tried increasing database connection pool size?
  3. Have you checked jstack when this happens?
B.Ihab9 months ago

Hi Anton,
Thank you for sharing your insights. To address your points:

  1. I can't confirm for sure that there is no delay in database connections, but what I can confirm that no data could be inserted when that happen, and the problem could last for hours. I know that I'm leaking valuable informations, and I'm sorry for that.
  2. While I have not proactively monitored the number of existing connections, I appreciate the suggestion. Moving forward, I will implement periodic monitoring. If the connection count approaches the preset limit (151 in my case), I will promptly adjust the database connection pool size accordingly.
  3. I'm looking for ways to improve my troubleshooting effectiveness, I'll use jStack the next time the issue arises.

On my end, I will continue to monitor key metrics that might shed light on the root cause of this issue. Please do not hesitate to reach out if you have any further questions or require additional information.

twin4 months ago

Hello B.Ihab,
did you happen to find a solution? We have the same problem happening, usually, during the night. Traccar stops writing into the database (and into the logs) until we restart the database, we can't find any specific error in the logs before it stops working.

We have the following system configuration:

Frontend (where traccar is installed)
OS: CentOS Stream 9
CPU 4 CPU Cores
RAM 8 GB RAM
Storage 70 GB

Backend (DB)
OS: CentOS Stream 9
CPU 6 CPU Cores
RAM 32 GB RAM
Storage 920 GB

We have followed the optimization guide and also increased the "innodb_buffer_pool_size" to 24GB.

Thank you and have a good day

B.Ihab4 months ago

Hi @twin,
On my case I was giving too much RAM to buffer pool which lead my system to use the swap, but it clearly not the issue on your side on the database side, I'll expect more the 8GB on the Frontend, I don't know how much devices you have, but the ratio between 8gb on frontend and 32 gb in DB is just irrational, there is two possible issue I can think of:
1- You have too much device that a 8gb instance couldn't handle ( supposing you tweaked the heap and stack )
2- You have multiple database, app ... on you DB server that 24 GB on the buffer pool + the memory system usage + the memory that other db/app exceed the 32 GB dedicated and start using the swap.
You can monitor your both systems using atop or htop ...