Have you tried using server.instantAcknowledgement
?
No, but is there a procedure that may get longer than 30s from receiving HTTP request to sending a response?
Are you suggesting me that the huge delay is mysql not writing in time? Should I troubleshoot mysql instead of Traccar?
Using server.instantAcknowledgement
would give me ACK without confidence that data has been written to DB. Seems not a solution to me, but sweeping the issue under the carpet.
Can you point me to the source code section after HTTP request is received, and where HTTP response is generated? This way I can get an idea of where the system is silently dropping the request.
Totally at a tangent, but I had something similar on a 5.10 release shortly after upgrade from 5.6 (I think it was those versions), and tried various settings to get around it.
It finally turned out to be a circular assignment in the groups hierarchy put in by one of my users, which would seem totally unrelated but may be worth checking.
thanks for the hint, but I have zero groups in my Traccar server
I also have just quickly checked for hanging queries in mysql via mysqladmin -u mysqlrootuser -p -i 1 processlist
but I found none
For the sake of clarity, here's my msyqladmin status
command results:
Uptime: 37407 Threads: 11 Questions: 222163 Slow queries: 0 Opens: 732 Flush tables: 3 Open tables: 652 Queries per second avg: 5.939
I've written a python parser for tracker-server.log
to calculate the missing response ratio by:
connection id
osmand <
and/or osmand >
len(in_only_connections) / len(total_connections), len(in_out_connections) / len(total_connections)
:
tracker-server.log.20240814
: (0.4959882787971813, 0.5040117212028187)
tracker-server.log.20240815
: (0.048451174573929064, 0.9515488254260709)
tracker-server.log' (partial)
: (0.14011943242206762, 0.8598805675779324)
Numbers are saying that Traccar receives/logs osmand <
HTTP request but doesn't respond/produce osmand >
with a variable probability from 5% to 50%
by parsing all the logged connections with incoming request but no outgoing response, and searching matching points (id, timestamp, location) via Traccar reports, it is evident that some of the points ARE present, and some are not. I should query all those points via API and/or SQL query to get final stats.
I have to consider the lack of osmand >
line in as both a false negative, and a true positive. server.instantAcknowledgement
might solve the false negative case, but the true positive are problematic
After sampling 1000 random requests from the group with request (osmand <
) but not response (osmand >
) and querying via /api/positions
endpoint for presence of that fix in the db, here's the result:
assert len(sample_df) == len(false_negative_points) + len(true_negative_points)
false_negative_ratio = len(false_negative_points) / len(sample_df)
true_negative_ratio = len(true_negative_points) / len(sample_df)
false_negative_ratio, true_negative_ratio
(0.167, 0.833)
it means that Traccar server is dropping silently a variable amount from 4% to 40% of all the incoming osmand data.
This is an alarming issue.
Using instant acknowledgement is not just a workaround. It should also tell us something about the problem and help find the root cause. But if you don't want to try it, that's your choice.
I never said I don't want to try, that's your word not mine. I'm sorry for the misunderstanding.
This is all the documentation says about the instantAcknowledgement
setting: Send device responses immediately before writing it in the database.
So I looked into the java source code https://github.com/traccar/traccar/blob/b1791569e71a39936c5d5d6e96996f55cb671ae4/src/main/java/org/traccar/BasePipelineFactory.java#L101
What I can grasp from that is that enabling that option would return 200 OK to HTTP client without knowing if point has been ingested or not by Traccar, making it actually harder to assess the size of the problem of apparent random nature, so I asked questions like:
As you can see I'm diving into the stats of the problem, and without further info how instantAcknowledgement
works all I can see from given information is that it would make it harder, not easier, to discriminate from true and false negative case.
From my perspective I have an HTTP client which requests timeout, a MYSQL instance that seems not complaining but a java application that logs incoming requests but is not responding, and drops a big chunk of the received data. Traccar application should be aware of pipeline being trucated, but logs and missing data in database proves otherwise.
Considering that I've just calculated the shape of the error distribution without instantAcknowledgement
, I can now start debugging with a different strategy. I can blindly enable instantAcknowledgement
but would you tell me please how it would help understanding more of problem and how to properly use it? Thanks
My server has been running with instantAcknowledgement
enabled for 2 days now.
As expected, I'm not getting timeouts on the osmand http client side anymore, but without answers to my questions I have zero proofs that this is a solution.
As soon as I have enough log lines produced with instantAcknowledgement
enabled I'll re-sample the request/response ratio (that I expect to be 1 now) but also the db stored ratio via api that would quantify again the loss
after testing a significant sample of points (>5k) after enabling instantAcknowledgement
I can assert that:
I'd call the topic problem solved, but my question are not aswered:
instantAcknowledgement
is not enabled by default)instantAcknowledgement
is disabled (why instantAcknowledgement
is needed at all)Thanks for the answers.
I'd also prefer to get ack of successful delivery on the client, instead of an instantAcknowledgement
that doesn't prove successful message handling.
What puzzle me is the fact that instantAcknowledgement
seems more than just a workaround, as it actually seems to fix the lossy ingestion of incoming osmand messages. Am I hitting a software limit on the osmand channel? I've a max of 9 points per second, but a mean of 1, it doesn't seem a large number at all. If that so, the problem lies somewhere before writing to the database.
I've been reading the whole logs obviously, I've been reporting here the section grepped by connection id as I saw no log lines without such prefix between the
connected
anddisconnected
lines where the problem happenswhat I see between the
connected
anddisconnected
lines of dropped HTTP request is a bunch of other HTTP requests from the very same client pushing data for different devices. Some of those requests are answered correctly, some are ignored (1 in 10 in this specific case) just like the grepped one. All log lines are are levelINFO
and no errors are raised.If I search the entire log file for any line with level different than
INFO
I only find thoseWARN
on server boot:and a bunch of
WARN: Unknown device
which I am aware of. My HTTP client is expectingHTTP status != 200
when this happens, here I have logs proving that Traccar is ignoring the HTTP request