Experiencing HTTP requests timeouts to osmand port (5055) since upgrade from 5.9 to 6.x

Jack4 months ago

I am forwarding points to Traccar via osmand HTTP requests from a custom software running locally with Traccar server (same docker-compose)

As the title says, I'm experiencing HTTP timeouts from this service to Traccar server.
The very same service with very same configuration has been sending data for the whole 5.x version line without raising this problem. It seems it started with Traccar server version upgrade (via docker pull)

The problem is not continuous. It happens on ~20% of all HTTP requests. It seems to happen in bursts.
I'm using osmand.timeout=600

I'm having some difficulties in debugging this issue as it seems not visible in the Traccar logs (/opt/traccar/logs).
I've been checking sysctl and ulimits, I've also been trying to improve logging troubleshooting on the HTTP request side, but it fails even with a very long client request timeout of 30s.

Any help how to troubleshoot this issue?

Thanks

Jack4 months ago

I forgot an important information: when a request goes timeout, if I re-send the same HTTP GET request unchanged immediately after, it succeeds most of the times

Anton Tananaev4 months ago

Why is it not visible in logs? It should be very visible.

Jack4 months ago

This is the level of detail I get. I've edited the deviceid, location and cellular, but timestamp is real.

Here's the original request

$ cat tracker-server.log | grep "Tf1d58dfe"
2024-08-16 22:53:03  INFO: [Tf1d58dfe] connected
2024-08-16 22:53:03  INFO: [Tf1d58dfe: osmand < 172.18.0.3] GET /?deviceid=999999999&valid=true×tamp=1723848780&location=42.123456%2C11.123456&cellular=111%6A60%1B14180%6A615A2&speed=0.0&heading=0.0&altitude=0.0&accuracy=0.0&hdop=0.0&batt=51.0&sat=11 HTTP/1.1\r\nHost: traccar:5055\r\nConnection: Keep-Alive\r\nuser-agent: Nim httpclient/2.0.2\r\n\r\n
2024-08-16 22:53:33  INFO: [Tf1d58dfe] disconnected

As you can see traccar is not anwering the request for 30 seconds (my client side timeout) and point is not ingested in Traccar.

Here's the same request re-sent manually minutes later via curl

cat tracker-server.log | grep "T3a199c5e"
2024-08-16 22:56:32  INFO: [T3a199c5e] connected
2024-08-16 22:56:32  INFO: [T3a199c5e: osmand < 172.18.0.1] GET /?deviceid=999999999&valid=true×tamp=1723848780&location=42.123456%2C11.123456&cellular=111%6A60%1B14180%6A615A2&speed=0.0&heading=0.0&altitude=0.0&accuracy=0.0&hdop=0.0&batt=51.0&sat=11 HTTP/1.1\r\nHost: localhost:5055\r\nUser-Agent: curl/7.81.0\r\nAccept: */*\r\n\r\n
2024-08-16 22:56:33  INFO: [T3a199c5e] id: 999999999, time: 2024-08-16 22:53:00, lat: 42.123456, lon: 11.123456, course: 0.0
2024-08-16 22:56:33  INFO: [T3a199c5e: osmand > 172.18.0.1] HTTP/1.1 200 OK\r\ncontent-length: 0\r\n\r\n
2024-08-16 22:56:33  INFO: [T3a199c5e] disconnected

Here 200 OK is received and curl and point is present.

I yet have to confirm that all curl request would succeed. I tried same point via curl over and over again via bash script, got 100% success.

EDIT: forum is messing up some characters in pasted code block, how can I paste it correctly?

Anton Tananaev4 months ago

What version are you using?

Jack4 months ago
2024-08-16 20:39:59  INFO: Operating system name: Linux version: 5.15.0-117-generic architecture: amd64
2024-08-16 20:39:59  INFO: Java runtime name: OpenJDK 64-Bit Server VM vendor: Alpine version: 17.0.12+7-alpine-r0
2024-08-16 20:39:59  INFO: Memory limit heap: 1024mb non-heap: 0mb
2024-08-16 20:39:59  INFO: Character encoding: UTF-8
2024-08-16 20:39:59  INFO: Version: 6.3
2024-08-16 20:39:59  INFO: Starting server...

from docker image traccar/traccar:6

Anton Tananaev4 months ago

Make sure you upgrade to the latest version first.

Jack4 months ago

Sure. But is there a commit that would explain the missing HTTP response?

https://github.com/traccar/traccar/compare/v6.3...v6.4

Jack4 months ago

Now running v6.4
Please point me to major suspects if you have any.
If it's something client-side or server-side I can change how data is sent or where container is executed.

Anton Tananaev4 months ago

Is it working now with 6.4?

Jack4 months ago

Can't really say right now with real data as producer is silent at night.

But I can test with fake data from same pipeline. I'm setting up the testbed right now

Jack4 months ago

just received real world data.
Same problem as before: Traccar logs shows received HTTP request, but gives no response for 30 seconds and my HTTP client goes timeout

I keep receiving "too many requests" from the forum. This platform seems unsuited for this kind of report. Shall I move this to github issue?

Anton Tananaev4 months ago

Logs please.

Jack4 months ago

forum keeps stopping me from providing requested details, forcing me to wait the anti-spam cooldown. Is this really necessary? Shall I move this to github issue? I've been waiting enough timeouts for today and this message has been waiting for 17 minute before being posted.

id, location and cellulare are fake, all the rest is original

$ cat tracker-server.log | grep "Td762466d"
2024-08-17 00:23:36  INFO: [Td762466d] connected
2024-08-17 00:23:36  INFO: [Td762466d: osmand < 172.18.0.3] GET /?deviceid=999999999&valid=true×tamp=1723854196&location=42.123456%2C11.123456&cellular=111%6A60%1B14180%6A615A2&speed=0.0&heading=0.0&altitude=182.0&accuracy=0.0&hdop=0.0&batt=0.0&sat=6 HTTP/1.1\r\nHost: traccar:5055\r\nConnection: Keep-Alive\r\nuser-agent: Nim httpclient/2.0.2\r\n\r\n
2024-08-17 00:24:06  INFO: [Td762466d] disconnected

at this point I think it's a server side issue. I'll start changing the HTTP requests removing unnecessary headers and options.

Anton Tananaev4 months ago

You have to look at full logs, not just filtered by connection id. There could be errors that you're missing.