Experiencing HTTP requests timeouts to osmand port (5055) since upgrade from 5.9 to 6.x

Anton Tananaev5 months ago

You're not hitting the limit, but your use case is very unusual because you have different devices on the same channel. Typically there's a separate connection per device. So there could be some bug in the logic.

Jack5 months ago

But the client is opening and closing independent TCP sockets, and the deviceId inside the GET params can have multiple different values. Basically I am proxing some iot devices as osmand devices, but from the point of view of Traccar it should see multiple independent HTTP requests for different deviceIds incoming from a single source IP is the same subnet of the server.

What do you mean by "connection"? In the Traccar logs I see a different [<id>] for each received point, and if that is what I guess is the netty connection id, then I have a different one for each HTTP request.

Anton Tananaev5 months ago

I was looking at this statement:

same client pushing data for different devices

Normally if you push data from the same client, it will reuse the same connection. You're saying it's not the case? Then I'm not sure how the "same client" statement is relevant. For all intents and purposes those are different connection and requests.

Jack5 months ago

I confirm my HTTP client is doing a single GET request for each point, closing the HTTP/TCP connection after each one. No more than 1 lat/long for each TCP connection, so to speak.
As a consequence of this, I see a unique [<id>] for each HTTP request / point sent in Traccar logs.
I know it is not the more efficient way to send data and it generates overhead, but it's a localhost TCP connection and it has been working so far ... until upgrading to 6.x from 5.9.

I can also generate fake data this way. It's just a rapid sequence of HTTP requests each one in it's own TCP SYN/FIN stream.

Anton Tananaev5 months ago

This is what we use as the connection key:

https://github.com/traccar/traccar/blob/master/src/main/java/org/traccar/session/ConnectionKey.java

It would be good to check if the value is different or the same.

Jack5 months ago

What I can assure is that by reading any of my tracker-server.log.* files and by grouping by [T*] connection id (fourth field splitting log line using whitespace as delimiter and trimming first and last char) I don't find any group with more than one osmand < or osmand > string.

I don't know if you speak python, but this is the (not so elegant) algorithm that I run to load the log file and perform the aforementioned checks. In particular inside the in_only_filter and in_out_filter functions I check the uniqueness of the request/response for each connection id.

from pathlib import Path
import pandas as pd

CWD_PATH = Path().absolute()
LOG_PATH = CWD_PATH / 'tracker-server.log.20240819'

assert LOG_PATH.exists(), f'Log file not found: {LOG_PATH}'

with open(LOG_PATH, 'r') as f:
    lines = f.readlines()

rows = []
for line in lines:
    date, time, level, slot, *rest = line.split()
    row = [date, time, level, slot, ' '.join(rest)]
    rows.append(row)

df = pd.DataFrame(rows, columns=['date', 'time', 'level', 'slot', 'rest'])

df['id'] = df['slot'].str[1:-1]

is_id = ((df['id'].str.len() == 9) & (df['id'].str.startswith('T')))
is_id_ratio = is_id.sum() / len(is_id)
print(is_id_ratio)

df['id'] = df['id'].astype('category')
no_id_df = df[~is_id]
df = df[is_id]
assert len(df) + len(no_id_df) == len(rows)

def in_only_filter(group):
    has_inbound_osmand = group['rest'].str.contains('osmand <').sum()
    has_outbound_osmand = group['rest'].str.contains('osmand >').sum()
    assert has_inbound_osmand <= 1
    assert has_outbound_osmand <= 1
    return (has_inbound_osmand > 0) and (has_outbound_osmand == 0)

def in_out_filter(group):
    has_inbound_osmand = group['rest'].str.contains('osmand <').sum()
    has_outbound_osmand = group['rest'].str.contains('osmand >').sum()
    assert has_inbound_osmand <= 1
    assert has_outbound_osmand <= 1
    return (has_inbound_osmand > 0) and (has_outbound_osmand > 0)

grouped = df.groupby('id')

in_only_df = grouped.filter(in_only_filter)
in_out_df = grouped.filter(in_out_filter)

assert len(in_only_df) + len(in_out_df) == len(df)

the script continues extracting the lat+lon+timestamp from the original HTTP GET parameter strings in logfile and convert them be sampled to query /api/positions on server and check point presence by fixTime and deviceId fields.

The script runs without rising any assert, but I'm not 100% sure if this checks "if the value is different or the same". Please tell me if I can tweak the code to improve the test. Thanks

Anton Tananaev5 months ago

Do you have a script that can replicate the problem on a clean server? Then we can investigate locally.

Jack5 months ago

Yes I can generate fake data and send via same function I use with real data.

I can also fire up a docker-compose without instantAcknowledgment enabled and check if I successfully replicate the lossy behaviour.

I'll test asap locally. Do you have any suggestion/request for setting up the backend for this kind of test?

Anton Tananaev5 months ago

I just need a script for sending fake data.

Jack5 months ago

I need to rewrite it then, as the original software is in Nim programming language and is closed.
I can:

  • compile the request simulator and share the binary
  • put the code to generate requests in a separate shareable Nim module
  • rewrite the whole thing in a Python script
  • generate points here and send the requests where you want

The second option requires nim compiler and some instructions.

The third option is not difficult to do, but I can't ensure the same behavior as it would be a separate program.

Anton Tananaev5 months ago

Python is probably a preferred option.