Escalating RX latency

There is no cross-platform way to set ToS/DSCP, so this is not a trivial addition…

As we use QT for all network comms, it would really need to be implemented there. There is a request to add this in the QT bug tracker, but as it has been there since 2009, I wouldn’t hold my breath! Loading...

Ahh I see, Phil. Available only on the superior platform of linux. hehe. Well, what can I say?

I wonder why they have not extended this to other platforms? Maybe it is not as exposed?

–E
de W6EL

As far as I know, it’s available on all platforms, but the implementation of each is (slightly) different I think.

This option is not supported on Windows. This maps to the IP_TOS socket option. For possible values, see table below.

So maybe at least macOS and Linux.

now, to use tc or something, requires root access. I thing we should stay away from this.

Now, as Phil mentioned, UDP is a harsh environment. and most COTS routers don’t prioritize well at all.
I personally don’t have any of the issues using the server side of the irgs at all even when wifi, over the internet using my phone or even in a hotel thousands of km’s away.

I will rebuild and recofig the stuff for the 7300 (e.g. make my linux env the server) and check if that breaks.

I’m using just cat5; good enough for the cabling.

The results from this weekend’s checking, with some notes i made when i passed by… The Log from entire client session https://termbin.com/kwl0

This session was on LAN at the site, so no Internet in the mix. JTDX doing some FT8, or monitoring FT8, VAC, Windows 10 64bit.

Client settings: Qt, LPCM 1ch

19:40 Initially, no problems, < 1 second lag for PTT. All Fine.
20:20 - tested, now has 2 sec lag
20:41 - same, 1 keyup totally ignored
21:11 - saw a log entry about missing packets, tested again, now is a 3sec lag - Smoking Gun??
21:16 - still 3 sec ptt lag
22:13 - 4 sec lag
00:00 - 7 sec lag

…changed to RT, Opus 1ch, audio breaks up on tx. Not workable.

00:03 - changed back to Qt in client, but Opus 1ch, good audio, <1sec PTT lag
00:18 - 1 sec lag?
Noticed jailbars on waterfall in jtdx, get pronounced if i disconnect from radio - VAC freaking out?
00:25 - restarted VAC, toggled radio connection a few times, waterfall clean again, lag <1 sec again, all is good. [op sleep]
09:49 - initially some erratic behavior, first ptt took 13 secs, then 4, then 2 with jumping levels and sticking meter. VAC shows unerflows of 100k+, overflows of 700k+ - settles on either no response whatsoever, or 2 sec lag. meters show random levels. But, after toggle of connection, lag restored to < 1 sec, correct levels etc. etc.
11:07 - 2 sec lag again, or no response at all.
11:08 - reconnect, lag gone, everything fine, log terminated, client shutdown.

So, some moving of the goalposts makes no difference. It’s initially great, then after 20-40 minutes a 2 second PTT lag for a long period, getting as bad as 7-14 seconds over 8 hours.

I got the impression (~21:11) Lag was being increased by packet loss.

Also seems VAC does not appreciate changing audio settings in WFview, as it can go nuts (~00:25) - this is a sneaky one, as you will still see signals on your waterfall, but with Jailbars, and far fewer decodes in JTDX. You might think it’s local noise or something. But, If You then disconnect WFview, those Jailbars become very clear on a dark background in JTDX. Restarting the Audio Service in VAC restores good audio.

It struck me (just now) that all 3 things, WFview, VAC and JTDX may/should have been restarted after every change to the audio system. But, last night i really only fiddled once, and had to restart the lot anyway then.

You still haven’t said what the specification of your server/client machines or network are, that log indicates that one of them isn’t up to the job, as I never see any packet loss.

OK, Clients have been:

i5 Lenovo X201 with 8gb ram, 64bit W10 2xxx
i7 Alienware (dell) R14, with 8gb ram, 64bit W10 2xxx
Asus Prime 520 Series/Ryzen 5 5600G, with 16gb ram, 64bit W10 2xxx

The Lenovo has a lame GPU, the other 2 do not.

Last rev Rpi4 running standalone as server.

Resource Monitor on the Windows machines show no spikes, apart from the usual bump on DECODE in JTDX.

The Pi shows all cores in the green, core temps normal, plenty of spare ram, no spikes. Additional Temp sensor on a GPIO shows the surface of the billet-block cooler are good.

The network is CAT6 between Netgear GS109 switches.

If there was a shortfall in resources, would it not be a chronic issue?

If You are not seeing any packet loss, could You share the details of Your system?

The problem is almost certainly at the RPi end, there are a number of UDP related issues that have been reported with the Pi4, like this one: https://forums.raspberrypi.com/viewtopic.php?t=264106

Also, try running

ps -T -p < pid >

On the server (with the wfview pid) as that will show each threads CPU usage. You may find that the udpaudio or audioconverter threads are actually maxing-out one of the CPU cores, which could explain why you see audio break-up with Opus, as it is a lot more CPU intensive.

Phil

There is also the standalone wfserver, which uses far less resources (and doesn’t require a GUI) but it must be manually configured at the moment Headless Server | wfview

As the author of the wfview networking code, I have literally every supported platform (and radio), but I only use managed switches with them.

None of my hardware is particularly special, although I do remember spending quite a bit of time resolving issues with my Pi4 (I can’t remember what I did though, just various tuning found from web searches and disabling wifi which I do remember caused quite a problem)

I am running the headless server, which requires some acrobatics to get it to run on boot, covered here: Headless Server 100% 1 CPU on Rpi4 (fixed) - #4 by ei4gnb

I have been keeping an eye on CPU usage and resources on the server end, and it does appear that there are no issues there. Even under heavy load, the Pi4 claims it’s coping, but if that (4 year old) thread’s issues have been blind-eye’d, as is likely, due to a crap chipset they are stuck with, then We have an answer: “…the Pi sucks a bit at networking, what do You expect for 30 quid?”

I have no attachment to the Pi4, i’ll swap it out for a Ryzen based box with a grown-up chipset, see how that goes. Would have been nice to have the Pi working though, given it’s super-low entry point position.

It certainly should work, although I haven’t tried the latest Pi OS, so that might be causing issues. I will boot up a Pi4 in the next couple of days and upgrade it…

Phil

Well, it does work, just seems to have a leak somewhere. I’m still highly suspicious of VAC in all of this. Many audio codecs (hardware and software) do some pretty funky tricks to cover up dropouts and rate-shifts these days. What i saw on the waterfall when VAC took a fit on sunday was a lightbulb moment TBH, i had seen that issue before and not paid it attention. Upset VAC and it throws a wobbler.

Anyway, moving on…

My dad regularly runs wfview as a server (using the GUI version) on his $35 Inovato Quadra SBC. It’s like a Pi but a little slower. No heat sink as well.

He has no issues using this with his off the shelf Netgear router at home to hotel rooms and hot spots with his Mac laptop.

It’s been working for years without any trouble. The radio being served is the 7300, and he actually runs it remotely with his Alpha 87A.

I don’t deny that wfview could handle whatever is going on better, but there is something unique in your setup that is getting it there.

Wish you’d share us a log file during the incident. That would help a lot.

https://wfview.org/wfview-user-manual/how-to-send-a-logfile/

—E
de W6EL

OK, well that’s encouraging, if it’s not only solid for someone else on an (allegedly) lesser board and via hotel connections etc.

Logfile is linked above, along with some notes…

I’m not unique in this ‘escalating lag’ thing, but i only picked up on it properly this week - i’m not the op of this thread, or the one this is linked from/in, just put 2 & 2 together after reading both is all.

My case may be unique in that I am focused on running datamodes, with VAC in the workflow - I did not see anyone else specify this, or if they did, i did not take it in. Running Data modes is the goal for me.