Unstable initial wifi connection - one client only

Discussion in 'Tomato Firmware' started by tievolu, Aug 29, 2018.

  1. tievolu

    tievolu Network Guru Member

    I'm not sure when this problem started, but over the last few weeks I've noticed that one client on my network (a Lenovo laptop) has trouble when connecting to a Tomato router wirelessly, but only for the first couple of minutes. None of the other clients exhibit this problem.

    Pings from the device for the last minute or so of the unstable period look like this:

    Code:
    Request timed out.
    Reply from 192.168.111.1: bytes=32 time=30ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=23ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1001ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=10ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=994ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=992ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=5ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=987ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=3ms TTL=64
    Request timed out.
    Reply from 192.168.111.1: bytes=32 time=27ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=13ms TTL=64
    Request timed out.
    Request timed out.
    Reply from 192.168.111.1: bytes=32 time=22ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=14ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1002ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=4ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=994ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=991ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=975ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=104ms TTL=64
    Request timed out.
    Reply from 192.168.111.1: bytes=32 time=18ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=17ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1001ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=4ms TTL=64
    Request timed out.
    Reply from 192.168.111.1: bytes=32 time=17ms TTL=64
    Request timed out.
    Request timed out.
    Reply from 192.168.111.1: bytes=32 time=5ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=3ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=2ms TTL=64
    Request timed out.
    Request timed out.
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=2ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=2ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    Reply from 192.168.111.1: bytes=32 time=1ms TTL=64
    You can see how the ping times stabilise at the end of the trace when the connection sorts itself out.

    During the same period the syslog shows repeated DHCPINFORM's and DHCPACK's:

    Code:
    Aug 29 09:34:55 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPDISCOVER(br0) XX:XX:XX:XX:XX:XX 
    Aug 29 09:34:55 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPOFFER(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX 
    Aug 29 09:34:59 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPREQUEST(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX 
    Aug 29 09:34:59 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPACK(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX thinkpad-wireless
    Aug 29 09:35:07 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPINFORM(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX 
    Aug 29 09:35:07 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPACK(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX thinkpad-wireless
    Aug 29 09:35:10 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPINFORM(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX 
    Aug 29 09:35:10 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPACK(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX thinkpad-wireless
    Aug 29 09:36:30 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPINFORM(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX 
    Aug 29 09:36:30 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPACK(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX thinkpad-wireless
    Aug 29 09:36:33 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPINFORM(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX 
    Aug 29 09:36:33 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPACK(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX thinkpad-wireless
    Aug 29 09:37:45 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPINFORM(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX 
    Aug 29 09:37:45 XXXXXXXX daemon.info dnsmasq-dhcp[6805]: DHCPACK(br0) 192.168.111.251 XX:XX:XX:XX:XX:XX thinkpad-wireless
    
    The wifi driver on the laptop is up to date, but I guess this still could be a driver problem. It's not a huge problem because it always sorts itself out after 2-3 minutes, but has anyone seen anything like this before?
     
  2. Sean B.

    Sean B. Network Guru Member

    I'd start by checking the advanced property options for the wireless card in the laptop. If Win10, right click on the wifi signal icon in the system tray, click open network & internet settings, click change adapter options, right click your wifi adapter then click properties, click configure, select the advanced tab. Turn off, or select lowest setting for any power saving and roaming options. Then select the power management tab and un-check the box for "allow the computer to turn this device off" if it's enabled. On the router side, in the GUI under Advanced->Wireless set the option "APSD mode" to disabled. See if any difference is noted.
     
  3. koitsu

    koitsu Network Guru Member

    Brain dump from me, in random order, no organisation:

    The DHCP requests you see look normal. There's literally nothing abnormal about the flow. Please read how the DHCP protocol works to understand its operation, what the messages mean and encapsulate. The repeated and aggressive INFORM (from the client) and ACK (response from the server) could be one of many things, including WPAD or who knows what. You would need to do packet captures on the router (this requires Entware and tcpdump), then look at them in Wireshark to have them decoded, to understand what's being requested. But in general, this is normal. You're worried because you're not familiar with the protocol or said log messages, and this is probably the first time you've looked at them -- while understandably hypersensitive because of the WiFi packet loss and latency issue. The important thing is that you aren't seeing a "full DHCP re-negotiation" every time (which can be indicative of connectivity loss/deeper issues at the OS level on the client).

    You didn't state what router model you're using, what firmware you're running (filename please), or what WiFi frequency you're using (2.4GHz or 5GHz) for that client. It all matters -- really!

    I have seen "fluctuating pings" and "common packet loss" from WiFi-attached clients before to Tomato routers, from all sorts of devices/vendors. There's nothing stable about WiFi in general.

    WiFi problems are notoriously hard to troubleshoot. Random ones that come to mind that are the most common:

    - It could be the router's WiFi (easy to rule out: if the problem is only specific to 1 device)
    - It could be the WiFi network in general (ex. 2.4GHz is saturated/trashed); this isn't just "how many other APs are around", tons of junk uses 2.4GHz which WiFi APs can't see -- microwaves, baby monitors, cordless phones, Bluetooth, USB 3.0, and pretty much every "random wireless gadget" you can think of. This may affect certain devices more than others (see below). Switching to 5GHz is usually the best choice I've seen and experienced, but this requires devices that support it
    - It could be the drivers on the client/device (you covered this)
    - It could be the antenna on the client/device (talk to the manufacturer/vendor, do an RMA, etc.)

    You may be able to work around a couple of these issues through use of very specific settings under Advanced -> Wireless in Tomato's GUI. Sean covered some, but others are Interference Mitigation, Bluetooth Mitigation, and WMM. You will need to experiment.

    I would also suggest, before fiddling with anything under Advanced -> Wireless, to try power-cycling the router. Yes really. A hard power-cycle can physically reset the underlying SoC (which contains the WiFi chip/radio). WiFi chips and their underlying drivers are notorious for getting "stuck/wedged" in situations of interference (here's an example I reported on FreeBSD, when trying to use a wireless NIC on FreeBSD as an AP (not client!); the driver maintainer, who contracted for Atheros at the time (maybe still does), actually explained what is happening). See if that relieves the problem temporarily. If so, it might be something on the Tomato side (the wireless driver on Tomato), which probably cannot be solved.

    Remember, increasing transmit power (on the router) does not guarantee a better connection -- it all depends on the radio/antenna/chip on the device too, which is transmitting as well (i.e. Tomato router transmits and receives, as does the device). Settings on the router only affect one of those two flows/directions.

    I've talked about several of the above things in the past, including what relieved some problems for me in my area/environment:

    http://www.linksysinfo.org/index.php?threads/r6250-shibby-wifi-sucks.71629/#post-269985
    http://www.linksysinfo.org/index.ph...-expected-shibby-132-build.73188/#post-284110
    http://www.linksysinfo.org/index.php?threads/tomato-toastmans-releases.36106/page-35#post-274506

    In general I've found Tomato's wireless to be flaky in heavy interference environments. Wireless APs (which do JUST the wireless capability, i.e. you can buy one of these, hook it up, and disable the wireless radios in Tomato) like the Ubiquiti UAP-AC-LITE perform and behave a lot better. It's another device that takes up another LAN port on your router and electrical outlet on your wall, but the wireless performance and stability is outstanding in comparison.
     
    Last edited: Aug 29, 2018
  4. tievolu

    tievolu Network Guru Member

    Thanks for taking the time to write all that!

    It's certainly not the first time I've ever looked at the syslog :) I've been messing about with router firmware (in a rather amateurish way) for over fifteen years. I certainly wouldn't claim to be an expert on DHCP though. The only reason I mentioned the repeated DHCP messages in the log is because the pattern is very different to all other clients, and it correlates exactly with the period of instability in the wireless connection (i.e. as soon as the connection is stable the repeated DHCPINFORM's and DHCPACK's stop). I'm thinking it's a symptom of an underlying problem that's also causing the instability in the connection for those couple of minutes, but perhaps it really is just a coincidence.

    I already have a dnsmasq config workaround in place to prevent DHCP spam from Windows machines. (Not sure if this involved WDAP? It seems similar at least.)

    Router: Asus RT-AC66U
    Firmware: Tomato Firmware 1.28.0000 MIPSR2-140 K26AC USB AIO-64K
    Wifi frequency: 5Ghz and 2.4Ghz. This client should connect at 5Ghz (considering hardware capability and proximity to the router).

    I actually have two AC66Us serving the same SSID on both 5Ghz and 2.4Ghz, one acting as an access point and the other as the gateway. The routers are connected via a wired network (actually a trunk cable because I also serve up a virtual guest wireless network on a separate VLAN).

    Everything works very well, with clients roaming seemlessly between the two routers and wifi bands (at least to the extent that nobody in the house ever notices it). The routers have uptimes of over four months, and the only client with any problem at all is this one Lenovo laptop. As I say, it isn't a serious issue - it just bugs me because everything else is working so well :)

    I don't think it's any of these issues, because the problem is specific to one device, and it only ever occurs for a very limited amount of time while initially connecting. The client works 100% perfectly for the rest of the day.

    Interference levels are generally very low - I'm lucky in that respect.

    I still think this is the most likely explanation. Lenovo's drivers for other hardware have caused me plenty of problems in the past.

    I don't think so because it only happens for a very limited amount of time when initially connecting to the network. The same hardware will work perfectly for hours once it has got over that initial couple of minutes marked by the repeated DHCP messages and erratic pings.

    I have played with those options in the past (for other issues) and they seemed to cause significantly more problems than they solved - especially the interference mitigitation options.

    I don't think interference is a problem, but turning the router off and on again is always worth a go :)

    Yep, fully aware of this. I don't modify the transmit power.
     
  5. tievolu

    tievolu Network Guru Member

    I've been living with this issue for months now, but I just got a new Lenovo laptop and the problem became even worse - this one couldn't connect at all...

    After trying out lots of random things, reducing the bandwidth of the 5Ghz channel on the router appears to have solved the problem. Interestingly though, I have two identical AC66U routers in my house, both running the same version of tomato, and the laptop can connect to one of them just fine with the maximum 80Mhz bandwidth, but it refuses to connect the other one when configured in exactly the same way (exhibiting the same symptoms as above, but without eventually resolving after a few minutes). So I've reduced the bandwidth to 40Mhz on that router, and everything's ok.

    Weird how one router has the problem and the other one doesn't though. I wonder if the "bad" router's hardware is somehow different or slightly faulty. As before though, all other wireless clients were/are fine with the 80Mhz setting on that router. All very odd.
     
  6. rs232

    rs232 Network Guru Member

    In my experience wifi instability is almost always caused by interferences. For 2.4GHz wireless (landline) telephones, blutooth devices, wireless TV headphones, other WiFi router using same channel all have an effect on WiFi. So repositioning devices often helps. Having said that what channel are you using on each devices? I would suggest using a WiFI analyser (there are plenty free for mobile android) to check if you have any neighbouring network overlapping too much with your channels (I suppose you already use two non overlapping channels for your own devices right?).

    A very final check, is to go through the end device wireless setting, occasionally you find parameters in there that you don't really agree with e.g. prefer 2.4GHz over 5GHz etc
     
    Last edited: Apr 7, 2019
  7. tievolu

    tievolu Network Guru Member

    Turns out I spoke too soon. The problem remains.

    I've "solved" it for now by adding the laptop's MAC to the wireless filter on the "bad" router, so that it only ever connects to the "good" one. The "good" router always works with no problems, even when it is twice as far away through a couple more walls compared to the "bad" one. I have also tried what feels like every combination of channels on the two routers, moving them around etc. Nothing makes any difference - the "good" one always works, the "bad" one doesn't. But only with this one client! All other clients can connect to both routers with no issues.

    There's no sign of any interference or competing APs nearby. I have also been through every setting on the client and disabled/enabled them individually before retesting each time. The only one that seemed to help was forcibly reducing the 5ghz bandwidth, which is what led to my changing that setting on the router. As I said, that seemed to help at first but then the problem returned when I tried to connect a few hours later.

    Perhaps the wireless hardware in the "bad" router is bad in some way. I'll find out soon because I've bought an R7000 (for other reasons) to replace the main router (the "good" AC66U), so I can then swap out the "bad" AC66U. If the hardware is the problem the issue will disappear. If it doesn't go away then it must be something to do with my config.
     
  8. rs232

    rs232 Network Guru Member

    Out of curiosity do you have the same Country set on both devices under Advanced/Wireless?

    As a very last resort I would clear the NVRAM or even re-flash and start from scratch on the troublesome device.
     
  9. tievolu

    tievolu Network Guru Member

    Yep, both set to EU. All the advanced wireless settings are exactly the same on both AC66Us.

    The only major difference between them is that the bad one is acting as an AP and the good one as the gateway. Wireless channels are also different, obviously, but all other wireless settings are exactly the same on both devices. All clients happily roam seamlessly between the two except for this one laptop which won't connect to the "bad" one.
     
  10. rs232

    rs232 Network Guru Member

    The last sentence is the most significant to me. Are you in the position where you can try replacing the WiFi card on the laptop? It might just have issues with roaming in general.
     
  11. tievolu

    tievolu Network Guru Member

    No, it's a work laptop so I can't mess with its insides.

    My previous work laptop (Lenovo P50) had the original problem described in this thread, where it would connect to the "bad" AP eventually after two or three minutes of instability. However my new work laptop (Lenovo P52) won't connect at all. Both use similar, but different, Intel wifi chipsets.
     
  12. tievolu

    tievolu Network Guru Member

    I have now completed the swap over. The main "good" AC66U has been replaced with an R7000 (very nice it is too), and the "bad" AC66U has been replaced with the "good" AC66U. Both the functioning devices have been set up from scratch with the latest Fresh Tomato release, and now everything is working fine. The laptop connects immediately with no instability, on either access point.

    So I guess the problem was either bad hardware or some hokey config on the "bad" AC66U. We will never know....
     
    rs232 likes this.
  13. rs232

    rs232 Network Guru Member

    Good approach! yes HW issue occasionally is the actual root cause.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice