1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Packet loss to router or internet, but not other machines

Discussion in 'Cisco/Linksys Wireless Routers' started by AdamWill, Feb 21, 2006.

  1. AdamWill

    AdamWill Network Guru Member

    Hi, everyone, great forum. I have a problem with my WRT54G (v3) that I've been trying to track down for months now without success.

    Well, the problem is probably not even the router, but I'm posting here since there's obviously a lot of smart cookies around :). Here's my situation.

    My router has two PCs connected to the wired LAN ports and two connected via wireless. One of the wired PCs runs Windows XP, the other PCs all run Mandriva Linux. One of the Mandriva machines runs a web and mail server, the rest are just used for regular internet-surfing duty. I also have a D-Link DVG-1120 VOIP router that came with my Primus TalkBroadband service, which is sometimes connected via a wired LAN port: in case it was causing the problem, I disconnected it entirely last night to test, but the problem persisted.

    When the D-Link is connected, I use the DHCP server on the router; when it's not connected, I turn it off (all the other machines are configured with static IPs). The router is, the PCs are,, and The router is in wireless-G only mode and uses WPA-PSK encryption. I have a few ports forwarded, but am using no other advanced functionality.

    I had this problem with the previous router (an SMC), the WRT54G running the latest stock firmware, and also running dd-wrt latest stable version (v23), which I installed last night to see if it would help.

    The actual _problem_ is thus. Some time after turning on - could be hours, could be as long as a couple of days - we notice internet browsing become extremely sluggish. If, at this point, one attempts to ping any internet site or the router itself, one will experience packet loss around 50% or more. _However_, pinging any other machine on the network results in 0% packet loss, and indeed doing any actual network operation to other machines on the network (ssh, scp, whatever) is perfectly fast - it's only talking to the router itself or the internet that is affected. The symptoms were more or less identical with both routers. _However_, the problem did _not_ always happen with the previous router - the network was fine for several months then suddenly the problem started. I've tried to think what may have changed around the same time as the problem started, but I just can't come up with anything; my last guess was the VOIP box, but obviously, that wasn't it.

    Does anyone have any guesses at all as to what the problem might be? It's driving me insane having to power cycle the router then restart the network on both wireless machines twice a day. Thanks!
  2. t4thfavor

    t4thfavor Network Guru Member

    try setting the max sockets and max connections to the maximum setting. also look into the process for using bittirrens it will help with the packet loss.

    also there are some commands in the ssh terminal that you can use to drop old connections that are no longer in use automagically.

    but i do not know what they are.

    this is what is most likely causeing your problems.
  3. vincentfox

    vincentfox Network Guru Member

    BEFORE just randomly trying some things, let's be semi-scientific. Try to identify what the actual problem is.

    The most popular problem of the week is connection table issues. So ssh into your router and watch the size of the table with:

    cat /proc/net/ip_conntrack | wc -l

    If the size of this table goes up correspondingly with the slowdown, you have found the likely culprit.

    HOWEVER, I think it unwise to just increase it without knowing what is causing it to be large. You may increase it, only to find that it's filling at a later time.

    Now, turn off all PC's. Turn one at a time on until you have identified which PC is causing the problem.

    You may also find it useful to turn on syslog and watch it with
    logread -f
  4. AdamWill

    AdamWill Network Guru Member

    Thanks for the input, guys. Vincent, I'll try your suggestion next time the problem occurs. I also intend to try the isolate-the-problematic-PC trick, but it's a bit difficult to do practically speaking - the machines are all in fairly constant use and it's hard to down them all without causing a bit of inconvenience. I'll try and work it out, though.

    One interesting tidbit: the problem seemed to be happening _very_ quickly last night (much less time than usual from router reset -> problem occurring). Thinking about it, I currently have a 5GB bittorrent download going, in a slightly unusual setup: one of the wirelessly-connected machines is my HTPC, where all my videos etc are stored. Its /data directory is NFS-mounted on my desktop machine, and when I want to download some video or something, I run Azureus on my desktop and download the file to the NFS-mounted /data, the upshot being the data is effectively piped from the router to my desktop then to my HTPC. Now, every time I reset the router and didn't manage to reset the network on the HTPC in time, the torrent download would die, and I'd have to restart it - and every time I did this, azureus would insist on verifying the entire download (which takes a bloody long time, for 5GB). So there was probably a massive load of traffic going from the HTPC to the desktop at this point (as azureus was accessing the file remotely to integrity-check it). Maybe this could indicate that the action that triggers the problem is data transfer between the wired-connected desktop and the wirelessly-connected HTPC machine? This morning, I haven't had to restart the torrent at all, and indeed the problem hasn't manifested itself yet...
  5. vincentfox

    vincentfox Network Guru Member

    Azureus is prone to filling up connection tables.

    In other context you would call this a Denial of Service attack.

    Personally I encourage people to set limits in the BT client so it's a better network neighbor. My own desktop is set to:
    global connections maximum : 130
    download: 70% of tested down
    upload: 50% of tested up

    The router is set for 1200 sec on the timeout values. I did increase the size of the connection tables.

    This seems to alleviate problems for me very well. It also guarantees that some of the pipe is left over for other users. Do I care if my download finishes in 2 hours or 1:45? Not really.

    I throw out my solution in case it turns out to be of use. I still recommend first finding out what the actual problem is. Fortunately the non-Linksys firmware gives you many tools for diagnostics.

    As to the possibility of a wired-WiFi problem under stress that is possible. Let us know how it tests out.
  6. AdamWill

    AdamWill Network Guru Member

    Thanks again Vincent. I'm considering re-arranging my torrent setup in light of this experience and your comments (there's a Linux client which can act as a kind of bittorrent daemon - it runs constantly and automatically starts downloading any torrent file you put in a directory it monitors - so I might try using that on the HTPC, and just dumping the torrent files to the HTPC from the desktop machine). I looked at the number you suggested:

    ~ # cat /proc/net/ip_conntrack | wc -l

    is that a high number, or normal?

    edit: forgot to mention, I already throttle my upload (even if I didn't, Shaw would do it for me, sigh) to 10kb/sec. I don't throttle my download, but it rarely hits max anyway.
  7. vincentfox

    vincentfox Network Guru Member

    Well if your current table limit is 512, then are already quite close to it.

    If you start to exceed it, you will see entries in the syslog about connection attempts being denied. Useful to know!

    If you get it worked out, please post back. We have many threads where they never finish saying what happened.
  8. AdamWill

    AdamWill Network Guru Member

    Is the limit the setting referred to in the web admin interface as "Maximum Ports" (under IP filter settings, on the Administration page) or is it something else? If something else, how do I configure / check it? Thanks :)
  9. vincentfox

    vincentfox Network Guru Member

    Yes, that is the setting. It is probably currently 512. Try increasing it to say 2048 and decreasing timeouts to perhaps 1800 or 1200 and see if that helps you. Then report back.
  10. AdamWill

    AdamWill Network Guru Member

    Thanks a lot, Vincent. Since the count seems to be hovering around 470 I wouldn't be at all surprised if that were the problem (and it would also actually make sense of the 'happened with two routers' thing, since it wouldn't be surprising if the SMC router had the same limit). I will increase the limit and decrease the timeout, then see if the problem reoccurs - if it doesn't come up for a while I'll post back here and confirm that it was the fix.
  11. AdamWill

    AdamWill Network Guru Member

    I'd like to thank Vincent once more and say I'm 99% sure he hit the problem right on the head: I haven't had the problem since making the changes he recommended, but I had a couple of torrents going overnight, and when I checked my tables this morning, there were over 512 (about 600). I think between the lower timeout, the azureus settings and the higher limit I shouldn't have a problem in future, but if I do, at least I know what I need to fix! Thanks again for the perfect advice, Vincent, and anyone else who's having trouble with mysterious slow performance and packet loss - check this issue.
  12. vincentfox

    vincentfox Network Guru Member

    Sure thing dude!

    Mounting soapbox for general readers.....

    1) Please pressure the P2P-software community about making their P2P apps be "good neighbors" in a network sense.

    2) Please pressure the router makers to sell routers with more resources. I bought a WRT54GS v1.0 with 32 megs RAM a couple of years ago. The latest v5 version has what 8 megs? WTF? It's totally bizarre to me that my keychain-drive has gotten 4 times bigger in last years for same price, and routers 4 times smaller. But people will buy blue boxes without asking what's inside, so these low-end router makers will sell the absolute minimum hardware they think can get away with. I would love to see boxes clearly say "32 megs RAM!" or "P2P-friendly edition!" or something like that. Latest generation home routers seem only suitable for light usage.

    It sure would cut down on a lot of "fixes" at this level.

Share This Page