1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Traceroute problems

Discussion in 'Tomato Firmware' started by Kevin Darbyshire-Bryant, Feb 21, 2013.

  1. Kevin Darbyshire-Bryant

    Kevin Darbyshire-Bryant Networkin' Nut Member

    Hi All,

    Here's an 'interesting' one for you. Traceroutes to any destination that have non-responding hops in their path don't complete....the command just hangs waiting for a response from the 'dead' hop.

    Code:
    root@Router:/tmp/home/root# traceroute  212.58.241.131
    traceroute to 212.58.241.131 (212.58.241.131), 30 hops max, 38 byte packets
    1  5e0574be.bb.sky.com (94.5.116.190)  23.315 ms  23.048 ms  24.040 ms
    2  10.245.139.241 (10.245.139.241)  22.127 ms  22.110 ms  22.569 ms
    3  te0-1-0-7.er11.thlon.ov.easynet.net (89.200.131.3)  29.166 ms  30.370 ms  31.482 ms
    4  ntl-ge2-9.prt0.rbsov.bbc.co.uk (212.58.238.189)  27.345 ms  26.346 ms  26.258 ms
    5
    
    Now, a traceroute -v showed that it was receiving ping replies from 'itself to itself' which is a bit odd. Then I remembered that I've an external device which pings my WAN address from outside every second....I turned this off and now traceroute completes - the non responding hops (5,6 & 7) now get '*' marks and time out. Re-enabling the external ping box stops tomato's traceroute from working.

    Behaviour found on Shibby Tomato 105 AIO, don't know if it's always been there.

    Any ideas?
     
  2. koitsu

    koitsu Network Guru Member

    The output you show looks to be DNS-related. However, the description you give in your paragraph after the traceroute has to do with ICMP limiting rules in the Linux kernel. Add this to Scripts -> Init to relieve yourself of that issue:

    Code:
    #
    # Set icmp_ratelimit to 0 (no rate limiting) to ensure that both egress
    # and ingress mtr/traceroutes aren't affected.
    #
    # NOTE: IPv4 and IPv6 tunable pathnames are different!  This is not a typo!
    #
    echo 0 > /proc/sys/net/ipv4/icmp_ratelimit
    echo 0 > /proc/sys/net/ipv6/icmp/ratelimit
    
     
  3. Kevin Darbyshire-Bryant

    Kevin Darbyshire-Bryant Networkin' Nut Member

    Tried that, no change.

    I don't think it's a case of rate limiting or packets being dropped. It's as if the incoming pings (1 per second) are being heard by the traceroute command, even though the traceroute didn't initiate the echo requests from the monitoring box host, and hence traceroute isn't hitting the 'no reply' timeout.
     
  4. koitsu

    koitsu Network Guru Member

    Did you simply paste what I told you into Scripts -> Init, click Save, and expect the commands to be executed (without a reboot)? If so, that doesn't happen. You'll need to go into the router via CLI and issue the echo commands by hand for them to take effect immediately (alternately you can reboot).

    Do not bother with -v for Busybox traceroute -- it's broken and reports stupid nonsensical shit like:

    Code:
    44 bytes from 206192.168.1.1 to 192.168.1.1: icmp type 8 (Echo) code 0
    
    Just utter nonsense, certainly a bug. Ignore it; Busybox is such a pile of junk, sigh.
     
  5. Kevin Darbyshire-Bryant

    Kevin Darbyshire-Bryant Networkin' Nut Member

    No, I'm not quite that stupid :)

    As for traceroute:
    Code:
    root@Router:/tmp/home/root# traceroute -v -n bbc.co.uk
    traceroute to bbc.co.uk (212.58.241.131), 30 hops max, 38 byte packets
    1  94.5.116.190 46 bytes to (null)  25.515 ms  23.290 ms  27.400 ms
    2  10.245.139.241 36 bytes to (null)  22.617 ms  22.480 ms  22.597 ms
    3  89.200.131.3 76 bytes to (null)  28.780 ms  30.427 ms  31.192 ms
    4  212.58.238.189 36 bytes to (null)  26.438 ms  26.747 ms  27.037 ms
    5
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
    
    There are 1 second gaps in time between each '8 bytes from 192....' These appear to be something to do with the pings coming from the external monitoring ping box.

    I then re-ran it & disabled the external ping box during the command:

    Code:
    root@Router:/tmp/home/root# traceroute -v -n bbc.co.uk
    traceroute to bbc.co.uk (212.58.241.131), 30 hops max, 38 byte packets
    1  94.5.116.190 46 bytes to (null)  41.307 ms  41.776 ms  23.621 ms
    2  10.245.139.241 36 bytes to (null)  38.561 ms  36.571 ms  37.601 ms
    3  89.200.131.3 76 bytes to (null)  47.922 ms  47.863 ms  47.457 ms
    4  212.58.238.189 36 bytes to (null)  44.951 ms  47.899 ms
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
      44.984 ms
    5
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
     
    8 bytes from 192.168.235.1 to 192.168.235.1: icmp type 8 (Echo) code 0
    4: x26000045
      *  *  *
    6  132.185.254.46 36 bytes to (null)  38.189 ms  37.775 ms  40.012 ms
    7  132.185.255.60 36 bytes to (null)  41.073 ms  50.758 ms  47.199 ms
    8  212.58.241.131 36 bytes to (null)  42.826 ms  43.379 ms  44.037 ms
    root@Router:/tmp/home/root#
    
    After disabling, hop 5 times out and the rest of the traceroute runs to completion.
     
  6. koitsu

    koitsu Network Guru Member

    I still can't reproduce this behaviour, but I wouldn't be surprised if this turned out to be a bug in Busybox traceroute (really -- ICMP hand-off between the kernel and userland is a little tricky, it's not quite the same as TCP and UDP).

    Code:
    root@gw:/tmp/home/root# traceroute -w 1 8.8.8.1
    traceroute to 8.8.8.1 (8.8.8.1), 30 hops max, 38 byte packets
    1  c-67-180-84-1.hsd1.ca.comcast.net (67.180.84.1)  12.089 ms  18.094 ms  20.026 ms
    2  te-0-0-0-12-ur05.santaclara.ca.sfba.comcast.net (68.85.191.249)  10.645 ms  9.911 ms  10.823 ms
    3  te-1-1-0-5-ar01.sfsutro.ca.sfba.comcast.net (68.86.143.94)  22.968 ms  te-1-1-0-4-ar01.sfsutro.ca.sfba.comcast.net (68.85.155.66)  22.303 ms  te-1-1-0-3-ar01.sfsutro.ca.sfba.comcast.net (68.85.155.62)  20.316 ms
    4  he-1-5-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.90.93)  20.655 ms  22.301 ms  24.145 ms
    5  pos-0-2-0-0-pe01.529bryant.ca.ibone.comcast.net (68.86.87.6)  14.717 ms  14.970 ms  12.898 ms
    6  *  *  *
    7  *  *  *
    8  *
    
    I used -w 1 to increase the ICMP time-exceeded timeout (in traceroute itself, obviously) to 1 second.

    The timeouts at 6/7/8 (and all subsequent hops) are normal.

    At the same time this is running, I have a hosted VPS box that is using ping hit my WAN IP to check for connectivity issues + log the results (so what you're going to see in a packet capture is predominantly ICMP ECHO/ECHO REPLY). That VPS box is 206.125.172.42.

    Here's a packet capture which was running at the same time (started before the traceroute). I can't show it in a code block due to "number of characters exceeded" from the forum, so it's an attachment (DOS CR format).

    You can see quite clearly in the capture that the ICMP ECHO and ICMP ECHO-REPLY packets are going back and forth between my router and 206.125.172.42 at the same time the traceroute is running. You can see the traceroute running because of all the ICMP TIME EXCEEDED messages. The source IPs vary because that's how traceroute works (read up on how traceroute works, re: incrementing TTL).

    And again: please stop using traceroute -v on Busybox, it's broken. :)
     

    Attached Files:

  7. Kevin Darbyshire-Bryant

    Kevin Darbyshire-Bryant Networkin' Nut Member

    Thanks for trying to replicate this & the capture which makes perfect sense. I'll try some different firmware versions to see if it's 'always' been like this, or maybe something to do with WNR3500lv2......tomorrow.
     
  8. koitsu

    koitsu Network Guru Member

    I just found the following commit in Busybox:

    http://git.busybox.net/busybox/commit/?id=a348b4557d3d0af411135c23448a2c5a7cd82982

    This looks to be what you're experiencing, and using -w 1 may actually work around the problem. This commit was done May 2011.

    This commit is applied to only the 1.19.x and newer Busybox branches:

    http://git.busybox.net/busybox/log/?h=1_19_stable&qt=grep&q=traceroute

    TomatoUSB uses Busybox 1.18.x (run the busybox command and look at the first line); the committer never backported the commit to older branches:

    http://git.busybox.net/busybox/log/?h=1_18_stable&qt=grep&q=traceroute

    So, TomatoUSB therefore does not have this fix. If Busybox was upgraded to 1.19.x or newer, it would.

    Like I said, Busybox is such a piece of junk. So many bugs that are utterly catastrophic on so many levels. It should amaze people that there are commercial embedded devices (like cable modems) that use Busybox.

    You can contact the committer yourself and ask him why he didn't backport that fix. I'm sure his response will be "you shouldn't be running that old of a Busybox anyway".
     
  9. koitsu

    koitsu Network Guru Member

    I should also note that the diff/fix could be backported by shibby20 or Toastman, but I dunno what the TomatoUSB policy is on manual patches/backports for Busybox. It may, overall, just be better to upgrade to 1.19.x, but given how important/key Busybox is to the firmware, that may be a bigger undertaking than simply backporting the patch.
     
  10. Kevin Darbyshire-Bryant

    Kevin Darbyshire-Bryant Networkin' Nut Member

    Well Busybox is updated in Shibby's 106 release to 1.2? (I suspect code shared with the RT-Merlin project) - I hope Shibby updates the git repo soon as I custom build with some dnsmasq fixes (latest version as of 2 days ago 2.66test16 but on-going) - as it currently stands I've not tested the 106 release. I've handed shibby a copy of dnsmasq2.66test10(ish) which he's said will probably make it to 107, meanwhile I'm keeping an eye on dnsmasq and seem to be maintaining the Tomato additions, when they get to a 2.66 release I'll make sure shibby gets that & pause a bit.

    I personally would like to see radvd gone from tomato and replaced by the ipv6 RA & DHCPv6 with DNS integration of dnsmasq - it'll do all of what radvd does & more. I (now) have a system at home which is happily handing out addresses to iphones/androids & windows boxes, both ipv4 & ipv6 and it's even maintaining the DNS lookups forward & reverse.....try doing that with radvd :)

    Although I'm really impressed & respectful of Jonathan Z's work (and those of other maintainers) there's a decided lack of documentation as to how any of this stuff really works, and a lack of flags saying 'this release of xyz has some custom tomato code buried in it' which makes maintaining harder than it perhaps should be. Mind you, what do I really know... I can't even program in C.
     
  11. koitsu

    koitsu Network Guru Member

    I just examined the tomato-shibby-RT-N branch via git (repo.or.cz). The included busybox is 1.18.5. So unless Shibby has pending git updates he hasn't pushed out yet, I wouldn't expect any change.

    If what you say is true -- if you can get your hands on the /bin/busybox binary from 106, then rename it to traceroute and run it (it's dynamically linked so you'd better hope libcrypt, libm, libgcc_s, libc, and ld-uClibc haven't changed between 105 and 106! -- stick it in /tmp or something), you can test it in advance.

    From my review of the (buggy) Busybox traceroute code, you should be able to use -w 1 as a workaround. Their commit message describing the problem is accurate but not verbose enough -- the problem is that while the wait interval code is running (controlled indirectly via -w), if a large amount of ICMP traffic is seen (of any type) during that interval, it can cause traceroute to lock up (I believe in an infinite loop). Setting -w 1 causes the internal variable called waittime to be assigned to 1, which is later used in the safe_poll() call multiplied by 1000 (milliseconds); the default value of waittime is 5. So by decreasing the window of time, effectively you decrease the window of opportunity where "too many" ICMP packets can arrive and lock up the program. Like I said: ICMP is handled very, very differently between kernel and userland than things that use classic layer 4 sockets (UDP, TCP, etc.) because of where ICMP sits in the OSI layer (layer 3). Effectively userland programs can "see" all ICMP traffic received by the kernel (see first line, 2nd paragraph).
     
  12. koitsu

    koitsu Network Guru Member

Share This Page