1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to debug spontaneous reboot?

Discussion in 'Tomato Firmware' started by chrcoluk, Sep 1, 2013.

  1. chrcoluk

    chrcoluk Reformed Router Member

    Router rebooted approx 15 hours ago, I noticed when I seen my custom ppp settings were lost (have to be manually applied as ginterface doesnt supposrt 1500byte pppoe mtu).

    Of course it seems logs write to ram or something as they only go back to the start of the boot process.

    So I need help.

    I am using shibby 112 AIO on the RT-N16.

    No torrents or anything else that uses a lot of connections, connection is mostly idle or web browsing with occasional downloading.
    No QoS
    Only a handful of port forwards.
    Time-waits I reduced to 30 sec timeout, I think rest of connection settings left same. # of connections seems always under 100.
    cifs in use to store traffic stats.
    no vpn in use.
    no usb in use.

    Basically I want to be able to store the log so it survives reboot, maybe using cifs path? if not whats best way?
    Ideas of likely cause of reboot?

    Also ideas of how to run commands before wan is brought up automatically (setting mtu size for eth0 and vlan2). Has to be before wan is up, if run when ppp0 is up then ppp0 restarts.

    I was awake when it rebooted, very likely was using the net and obviously didnt notice it rebooted at the time. What I cant be sure of is if I had a wireless device using the net at the time, I downloaded a few apps to my phone but cant remember the time I did it. But connecitvity wasnt interrupted whilst using the phone.

    I am using CTF, although I suspect its been auto disabled as I have working traffic graphs.
     
    Last edited by a moderator: Sep 1, 2013
  2. koitsu

    koitsu Network Guru Member

    The most likely causes of reboots of this nature are either a) hardware malfunction or b) kernel panics.

    There is no logging capability to "debug" either of these without serial console.

    You're welcome to implement logging via syslog to a remote system, store logs on a CIFS share or a USB-backed filesystem, but it won't help in either of the above cases. All you'll see in the logs is indication that yes, the system did restart somehow -- you won't know the root cause because with (a) the hardware is flaking out, and with (b) the kernel crashes thus logging code (to a file, etc.) cannot be executed safely by the kernel.

    And even if you did have serial console, the existing versions of Tomato do not include all the necessary debugging symbols to provide a good stack trace ("call trace") to determine what/where the offending code is, as proven here:

    http://www.linksysinfo.org/index.ph...g-ipv6-shibbby-110-vpn-en-asus-rt-n66u.68750/

    So in short, there is no easy, or even "moderate" way to debug this kind of problem. You can rule out hardware issues, however, by doing an RMA with Asus and get a replacement device. If the replacement has the same issue, then you know it's likely software/a kernel panic. Your only choice then is to start "messing around" with features/etc. to see if you can figure out which ones relieve the issue, or try different firmware builds (from other authors) or older versions.

    I cannot help past this point.
     
  3. RMerlin

    RMerlin Network Guru Member

    The only way to truly debug such issues is by connecting a serial cable to your router, so you can see the error panic message that caused the reboot.
     
  4. chrcoluk

    chrcoluk Reformed Router Member

    ok thanks guys, for now I have done a couple of things.

    reduced the other time out values.
    reduced the max connection limit from 8192 (which toastman I seen has said is too high)
    reduced the hashtable to 1/8th of connection limit.

    Plus I noticed when I browsed using firefox there was a surge of udp unreplied connections eg. loading a youtube video made approx 80 udp unreplied connections, I traced this down to firefox's dns prefetch feature and the amount of connections seems to match how many links are on the page, I think firefox is buggy as not only are all the links the same domain name (youtube.com) but that domain should already be cached in the browser yet it would appear 80 seperate dns lookups were been sent externally to the router for the same domain name. I have disabled dns prefetch now in firefox.

    I am considering changing to a different built of tomato, I want one that has a vpn client (although I dont currently use it) but I am happy to ditch the bittorent client and a few other features). But still want one that has the ppp client. It seems with shibby if you want one you have to have both.

    in terms of kernel debugging, on linux/bsd and even windows usually its possible to have a kernel dump its data on a panic, is it possible (at least in a future version of tomato) that it could be made so a usb stick mounts a swap partition and if a kernel panic occurs the contents is dumped to the swap partition. Also in terms of logging the reason I wanted it (whilst I agree in a kernel panic situation logging becomes unreliable or non existant) is that the logging prior to a panic might give clues as to trigger of a panic.

    Hardware failure seems a bit weak, it was ok prior to switching back to shibby, I would like to try and prove otherwise first its not software related. The router is so new its still in warranty, and there is months left, so I will try a few things and if still getting reboots I will exchange it, but since the issue hasnt occured on stock/merlin firmwares, its possible my warranty swap may get rejected.

    Also I have moved logging to jffs now (didnt use cifs as pc might reboot ocassionally etc.).
     
    Last edited: Sep 1, 2013
  5. Planiwa

    Planiwa LI Guru Member

    Why do routers keep crashing? Could it be because of bad programming?

    If the programmers were to regard every unintentional reboot of a router as a significant failure on the part of the programmer, this might influence their programming style.

    Recently I installed a new "stable" FW. Unfortunately it keeps crashing!
    It seems to be related to "Bandwidth Limiter".
    Even "saving" the Bandwidth Limit settings caused a router crash.

    It may also be related to the NGINX web server:

    Code:
    Sep  3 05:49:22 ROUTER daemon.warn miniupnpd[5170]: HTTP Connection from 192.168.1.98 closed unexpectedly
    That's what happens when an OSX host on the LAN goes to sleep. Then router stops responding when it comes back. Then it reboots. But not always.

    Some things to do:

    1. Let the computer monitor the router and sound an alert when router becomes unresponsive.
    2. Find events that cause router crash -- starting a video stream or closing Mail, or disconnecting WAN cable, etc.
    3. Make Administration>Scripts to record significant events.
    4. Save important log records on a USB data stick.

    When a router keeps rebooting, as a start it may be useful to know when it does that. Then we can see what happened just before.

    Of course it would be amazing if there was an option to make a syslog entry every time you "Save" settings on a Tomato GUI page. But clearly no one has considered it important so far.

    Here is part of the persistent log:

    Code:
    Tue Sep  3 05:11:45 EDT 2013
    Sep  3 00:32:14 ROUTER daemon.notice pppd[1265]: local  IP address 70.24.108.106
    Tue Sep  3 00:32:18 EDT 2013  00:32:18 up 9 min,  load average: 0.03, 0.02, 0.00 WAN-Up 596.70 582.77
    Sep  3 00:42:35 ROUTER daemon.notice pppd[2014]: local  IP address 70.29.28.230
    Tue Sep  3 00:42:43 EDT 2013  00:42:43 up 20 min,  load average: 0.21, 0.05, 0.01 WAN-Up 1221.62 1202.73
     
    Wed Dec 31 19:00:53 EST 1969  19:00:53 up 0 min,  load average: 0.14, 0.03, 0.01 REBOOT 54.87 49.65
    Dec 31 19:00:54 ROUTER daemon.notice pppd[567]: local  IP address 70.24.109.240
    Tue Sep  3 01:59:31 EDT 2013  01:59:31 up 1 min,  load average: 0.15, 0.04, 0.01 WAN-Up 77.40 67.21
     
    Wed Dec 31 19:00:53 EST 1969  19:00:53 up 0 min,  load average: 0.22, 0.04, 0.01 REBOOT 54.85 49.65
    Dec 31 19:00:54 ROUTER daemon.notice pppd[567]: local  IP address 67.71.0.95
    Tue Sep  3 02:01:03 EDT 2013  02:01:03 up 1 min,  load average: 0.40, 0.10, 0.03 WAN-Up 75.49 65.33
     
    Wed Dec 31 19:00:53 EST 1969  19:00:53 up 0 min,  load average: 0.22, 0.04, 0.01 REBOOT 54.85 49.63
    Dec 31 19:00:54 ROUTER daemon.notice pppd[567]: local  IP address 69.158.25.104
    Tue Sep  3 03:03:46 EDT 2013  03:03:46 up 1 min,  load average: 0.41, 0.10, 0.03 WAN-Up 74.59 64.73
     
    Wed Dec 31 19:00:53 EST 1969  19:00:53 up 0 min,  load average: 0.22, 0.04, 0.01 REBOOT 54.83 49.63
    Dec 31 19:00:54 ROUTER daemon.notice pppd[569]: local  IP address 64.231.178.213
    Tue Sep  3 03:05:37 EDT 2013  03:05:37 up 1 min,  load average: 0.48, 0.12, 0.04 WAN-Up 75.24 65.31
    Sep  3 04:45:35 ROUTER daemon.notice pppd[4832]: local  IP address 69.158.86.206
    Tue Sep  3 04:45:42 EDT 2013  04:45:42 up  1:41,  load average: 0.04, 0.01, 0.00 WAN-Up 6080.70 6033.82
    
    Wed Dec 31 19:00:53 EST 1969 19:00:53 up 0 min, load average: 0.22, 0.04, 0.01 REBOOT 49.83 45.94
    Dec 31 19:00:54 ROUTER daemon.notice pppd[569]: local IP address 67.71.2.156
    Tue Sep 3 05:50:47 EDT 2013 05:50:47 up 1 min, load average: 0.28, 0.07, 0.02 WAN-Up 74.67 63.03
    
    The scripts are:

    Code:
    logger "INIT started $(uptime) $(cat /proc/uptime)"
    set -- $(cat /proc/uptime)
    case $1 in
    ??????*);;
    *) sleep 5; echo " " >> /mnt/KINGSTON/log ; echo "$(date) $(uptime) REBOOT $*" >> /mnt/KINGSTON/log ;;
    esac
    
    Code:
    logger "WANUP started $(uptime) $(cat /proc/uptime)"
    
    tail -55 /var/log/messages |grep ": local  IP address" >> /mnt/KINGSTON/log
    echo "$(date) $(uptime) WAN-Up $(cat /proc/uptime)" >> /mnt/KINGSTON/log
    
    Note that not all INIT's are reboots. Note also that the reboots tended to come in doubles: At 01:59 and 02:01, and again at 03:03 and 03:05.
     
  6. chrcoluk

    chrcoluk Reformed Router Member

    so far its been up for 2 days, I have observed if I am downloading at full speed (just single threaded ftp, not torrents) the router gui is very slow to navigate each page takes about 5 seconds to load. full speed on my connection is about 70mbit/sec.

    I do now have persistnet logging configured to jffs.
     
  7. koitsu

    koitsu Network Guru Member

    FYI -- miniupnpd has nothing to do with nginx. nginx is a webserver, miniupnpd is a UPnP processing daemon. I just thought I'd make that clear. I understand that the miniupnpd log line might be the last thing you see before your issue happens, but that log line has nothing to do with nginx. miniupnpd has its own webserver (you need to look into how the SSDP and UPnP protocols actually work -- they use HTTP -- so what you're seeing could just be the OSX machine closing down the TCP socket it has for UPnP capabilities prior to going to sleep. Whoop de doo -- means nothing :) ).
     
  8. Planiwa

    Planiwa LI Guru Member

    How to Debug Spontaneous (Router) Reboot?

    Are we getting anywhere? Let's look at the situation from a LAN host.
    I should mention that the LAN host is connected to the router by ethernet, and the host's WIFI is off.

    We have found that when the LAN host wakes from sleep, the router almost certainly reboots.
    Also, shutting down Mail on the LAN host may cause the router to reboot.

    Immediately upon quitting Mail, all the router's lights come on momentarily . . .



    Recent log from Router (last entry was caused by quitting Mail on host):

    Code:
    Wed Dec 31 19:00:53 EST 1969 19:00:53 up 0 min, load average: 0.22, 0.04, 0.01 REBOOT 49.80 45.89
    Tue Sep 3 21:36:05 EDT 2013 21:36:05 up 1 min, load average: 0.34, 0.09, 0.02 WAN-Up 74.52 64.22
    
    Wed Dec 31 19:00:53 EST 1969 19:00:53 up 0 min, load average: 0.22, 0.04, 0.01 REBOOT 49.80 45.89
    Tue Sep 3 21:37:35 EDT 2013 21:37:35 up 1 min, load average: 0.61, 0.15, 0.05 WAN-Up 76.57 66.21
    
    Wed Dec 31 19:00:53 EST 1969 19:00:53 up 0 min, load average: 0.22, 0.04, 0.01 REBOOT 49.78 45.88
    Tue Sep 3 22:31:37 EDT 2013 22:31:37 up 1 min, load average: 0.48, 0.12, 0.04 WAN-Up 75.12 64.81
    
    Wed Dec 31 19:00:53 EST 1969 19:00:53 up 0 min, load average: 0.22, 0.04, 0.01 REBOOT 49.81 45.90
    Tue Sep 3 22:53:33 EDT 2013 22:53:33 up 1 min, load average: 0.40, 0.10, 0.03 WAN-Up 76.25 65.93
    
    Wed Dec 31 19:00:53 EST 1969 19:00:53 up 0 min, load average: 0.22, 0.04, 0.01 REBOOT 49.83 45.89
    Tue Sep 3 23:08:12 EDT 2013 23:08:12 up 1 min, load average: 0.40, 0.10, 0.03 WAN-Up 76.41 65.72
    
    Wed Dec 31 19:00:53 EST 1969 19:00:53 up 0 min, load average: 0.14, 0.03, 0.01 REBOOT 49.87 45.92
    Wed Sep 4 01:59:35 EDT 2013 01:59:35 up 1 min, load average: 0.23, 0.06, 0.01 WAN-Up 75.51 65.78
    
    Wed Dec 31 19:00:53 EST 1969 19:00:53 up 0 min, load average: 0.22, 0.04, 0.01 REBOOT 49.78 45.88
    Wed Sep 4 07:20:14 EDT 2013 07:20:14 up 1 min, load average: 0.73, 0.17, 0.05 WAN-Up 76.82 65.72
    
    Wed Dec 31 19:00:53 EST 1969 19:00:53 up 0 min, load average: 0.22, 0.04, 0.01 REBOOT 49.81 45.90
    Wed Sep 4 08:29:39 EDT 2013 08:29:39 up 1 min, load average: 0.48, 0.12, 0.04 WAN-Up 75.19 65.14
    
    Recent log from OSX LAN host, not including the recent Mail shut-down):
    Code:
    Sep  3 21:21:54 planiwa kernel[0]: sleep
    Sep  3 21:21:54 planiwa.local configd[18]: network changed: v4(en1-:192.168.1.98) DNS- Proxy- SMB
    Sep  3 21:31:40 planiwa kernel[0]: Wake reason: OHC1
    Sep  3 21:34:41 planiwa.local configd[18]: network changed: v4(en1+:192.168.1.98) DNS+ Proxy+ SMB
    Sep  3 21:34:41 planiwa.local mDNSResponder[39]: NAT gateway 192.168.1.1 rebooted
    Sep  3 21:34:41 planiwa.local configd[18]: network changed: v4(en1!:192.168.1.98) DNS Proxy SMB
    Sep  3 21:35:00 planiwa.local configd[18]: network changed: v4(en1-:192.168.1.98) DNS- Proxy- SMB
    Sep  3 21:36:09 planiwa.local configd[18]: network changed: v4(en1+:192.168.1.98) DNS+ Proxy+ SMB
    Sep  3 21:36:09 planiwa.local configd[18]: network changed: v4(en1!:192.168.1.98) DNS Proxy SMB
    Sep  3 21:36:09 planiwa.local mDNSResponder[39]: NAT gateway 192.168.1.1 rebooted
    Sep  3 21:36:29 planiwa.local configd[18]: network changed: v4(en1-:192.168.1.98) DNS- Proxy- SMB
    Sep  3 21:38:07 planiwa.local configd[18]: network changed: v4(en1+:192.168.1.98) DNS+ Proxy+ SMB
    Sep  3 21:38:07 planiwa.local configd[18]: network changed: v4(en1!:192.168.1.98) DNS Proxy SMB
    Sep  3 21:38:07 planiwa mDNSResponder[39]: NAT gateway 192.168.1.1 rebooted
    Sep  3 22:29:10 planiwa.local configd[18]: network changed: v4(en1-:192.168.1.98) DNS- Proxy- SMB
    Sep  3 22:30:13 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  3 22:30:18 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  3 22:30:20 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  3 22:30:20 planiwa.local mDNSResponder[39]: mDNS_RegisterInterface: Frequent transitions for interface en0 (192.168.1.92)
    Sep  3 22:30:28 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  3 22:31:24 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  3 22:31:24 planiwa.local mDNSResponder[39]: mDNS_RegisterInterface: Frequent transitions for interface en0 (192.168.1.92)
    Sep  3 22:31:39 planiwa mDNSResponder[39]: NAT gateway 192.168.1.1 rebooted
    Sep  3 22:36:22 planiwa com.apple.usbmuxd[26]: _heartbeat_failed heartbeat detected detach for device 0x3c-192.168.1.90:0!
    Sep  3 22:51:08 planiwa kernel[0]: sleep
    Sep  3 22:51:08 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  3 22:51:59 planiwa kernel[0]: Wake reason: OHC1
    Sep  3 22:52:02 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  3 22:52:12 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  3 22:52:12 planiwa.local mDNSResponder[39]: DeregisterInterface: Frequent transitions for interface en0 (192.168.1.92)
    Sep  3 22:52:14 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  3 22:52:14 planiwa.local mDNSResponder[39]: mDNS_RegisterInterface: Frequent transitions for interface en0 (192.168.1.92)
    Sep  3 22:52:22 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  3 22:53:18 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  3 22:53:18 planiwa.local mDNSResponder[39]: mDNS_RegisterInterface: Frequent transitions for interface en0 (192.168.1.92)
    Sep  3 22:53:34 planiwa mDNSResponder[39]: NAT gateway 192.168.1.1 rebooted
    Sep  3 23:06:52 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  3 23:06:54 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  3 23:07:03 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  3 23:07:58 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  3 23:07:58 planiwa.local mDNSResponder[39]: mDNS_RegisterInterface: Frequent transitions for interface en0 (192.168.1.92)
    Sep  3 23:08:13 planiwa mDNSResponder[39]: NAT gateway 192.168.1.1 rebooted
    Sep  3 23:09:42 planiwa kernel[0]: sleep
    Sep  4 07:18:43 planiwa kernel[0]: Wake reason: OHC1
    Sep  4 07:18:43 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  4 07:18:46 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  4 07:18:54 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  4 07:18:54 planiwa.local mDNSResponder[39]: DeregisterInterface: Frequent transitions for interface en0 (192.168.1.92)
    Sep  4 07:18:56 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  4 07:18:56 planiwa.local mDNSResponder[39]: mDNS_RegisterInterface: Frequent transitions for interface en0 (192.168.1.92)
    Sep  4 07:19:05 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  4 07:20:00 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  4 07:20:00 planiwa.local mDNSResponder[39]: mDNS_RegisterInterface: Frequent transitions for interface en0 (192.168.1.92)
    Sep  4 07:20:16 planiwa mDNSResponder[39]: NAT gateway 192.168.1.1 rebooted
    
    LAN OSX host log, relating to Mail shut-down and subsequent router reboot (and including the ethernet link messages):

    Code:
    Sep  4 08:28:19 planiwa kernel[0]: 00000000  01947949  NVEthernet::mediaChanged - Link is down
    Sep  4 08:28:19 planiwa kernel[0]: 00000000  00000020  NVEthernet::setLinkStatus - not Active
    Sep  4 08:28:20 planiwa.local configd[18]: setting hostname to "planiwa.local"
    Sep  4 08:28:20 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  4 08:28:21 planiwa kernel[0]: Ethernet [nvenet]: Link up on en0, 1-Gigabit, Full-duplex, Symmetric flow-control, Debug [796d,0000,0de1,000d,cde1,3c00]
    Sep  4 08:28:21 planiwa kernel[0]: 3b9aca00  00500030  NVEthernet::setLinkStatus - Active
    Sep  4 08:28:21 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  4 08:28:30 planiwa.local configd[18]: network changed: v4(en0-:192.168.1.92) DNS- Proxy- SMB
    Sep  4 08:29:10 planiwa kernel[0]: 00000000  01947949  NVEthernet::mediaChanged - Link is down
    Sep  4 08:29:10 planiwa kernel[0]: 00000000  00000020  NVEthernet::setLinkStatus - not Active
    Sep  4 08:29:12 planiwa kernel[0]: Ethernet [nvenet]: Link up on en0, 1-Gigabit, Full-duplex, Symmetric flow-control, Debug [796d,0000,0de1,000d,cde1,3c00]
    Sep  4 08:29:12 planiwa kernel[0]: 3b9aca00  00500030  NVEthernet::setLinkStatus - Active
    Sep  4 08:29:25 planiwa.local configd[18]: network changed: v4(en0+:192.168.1.92) DNS+ Proxy+ SMB
    Sep  4 08:29:26 planiwa.local mDNSResponder[39]: mDNS_RegisterInterface: Frequent transitions for interface en0 (192.168.1.92)
    Sep  4 08:29:26 planiwa configd[18]: setting hostname to "planiwa"
    Sep  4 08:29:41 planiwa mDNSResponder[39]: NAT gateway 192.168.1.1 rebooted
    
    Restarting Mail and shutting it down a few minutes later did not result in router reboot . . .

    [Stay tuned if interested ... feel free to participate in this inquiry, if so inclined.]
     

Share This Page