1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Soft restarts with shibby 110

Discussion in 'Tomato Firmware' started by RobbieW, Jul 30, 2013.

  1. RobbieW

    RobbieW Reformed Router Member

    Router: Asus RT-N16
    Firmware: shibby 1.28.0000 MIPSR2-110 K26 USB VPN

    I have a number of instances of unexplained soft restarts, see log extract below (actual MAC address removed). The WAN connection is provided by a Huawei E367 on a Wind.IT service. Sometimes the router seems to reconnect on its own, others require a reboot although this may only be when we are using the service and impatient. With the example given, the restart has happened around 01.45 local when there were no users.

    Any thoughts as to cause and whether there is anything I can tweak to improve the service?

    Code:
    Jul 30 01:44:24 Springy-ASUS user.warn kernel: vlan2: Setting MAC address to  [I]'removed'[/I]
    Jul 30 01:44:24 Springy-ASUS user.debug kernel: vlan2: add 01:00:5e:00:00:01 mcast address to master interface
    Jul 30 01:44:24 Springy-ASUS user.debug kernel: vlan2: add 33:33:00:00:00:01 mcast address to master interface
    Jul 30 01:44:25 Springy-ASUS daemon.notice pppd[6165]: pppd 2.4.5 started by root, uid 0
    Jul 30 01:44:25 Springy-ASUS user.info redial[6167]: Started. Time: 10
    Jul 30 01:44:25 Springy-ASUS user.notice root: 3G MODEM already found - using usbserial module
    Jul 30 01:44:26 Springy-ASUS local2.err chat[6203]: Can't get terminal parameters: Input/output error
    Jul 30 01:44:26 Springy-ASUS daemon.err pppd[6165]: Connect script failed
    Jul 30 01:44:41 Springy-ASUS daemon.info pppd[6165]: Serial connection established.
    Jul 30 01:44:41 Springy-ASUS daemon.info pppd[6165]: Using interface ppp0
    Jul 30 01:44:41 Springy-ASUS daemon.notice pppd[6165]: Connect: ppp0 <--> /dev/ttyUSB0
    Jul 30 01:44:44 Springy-ASUS daemon.warn pppd[6165]: Could not determine remote IP address: defaulting to 10.64.64.64
    Jul 30 01:44:44 Springy-ASUS daemon.notice pppd[6165]: local  IP address 10.34.194.4
    Jul 30 01:44:44 Springy-ASUS daemon.notice pppd[6165]: remote IP address 10.64.64.64
    Jul 30 01:44:44 Springy-ASUS daemon.notice pppd[6165]: primary  DNS address 212.52.97.25
    Jul 30 01:44:44 Springy-ASUS daemon.notice pppd[6165]: secondary DNS address 193.70.152.25
    Jul 30 01:44:44 Springy-ASUS user.debug ip-up[6209]: 182: pptp peerdns disabled
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6106]: reading /etc/resolv.dnsmasq
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6106]: using nameserver 8.8.8.8#53
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6106]: using nameserver 8.8.4.4#53
    Jul 30 01:44:44 Springy-ASUS user.debug init[1]: 182: pptp peerdns disabled
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6106]: exiting on receipt of SIGTERM
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6212]: started, version 2.67cs7 cachesize 1500
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6212]: compile time options: IPv6 GNU-getopt no-RTC no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset Tomato-helper auth
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6212]: asynchronous logging enabled, queue limit is 5 messages
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq-dhcp[6212]: DHCP, IP range 192.168.20.100 -- 192.168.20.149, lease time 1d
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6212]: reading /etc/resolv.dnsmasq
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6212]: using nameserver 8.8.8.8#53
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6212]: using nameserver 8.8.4.4#53
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6212]: read /etc/hosts - 2 addresses
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq[6212]: read /etc/dnsmasq/hosts/hosts - 3 addresses
    Jul 30 01:44:44 Springy-ASUS daemon.info dnsmasq-dhcp[6212]: read /etc/dnsmasq/dhcp/dhcp-hosts
    Jan  1 01:00:51 Springy-ASUS user.info hotplug[503]: USB vfat fs at /dev/sda1 mounted on /tmp/mnt/E367
    Jan  1 01:00:52 Springy-ASUS daemon.err pppd[528]: In file /tmp/ppp/wanoptions: unrecognized option '/dev/ttyUSB0'
    Jan  1 01:00:52 Springy-ASUS user.info redial[531]: Started. Time: 10
    Jan  1 01:00:52 Springy-ASUS authpriv.info dropbear[539]: Running in background
    Jan  1 01:00:52 Springy-ASUS daemon.info dnsmasq[573]: started, version 2.67cs7 cachesize 1500
    Jan  1 01:00:52 Springy-ASUS daemon.info dnsmasq[573]: compile time options: IPv6 GNU-getopt no-RTC no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset Tomato-helper auth
    
     
  2. Planiwa

    Planiwa LI Guru Member

    Were there messages before these ones?
     
  3. RobbieW

    RobbieW Reformed Router Member

    Sure, what I tried to do was pick a representative sample. The uploaded file takes you from a previous restart, some 40 minutes earlier, through to the one seen in the sample above.
     

    Attached Files:

  4. Planiwa

    Planiwa LI Guru Member

    Code:
          egrep 'terminated|Connect time|Sent' sof*
    Jan  1 01:01:34 Springy-ASUS daemon.warn pppd[840]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 880) terminated with signal 15
    Jan  1 01:01:56 Springy-ASUS daemon.warn pppd[944]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 983) terminated with signal 15
    Jan  1 01:02:18 Springy-ASUS daemon.warn pppd[1048]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 1090) terminated with signal 15
    Jan  1 01:02:40 Springy-ASUS daemon.warn pppd[1158]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 1197) terminated with signal 15
    Jan  1 01:03:02 Springy-ASUS daemon.warn pppd[1256]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 1304) terminated with signal 15
    Jan  1 01:03:24 Springy-ASUS daemon.warn pppd[1367]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 1407) terminated with signal 15
    Jan  1 01:03:46 Springy-ASUS daemon.warn pppd[1472]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 1514) terminated with signal 15
    Jan  1 01:04:08 Springy-ASUS daemon.warn pppd[1574]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 1621) terminated with signal 15
    Jan  1 01:04:30 Springy-ASUS daemon.warn pppd[1684]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 1728) terminated with signal 15
    Jan  1 01:04:53 Springy-ASUS daemon.warn pppd[1792]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 1831) terminated with signal 15
    Jan  1 01:05:16 Springy-ASUS daemon.warn pppd[1899]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 1938) terminated with signal 15
    Jan  1 01:05:38 Springy-ASUS daemon.warn pppd[2001]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 2045) terminated with signal 15
    Jan  1 01:06:00 Springy-ASUS daemon.warn pppd[2109]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 2152) terminated with signal 15
    Jan  1 01:06:22 Springy-ASUS daemon.warn pppd[2216]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 2255) terminated with signal 15
    Jan  1 01:06:44 Springy-ASUS daemon.warn pppd[2323]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 2362) terminated with signal 15
    Jan  1 01:07:06 Springy-ASUS daemon.warn pppd[2426]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 2469) terminated with signal 15
    Jan  1 01:07:28 Springy-ASUS daemon.warn pppd[2528]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 2576) terminated with signal 15
    Jan  1 01:07:50 Springy-ASUS daemon.warn pppd[2646]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 2679) terminated with signal 15
    Jan  1 01:08:13 Springy-ASUS daemon.warn pppd[2743]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 2786) terminated with signal 15
    Jan  1 01:08:35 Springy-ASUS daemon.warn pppd[2855]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 2893) terminated with signal 15
    Jan  1 01:08:57 Springy-ASUS daemon.warn pppd[2952]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 2996) terminated with signal 15
    Jan  1 01:09:19 Springy-ASUS daemon.warn pppd[3061]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 3103) terminated with signal 15
    Jan  1 01:09:41 Springy-ASUS daemon.warn pppd[3171]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 3210) terminated with signal 15
    Jan  1 01:10:03 Springy-ASUS daemon.warn pppd[3269]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 3317) terminated with signal 15
    Jan  1 01:10:25 Springy-ASUS daemon.warn pppd[3380]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 3420) terminated with signal 15
    Jan  1 01:10:47 Springy-ASUS daemon.warn pppd[3485]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 3527) terminated with signal 15
    Jan  1 01:11:09 Springy-ASUS daemon.warn pppd[3591]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 3634) terminated with signal 15
    Jan  1 01:11:32 Springy-ASUS daemon.warn pppd[3695]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 3741) terminated with signal 15
    Jan  1 01:11:54 Springy-ASUS daemon.warn pppd[3805]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 3844) terminated with signal 15
    Jan  1 01:12:16 Springy-ASUS daemon.warn pppd[3909]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 3951) terminated with signal 15
    Jan  1 01:12:38 Springy-ASUS daemon.warn pppd[4011]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 4058) terminated with signal 15
    Jan  1 01:13:00 Springy-ASUS daemon.warn pppd[4122]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 4165) terminated with signal 15
    Jan  1 01:13:22 Springy-ASUS daemon.warn pppd[4229]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 4268) terminated with signal 15
    Jan  1 01:13:44 Springy-ASUS daemon.warn pppd[4333]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 4375) terminated with signal 15
    Jan  1 01:14:06 Springy-ASUS daemon.warn pppd[4439]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 4482) terminated with signal 15
    Jan  1 01:14:28 Springy-ASUS daemon.warn pppd[4541]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 4589) terminated with signal 15
    Jan  1 01:14:50 Springy-ASUS daemon.warn pppd[4652]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 4692) terminated with signal 15
    Jan  1 01:15:13 Springy-ASUS daemon.warn pppd[4760]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 4799) terminated with signal 15
    Jan  1 01:15:35 Springy-ASUS daemon.warn pppd[4859]: Child process /usr/sbin/chat -V -t 60 -f /tmp/ppp/connect.chat (pid 4906) terminated with signal 15
    Jan  1 01:15:57 Springy-ASUS daemon.notice pppd[4970]: Connection terminated.
    Jan  1 01:16:19 Springy-ASUS daemon.notice pppd[5080]: Connection terminated.
    Jul 30 01:27:21 Springy-ASUS daemon.info pppd[5192]: LCP terminated by peer
    Jul 30 01:27:21 Springy-ASUS daemon.info pppd[5192]: Connect time 16.4 minutes.
    Jul 30 01:27:21 Springy-ASUS daemon.info pppd[5192]: Sent 8616 bytes, received 334443 bytes.
    Jul 30 01:27:24 Springy-ASUS daemon.notice pppd[5192]: Connection terminated.
    Jul 30 01:44:16 Springy-ASUS daemon.info pppd[5742]: LCP terminated by peer
    Jul 30 01:44:16 Springy-ASUS daemon.info pppd[5742]: Connect time 16.3 minutes.
    Jul 30 01:44:16 Springy-ASUS daemon.info pppd[5742]: Sent 503 bytes, received 338 bytes.
    Jul 30 01:44:19 Springy-ASUS daemon.notice pppd[5742]: Connection terminated.
    
     
  5. koitsu

    koitsu Network Guru Member

    The "child process" entries are pppd itself sending a SIGTERM to the forked child process of /usr/bin/chat. I have no idea why this happens regularly like that, but whatever.

    The "connection terminated" messages followed by "LCP terminated by peer" messages, and their subsequent lines (connect time, sent X bytes), are indications that your ISP issued a terminate-request LCP code to you, which the LCP layer of pppd honoured (sending back a terminate-ack presumably).

    You would need to ask your ISP why they're doing this, and will need to work with senior network technicians at your ISP to figure out what's going on. You yourself will need to get well-versed with the PPP protocol and its underlying LCP layer.

    Very likely this is caused by PPP implementation incompatibilities between your ISP and the PPP layer implemented via PPPoE/PPPoA on TomatoUSB. As I've advocated in the past, at all costs you should avoid using PPPoE/PPPoA on the router itself and instead use whatever modem/device your ISP gives you (put it into bridge mode so that you can get a Internet-facing WAN IP from the device/your ISP without having to do things like "double NAT"). This situation is nothing new -- it's been like this for the past 15+ years, and is one of the reasons PPPoE/PPPoA is problematic. You really have to use whatever device they give you to ensure full compatibility.
     
  6. RobbieW

    RobbieW Reformed Router Member

    Thanks for those insights, is the 'restart' likely to be a result of a memory shortage because those forked processes dont get cleaned up completely?

    koitsu,
    I get the gist of what your saying even though the technical detail is beyond me. I guess its obvious that I'm using a mobile broadband connection so my ISP is a mobile provider. The only 'equipment' supplied is a SIM card. That SIM is used in a generic user supplied device, I'm using a Huawei E367. My observation is that, with the advent of MyFi devices and routers which advertise dongle support, support of the intricacies of a protocol would seem to be moving away from a specific ISP implementation toward something that should be generic.

    Tomato offers that dongle support via a USB port in the shibby (and other) builds, I appreciate that this may not be mainstream function for the OS though. Is there any way that I can configure Tomato to interface with the dongle without using PPPoE/PPPoA as you strongly suggest?

    Thanks...
     
  7. koitsu

    koitsu Network Guru Member

    No, the forked process is being cleaned up properly/completely, that's what you're seeing there. pppd forks/spawns the chat binary, and pppd is also the one responsible for sending SIGTERM (e.g. kill {pid}) to the underlying child. Without that, the chat binary would be running/lingering potentially forever.

    Okay, let's be very clear here.

    The Huawei E367 -- I just looked it up -- is absolutely nothing more than a UMTS/GSM/GPRS/EDGE wireless device that provides host connectivity via USB. What this mean: the Huawei E367 is a layer 1 device (possibly partial layer 2) that has absolutely no involvement in the PPPoE/PPPoA portion of the network connection setup. It simply handles the wireless portion. So the Huawei device is not responsible for anything relating to the PPP encapsulation going on.

    In turn, what this means is that your ISP (Wind.IT?) has a PPPoE/PPPoA concentrator/endpoint (think "server" -- it's identical in concept to a DSLAM for DSL) that's responsible for negotiating the PPP portions of the "next layer" of traffic. PPPoE et al is at layer 2, quite literally "encapsulating Ethernet frames into PPP". So what you need on your side is equipment that is able to speak PPPoE/PPPoA -- because the Huawei doesn't do this for you.

    Since your ISP does not give you any kind of device that does the PPP encapsulation for you, they are expecting (read: assuming) that PPPoE/PPPoA "magically works with everything on the planet". This is completely idiotic and absolutely false. There are so many options negotiated during the LCP phase as well as the PPP phase that can "go wrong" or "act oddly" that cause a connection to not work reliably, or intermittently work then fail.

    So because the ISP does not give you a device that does the PPP encapsulation that they know works correctly with their own PPPoE concentrator, they need to have senior network engineers work with you to find out where the LCP and/or PPP option/handshaking is going wrong and causing you problems. It means their own concentrator may have design/methodology quirks that require certain specific options be submit by the PPPoE client (your router), but you wouldn't know what those are unless they told you (no ISP does, in my experience).

    Your choices as I see them:

    1. Communicate with your ISP, explaining that there is a PPPoE and/or LCP negotiation problem between your PPPoE device (a router) and their PPPoE concentrator, and that you need senior network technicians familiar with these protocols to work directly with you to find out what is going wrong. (It will require lots of debugging/logging/etc. on both of your parts -- expect this to take weeks, if not months, to figure out)

    2. Replace the Huawei E367 with a device that does both UMTS/GSM/GPRS/EDGE wireless and PPPoE encapsulation that your own ISP can recommend works with their PPPoE concentrator/service. Also important: you want a device that is a bridge, or can be put into bridging mode, so that you can use your own router (Tomato-based), while leaving the PPP encapsulation and authentication up to a separate device.

    This is a fairly standard configuration/setup for providers like AT&T (both out west and in the south) with DSL, where you're given a SpeedStream "DSL modem" that handles the ADSL portion (layer 1), and also the PPP encapsulation (layer 2) as well as the authentication (part of PPPoE), and can also act as a router (a crappy one at that) -- but can be turned into a bridge (via a simple radio button via the GUI for the DSL modem) that allows it to do PPPoE itself but then pass all other traffic (ex. DHCP, etc.) from your ISP on to whatever device you have plugged into the "Ethernet" port on the modem (i.e. a router). I used this setup myself for many years when I had AT&T, as well as when I had Covad. (When I had Northpoint, I actually used their Netopia-branded router because it did things decently at that time (1999-2000)).

    3. Change ISPs.
     
  8. RobbieW

    RobbieW Reformed Router Member

    Thanks koitsu,
    You outline the choices clearly, given that I'm currently in Italy and Wind.IT are an Italian company 1. is unlikely as I dont speak Italian. 3. is also problematic as the ISP choice was made on the basis of the monthly cost which leaves 2. Thats pretty much where where my thoughts had gone during the day, to move the function off the Tomato based system. Thats a pity as I'd hoped that the Tomato USB dongle support would handle connections through those devices.

    Regards ...
     
  9. koitsu

    koitsu Network Guru Member

    I understand; I'm sorry to be the one to tell you that not all PPPoE/PPPoA implementations and configurations are "universally compatible". It's very likely that Wind.IT, who I assume told you the Huawei E367 was compatible with their stuff, probably only tested it when used on a Windows machine. Windows' PPPoE implementation/configuration/setting differs from that of pppd on Linux -- it can be made to operate the same, but it involves knowing what PPP configuration settings to use/set, and that depends entirely on what the ISP has on their side. If they could disclose this, it would greatly help.
     
  10. RobbieW

    RobbieW Reformed Router Member

    Sorry to push on this but, having just reread the above, there is no theory on why Tomato restarted.

    By comparison, having moved the mobile internet access off Tomato, I've just had a period when the internet connection died temporarily whilst running an OpenVPN client. There are many messages that VPN had lost contact etc etc but the OS recovered when the internet connection came back and VPN carried on - no OS restart.
     
  11. Planiwa

    Planiwa LI Guru Member

    Why do you say "Tomato restarted", and "OS restart"? Why "Soft restarts" in the Subject?
     
  12. RobbieW

    RobbieW Reformed Router Member

    Tomato is an operating system (OS), so OS and Tomato are interchangeable. Soft restart, to me anyway, is one that the OS initiates as a recovery mechanism from some error. A hard restart would be one I had to do because the OS couldnt recover.

    I suppose strictly Linux is the OS that Tomato runs on but as Tomato is essentially a bundled, monolithic program the whole thing is really the OS that runs the router.
     
  13. Monk E. Boy

    Monk E. Boy Network Guru Member

    Yeah, Linux is the OS, Tomato is just a bundle of executables that run on the OS. It's like saying Ubuntu is the OS, it's not, it's just the Linux distribution.
     
  14. mvsgeek

    mvsgeek Addicted to LI Member


    Once upon a time there was an IBM mainframe OS called "OS". It was IBM's successor to their "DOS". Over the years, "OS" became "OS/VS1", and then "MVS", which spawned a whole generation of geeks. OK I'll shut up now and return to studying my IBM System 360 Reference Card.
     
  15. myersw

    myersw Network Guru Member

    Ah yes. Still have my 360 card as well. Also IBM 1401 service manuals. Was also a MVS Program Support Rep back when things were written in assembler language. Giving away my age now. :)
    --bill
     
  16. mvsgeek

    mvsgeek Addicted to LI Member

    Ha! My first language was 1410 (not 1401) Autocoder. There were only 2 1410's in the UK back then. And I still think Assembler is the most elegant programming language of them all. The COBOL drones used to call us Assembler Snobs. I considered it a badge of honour:D
     
  17. RobbieW

    RobbieW Reformed Router Member

    So, the tilley lamps are all lit and the sandbags arranged in a nice circle :rolleyes: Now, anyone like to offer a theory, or suggest some diagnostic, on why my router restarts itself? I cant see that its running short of memory, are there other logs I could capture?
     
  18. Monk E. Boy

    Monk E. Boy Network Guru Member

    A theory has already been offered, and I'd say it's probably the most likely scenario. Read Koitsu's second reply to this thread.

    The problem is the ISPs PPP implementation is not "out of the box" compatible with the PPP implementation in Tomato. Settings on one side or the other need to be changed so they match. Alternately you can get a new device (e.g. something besides a Huawei E367) which handles PPP for you, which the ISP also certifies as compatible, thereby removing Tomato from the PPP equation.

    Various parts of the PPP system are restarting, as they are wont to do when one side or the other requests that the PPP session be built/rebuilt/shutdown. I see no mention of the router itself restarting, except when you do it by hand.
     
  19. RobbieW

    RobbieW Reformed Router Member

    Thanks for that & I understand that the PPP implementation from the ISP isnt matching that of Tomato. In posts 1 & 3 I've included log entries that, once you hit the 1 Jan sequence, are identical to a manual reboot of my system. As I'm not rebooting it, I've concluded that the sequence shows the router rebooting itself. How might I figure out why it does that? I may be being obtuse but I dont see that in koitsu's answers.

    In essence, the scenario koitsu describes could repeat forever if the router processes always recovered and tried to restart just the PPP part. That is not what seems to be happening though, the router does seem to reboot and I'm not seeing the difference between that and a PPP retry.
     
  20. koitsu

    koitsu Network Guru Member

    The "January 1st" timestamps are indicators that the router has probably been rebooted and has yet to sync its internal clock to an NTP source; most consumer routers do not have battery-backed RTCs (think CMOS clock for a PC) thus they can keep track of time but don't retain/store what time it is for restoration on power-up/reset.

    Solely with regards to timestamps:

    I'm having a very hard time understanding the logs you've provided. Let me explain why.

    First, it looks like to get these logging lines, you have a secondary syslog server (maybe on a Windows machine or another *IX machine?) or you're writing the logging lines to a file (maybe on a USB stick, etc.). Is that correct? Can you shed light on that? It matters, believe it or not.

    Secondly, here you provide a log that shows lines from July 30th, then "magically" log lines from January 1st. I say "magically" because there is probably a lot more going on between those logging lines that has been omitted or that isn't being logged -- it's impossible for me to say because the logging on Tomato is not all that great to begin with (search the forums for people complaining about "too much", and others complaining about "not enough" -- impossible to make everyone happy).

    Thirdly, here is a log from Planiwa (presumably some kind of "parsed log" from what you provided here? Not sure -- not enough info) that shows the opposite situation -- logging lines "start" with January 1st, then switch over to July 30th, presumably (roughly? I have to make assumptions!) 11 minutes later. This would indicate that you got a working network layer for roughly 11 minutes, and your router was able to sync time via NTP during that time. NTP sync logging is disabled by default on Tomato; you can turn it on via the GUI if you want (I do).

    Finally, if the router is actually rebooting, it could be caused by all sorts of things: from faulty hardware (incl. overheating) to a kernel panic. If it's a kernel panic, it's important to get that information if at all possible. And there's where you're kinda SOL: the only way to get that info on an embedded router is via a serial port/serial console. Here's an example thread showing that exact situation. The kernel panics aren't very helpful right now either because no symbol data is generated, which makes getting functions/lines of code/etc. relevant to the crash very difficult (I'd border on saying impossible, or at least infeasible). RMerlin in that thread demonstrates how he tracked down problems with some iptables/netfilter modules -- it's very painful and requires familiarity. For developers, the easiest way is to be told exactly how to reproduce the problem, and in this case you can't send people your Huawei device nor ask people to relocate to Italy, so the effort sadly falls onto you (or finding other solutions, e.g. other firmwares, etc.). I know that sucks, but that's one of the major caveats/problems of embedded devices -- it's been like this for 30 years.

    It would help possibly if you could provide a full description of what all settings you change on the router -- every single one! -- from stock defaults. Many people have found problems/complexities with IPv6 support or QoS where their router reboots (presumably a kernel panic, but because none of the users have serial ports on their routers...... starting to get the picture?)

    Does this explain why, with regards to the logs, I don't know what to think/trust/etc.? :) It's not that I don't believe you per se, it's that it's a lot easier to troubleshoot/etc. if there's further evidence and a good description of what's going on.

    For example, for all we know, your "reboot issue" could be a secondary problem that isn't related to the PPP stuff at all! We just don't know. Welcome to troubleshooting 101. :)
     
  21. RobbieW

    RobbieW Reformed Router Member

    Logs are written to files onto a USB stick, initially I did that to make bandwidth logging persistent then extended that thought to system log files so there are 5x250kb log files.

    In my first post I pasted an extract from my log that showed a more or less repeating sequence of events leading to a time stamp reset to Jan 1. That I took to be my router rebooting. Planiwa asked if there was more so in post 3 I added a file showing a sequece that bridged two Jan 1 events about an hour apart. What Planiwa did was extract from that file the line he considered to be giving me trouble, post 4. NTP sync logging is enabled so at the point a network connection is re-established, the router can do a time sync and reset its clock to some variant of UTC.

    OK, read the link on getting at debug stuff. Understand its not easy and that the payback may not be worth it given I have alternatives (the SIM is now in a purpose designed MiFi type box plugged into the WAN port on the router).I'd also tried several different firmwares before ending up with shibby as the closest match so far to what I'd like to be able to do with the router. I'd explore OpenWRT but thier support for the RT-N16 is still developmental and gave me problems when I did try.

    In my mind at least there is a causal link between these apparent reboots and the use of the PPP based E367 for connection. My router doesnt exhibit the apparent rebooting behaviour when not using that connection. Part of my history is as an IBM hardware tech and before that as a line avionics tech at BA but we are talking 70s/80s, also have a grounding in *IX having worked with AIX3 supporting software that ran on it (& a couple of other *IXs from the same era). I guess I just like to get to the bottom of things :) or at least understand whats going on.
     
  22. Planiwa

    Planiwa LI Guru Member

    Why do you say "my router restarts itself"? Are you referring to 01:44:44 ?

    You wrote: "Planiwa asked if there was more so in post 3 I added a file showing a sequece that bridged two Jan 1 events about an hour apart."

    The file you posted does not show "a sequence that bridged two Jan 1 events about an hour apart." You posted two separate files. The second file shows no "restart". It ends before the "restart". It is unclear where the last 6 lines of the first file come from, and why they are missing from the 2nd file. Perhaps these log files have been manually edited?

    It would have been less confusing had you posted a single file containing the (single) discontinuity.

    You have shown no restarts (plural), but only a single implied restart between two separate logs.

    Your log shows multiple ppp connection terminations. That is clearly your primary problem.

    If/when you fix that problem, if/when you then have multiple, documented "restarts", then we can look at that documentation.

    Meanwhile, one suggestion is to place a line in each of the "scripts" (two lines if there is significant processing within a script, e.g.:
    Code:
    logger -p 6 -t "" "INIT SCRIPT Started"
    
    From your Subject Header, problem description, and the original evidence provided, it would appear that your problem is "router restarts".

    It was not until message #4 that it became clear that your problem is not "router restarts", but ppp connection failures.

    If you follow the above suggestion you will see very few "INIT SCRIPT Restart" messages and very many "WAN SCRIPT Restart" messages.
     
  23. RobbieW

    RobbieW Reformed Router Member

    Planiwa, please specify exactly which scripts, name & location, you suggest I insert that line & I'll do that. I will then revert to using the E367 and see whether the problem recurs and post the full log. You're right, what you saw had been edited to show what seemed to matter, I was wrong.
     
  24. Planiwa

    Planiwa LI Guru Member

    FWIW the scripts are located in /admin-scripts.asp
    aka Administration > Scripts

    Making each Script log each entry (and, if nontrivial, exit) will not prevent any problems; it will simply help diagnosing eventual problems.

    As for your actual problem -- it would be useful if people shared/posted the G3 modems that do and do not work with their Tomato configuration and ISP. In your case, something like:

    Code:
    Huawei-E367  RT-N16  shibby-110  Wind.IT  :  UNSTABLE  --RobbieW
    (I've inherited a (Wind-branded) Huawei E1691. Perhaps when I have nothing else to do I might buy some Wind time and check it out. It could be useful in an emergency. But it would be so much more convenient to know if others have been (un)successful with that particular model.)

    Cf: http://www.linksysinfo.org/index.php?threads/current-usb-3g-modem-support.35574/
    (and, amazingly,):
    http://www.linksysinfo.org/index.php?threads/multiple-serial-devices.68803/#post-231240
     

Share This Page