1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Script to log WAN Link Outage

Discussion in 'Tomato Firmware' started by bdf0506, Jun 7, 2013.

  1. bdf0506

    bdf0506 Serious Server Member

    Hello all,
    I am looking to write some sort of script to run on my tomato router that will (a) log whenever my WAN link goes down and (b) how long it goes out for. I started to try to script this by running ICMP checks to a public DNS server, and then whenever it failed for 4 consecutive ping tests it would log a message stated that the link went out. However, if my connection goes out for 5 minutes, I simply want the message to say 'WAN link outage occured. Duration: 5 minutes' or something like that, instead of seeing the same message many times. I just don't know exactly how to script that up, and wanted to see if others could help.

    Here is the script I currently have:

    while [ true ]; do                                                                               
    if ping -c 4 -W 1 | grep "100% packet loss" ; then
      logger Ping to FAILED -p9                     
        sleep 10                                             
  2. bdf0506

    bdf0506 Serious Server Member

    Well, the Tomato community failed me on this one! I've been working on a script that does this, so I wanted to share it with the rest of the world.

    The script was taken and modified from http://www.dslreports.com/forum/r24738600-Cable-vs-DSL-reliability-with-logs-using-DDWRT. It doesn't do exactly what I wanted as shown above, but it will do a ping request every 10 seconds. If successful, the log file will have an "X" and if it fails the log file will have a "." instead. You can change the variables for the IP to use to ping, how often to ping the IP, how long to wait for non-response to consider it failed, log file location, and how many outputs you want per line. It's been extremely helpful in troubleshooting my intermittent internet connection, especially in the wee hours of the morning when nobody on my home network is accessing the internet or reporting issues.

    I have a USB drive attached to my E3000 router, so I setup a cron job at 11:59pm to save the day's logs to it's own file on the USB drive. Otherwise, the running log file is 300 lines.

    Example of the script output:

    Here is the script:

    #########BEGIN OF SCRIPT FOR ICMP CHECKS#####################
    # Script taken and modified from http://www.dslreports.com/forum/r24738600-Cable-vs-DSL-reliability-with-logs-using-DDWRT
    # Adds cron job to create daily backup of the day's ping history at 11:59pm
    cru a ispmonitor "59 23 * * * /tmp/home/root/ispmonitoring_bkp"
    # Interval of how often to do an ICMP Check (seconds)
    # Number of results to display per line
    # Number of seconds to wait on non-responsive ping before considering it a failure
    # Log file location
    # IP to use to ping
    while [ 1 ]
      echo -n "`date +%b%d.%H:%M:%S` " >> $LOGFILE
            while [ $I -gt 0 ]
          RET=`ping -c 1 -W $FAIL $TGT 2> /dev/null | awk '/received/ {print $4}'`
            if [ "$RET" -ne "1" ]; then
              echo -n . >> $LOGFILE
              echo -n X >> $LOGFILE
          sleep $INT
          echo >> $LOGFILE
    # Limits log file to 300 lines to maintain size
        tail -300 $LOGFILE > /tmp/isp_tmp.log
          mv /tmp/isp_tmp.log $LOGFILE
    #########END OF SCRIPT FOR ICMP CHECKS#####################
    Here is the script that runs the cron job:

    # Creates daily backup of the day's ping history
    cat /tmp/isp.log | egrep `date +%b%d` >> /mnt/sda1/Router/ISP_Monitoring/isplog-`date +%Y%m%d`.log
    Since that cron script is so short, you could probably integrate it with the first script if you wanted to. Either way, hope others find this useful.
    rs232 likes this.
  3. Planiwa

    Planiwa LI Guru Member

    A small note: It is not so easy to measure "WAN link goes down".

    For example, pinging 3 hosts, torix.ca, google.com,, every 5 minutes:

    Jun 23 23:30:37 ROUTER user.notice root: google.com is unreachable!
    Jun 23 23:35:36 ROUTER user.notice root: google.com is unreachable!
    Jun 23 23:40:36 ROUTER user.notice root: google.com is unreachable!
    Jun 23 23:45:35 ROUTER user.notice root: google.com is unreachable!
    Jun 23 23:50:35 ROUTER user.notice root: google.com is unreachable!
    Jun 23 23:55:36 ROUTER user.notice root: google.com is unreachable!
    Jun 24 00:00:36 ROUTER user.notice root: google.com is unreachable!
    Jun 24 00:05:35 ROUTER user.notice root: google.com is unreachable!
    Jun 24 00:10:35 ROUTER user.notice root: google.com is unreachable!
    Jun 24 00:15:35 ROUTER user.notice root: google.com is unreachable!
    Jun 24 00:20:35 ROUTER user.notice root: google.com is unreachable!

    The 2 hosts, torix.ca, and continued to be ping-able.
    But the network was unusable for practical purposes.

    This problem also happened the two nights before. Each time it lasted between 30 minutes and 2 hours, and restarting WAN did not fix it. It fixed "itself", probably a network problem beyond the router.
  4. koitsu

    koitsu Network Guru Member

    This script is probably pissing off the Verizon/GTEI administrators, by the way. Please do not ping or like this, at least not in a constant while loop. Once a day, or once a reboot, etc. is perfectly reasonable, but constantly is not nice. It is this kind of behaviour that will get the and resolvers shut off to the public. If you think I'm kidding, I recommend you look at this exact case (where consumer router vendors were crapping all over some public NTP servers):


    Instead, pick something like (if your ISP will let you) `nvram get wan_gateway` as your ping destination instead. I would STRONGLY suggest this, because it's a device very close to you, rather than across the Internet -- more on that in a moment -- or better yet, use one of your ISP's DNS servers as a destination.

    And do not use FQDNs (ex. "google.com") as a destination. This assumes DNS resolution is available, which may not be the case.

    P.S. -- What Planiwa said is accurate. Your "WAN test" script assumes the Internet is always reliable/up/working when in fact it isn't. The Internet is broken 24x7x365, and I am quite serious when I say that. That applies to all kinds of traffic, including DNS, ICMP ping, etc..
  5. mstombs

    mstombs Network Guru Member

    I have a wan check script posted over there,


    and I seem to violently agree with Koitsu, occasional failures to ISP gateway and dns are common, ket alone external google dns by IP).

    I still haven't fixed the occasional dhcp renew failure, so I still use this script, and would like to incorporate the above logfile, sometime...
    koitsu likes this.
  6. Planiwa

    Planiwa LI Guru Member

    FWIW, I was talking about false negatives, not false positives. IOW, the fact that you can ping all sorts of hosts, from coast to coast, does not necessarily mean that all's well.

    There is a terrible bug involving Bell, "Fibe", and Sagemcom VDLS-2 modems, where it is quite possible to ping google, but it's impossible to browse google, or anywhere else.

    I now have a good test for this condition, and am working on rebooting the modem automagically, in response to this condition, via the web interface, using curl.

    I don't suppose there are any Curl Adepts reading this?
  7. koitsu

    koitsu Network Guru Member

    I'm quite familiar with curl (both the C API and the userland utility). I really wouldn't trust using that for determination of an outage condition either, though -- webservers can return status codes of all sorts that don't necessarily indicate an "unreachable" state, and if anything there's more overall resource waste/overhead compared to a simple ICMP echo/echo reply test.
  8. Planiwa

    Planiwa LI Guru Member

    Well, then, are you also familiar with the Sagemcom modem?

    Are you capable of crafting a Curl command line to push the "reboot" button, and then to push the "yes" button in the confirmation dialog?

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
    <title>Connection Hub</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>
          <meta name="author" content="Nafaa ZAYEN & Wael MAAOUI"></meta>
          <meta name="Copyright" content="Copyright (c) 2010 SAGEMCOM. All Rights Reserved. Residential Gateway Software Division www.sagem.com  This file is part of the SAGEM Communications Software and may not be distributed, sold, reproduced or copied in any way without explicit approval of SAGEM Communications. This copyright notice should not be removed in ANY WAY."></meta>
        <link rel="stylesheet" type="text/css" href="/css/fonts.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/page.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/menu.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/header.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/contener.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/subcontener.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/array.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/hardware.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/button.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/styles.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/about.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/lbpopup.css"></link>
          <link rel="stylesheet" type="text/css" href="/css/examples.css"></link>
          <link media="all" rel="stylesheet" type="text/css" href="/css/style.css"></link>
        <!--[if IE]>
              <style type="text/css" >BODY {
                      BEHAVIOR: url("css/csshover.htc")
            }        </style>
          <script type="text/javascript" src="/js/jquery-1.4.3.min.js?gteRfvPJZHWmH5rlMtx71GWneps8dPP"></script>
          <script type="text/javascript" src="/js/script.js?gteRfvPJZHWmH5rlMtx71GWneps8dPP"></script>
          <script type="text/javascript" src="/js/control.js?gteRfvPJZHWmH5rlMtx71GWneps8dPP"></script>
          <script type="text/javascript" src="/js/md5.js?gteRfvPJZHWmH5rlMtx71GWneps8dPP"></script>
          <script type="text/javascript" src="/js/jquery-impromptu.3.1.min.js?gteRfvPJZHWmH5rlMtx71GWneps8dPP"></script>
          <script type="text/javascript" src="/js/Common.js"></script>
          <div class="main">
    js_ConfirmDialoglang = "0";
    function FormSubmit(action) {
      document.formu.factory_reset.value = action;
      if (action == 'reset') AffichagePopupReset();
      if (action == 'reboot') AffichagePopupReboot();
                <form name="formu" method="POST" action="index.cgi">
                <input type="hidden" name="page" value="resets">
                <input type="hidden" name="update" value="0">
            <input type="hidden" name="action" value="submit">
                <input type="hidden" name="sessionid" value="gteRfvPJZHWmH5rlMtx71GWneps8dPP">
        <div id="Contentheader" class="headermenu">
            <img src="/images/header/logo.gif" align="left" alt="logo"></img>
            <img src="/images/header/right.gif" width="5" alt="box" style="position:absolute; left:1010px;"></img>
            <div id="GatewayTitle" align="center">Connection Hub</div>
                  <a href="JavaScript:changeLang('en')" style="position:absolute;height:16px;top:60px; left:850px">English</a>
                  <span class="separator" style="position:absolute; top:60px; left:895px"> | </span>
                  <a href="JavaScript:changeLang('fr')" style="position:absolute;height:16px;top:60px; left:905px">Fran├žais</a>
    <!-- Main content page -->
    <div class="contentpage">
    <table class="contener">
      <td class="menu">
    <table class="tab_menu">
    <td class="top_menu">
    </td><td class="top_left_menu"></td>
    <td class="main_menu" colspan="2">
      <li class="title none"> <a href="JavaScript:GoPageConfirm1('index.cgi?page=home&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Home</a></li>
      <li class="item none"> <a href="JavaScript:GoPageConfirm1('index.cgi?page=language&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Language</a></li>
      <li class="item none"> <a href="JavaScript:GoPageConfirm1('index.cgi?page=admin&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Account settings</a></li>
      <li class="item selected"> <a href="JavaScript:GoPageConfirm1('index.cgi?page=resets&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Resets</a></li>
      <li class="separator"><a></a></li>
      <li class="title">Settings</li>
      <li class="item none"><a href="JavaScript:GoPageConfirm1('index.cgi?page=settings&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Internet</a></li>
      <li class="item none"><a href="JavaScript:GoPageConfirm1('index.cgi?page=ssid&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Wireless</a></li>
      <li class="item none"><a href="JavaScript:GoPageConfirm1('index.cgi?page=dhcp&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Network</a></li>
      <li class="item none"><a href="JavaScript:GoPageConfirm1('index.cgi?page=game&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Games and Applications</a></li>
      <li class="item none"><a href="JavaScript:GoPageConfirm1('index.cgi?page=advanced_settings&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Advanced Settings</a></li>
      <li class="separator"><a></a></li>
      <li class="title">Device Status Management</a></li>
      <li class="item none"><a href="JavaScript:GoPageConfirm1('index.cgi?page=table_device&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Network Device(s)</a></li>
      <li class="item none"><a href="JavaScript:GoPageConfirm1('index.cgi?page=storage_devices_connected&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">USB Connected Device(s)</a></li>
      <li class="item none"><a href="JavaScript:GoPageConfirm1('index.cgi?page=ethernetStats&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Statistics</a></li>
      <li class="item none"><a href="JavaScript:GoPageConfirm1('index.cgi?page=log&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">System Log</a></li>
      <li class="item none"><a href="JavaScript:GoPageConfirm2('index.cgi?page=netper&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">Utilities</a></li>
      <li class="separator"><a></a></li>
        <li class="about none"><a href="JavaScript:GoPageConfirm1('index.cgi?page=about&sessionid=gteRfvPJZHWmH5rlMtx71GWneps8dPP')">About Connection Hub</a></li>
    </td><td class="bottom_menu"></td><td class="bottom_left_menu"></td></tr></table>
      <td class="space"></td>
      <td class="content">
      <!-- Start content page -->
      <table class="content">
      <tr><td class="top_content"></td><td class="top_left_content"></td></tr>
      <td class="main_content">
    <div id="contener">
      <div class="titlepage">Resets</div>
    <div class="info">
    <tr><td><h2>Factory Reset</h2></td></tr>
    <tr><td><input type="hidden" name="factory_reset"></td></tr>
    <tr><td>Resetting your Connection Hub will revert all settings back to factory defaults.</td><td><div class="inputBtnRed"><a title="Click to Factory Reset" href="JavaScript:FormSubmit('reset');">Reset</a></div></td></tr>
    <div class="info">
    <tr><td><input type="hidden" name="factory_reboot"></td></tr>
    <tr><td>This button will reboot your Connection Hub.</td><td><div class="inputBtnRed"><a title="Click to reboot" href="JavaScript:FormSubmit('reboot');">Reboot</a></div></td></tr>
      <td class="right_content"> </td>
      <tr><td class="bottom_content"></td><td class="bottom_left_content"></td></tr>
      <!-- End content page -->
    <!-- End Main content page -->
    <!-- footer -->
    <div id="contentfooter">
    <!-- fin footer -->
  9. koitsu

    koitsu Network Guru Member

    Doable: yes. Entirely in a single curl command: no.

    I can clearly see some dynamic hidden HTTP POST variables used there (like "sessionid"), which would need to be passed properly every time. And god only KNOWS what the JavaScript is doing, because there's boatloads. :(

    The "Yes" button is purely Javascript based, from what I can tell -- there might be a form variable that gets passed via the HTTP POST, but can't tell. You'd need to sniff an HTTP session visiting that page + clicking the Reboot button + Yes button, then provide all the HTTP parameters submit, then use all those to create your query (POST) string. But like I said, some variables are dynamic, and those would need to be properly passed every time.

    This gets quite messy in shell very quickly, and is error-prone. This is a better job for perl + LWP.

    What does this have to do with Tomato? (Answer: nothing :) ). But it's something you could pay some random dude online to write for you. Would be a couple hours of work.
  10. bdf0506

    bdf0506 Serious Server Member

    Thanks for all the replies, guys. I wasn't so much trying to make sure that is always up, but I am just tracking it since i've been having issues with my internet connection going in and out. I was a bit hesitant about pinging so much, but didn't really have a better choice for these tests. Comcast doesn't let me ping their DNS server that often (I haven't figured out what frequency is allowed), but I think I could probably switch to pinging the WAN gateway.

    The line will go out a couple of times a day, and I was trying to narrow it down to see when it would go down, how long, etc. Unfortunately, my modem (Cisco DPC3008) doesn't have a log in it that is accessible to me, so this is really my only option.

    Once i get this squared away, I'll be able to lessen the ping frequency from every 10 seconds to probably once a minute.
  11. Planiwa

    Planiwa LI Guru Member

    The basic question remains:

    1. How to ascertain, effectively, automatically, whether a remotely managed network is (still) properly connected to the Internet.

    Once a practical and reliable method has been found to ascertain this, the next question is:

    2. How to ascertain that the problem is not transient, but does require action.

    Once it has been determined that action is required, the next question is:

    3. What should be done?

    And then:

    4. How to do it?

    And, once we have all these components:

    5. How to connect all the pieces?

    And we must not forget:

    6. How to prevent our Auto-Magic from getting out of control!

    And ultimately:

    7. How to recover when the auto-magic refuses to accept our idea of what is impossible, and does get out of control. :)

    . . .

    Some folks find it quite adequate to reboot everything once an hour, whether it needs it or not.

    Others are interested in understanding, and in best practices. This forum has the potential for collaboration towards sharing best practices and best tools, and helping each other towards insight and understanding.
  12. Planiwa

    Planiwa LI Guru Member

    Well, it took an entire week for this terrible Bell Canada bug to show up again. (Remember, usually when network is moving a lot of data, typically torrenting, it enters a state of "ping-no-browse". In this state LAN users can typically ping various hosts, such as www.google.com, but cannot browse. It's actually a little more complex than that. For example, my remote ssh session persisted, but the remote web sessions were unresponsive.)

    Tonight it finally happened. A Curl command successfully determined that Google was not browsable (though pingable), and after ensuring the situation called for it, another Curl command established an HTTP session to the modem and rebooted it.

    Jul  8 20:31:28 ROUTER user.notice root: [vit]: ALERT: www.google.com: NOBROWSE 4 -- curl: (6) name lookup timed out
    Jul  8 20:31:28 ROUTER user.notice root: [vit]: ALERT: We have a Connectivity Problem 4 XXX
    Jul  8 20:36:13 ROUTER user.notice root: [vit]: ALERT: www.google.com: NOBROWSE 3 -- curl: (28) Failed to connect to 2607:f8b0:400b:80a::1014: Network is unreachable
    Jul  8 20:36:13 ROUTER user.notice root: [vit]: ALERT: We have a Connectivity Problem 3 XXX
    Jul  8 20:41:13 ROUTER user.notice root: [vit]: ALERT: www.google.com: NOBROWSE 3 -- curl: (28) Failed to connect to 2607:f8b0:400b:807::1010: Network is unreachable
    Jul  8 20:41:13 ROUTER user.notice root: [vit]: ALERT: We have a Connectivity Problem 3 XXX
    Jul  8 20:46:13 ROUTER user.notice root: [vit]: ALERT: www.google.com: NOBROWSE 3 -- curl: (28) Failed to connect to 2607:f8b0:400b:80b::1011: Network is unreachable
    Jul  8 20:46:13 ROUTER user.notice root: [vit]: ALERT: We have a Connectivity Problem 3 XXX
    Jul  8 20:46:13 ROUTER user.notice root: [vit]: ############# REBOOTING MODEM TO FORCE RESYNC #############
    Jul  8 20:47:41 ROUTER daemon.notice pppd[31925]: Serial link appears to be disconnected.
    Jul  8 20:47:47 ROUTER daemon.notice pppd[31925]: Connection terminated.
    Jul  8 20:47:47 ROUTER daemon.notice pppd[31925]: Modem hangup
    Jul  8 20:50:10 ROUTER user.notice root: [vit]: ALERT: -- UNREACH 1 -- PING ( 56 data bytes ping: sendto: Network is unreachable
    Jul  8 20:50:11 ROUTER user.notice root: [vit]: ALERT: www.google.com -- UNREACH 2 -- PING www.google.com (2607:f8b0:400b:80b::1011): 56 data bytes ping: sendto: Network is unreachable
    Jul  8 20:50:11 ROUTER user.notice root: [vit]: ALERT: www.google.com: NOBROWSE 5 -- curl: (7) Failed to connect to 2607:f8b0:400b:80b::1011: Network is unreachable
    Jul  8 20:50:11 ROUTER user.notice root: [vit]: ALERT: We have a Connectivity Problem 5 XXX
    Jul  8 20:50:11 ROUTER user.notice root: [vit]: Modem reboot deferred -- only 238 seconds since last reboot
    Jul  8 20:50:38 ROUTER user.info redial[31926]: WAN down. Reconnecting...
    Jul  8 20:50:39 ROUTER daemon.warn pppd[20177]: Connected to 00:90:1a:a2:fb:42 via interface vlan2 
  13. haarp

    haarp LI Guru Member

    Maybe you could check the router logs for disconnects.

    Tomato should really, really add a "WAN down" section to the scripts.
    Elfew likes this.
  14. Elfew

    Elfew Addicted to LI Member

    +1 for that :D

Share This Page