Shibby 108: openvpn problem?

Discussion in 'Tomato Firmware' started by rs232, Apr 7, 2013.

  1. rs232

    rs232 Network Guru Member

    HI all, I've upgraded my routers to shibby 108 and I'm now experiencing a vpn problem.

    Basically I have 2 NASes running a rsync script overnight and they have network reachability granted by tomato using openvpn. I've used this setup for years now.
    For some strange reason now rsync interrupt the transfer randomly.

    I have to be honest: I upgraded the routers and changed NASes (from nas4free to openmediavault) pretty much at the same time; this obviously doesn't help the troubleshooting but the rsync scripts are the same old ones.

    e.g. I've just run the rsync script and after 22minutes I got:
    rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
    rsync: connection unexpectedly closed (216789 bytes received so far) [sender]
    rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at io.c(601) [sender=3.0.7]
    Another attempt stopped after about 6 hours and 1/2 minutes

    rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(543) [sender=3.0.7]
    real    328m27.819s
    user    0m36.058s
    sys    0m13.853s
    A third attempt:
    rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
    rsync: connection unexpectedly closed (92190 bytes received so far) [sender]
    rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at io.c(601) [sender=3.0.7]
    real    23m25.217s
    user    0m10.297s
    sys    0m3.432s
    is anybody experiencing similar VPN problems?

  2. koitsu

    koitsu Network Guru Member

    What makes you think this is a VPN problem? You've only provided logs from rsync, which is clearly showing that its receiving SIGPIPE, which is a result of the fd it's reading from (for network I/O) gets either closed or times out (and then closed). You have not provided VPN logs.

    I'm also going to assume the "rsync error: received SIGINT/SIGTERM/SIGHUP" errors are induced by you killing the process and not something else doing it (a socket timeout/failure/etc. will not cause this message).

    Check your VPN logs (client and server) and correlate timestamps with when your rsync starts/ends (fails).

    If you find something in those logs that indicates the VPN is going down, then the reason for that is probably something network related. If this VPN tunnel is going across the Internet, the Internet is broken, and does break, 24x7x365. Set up some monitoring (specifically periodic mtrs (not traceroutes) from both the VPN client IP destined to the VPN server IP, as well as the VPN server IP to the VPN client IP. You need both directions due to the asymmetrical nature of routing on the 'net these days)
  3. rs232

    rs232 Network Guru Member

    Right a little step forward.

    I have restricted the problem with the rsync over openvpn on a two hours basis, and specifically 2:30-4:30-6:30 a.s.o.


    According to the openvpn client config 120 (minutes) is currently set in the "poll" field.

    My points are:
    1) I did put 120 there since my first tomato-openvpn config (few years ago) and never experienced any problem
    2) I don't see any referent to this "poll" in the config file (/etc/openvpn/config.ovpn ) any ways

    P.S. Thanks Koitsu for the input you gave me some good ideas on how to troubleshoot. The tomato openvpn log looks totally normal on both sides. btw the rsync messages I posted above are automatic, I didn't kill anything manually.
  4. koitsu

    koitsu Network Guru Member

    If you remove OpenVPN from the picture, does the problem go away?

    If so, see if there is a way to increase the logging verbosity of OpenVPN.

    Otherwise, what you could be experiencing is some ISP-related issue that happens at certain hours, or possibly something relating to your line.
  5. Monk E. Boy

    Monk E. Boy Network Guru Member

    A simplistic test would be to ping the remote system (perpetually) and run an rsync in parallel with it. If the pings never fail then the VPN tunnel is (probably) staying up, and whatever this problem is it's unrelated to the tunnel.
  6. koitsu

    koitsu Network Guru Member

    You would need to run pings to multiple endpoints simultaneously -- specifically one to the VPN endpoint, another to the router (on the LAN itself, to rule that out), and finally another to an Internet host/destination (not a router -- you can use my box,, if you need a destination that does not filter/rate-limit ICMP). You could do all of this with mtr except you'd need two of those going (one to the VPN endpoint, and another to an Internet destination) simultaneously.

    If PPPoE is being used on the router, that complicates the matter even further.
  7. rs232

    rs232 Network Guru Member

    Ok, I run all the tests and finally got to the bottom of this. The problem IS NOT tomato related.
    For reference when you run an rsync from NAS-A to NAS-B a "rsync --server" process can be found on the target. The reason rsync was killed periodically was the mirror script on NAS-B pushing data to NAS A and starting with a "killall rsync" command

    I've changed the killall with something like this:
    ps | grep rsync | grep -v server | grep -v grep| cut -d " " -f1 | while read f; do kill -9 $f; done
    and now everything works fine.

    I still have a bad taste in my mouth thought as the old script starting with killall used to work perfectly on freebsd based NASes, this problem appeared switching to linux a based NAS.

    Any ways thanks a lot for the inputs they really helped me!
  8. koitsu

    koitsu Network Guru Member

    Multiple things:

    1. There is no point to the ps | grep | grep -v | grep -v | ... madness. I've seen this so many times over the years that it's depressing. :-( ps | egrep '[ ]rsync' is a common way to keep ps|grep from matching grep itself (the trick lies in use of the brackets for a regex indicating a space), but keep reading for a better idea. Also, on Linux you can just use grep instead of egrep, since Linux grep has some basic (not extended) regex support; however on other OSes (like Solaris) grep is very plain. On FreeBSD we use GNU grep, however that will be changing very very soon (replacing it with bsdgrep).

    2. You can use awk on Linux to do all of what you need: ps | awk '/[ ]rsync/ { print $1 }' | while ..., since Linux awk (which is actually gawk), even Busybox awk, supports some basic regex. This is different from other OSes (ex. FreeBSD awk is not gawk). However, keep reading.

    3. You can use pidof to get multiple processes, at least on Busybox. For example, pidof rsync would return all processes that contain rsync in their process argument list (including process name). It anchors against the end of the process name, e.g. pidof http would match nothing, while pidof httpd would. The PIDs returned are space-delimited, so you can literally do this: kill `pidof rsync` since kill supports killing more than one PID if provided -- thus no while loop needed, which is good (no SIGPIPE to deal with or other anomalies in the case all the greps don't work/don't match anything).

    4. Do not get in the habit of using cut, particularly when you cannot guarantee the string will not contain more than one of the delimiter in a row. Use awk for those situations (in fact, it's almost always what you want), not cut. Only use cut when you know exactly how many delimiter characters there will be in a row. A great example is ps | grep httpd | cut -d' ' -f1, which for 4-digit PID numbers (ex. " 1234" -- note the space!) would return a space (visually looks like nothing) while for 5-digit PIDs would return the PID; you'd have to work idiotic magic into things to figure out if you should use -f1 (for 5-digit PIDs) or -f2 (for 4-digit PIDs) or -f3 (for 3-digit pids) and so on. Instead, awk '{ print $1 }' would return the PID no matter what its length, and only the PID (no preceding spaces).

    5. Do you really need to use kill -9 (i.e. SIGKILL) rather than just kill (i.e. SIGTERM)? It is an extremely bad habit to get into using -9. In fact, in some cases this can be dire. Please check to see if the processes can just be killed off without using -9.

    6. I'm glad to see you learned your lesson with killall. I've warned people in the past on this forum about using killall. Like the above items, it's a very bad habit to get into. Do not use killall. And if you ever run Solaris, you will learn very, very quickly another reason not to use it.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice