OpenVPN Client: The case of the disappearing routing table

Discussion in 'Tomato Firmware' started by eibgrad, May 15, 2019.

  1. eibgrad

    eibgrad Network Guru Member

    Another oldie but goodie.

    IMO, the OpenVPN client (and server for that matter) have a serious design flaw. And I believe it's because the original developer(s) didn't fully appreciate the importance of the OpenVPN scripting engine to the proper configuration and management of the service.

    A proper implementation of OpenVPN necessarily requires use of the OpenVPN scripting engine. Without it, the router remains oblivious to certain important events during the connection lifecycle between the OpenVPN client and server. Case and point; soft restarts.

    As I discussed in my previous posting, Routing Policy creates an alternate routing table that has as its default gateway the VPN, so that specific devices can be routed through it, while others remain routed through the main routing table and over the WAN/ISP.

    The problem w/ NOT using the OpenVPN scripting engine is that sometimes OpenVPN will trigger what's called a soft restart. This occurs for various reasons (e.g., one side issues a reset for some reason). The net effect is that the connection between the OpenVPN client and server is briefly lost, then re-established. Most users wouldn't even be aware this is happening (but it will show in the syslog). It happens quietly, quickly, and randomly. And when it happens, *all* the routing information relative to the VPN connection is lost and needs to be re-established. The routing system automatically flushes any and all references to the OpenVPN network interfaces before adding them back once the connection is re-established. But since the OpenVPN client isn't listening for the up/down events to be triggered, it's oblivious to what has just happened, and therefore fails to rebuild the alternate routing table!

    Suddenly and without warning, your OpenVPN client is routing all your previous VPN traffic back over the WAN/ISP. If you happen to have a kill switch, at least that traffic gets blocked. But it will confuse the average user because they will have no idea why the OpenVPN connection has stopped working. In the worst case, as I said, the traffic gets routed back out the WAN/ISP.

    If you want to see this for yourself, it's pretty simple to reproduce. Configure and connect the OpenVPN client (let's assume #1 for this example) to the OpenVPN server, and then open three (3) separate shells (telnet/ssh) to the router. In the first shell, issue the following command.

    Code:
    watch -n5 "ip route show table main; echo; ip route show table 311"
    This will monitor the relevant routing tables. Let that get well established for a minute or two. You want to see the default route to the VPN appear in table 311 before proceeding.

    In the second shell, issue the following command.

    Code:
    watch "tail -n15 /var/log/messages"
    This monitors the syslog.

    Finally, in the third shell, issue the following command.

    Code:
    killall -s SIGUSR1 openvpn
    This forces a soft restart. Watch what happens to the alternate routing table, 311. Eventually it just disappears! And never to return. And if you look at the syslog, you'll see the connection has been lost and re-established.

    But it gets even worse. Not only is the alternate routing table managed outside OpenVPN's scripting engine, but also the application of firewall rules. What most ppl don't realize is that whenever *any* OpenVPN client is initialized, the router (NOT OpenVPN) reinitializes *all* the firewall rules for *all* the other active OpenVPN clients too! I ran into this problem while developing my own PBR (policy based routing) scripts. I would have one OpenVPN client configured and supplemented w/ my own firewall rules, then start the second OpenVPN client, and it would wipe out all my firewall rules from the first OpenVPN client!

    Admittedly, these are very tough problems to fix because the real mistake was NOT using the OpenVPN scripting engine to manage the service. And to now correct that mistake would require a major overhaul. I assume that's why Merlin uses the scripting engine w/ his own firmware, despite that firmware being a tomato variant. He recognized the problem and decided to do things properly.

    I don't see an easy solution here. Like most of the bugs I'm now reporting, this one has been around ever since Routing Policy was introduced by Shibby. And in the early days, I wrote a script to address it.

    https://pastebin.com/sgNGsjaa

    Yes, it dates back to 2015! And it's pretty crude. I think I wrote in like 20 mins. And here's the initial identification of this bug way back when.

    https://www.linksysinfo.org/index.p...uting-policy-empty-routing-table-“111”.71746/

    But the fix, as currently written, will only work w/ Shibby tomato and prior tomato variants. FreshTomato now uses a different set of table IDs for its OpenVPN clients (311, 312, an 313). And it has a one more OpenVPN client. So the script needs to be updated (which I'm currently doing).

    Although the concept behind the fix/hack works (most of the time anyway), it's one crappy solution. It parses the output from ifconfig to find the tunnel's network interface, picks off the IP assigned to the OpenVPN client, and uses that as the VPN's default gateway. But that's not guaranteed to work 100% of the time. The VPN provider *might* require a special gateway for his OpenVPN users. But of course, that's only made available to OpenVPN clients who are using the scripting engine!

    Anyway, I don't really expect this bug/problem to be fixed, at least not anytime soon. The "problem" is bigger than this one bug. It's more a case of putting ppl on notice regarding the reliability of Routing Policy wrt either Shibby or FreshTomato, and how to deal with it.
     
    Last edited: May 15, 2019
    rs232, Bunsen, Malakai and 1 other person like this.
  2. jerrm

    jerrm Network Guru Member

    Agree moving to openvpn scripting would be the ideal choice. Just make sure there are some well defined pre and post hooks into the script.

    I've never had need for the Tomato routing policy, but even for my more modest site-to-site needs I've always elected "custom" rules and used OpenVPN scripting.
     
  3. eibgrad

    eibgrad Network Guru Member

    FWIW, this is exactly what Merlin has done w/ his AsusWRT firmware. He's added an event model of his own that include one called openvpn-event. If this file exists in the /jffs/scripts directory, then whenever the up/down or route-up/route-pre-down events are called in OpenVPN, he uses those events for his own purposes, and then passes those same events to the openvpn-event script so you have the opportunity to make your own changes. That's the correct way to do it.

    Even though dd-wrt uses the OpenVPN scripting engine, there are no such similar hooks. In fact, you can't even override the router's use of these directives/events in Additional Config because they've been added to the command line (!), which always overrides anything in the config file. Ugg.

    Very few have gotten this completely right. So far, Merlin seems to have come the closest.
     
  4. eibgrad

    eibgrad Network Guru Member

    P.S. What I'm considering is doing the same thing as Merlin. If tomato is not going to use the scripting engine, then perhaps I'll use it, and create a similar event model w/ hooks. My scripts will fix these various problems, then call your scripts for the same events.
     
  5. jerrm

    jerrm Network Guru Member

    Since not adopting all of @RMerlin's event scheme, I would "tomato-fy" it and look for "*.openvpn" scripts following the established /etc/config, /jffs/etc/config, /opt/etc/config, /mmc/etc/config, /tmp/config convention.

    What happens with Tomato's PBR info if the firewall rules are set as "custom." Is everything ignored, or does Tomato still try to set up routing. Just wondering if selecting custom and then using whatever you come up with would be a way to get an openvpn scripting solution in the wild and tested in a way that could ultimately be a drop in replacement for the current rc generated code.
     
  6. pedro311

    pedro311 Networkin' Nut Member

    I'm working on it right now.

    BTW: I did a test you described on FreshTomato 2019.2 and - besides this weird made routing - table 311 after soft restart it still there. So I don't know on what version you have done your test.
    //EDIT: I implemented a kill switch so I would probably notice lack of connection on clients using openvpn.
     
    Last edited: May 16, 2019
  7. pedro311

    pedro311 Networkin' Nut Member

    OK, so @eibgrad should be happy (at least with this issue):

    Code:
    May 17 17:11:26 router daemon.notice openvpn[3885]: OpenVPN 2.4.7 arm-unknown-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on May 17 2019
    May 17 17:11:26 router daemon.notice openvpn[3885]: library versions: OpenSSL 1.0.2r  26 Feb 2019, LZO 2.10
    May 17 17:11:26 router daemon.warn openvpn[3886]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
    May 17 17:11:26 router daemon.notice openvpn[3886]: UDPv4 link local: (not bound)
    May 17 17:11:26 router daemon.notice openvpn[3886]: UDPv4 link remote: [AF_INET]xxx.xxx.xxx.xxx:1234
    May 17 17:11:30 router daemon.notice openvpn[3886]: TUN/TAP device tun11 opened
    May 17 17:11:30 router daemon.notice openvpn[3886]: /sbin/ifconfig tun11 10.8.0.6 pointopoint 10.8.0.5 mtu 1500
    May 17 17:11:30 router daemon.notice openvpn[3886]: updown.sh tun11 1500 1552 10.8.0.6 10.8.0.5 init
    May 17 17:11:31 router user.notice openvpn-updown.sh[4259][tun11]: Creating firewall rules for client1
    May 17 17:11:31 router user.notice openvpn-updown.sh[4259][tun11]: Running firewall rules for client1
    May 17 17:11:31 router user.notice openvpn-updown.sh[4259][tun11]: Clean-up routing
    May 17 17:11:31 router user.notice openvpn-updown.sh[4259][tun11]: Starting routing policy for VPN client1 - Interface tun11 - Table 311 - GW 10.8.0.5
    May 17 17:11:31 router user.notice openvpn-updown.sh[4259][tun11]: Type: 1 - add xxx.xxx.xxx.xxx
    /.../
    May 17 17:11:31 router user.notice openvpn-updown.sh[4259][tun11]: Type: 3 - add google.com
    /.../
    May 17 17:11:46 router user.notice openvpn-updown.sh[4259][tun11]: Running firewall routing rules for client1
    May 17 17:11:46 router user.notice openvpn-updown.sh[4259][tun11]: Done running firewall routing rules for client1
    May 17 17:11:46 router user.notice openvpn-updown.sh[4259][tun11]: Completed routing policy configuration for client1
    May 17 17:11:47 router daemon.notice openvpn[3886]: Initialization Sequence Completed
     
    M_ars and Malakai like this.
  8. eibgrad

    eibgrad Network Guru Member

    That doesn't look like FreshTomato 2019.2. I'm using the same firmware on ASUS RT-AC68U and can reproduce the problem at will. I have to in order to complete the fix I'm working on (almost done).

    That looks more like YOU created your own scripts and added them to the Custom Config field, which is irrelevant to this current problem w/ the Routing Policy tab.

    Code:
    up <path>/openvpn-updown.sh
    down <path>/openvpn-updown.sh
    When Routing Policy is active, you'll see lines like the following:

    Code:
    May 15 09:43:06 tomato-lab2 daemon.notice openvpn[22168]: OPTIONS IMPORT: --ifconfig/up options modified
    May 15 09:43:06 tomato-lab2 daemon.notice openvpn[22168]: OPTIONS IMPORT: peer-id set
    May 15 09:43:06 tomato-lab2 daemon.notice openvpn[22168]: OPTIONS IMPORT: adjusting link_mtu to 1625
    May 15 09:43:06 tomato-lab2 daemon.notice openvpn[22168]: OPTIONS IMPORT: data channel crypto options modified
    May 15 09:43:06 tomato-lab2 daemon.notice openvpn[22168]: Data Channel: using negotiated cipher 'AES-256-GCM'
    May 15 09:43:06 tomato-lab2 daemon.notice openvpn[22168]: Initialization Sequence Completed
    May 15 09:43:08 tomato-lab2 user.notice vpnrouting[22173][tun12]: Got gateway for tun12 - IP 10.10.0.42 - ID 312
    May 15 09:43:08 tomato-lab2 user.notice vpnrouting[22173][tun12]: Type: 1 - add 192.168.1.14
    May 15 09:43:08 tomato-lab2 user.info preinit[1]: igmpproxy is stopped
    May 15 09:43:08 tomato-lab2 user.notice vpnrouting[22173][tun12]: Completed routing policy configuration for client2
    Notice the use of "vpnrouting" and the fact it occurs *after* OpenVPN has completed initialization.

    All my reported bugs are solely about how the Routing Policy tab works (or doesn't) in the GUI, NOT about any other form of PBR (policy based routing), like your own PBR scripts. Obviously, you can always correct these problems yourself using your own PBR scripts. But I'd like to see it made possible to use Routing Policy reliably, at least for those ppl who would prefer to use the GUI.
     
  9. pedro311

    pedro311 Networkin' Nut Member

    Did you see that, right?

    And possibly, after this, you can deduce that this is a log after changes that will happen, right?
    So don't be so sarcastic, please, or get yourself involved and introduce some of your code corrections instead of complaining all the time, right?
    No offence, BTW...
     
  10. eibgrad

    eibgrad Network Guru Member

    It finally dawned on me what you're trying to say. I *thought* you were claiming that you had run your own tests and discovered that FreshTomato 2019.2 was producing that output w/ openvpn-updown. I *now* realize you're claiming you've produced your own fix, and those are the results. Since you didn't post that fix, it's wasn't clear that's what you meant (at least to me). And so that's why I was making the point that this doesn't address the problem in Routing Policy in the GUI. It merely circumvents it, as any PBR scripts would do.

    I wasn't being sarcastic either (at least it was never my intent). That's one of the problems w/ all this text-based communications. Ppl assume you always understand the point they're trying to make. And sometimes language differences contribute to the problem. But it just wasn't clear (at least to me). And so I responded to what I thought you meant, and well..., it's downhill from there.

    Again, the only point I'm making is that Routing Policy is broken, and I've explained what to look out for. And yes, I will soon have a fix. If anyone wants to contribute their own fix/scripts, by all means, do so.
     
  11. jerrm

    jerrm Network Guru Member

    Looks good. Which OpenVPN script option are you using (up/route-up/etc)?

    Adding any support for "*.openvpn" scripts in */config?
     
  12. pedro311

    pedro311 Networkin' Nut Member

    Okay, so good to know it was just a small misunderstanding ;)
    I'm just testing this changes right now, and will upload as a commit to my repo in 1-2 days.

    Of course, I have to solve the remaining problems later, but you have to start with something...
     
  13. eibgrad

    eibgrad Network Guru Member

  14. pedro311

    pedro311 Networkin' Nut Member

    up/down

    What do you mean?
     
  15. jerrm

    jerrm Network Guru Member

    OpenVPN only allows one up entry. We need a hook for user scripts.

    Traditionally rc processes scripts in /*/config directories several events. For a firewall restart it is "*. fire" scripts.

    The generated up script could follow the same convention, looking for something like "*.openvpn" files.

    A few other options:
    • Not sure if the first or last "up“ config entry wins. You could make sure the Tomato generated entry is in the losing position and let the user script be responsible for calling the Tomato script.
    • Generate the script but don't add the config entry if the user selects custom firewall rules, letting the user decide when/if to use the Tomato script.
    • Add another option to the firewall dropdown to allow something similar.
    • User could potentially use one of the other OpenVPN script options, but up and down are the most common.
    As long as there are options for customization without having to completely give up the ease of the generated script doing most of the heavy lifting.



     
  16. eibgrad

    eibgrad Network Guru Member

    As far as OpenVPN event handling goes, later directives in the config file always override earlier directives of the same name. So even if tomato eventually used the OpenVPN scripting engine, we could rather easily override those events w/ our own (and of course, make sure to call tomato's scripts (if any) for those same events). As long as they didn't do what dd-wrt did and specify them on the command line, making them UN-overridable. However, you still run into the problem of *breaking* existing scripts, ones that have no idea of these changes.

    Once you've established the current situation w/ tomato, where the firmware is working entirely outside that OpenVPN scripting engine, it gets really tough to change course. I suspect that for those who want a more structured environment, Merlin may be the better choice.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice