1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

UDP: short packet

Discussion in 'Tomato Firmware' started by rs232, Aug 29, 2014.

  1. rs232

    rs232 Network Guru Member

    Can anybody shed light on this message I've found in my log?

    Code:
    Aug 29 14:25:55 tomato36k kern.debug kernel: UDP: short packet: From x.x.66.130:1026 252/20 to x.x.66.130:1196
    Source and destination IP appear to be the same and the 1196 destination port is used by openvpn as I can see. It sounds like a runt in a loop
     
  2. koitsu

    koitsu Network Guru Member

    Do the IP addresses involved look correct? If not:

    http://git.kernel.org/cgit/linux/ke.../?id=ccc2d97cb7c798e785c9f198de243e2b59f7073b
    http://git.kernel.org/cgit/linux/ke.../?id=2783ef23128ad0a4b34e4121c1f7ff664785712f

    What I'm having trouble figuring out is when the bad commit (2783ef2) got committed to Linux. The fix got put into 2.6.32 (not 2.6.34), and TomatoUSB uses 2.6.22 with numerous patches. I suppose I could go look at the kernel source in TomatoUSB myself to see if the saddr/daddr assignments were moved or not.

    If the IP addresses involved look correct**, then the above commits aren't relevant (meaning the kernel is so old we're not susceptible to that problem -- sometimes old is good (or lucky)), in which case an explanation of what the message means is mandated:

    The message literally means that the UDP payload length (defined in the UDP header) does not match the actual payload length received (the payload is smaller than the length). This could be caused by a whole ton of things, but the most likely explanation is a router or device somewhere (far away from you) is munging packets in a very bad way. You may want to report this to your VPN provider / endpoint, and they will need to do packet captures (as will you) to see what they're sending vs. what you're receiving. It's also a very likely possibility that a router somewhere on the Internet is munging this data (it happens more often than you might think).

    You could perform a packet capture on your router on the WAN interface, specific to whatever source address is involved, i.e. tcpdump -v -p -i vlan2 -l -n -s 65535 -w /tmp/capture.pcap "udp and port 1196 and host x.x.66.130", wait until you see one of those kernel messages, stop the capture (^C), then load capture.pcap into Wireshark and review the packets. You may need to turn off UDP checksum validation within Wireshark. Wireshark should probably show you if the packets are short, and you can count the payload size yourself.

    OpenVPN uses UDP by default. You could switch to TCP (on client and server) and your problem would likely go away. You also get the added benefit of TCP's built-in retry behaviour and things like RFC1323 / increased TCP window size.

    ** -- I'm questioning this because the source and destination look the same in your example (more specifically, the last two octets are identical -- what's the possibility of that?). So this possibly this is a different Linux kernel bug (solely in the logging).

    And I've mentioned this before: please do not hide or mask IP addresses when asking for networking troubleshooting/assistance. Network administrators will often ignore such requests, and all it does is unnecessarily complicate things.
     
  3. rs232

    rs232 Network Guru Member

    The IP are correct source IP and destination IP are the same (my WAN IP). It is common sense to hide part of the IP when posting on the forum no matter what, I've just replaced the first two octets with X. I agree I should have clearly stated that they are the same in the same post. Any ways...

    Why a packet goes from one port to another on the very same IP I don't understand, perhaps a level of complexity added by openvpn.

    I have been using UDP on openvpn for long time and seen this message only recently, I'm wondering if this has to do with the ARM build. I'll see if I can do some packet sniffing but not really keen to switch to TCP right now for different reasons.

    rs232
     
  4. koitsu

    koitsu Network Guru Member

    It doesn't -- it's almost certainly a bug in the Linux kernel, using the destination address for both source and destination when printing the message. This isn't the first time I've seen this. That's why I said:

    I'll look through the kernel source when I have time. It's probably been fixed in a newer version, but obviously we can't upgrade easily due to reliance on binary blobs. But if it is fixed, I can almost certainly backport whatever the patch is that fixed it in later Linux versions.

    I would be very surprised if it was specific to ARM; you'd be seeing it a lot more if it was (hint: DNS is mostly UDP). I think it's a lot more likely something else is happening upstream from you somewhere and that you're just getting short packets.
     
  5. koitsu

    koitsu Network Guru Member

    I've done some work on this today. Something very uncomfortable is afoot.

    The Linux kernel version (according to uname -a) on TomatoUSB is 2.6.22.19. I treat that to mean 2.6.22 with a bunch of modifications or manual patches/patchlevels.

    However, net/ipv4/udp/udp.c on TomatoUSB is not the same thing as on Linux 2.6.22. In fact, I think what we're using is code from Linux 2.6.19 that may be heavily modified. I haven't figured it out yet.

    Here's all the versions. I'm keying off of Toastman-RT-N branch because that's what I run:

    TomatoUSB: http://repo.or.cz/w/tomato.git/blob/refs/heads/Toastman-RT-N:/release/src/linux/linux/net/ipv4/udp.c
    Linux 2.6.22: https://github.com/torvalds/linux/blob/v2.6.22/net/ipv4/udp.c
    Linux 2.6.21: https://github.com/torvalds/linux/blob/v2.6.21/net/ipv4/udp.c
    Linux 2.6.20: https://github.com/torvalds/linux/blob/v2.6.20/net/ipv4/udp.c
    Linux 2.6.19: https://github.com/torvalds/linux/blob/v2.6.19/net/ipv4/udp.c

    If you look at all of those and compare them, you'll see that TomatoUSB's is much like 2.6.19. I was mainly focused on the function udp_rcv() and this piece of code:

    Code:
    1084 short_packet:
    1085         NETDEBUG(if (net_ratelimit())
    1086                  printk(KERN_DEBUG "UDP: short packet: %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
    1087                         NIPQUAD(saddr),
    1088                         ntohs(uh->source),
    1089                         ulen,
    1090                         len,
    1091                         NIPQUAD(daddr),
    1092                         ntohs(uh->dest)));
    
    That's from Toastman-RT-N. That code looks correct -- that is to say, it should be printing the source address, as well as the destination address. So that brings into question if saddr and daddr are actually being populated correctly. NIPQUAD() is a macro that simply splits up each octet of the address (hence why you see four %u expandos for printf() but only one NIPQUAD()).

    The initialisation is done here:

    Code:
    1028 int udp_rcv(struct sk_buff *skb)
    1029 {
    ....
    1034         u32 saddr = skb->nh.iph->saddr;
    1035         u32 daddr = skb->nh.iph->daddr;
    ....
    
    That also looks correct, code-wise, so possibly whatever is building/creating skb is what's wrong. This is why I was going through the Linux source code, commit by commit, to see if someone fixed this somewhere.

    But in 2.6.20 they appeared to overhaul or revamp udp.c a lot, so this is going to take me a lot of time to figure out. My gut feeling is that the bug is actually somewhere else in the kernel code (possibly populating the saddr part of the struct with daddr, incorrectly), and this is a huge rat's nest to follow. But I'm still going through it.
     
  6. koitsu

    koitsu Network Guru Member

    I haven't found a way to simulate this problem yet. I've been really hammering on hping3 to try and simulate this, but the tool is pretty damn adamant about sending the proper payload size along with whatever you specify to -d. I really need to be able to reproduce this reliably somehow.

    Otherwise my only choice is to try Emailing David S. Miller, or possibly LKML entirely, and asking if anyone can recollect a bug in the Linux source dating back to the 2.6.xx days (but not as recent as 2.6.20, which is where the UDP code was rehauled into something called "UDP-Lite", and a TON of bugs were introduced then -- many not getting fixed until 2.6.3x) where source/destination addresses being printed in some parts of the kernel network layer weren't correct (i.e. source address shown was the same as destination address). But I don't want to ask anyone until I have a way to reproduce this.

    Edit: One thing I did learn in the process is this: the 252/20 part of the line refers to two types of packet lengths used in the code. And boy I sure hope I got this right, because I don't understand the Linux sk_buff stuff much at all. But my educated guess is this: the 1st number (ulen) is the UDP length as reported per the UDP header. The 2nd number (len) is the actual length of the payload received.

    The code used by TomatoUSB for this printk() condition is as follows:

    Code:
    1050         if (ulen > len || ulen < sizeof(*uh))
    1051                 goto short_packet;
    1052
    1053         if (pskb_trim(skb, ulen))
    1054                 goto short_packet;
    
    I don't know which of those 2 conditions are matched, and I haven't taken the time to figure out what pskb_trim() does, but the first if() indicates if ulen is greater-than len then print the message, or, if ulen is less-than the size of the entire UDP header itself (uh = skb->h.uh, and uh is declared as a struct udphdr *).

    This is why me having a way to reproduce this would be quite useful.
     
    Last edited: Sep 1, 2014
  7. rs232

    rs232 Network Guru Member

    I should have added more details actually.
    The problem is experienced when connecting openvpn 2.3.4 from my windows laptop to tomato shibby 122 (I think 121 as well) on a R7000.
    Now I'd like to specify that the Internet connection at the laptop side is horrible to give a compliment. Not only the ADSL itself but also the wifi with at least 10 neighbours transmitting on overlapping channels.

    The client config looks like this:

    Code:
    dev tun
    proto udp
    remote xxxxx.xxxxx.xxx 1194
    ifconfig 10.10.3.2 10.10.3.1
    resolv-retry 30
    nobind
    persist-key
    persist-tun
    cipher AES-128-CBC
    verb 3
    secret static.key
    script-security 2
    float
    route-method exe
    route-delay 2
    keepalive 30 180
    tun-mtu 1500
    tun-mtu-extra 32
    mssfix 1450
    fragment 1450
    
    where the server like this :

    Code:
    daemon
    ifconfig 10.10.3.1 10.10.3.2
    proto udp
    port 1194
    dev tun21
    cipher AES-128-CBC
    keepalive 15 60
    verb 3
    secret static.key
    status-version 2
    status status
    
    # Custom Configuration
    script-security 2
    #keepalive 21600 86400
    verb 0
    mute 2
    tun-mtu 1500
    tun-mtu-extra 32
    mssfix 1450
    
    rs232
     
  8. koitsu

    koitsu Network Guru Member

    Well I can tell you up front I'm not going to get up a router with OpenVPN + etc. and test that. I just do not have the resources for that -- it'd require someone else as a guinea pig.

    Is your OpenVPN client connecting to the router across the WAN (i.e. Internet)? And the WAN link happens to be ADSL?

    If so, chances are it's PPPoE or PPPoA-based, which means the overall MTU payload that can go across the LAN is smaller than what you've configured (routers will try to deal with this but the client may end up sending packets with wrong things in them). OpenVPN may be also doing the wrong thing when it comes to fragmenting packets itself and trying to send them, meaning it actually could be responsible for this problem based on incorrect settings all over the place.

    If not (e.g. your OpenVPN client is on the same network as the router, either via wireless or LAN), then that might actually explain why the source/destination addresses in the kernel log are the same (and if so, that means you're connecting to the WAN IP across the LAN, which uses NAT loopback, which you shouldn't be doing (instead connect to the LAN IP)). But I doubt this is the case because I can't see what you'd gain from that. :)

    All that said: your OpenVPN config looks awful, like just random configuration variables thrown in randomly. I don't even know where to begin.

    1. You're using UDP, not TCP, so keep this in mind.
    2. You're using tun, not tap, so keep this in mind.
    3. You shouldn't bother setting tun-mtu or tun-mtu-extra. You should maybe consider using fragment or mssfix only (more on that in a moment -- just keep reading). tun-mtu-extra shouldn't be touched when using tun devices (the default value is 0); for tap it's a different situation (the default value is 32).
    4. You may want to experiment with mtu-disc maybe or mtu-disc yes which utilises ICMP Path MTU discovery. Again: you are using UDP, so that's why I mention this. If you were using TCP, tinkering with this may be a bad idea, as Tomato's firewall rules for forwarded packets, specifically for TCP, has this: tcp flags:0x06/0x02 TCPMSS clamp to PMTU (I'd need to look into what that actually does -- edit: ah here we go. And the --tcp-flags SYN,RST SYN reads in English "on a TCP packet, examine the SYN and RST bits, and this rule only applies if the SYN bit is set)). If using the OpenVPN directives, I recommend you remove fragment and mssfix from both configs altogether and just let the Linux IP stack do everything itself. That leads me to...
    5. You may want to use mtu-test once (as in set the parameter one time, run it, then after reviewing the results remove it). OpenVPN will then try different payload sizes to the destination, in attempt to figure out what the largest is. It takes ~3 minutes according to the documentation.
    6. If you're successful with #4 (utilising Path MTU discovery) then you probably do not need fragment or mssfix.
    7. If #4 isn't successful, then I suggest setting the maximum payload using fragment 1472 and mssfix (but do not specify a value to the mssfix param!). This 1472 value is 20 bytes smaller than my "guesstimated" 1492 MTU since ADSL is involved; the UDP header is 8 bytes in size, so I'm giving a lot more room than probably needed, but mtu-test may answer this definitively. This may need to be a little bit lower, like 1460. I'd really have to do packet captures on both ends and on multiple interfaces to actually see what's going on.
    8. I noticed in the server config verb is specified twice, once with 3, once with 2. I dunno if that's an effect of stock GUI settings vs. custom ones you've added or what, but increasing the verbosity of OpenVPN (on both client and server) might actually help you -- it might allow you to correlate a timestamp with something OpenVPN does/says with the timestamp of the UDP short packet message.

    This is all I can provide at the moment. I really do not like dealing with VPNs at all (in fact to be honest whenever I see a thread subject discuss them I make a deliberate effort to avoid the thread entirely at that point). I have used OpenVPN myself to bridge two networks (specifically two private (ex. 10/8) networks between two co-location facilities across the Internet; I didn't have to mess with MTU because everything was Ethernet) and it took me a lot of effort, but that was years upon years ago. Client->server setups are different (most people don't use bridging) and the goal there is a lot different.

    I'm sorry I can't help much otherwise. What bothers me the most about this thread isn't even the short UDP packet report -- it's that the kernel is reporting the same IP address for both source and destination.
     
    Last edited: Sep 1, 2014

Share This Page