1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

QOS malfunction in Toastman/Shibby/Advancedtomato [fixed, not an issue]

Discussion in 'Tomato Firmware' started by comego, Jan 8, 2017.

  1. comego

    comego Network Newbie Member

    Hi, I apologize for opening a new thread for this (simple) issue.

    I am having and issue with the default QOS settings in Toastman (but also shibby and advancedtomato) firmwares and I'm questioning whether I'm doing something wrong.

    When I enable the QOS and put a bandwidth limit of about 50% of my available bandwidth, I still see that speed tests (dslreports and ookla) are able to saturate the link to 100% on both upload and download speed. Same if I try at 5% of available bandwith.

    This makes me believe that either I'm doing something wrong or that there's an issue in QOS.

    I'm using the ARM build for a Netgear R6400 router.

    I'm no expert in linux tc but it looks like the limits are being applied to an intermediate dev (ifb0) and not directly on the wan device (vlan2).

    When the bandwidth limiter is enabled, a new qoslimit shell file is created in /tmp/etc/ which limits the bandwidth directly on the physical wan (vlan2). However, I can still max out my link (and have huge bufferbloat in the process).

    I'm short of ideas on what to try next (apart from studying the linux tc howto).

    Can you reproduce it ? Is it router specific ?

    Any hints?
     
    visceralpsyche likes this.
  2. comego

    comego Network Newbie Member

    I went ahead and there is definitely an issue with QOS. Basically, the default class is not respected and traffic escapes the classifier despite the "default" qdisc rule.

    I also found a typo that might prevent the wan_qos from being applied (hence the suggestion to reboot the routers after a qos change).
    I have no account in git so I hope a dev will pick up my suggested patches and merge them.

    First, the typo. The QOS script (wan_qos on shibby/advancedtomato, qos in toastman) written in /tmp/etc/, line needs to be changed like this (around line 76):
    Code:
    -        tc qdisc del dev $I ingress 2>/dev/null
    +        tc qdisc del dev $WAN_DEV ingress 2>/dev/null
    
    Then, the difficult (for me) part is understand why the default qdisc on the WAN is not being applied. Let's concentrate for a moment on the egress policy (ingress will be similar).
    Qdisc is created correctly:
    Code:
    # tc qdisc show dev vlan2
    qdisc htb 1: root refcnt 2 r2q 21 default 90 direct_packets_stat 0
    qdisc fq_codel 10: parent 1:10 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms ecn
    qdisc fq_codel 20: parent 1:20 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms ecn
    qdisc fq_codel 30: parent 1:30 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms ecn
    qdisc fq_codel 40: parent 1:40 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms ecn
    qdisc fq_codel 50: parent 1:50 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms ecn
    qdisc fq_codel 60: parent 1:60 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms ecn
    qdisc fq_codel 70: parent 1:70 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms ecn
    qdisc fq_codel 80: parent 1:80 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms ecn
    qdisc fq_codel 90: parent 1:90 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms ecn
    qdisc fq_codel 100: parent 1:100 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms ecn
    qdisc ingress ffff: parent ffff:fff1 ----------------
    
    However the "default 90" rule never matches and traffic escapes maxing out the available bandwith.
    If I mark packets at the beginning of PREROUTING like this:
    Code:
    iptables -I PREROUTING -t mangle -i br0 -j MARK --set-mark 0x9
    and send them to say, class 1:90 instead, the limits applied on the wan magically start working.

    EDIT: This does not interfere with later classification of the packets. The fwmark can be changed by later iptables/l7 classifiers and packets are assigned to a correct final class. The only issue is that unmarked packets are not assigned to the default class by HTB. The default class being 1:90.

    Now, I'm at a loss as to why the default qdisc is being ignored. The documentation of HTB clearly states that unclassified packets are assigned to the default class.

    ADDED: Here's the corresponding class tree created by default:
    Code:
    class fq_codel 100:339 parent 100:
    class fq_codel 100:33f parent 100:
    class fq_codel 10:335 parent 10:
    class fq_codel 10:339 parent 10:
    class fq_codel 30:339 parent 30:
    class fq_codel 50:339 parent 50:
    class fq_codel 50:33d parent 50:
    class fq_codel 60:339 parent 60:
    class fq_codel 70:335 parent 70:
    class fq_codel 70:33f parent 70:
    class fq_codel 80:33d parent 80:
    class fq_codel 90:1 parent 90:
    class fq_codel 90:335 parent 90:
    class htb 1:1 root rate 19500Kbit overhead 8 ceil 19500Kbit linklayer atm burst 1599b cburst 1599b
    class htb 1:10 parent 1:1 leaf 10: prio 1 rate 975Kbit overhead 8 ceil 3900Kbit linklayer atm burst 1599b cburst 1599b
    class htb 1:100 parent 1:1 leaf 100: prio 10 rate 195Kbit overhead 8 ceil 195Kbit linklayer atm burst 1599b cburst 1599b
    class htb 1:20 parent 1:1 leaf 20: prio 2 rate 975Kbit overhead 8 ceil 3900Kbit linklayer atm burst 1599b cburst 1599b
    class htb 1:30 parent 1:1 leaf 30: prio 3 rate 975Kbit overhead 8 ceil 4875Kbit linklayer atm burst 1599b cburst 1599b
    class htb 1:40 parent 1:1 leaf 40: prio 4 rate 975Kbit overhead 8 ceil 15600Kbit linklayer atm burst 1599b cburst 1597b
    class htb 1:50 parent 1:1 leaf 50: prio 5 rate 1950Kbit overhead 8 ceil 15600Kbit linklayer atm burst 1599b cburst 1597b
    class htb 1:60 parent 1:1 leaf 60: prio 6 rate 3900Kbit overhead 8 ceil 15600Kbit linklayer atm burst 1599b cburst 1597b
    class htb 1:70 parent 1:1 leaf 70: prio 7 rate 975Kbit overhead 8 ceil 15600Kbit linklayer atm burst 1599b cburst 1597b
    class htb 1:80 parent 1:1 leaf 80: prio 8 rate 975Kbit overhead 8 ceil 15600Kbit linklayer atm burst 1599b cburst 1597b
    class htb 1:90 parent 1:1 leaf 90: prio 9 rate 975Kbit overhead 8 ceil 17550Kbit linklayer atm burst 1599b cburst 1597b
    
    Help please.
     
    Last edited: Jan 9, 2017
    visceralpsyche likes this.
  3. comego

    comego Network Newbie Member

    So, after spending about a day trying to figure this out, I am going to give up. The qos scripts are "broken" on multiple targets and multiple releases (toastman, shibby, advancedtomato). They all share the basic stuff written around 2011-2012 although they call it differently.

    So far, any traffic that escapes classification will not be subjected to QOS and will escape the whole QOS tree. I found that the hard way because I have some vpn running on non standard ports.

    The fix must be easy, but I am short of time and the couple hours I studied tc and htb were only enough to let me read the scripts.

    As a workaround, I can offer you a patch, which can be implemented on any router, and which consists in marking all packets that enter the router in the default class, which for p2p/bulk is class 0x9 (1:90 in the tree above).

    As any modification in the iptables will wipe the whole ruleset, I resorted to using a cron job to reapply this mark in the prerouting mangle table like this:

    Code:
    iptables -t mangle -L PREROUTING|grep 'MARK set' >/dev/null || iptables -t mangle -I PREROUTING -j MARK --set-mark 0x9
    I run this every 3 minutes. If the rule is there it won't add another rule. After any change, the most time it takes is 3 minutes to reestablish proper qos.

    I really hope Toastman or the other qos gurus can look at this and fix it, but for the time being, I'm fine with this workaround.

    comego
     
  4. Toastman

    Toastman Super Moderator Staff Member Member

    Hi, I just read your posts and will take a look at it when I have time, hopefully it will be reproducible here. There are a few of us that might be able to fix it.
     
  5. Porter

    Porter LI Guru Member

    Hey,

    just checking: you haven't been using QoS and Bandwidth Limiter at the same time, correct?

    Could you please post your output of /tmp/etc/qos, /tmp/etc/iptables, ifconfig and screenshots of Basic/Network, QoS/Basic Settings (remember to scrub personal data first)?

    Are you using scripts in Administration/Scripts?
     
  6. comego

    comego Network Newbie Member

    Thank you for looking into it. Mine is a fairly stock setup. No fancy routing/scripts/forwarding. It is just plain router with a DHCP Wan interface and simple nat for the lan (br0 between eth and wifi). I'm not using the limiter, but I tried to enable it for a test and it made no difference (qoslimit is created in /tmp/etc but again it is not obeyed).

    I can assist with more detailed logs from my configuration, but it seems that the issue is very generic. I tried different qdisc (sfq/pfifo/fq_codel) and they all behave the same. Perhaps the issue is simply with the first HTB class which fails to use the "default" class.

    You can easily test for this use case by running a dslreports or ookla speed test from the lan of the router. In m case they use ports which are not in the classifier and will max-out the bandwidth of the WAN. I set a very low limit like 1Mbit up/down on a fiber connection and I can tell right away if qos is working or not.
     
  7. koitsu

    koitsu Network Guru Member

    dslreports' Speed Test uses HTTPS (TCP port 443), not something odd/weird. I'm happy to provide the proof if you want it, but it's pretty easy to demonstrate: visit the Speed Test page, then before clicking your network type, hit F12 in Chrome and go to the Network tab. Then watch. Once you get a list of (many) URLs it's hitting (multiples which are usually 10-20MBytes), mouseover them (or click them if you want further details) and you'll see they're all things like https://t49.dslreports.com/front/k?c=Jkv4KKvG-1484089521102-127 (HTTP params will vary). Attached is a screenshot proving it.

    I'm not refuting anything anything else in the thread/what you've provided, only what I've quoted. I'm happy to do packet captures providing further evidence if you don't believe me.
     

    Attached Files:

  8. Toastman

    Toastman Super Moderator Staff Member Member

    I just connected R7000 to Fiber line, 10/50 Mbps. Set max bandwidth to 6 Mbps both in/out. Class limits to 100%.

    Results as shown. Changing class limits also works ok.

    Default classification seems to be respected. Maybe I am misunderstanding something. ?
     

    Attached Files:

  9. comego

    comego Network Newbie Member

    Man, I feel ashamed and in urge to apologize.

    I reset the router completely and now qos is working. I must say I switched a few versions in an attempt to diagnose the issue without cleaning the nvram so this could have been the problem. Should have done this before digging in the rabbit hole I dug above.

    I did save my previous nvram so I'll punish myself by finding what caused this issue and to learn TC properly ;)

    Thanks, and sorry again.
    comego

    ps @koitsu: the tests use a mixture of port 80 and 443 but also port 8080 and 8888, depending on upload/download and whether it's ookla or dslreports. The upload test on ookla is done using port 8888 for example.
     
  10. Toastman

    Toastman Super Moderator Staff Member Member

    nvm

    Glad it's ok now!
     
  11. Sortec

    Sortec Network Newbie Member

    I think toastman's version was the only one that was working. Shibby's and advanced tomato I think are still broke. I currently run Shibby and do not get very good results with QOS running.
     
  12. kacheng

    kacheng LI Guru Member

    @comego, did you resolve the problem on RT-N16 using AdvancedTomato 138?
    Or some other combination?

    Thanks
     
  13. Jose C

    Jose C Connected Client Member

    I think some users have reported that QoS on 138 is broken and since shibby is not around the project (at least for now) the solution will be to go to an earlier version
     
  14. comego

    comego Network Newbie Member

    Well, I got both Toastman (latest, 1.28.9008.5) and Advancedtomato (latest, 138) to work properly.

    My issue was that I had a default but not clean install. I had this new stock netgear R6400 router and flashed to an "initial" image then to advanced tomato, then shibby, then toastman and back and forth a few times while I was tasting the differences and deciding which one to go with.

    This router had never received a proper nvram erase since the first flash, and as most of the settings were recognized on all firmwares I was being lazy to redo all configs every time. However, on the qos part I suspect that things are a little divergent between versions and that a nvram variables are either named in a slightly different way or interpreted differently or both, so that an erase is necessary when switching. I think some part of the problems with reported broken qos are due to not properly clearing nvram (like I did).

    Also, default QOS is very good but is not a silver bullet. It will almost certainly benefit from your customizations and adaptations to your special case. For example I have vpn(s) with windows shares and voip services inside the tunnel and a few other things that needed custom classification rules. Our RTP streams are in a different range than what is default and all this needs to be set up properly. Just enabling the checkbox might lead to sub-optimal performance, but that does not mean QOS is not working.
     
    Toastman likes this.
  15. comego

    comego Network Newbie Member

    I'm having hard time understanding the CONNMARK --set-return statement. It seems removed from latest kernels and perhaps was a patch in the beginning.

    My guess is that it marks the return packets for an established connection or does it set a mask and return to the parent chain?

    Also, what does the mask to (the part after the / which is usually /FF) ?
     
  16. kacheng

    kacheng LI Guru Member

    Thanks @comego.
    I'm using AT138 on a RT-N16. I did clear the nvram and reconfigured the router from scratch, but it seems all the incoming traffic is being classified as default class only.
    There's no differentiation at all. :(
     
  17. eangulus

    eangulus Network Guru Member

    Do this help at all? https://bitbucket.org/pl_shibby/tomato-arm/issues/74/inbound-qos-problem

    The fix offered here, works for RT-AC68U. Fixes the QoS to work properly again. on Shibby v138.

    Would love to get the same done for the RT-AC3200 on Shibby 138 also. QoS is the ONLY thing driving me nuts on this router. Still debating on wether to go back to 132 or not. On one hand QoS works, on the other I loose WAN (only matters for one of my setups) and I also loose the adblock too.
     
    kacheng likes this.
  18. kacheng

    kacheng LI Guru Member

    Thanks @eangulus!

    The Firewall script you referenced seems to help on my setup and I now see that QOS appears to be working on inbound connections!

    Have you been finding adblock useful enough to keep around? I have mixed feelings about it.
     
    Last edited: Jan 19, 2017 at 8:10 PM
  19. eangulus

    eangulus Network Guru Member

    In regards to Adblock, I love it. Needed to add a few exceptions at first as expected.


    For the script above, do you or anyone else know how to modify it to do the same for an RT-AC3200? The script doesn't seem to work on that router. Odds are it has slightly different variables or something and I'm not good enough in programming to work it out (have tried though).

    Getting QoS to work on RT-AC3200 would at least ease the issue of Shibby being AWOL at the moment.
     

Share This Page