Discussion in 'Tomato Firmware' started by Sean B., Jul 1, 2018.

  1. Sean B.

    Sean B. LI Guru Member

    Hey there, hope everyone's doing well! I wanted to post some info I came across and get opinions on it in regards to Broadcoms infamous/mysterious CTF module. Any posts I've come across or discussions I've been involved in on the topic all seem to share a few points:

    A.) The CTF module is closed source, and likely came from mystical unicorns.

    B.) It breaks things. Other functionality provided in Tomato often stops working all together, or ends up providing erroneous results when CTF is enabled.

    C.) When one has a fast WAN connection, running with CTF enabled can yield significant throughput increases.

    D.) CTF is likely in reference to "Cut-through Forwarding"

    And that's about it.

    Today I came across this snippit in a technical article on network hardware and was rather surprised as I was not familiar with the multiple switching modes, but even more so by the naming and description of one mode specifically:

    As you can see, cut-through mode gets the frame in an out as fast as possible, yet certainly ignores the majority of information it contains, information other modules hooked into other areas of the network stack would be expecting/need to use down the line.

    Might this be all the CTF module is? A way to "switch" modes of the "switch"? Nothing magical or based on alien technology, rather just packing two operating models of a switch into one unit? What do you guys think?
    Last edited: Jul 1, 2018
  2. maurer

    maurer Network Guru Member

    i think it's more than Cut-through switching most notable example being RT-N16 with tomato and CTF with PPPoE WAN can handle a little over 150mbps and with latest Asus or Merlin's fork can go up to 850mbps
  3. Sean B.

    Sean B. LI Guru Member

    Thanks for the input! However, I'd point out that comparing the performance of entirely different firmware ( in regards to age, features, OEM proprietary hardware/software access which Asus and Merlin both have etc ) isn't terribly valid to begin with, let alone speaking on one module in said firmwares. Would you agree?
  4. maurer

    maurer Network Guru Member

    what's the point of exploring CTF if not for better performance ?
  5. Sean B.

    Sean B. LI Guru Member

    I wasn't referring to performance as the debated point, I was referring to the fact you're comparing performance of completely different firmware. And as such, there are a vast number of things that will cause a performance difference between them, completely negating a possible comparison of one single module and it's specific impact on said performance. Even a basic test such as a speed testing with CTF loaded and unloaded. Here's an example ( the values used are random and are not meant to reflect actual differences on any firmware ):

    Tomato: CTF unloaded - WAN speed max 75mbps | CTF loaded - WAN speed max 110mbps

    Asus firmware: CTF unloaded - WAN speed max 85mbps | CTF loaded - WAN speed max 145mbps

    Tomato gain via CTF = 35mbps

    Asus firmware gain via CTF = 60mbps

    On the surface this may look like CTF performs much better in Asus firmware than Tomato and the module itself is improved and doing something different. However, even CTF uses other parts of the firmware its running in, most notable would be the kernel and its network stack, certainly other areas as well but being closed source I have no knowledge of specifics. Any of which can be different versions between the 2 firmwares, on top of different custom proprietary coding and changes. Especially OEM firmware such as Asus ( and therfor Merlin ) as OEM's have access to licensed information and can tailor their coding to maximize the benefits of Broadcom's hardware. These differences would lend to an increase or decrease in gains provided by the CTF module itself.

    At least, that would be my opinion anyway.
    Last edited: Jul 1, 2018
  6. maurer

    maurer Network Guru Member

  7. koitsu

    koitsu Network Guru Member

    CTF we suspect bypasses portions of the general Linux networking stack (read: not IP stack, but the literal networking stack). This image should come to mind, but that's just purely netfilter -- I think what ctf.ko interfaces with is somewhere "around" or "before" that. I've suspected that several of the intermediate processing stages are literally bypassed, or the module acts as an intermediary layer between kernel and userland. The latter is, in Linux land, called netlink. It may do a lot more than that as well. I have not looked into it until literally right this moment -- honest.

    One of the complications of understanding what the CTF module is doing under the hood steps from reverse-engineering. Just the fact I've said the term probably flags me as a potential problem source with Broadcom. Broadcom has pretty strict rules about not reverse-engineering their binaries, and you'll find they're (understandably) stripped, which means RE'ing even more difficult. On the other hand, ctf.ko (at least on ARM) actually isn't stripped, so strings + objdump -t return some functions. I don't know how exactly some of the fellows who made things like open-source replacements for wl (which actually calls the Broadcom API functions directly by name, conforming to their requirements) managed to do it without being sued; maybe they live in countries where DMCA doesn't exist or IP law differs from the United States. It is very important to remember that, much like Rambus, Broadcom is a very large, and very lawyer-oriented company. They take their IP very seriously and have zero qualms about dragging anyone of any size into court.

    What I can tell you, and this is about all I'm willing to go into (because RE'ing this work actually involves staring at ARM assembly code for hours on end -- you can't "decompile" this, only disassemble it):

    root@gw:/lib/modules# objdump -t /lib/modules/
    /lib/modules/     file format elf32-littlearm
    00000000 l    d  .text  00000000 .text
    00000000 l    d  .text.fastpath 00000000 .text.fastpath
    00000000 l    d  .init.text     00000000 .init.text
    00000000 l    d  .exit.text     00000000 .exit.text
    00000000 l    d  .rodata        00000000 .rodata
    00000000 l    d  .rodata.str1.4 00000000 .rodata.str1.4
    00000000 l    d  __ksymtab_strings      00000000 __ksymtab_strings
    00000000 l    d  .ARM.extab     00000000 .ARM.extab
    00000000 l    d  .bss   00000000 .bss
    00000000 l     F .text.fastpath 0000009c _ctf_brc_lkup_ll
    00000000 l     F .text  00000028 _ctf_ipc_count_get
    0000009c l     F .text.fastpath 000000a4 _ctf_ipc_lkup_l4proto_v6
    00000028 l     F .text  00000054 _ctf_fa_register
    0000007c l     F .text  0000002c _ctf_live
    000000a8 l     F .text  00000088 _ctf_detach
    00000130 l     F .text  00000108 _ctf_dev_vlan_delete
    00000238 l     F .text  00000120 _ctf_dev_vlan_add
    000003c0 l     F .text  000000b0 _ctf_enable
    00000470 l     F .text  00000008 _ctf_ipc_release
    00000478 l     F .text  00000358 _ctf_ipc_delete_multi
    00000000 l       .bss   00000000 .LANCHOR0
    000007d0 l     F .text  00000008 _ctf_brc_release
    000007d8 l     F .text  00000040 _ctf_brc_lkup
    00000818 l     F .text  00000090 _ctf_brc_update
    000008a8 l     F .text  00000064 _ctf_isbridge
    0000090c l     F .text  0000008c _ctf_isenabled
    00000a1c l     F .text  000001f4 _ctf_dev_unregister
    00000c10 l     F .text  00000144 _ctf_brc_delete
    00000140 l     F .text.fastpath 0000011c _ctf_ipc_lkup_ll
    00000dc0 l     F .text  00000268 _ctf_cfg_req_process
    00001028 l     F .text  00000064 _ctf_ipc_lkup_fn
    00001478 l     F .text  00000178 _ctf_ipc_action
    000015f0 l     F .text  00000110 _ctf_ipc_delete_range
    00001700 l     F .text  000001dc _ctf_ipc_delete
    000018dc l     F .text  0000034c _ctf_dump
    00001c28 l     F .text  000000d8 _ctf_dev_register
    00001d00 l     F .text  00000164 _ctf_ipc_add
    00001e64 l     F .text  000000d4 _ctf_brc_add
    00001f38 l     F .text  00000098 _ctf_fa_conntrack
    0000025c l     F .text.fastpath 00000c28 _ctf_forward
    000027c8 l     F .text  00000084 ctf_netlink_sock_cb
    00000000 l    d  .ARM.extab.text.fastpath       00000000 .ARM.extab.text.fastpath
    00000000 l    d  .ARM.exidx.text.fastpath       00000000 .ARM.exidx.text.fastpath
    00000000 l    d  .ARM.extab.init.text   00000000 .ARM.extab.init.text
    00000000 l    d  .ARM.exidx.init.text   00000000 .ARM.exidx.init.text
    00000000 l    d  .ARM.extab.exit.text   00000000 .ARM.extab.exit.text
    00000000 l    d  .ARM.exidx.exit.text   00000000 .ARM.exidx.exit.text
    00000000 l    d  .modinfo       00000000 .modinfo
    00000000 l    d  __ksymtab      00000000 __ksymtab
    00000000 l    d  __kcrctab      00000000 __kcrctab
    00000000 l    d  .data  00000000 .data
    00000000 l    d  .gnu.linkonce.this_module      00000000 .gnu.linkonce.this_module
    00000000 l    d  .ARM.exidx     00000000 .ARM.exidx
    00000000 l    d  __versions     00000000 __versions
    00000000 l    d  .note.GNU-stack        00000000 .note.GNU-stack
    00000000 l    d  .ARM.attributes        00000000 .ARM.attributes
    00001fd0 g     F .text  00000404 _ctf_attach
    00000000         *UND*  00000000 dev_queue_xmit
    00000000         *UND*  00000000 bcm_ether_ntoa
    00000000         *UND*  00000000 csum_partial
    00000000 g     O .gnu.linkonce.this_module      00000150 __this_module
    00000000         *UND*  00000000 osl_malloc
    00000000         *UND*  00000000 __aeabi_unwind_cpp_pr0
    00000000 g     F .exit.text     00000038 cleanup_module
    00000000         *UND*  00000000 ppp_txstats_upd
    00000000         *UND*  00000000 memcpy
    00000000         *UND*  00000000 bcm_bprintf
    000025a4 g     F .text  00000224 ctf_kdetach
    00000000         *UND*  00000000 osl_pkt_frmnative
    00000000 g     F .init.text     000000c0 init_module
    00000000         *UND*  00000000 ppp_rxstats_upd
    00000000         *UND*  00000000 _raw_spin_unlock_bh
    00000000         *UND*  00000000 osl_ctfpool_stats
    00000000         *UND*  00000000 init_net
    35783074 g       *ABS*  00000000 __crc__ctf_attach
    00000000         *UND*  00000000 skb_pull
    ce915761 g       *ABS*  00000000 __crc_ctf_kdetach
    00000000         *UND*  00000000 netlink_unicast
    00000000         *UND*  00000000 ctf_attach_fn
    00000000         *UND*  00000000 skb_push
    00000000         *UND*  00000000 strncpy
    00000000         *UND*  00000000 netlink_kernel_release
    00000000         *UND*  00000000 memcmp
    00000000         *UND*  00000000 printk
    00000000         *UND*  00000000 netlink_kernel_create
    00000000         *UND*  00000000 __memzero
    000023d4 g     F .text  000001d0 ctf_kattach
    cd5ac3e5 g       *ABS*  00000000 __crc_ctf_kattach
    00000000         *UND*  00000000 jiffies
    00000000         *UND*  00000000 sprintf
    00000000         *UND*  00000000 skb_clone
    00000000         *UND*  00000000 _raw_spin_lock_bh
    00000000         *UND*  00000000 osl_pktfree
    00000000         *UND*  00000000 osl_mfree
    00000000         *UND*  00000000 getintvar
    00000000         *UND*  00000000 kcih
    00000000         *UND*  00000000 __aeabi_unwind_cpp_pr1
    Anything starting with an underscore is likely an internal function to the module itself, or could alternately be a deeper kernel function (very possible). Anything double-underscored is likely generated by the compiler. Other functions may be common kernel functions (ex. printk()), or similar kernel API/ABI functions. Remember: this is a kernel module, the symbols you see above are not libc, despite some of their names being identical (ex. memcpy, sprintf)!

    * I can see several functions here that look interesting, but only due to their name usage. As I suspected, I do see some netlink calls, which means what this driver could be doing is offloading userspace <--> kernel network socket handling for specific kinds of sockets. My feeling is that it's doing a lot more than that, but one of the slowest things in *IX land, and always has been, is userland<-->kernel interfacing -- it's why kernel modules run so much faster (natively) than userland things (a good example: FUSE)
    * skb_*() functions do not surprise me -- these are what the networking stack and kernel use for networking (packet) buffers
    * For the kernel functions, you can find documentation on most, if not all, of these. For all the bcm_*() functions, these are likely proprietary
    * The underscore-prefixed functions named _ctf_*() are internals (don't miss the one called _ctf_fa_conntrack(), which implies it may be implementing it's own conntrack methodology outside of netfilter)
    * I even see some bridging and vlan-related bits in there. Don't know what "brc" might refer to in this case.
    * One function that is particularly weird and stands out above all others is inclusion of ppp_rxstats_upd(). No clue what that's doing in there

    I cannot really dive into this deeper for several reasons, and I will list those reasons in no particular order:

    * See above legal/DMCA/IP concerns -- I have no interest in being involved in any legal matters. Been there in my youth, do not want to re-visit such, ever
    * I lack familiarity with low-level ARM CPU architecture (read: instruction set)
    * I lack familiarity (kernel-level) with Linux's networking stack, specifically the very, very deep innards
    * I lack the time (I'm working 2 jobs right now, and crunch time is happening at one; this is not bragging, it's reality)

    If I was to take a step back, I would suggest maybe a good starting point would be to, if at all possible (and I don't know how you'd even do this), document what sorts of things bypass what. Maybe some magic things bypass the entire netfilter stack (which is entirely CPU-bound, obviously). One interesting thing to ponder: does ctf.ko only work with IPv4, or does it also work with IPv6? :) I do see an internal function called _ctf_ipc_lkup_l4proto_v6() which may imply it supports it on some level.

    Visibility into this is substantially limited due to the lack of tons of compile-time kernel features normally omitted due to disk and RAM space concerns on embedded devices, so this is super painful.

    One thing to check for sure would be: how much of netfilter is being bypassed? It's been suspected that ctf.ko has bypassed parts of the firewalling stack, as well as QoS (tc). Do we know for absolute 100% certain which of those is true? (The former seems dangerous! The firewalling stack is netfilter/iptables)

    As for the links maurer gave, here's my response to those (I'm being very neutral!):

    First link: this is for some Atheros (a.k.a. Qualcomm, who bought them in 2011) chipsets. The ctf.ko module is Broadcom-specific, and Tomato is a Broadcom-centric. If you can find the equivalent (and open-source) for Broadcom, that would be different -- and we'd know need to know what ICs and chips it supported.

    Second link: this provides only an interface (at the kernel level) for hardware vendors to interface with netfilter's flow table directly is some manner. The code for each of the vendors/devices/chips/etc. as an offload source is lacking; only the interface to it is provided.

    Both links pertain to newer kernel, while TomatoUSB is 2.6.x kernel for reasons that relate to the reliance on Broadcom's binaries (particularly the wireless driver; some bridging or VLAN capabilities may be faster on TomatoUSB, I'm not sure what all Broadcom has done with the Linux kernel code). What's always overlooked about LEDE/OpenWRT: they have virtually no wireless driver support for commonplace Broadcom routers. Backporting this work is almost certainly not feasible (esp. considering iptables/netfilter on MIPS is a different version than on ARM).

    Random thoughts over.
    Sean B. likes this.
  8. Sean B.

    Sean B. LI Guru Member

    That was a particularly interesting writeup Koitsu, thanks! While I'm not looking to get myself on the radar of any large companies lawyer team, I can't help but be curious as to what's going on inside that module.
  9. maurer

    maurer Network Guru Member

    this is how dd-wrt implemented it for broadcom mips kernel 3.10

    they actually have broadcom source code for devices up to BCM4718 (not newer unfortunately)
    I've used it on an e2000 using 5GHz with very good results (tomato seemed to crash on that device)
    Last edited: Jul 3, 2018
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice