DNSMasq runs away?

Discussion in 'Tomato Firmware' started by gfunkdave, Nov 10, 2012.

  1. gfunkdave

    gfunkdave LI Guru Member

    Anyone know why I'm suddenly seeing DNSMasq take up 100% of the router's CPU? I have even rebooted the router, and as soon as it comes back up the DNSMasq process runs away:

    Mem: 24020K used, 103012K free, 0K shrd, 2956K buff, 9060K cached
    CPU:  27% usr  71% sys  0% nic  0% idle  0% io  0% irq  0% sirq
    Load average: 2.00 1.05 0.41 4/38 1237
      586    1 nobody  S    1108  1%  100% dnsmasq -c 1500 --log-async
    This is on an RT-N-16 running Toastman 1.28.7500 MIPSR2Toastman-RT K26 USB VPN.

    Thanks for ideas...this seems to be related to enabling CIFS client, IP Traffic monitoring, and bandwidth monitoring. When I disabled all three, CPU use went back down. Re-enabling them all still has the CPU at 1% load, so I will disable IP Traffic monitoring and see if that fixes things.

    Any thoughts?
  2. gfunkdave

    gfunkdave LI Guru Member

    Update: it is definitely the CIFS client causing this. I disabled everything I was using on /cifs1 and rebooted. When the router came back up, DNSMasq was once again at 100%. I disabled the CIFS client in Administration -> CIFS and rebooted. Now, the router is at 1% CPU load.

    What could be causing this? I haven't customized Toastman's Tomato in any way.

    Correction: sorry, miscopied the TOastman version I'm using. It's actually v1.28.7500 MIPSR2Toastman-VLAN-RT K26 USB VPN
  3. gfunkdave

    gfunkdave LI Guru Member

    Actually, I figured out how to replicate this.

    1. Set the router to log to /cifs1/syslog
    2. Reboot router.
    3. DNSMasq will suck up 100% CPU once the router reboots.

    Changing it to log to anything but a file called syslog seems to resolve the issue. I am going to keep monitoring it. Why would this be happening?
  4. koitsu

    koitsu Network Guru Member

    Thank you for tracking down how to replicate the issue. When you say "Set the router to log to /cifs1/syslog", are you referring to Administration -> Logging -> Custom Log Path & Filename?

    I have Entware installed so I can try to reproduce this + use strace to find out what the daemon is continually doing at a syscall level, or if it's actually the daemon doing nothing but the kernel is indefinitely holding a thread (this would show up as 100% CPU usage for the process even though the process isn't where the bug is. :) ).
  5. gfunkdave

    gfunkdave LI Guru Member

    Hi koitsu,

    Thanks for replying. Yes, I'm referring to Administration -> Logging -> Custom Log Path & Filename.

    It's odd...I thought I'd figured out how to replicate the issue. I just tried deleting /cifs1/syslog and rebooting...and the CPU is not at 100% utilization now. Maybe it was something about the syslog file itself that caused the issue? I will continue monitoring.

    Can anyone else replicate this?
  6. koitsu

    koitsu Network Guru Member

    gfunkdave, the next time you find this happening, can you please provide full output from top (or top -b -n 1) and not just dnsmasq? I need to see everything (all processes), not just one, so please do not strip/edit out anything. I have a theory as to what's going on but I don't have enough evidence to back it up at this time. Thanks.
  7. gfunkdave

    gfunkdave LI Guru Member

    Hi Koitsu,

    Thanks - here you go. I'm not sure why two instances of dropbear are running, either, now that I notice it...

    edit: apparently my theory on logging to a file called "syslog" was wrong. I just changed the file name back to "messages" and am experiencing the same thing on reboot. But here's another thing I noticed: it appears that maybe the problem starts when the log file gets to be about 50 kb in size:

    root@router:/tmp/home/root# ls -l /cifs1
    drwxrwxrwx    1 root    root            0 Jun 30 16:07 keys
    -rwxrwSrwx    1 root    root        51256 Nov 11 08:54 messages
    -rwxrwSrwx    1 root    root        51240 Nov 11 05:01 syslog
    -rwxrwSrwx    1 root    root        12074 Nov 11 08:54 tomato_cstats_bcaec5e7b047.gz
    -rwxrwSrwx    1 root    root          104 Nov 11 08:54 tomato_rstats_bcaec5e7b047.gz
    -r-xr-Sr-x    1 root    root          657 Jun 30 19:27 vpn-home-ta.key
    As you can see, both messages and syslog are about that size. And the time they were last modified (for syslog at least) is several hours ago. They should be at least getting the hourly "---MARK---" line.
    And sure enough, creating a new file results in the CPU load going back down to normal. Apparently it's some problem with logging to a file that is about 50 kb in size, which is the size set for log rotation. Perhaps it's having a problem rotating the log? Not sure why that would affect DNSMasq...

    Anyway, here is top's output:
    root@router:/tmp/home/root# top -b -n 1
    Mem: 24344K used, 102688K free, 0K shrd, 2956K buff, 9040K cached
    CPU:  16% usr  83% sys  0% nic  0% idle  0% io  0% irq  0% sirq
    Load average: 2.00 1.99 1.91 3/38 1438
      584    1 nobody  R    1108  1% 100% dnsmasq -c 1500 --log-async
    1188    1 root    S    3452  3%  0% /etc/openvpn/vpnserver1 --cd /etc/open
    1190    1 root    S    3092  2%  0% /etc/openvpn/vpnserver2 --cd /etc/open
    1144    1 root    S    2676  2%  0% httpd
    1182    1 root    S    1724  1%  0% udhcpc -i vlan2 -b -s dhcpc-event -H r
      559    1 root    S    1724  1%  0% crond -l 9
    1384  1381 root    S    1720  1%  0% -sh
      322  321 root    S    1716  1%  0% /bin/sh
    1438  1384 root    R    1712  1%  0% top -b -n 1
      329    1 root    D    1708  1%  0% syslogd -L -s 50 -O /cifs1/syslog -b 1
      331    1 root    S    1708  1%  0% klogd
      533    1 root    S    1708  1%  0% telnetd -p 23
        1    0 root    S    1384  1%  0% /sbin/init noinitrd
      320    1 root    S    1364  1%  0% buttons
      321    1 root    S    1332  1%  0% console
    1381  536 root    S    1260  1%  0% dropbear -p 22 -p 3283 -s -a
      570    1 root    S    1248  1%  0% cstats
      536    1 root    S    1196  1%  0% dropbear -p 22 -p 3283 -s -a
      543    1 root    S    1192  1%  0% nas
    1146    1 root    S    1080  1%  0% miniupnpd -f /etc/upnp/config
      561    1 root    S    1028  1%  0% rstats
      540    1 root    S    1012  1%  0% eapd
      575    1 root    S      756  1%  0% pptpd -c /tmp/pptpd/pptpd.conf -o /tmp
      281    1 root    S      748  1%  0% hotplug2 --persistent --no-coldplug
      555    2 root    SW<      0  0%  0% [cifsd]
      89    2 root    SW<      0  0%  0% [mtdblockd]
        5    2 root    SW<      0  0%  0% [khelper]
        3    2 root    SW<      0  0%  0% [ksoftirqd/0]
        2    0 root    SW<      0  0%  0% [kthreadd]
      327    2 root    SWN      0  0%  0% [jffs2_gcd_mtd3]
      47    2 root    SW<      0  0%  0% [aio/0]
      18    2 root    SW<      0  0%  0% [kblockd/0]
      44    2 root    SW      0  0%  0% [pdflush]
        4    2 root    SW<      0  0%  0% [events/0]
      46    2 root    SW<      0  0%  0% [kswapd0]
      552    2 root    SW<      0  0%  0% [cifsoplockd]
      554    2 root    SW<      0  0%  0% [cifsdnotifyd]
      45    2 root    SW      0  0%  0% [pdflush]
  8. koitsu

    koitsu Network Guru Member

    Yeah, okay, I see what's going on here. First and foremost, let's talk about some of the related technology bits so you can get an understanding of terminology and so on. My apologies in advance if you already know this stuff.

    The "logging" infrastructure piece that's being referred to is actually called syslog. Not the filename per se, just the actual function calls that provide universal logging functionality in UNIX/Linux. You can see there's a daemon called syslogd running -- that's what handles all the logging functionality. It also handles writing the log messages to the disk. So, in the GUI, the Custom Log Path & Filename correlates directly with the -O /cifs1/syslog argument in syslogd. The -O option defines a path/file of where to write syslog logs.

    The way a program (say, dnsmasq) talks to syslogd is via a standardised function (it comes with all libc implementations on UNIX/Linux; in this case, uClibc I believe). The function is called syslog(). So inside of dnsmasq's code, there are many calls to syslog(), which then talks to the syslogd daemon over a UNIX domain socket:

    root@gw:/tmp/home/root# ps auxw | grep syslogd
    root      4509  0.0  0.2   1708   332 ?        Ss   Oct23   0:00 syslogd -L -s 50 -b 1
    root     10434  0.0  0.2   1708   296 pts/0    S+   11:34   0:00 grep syslogd
    root@gw:/tmp/home/root# cd /proc/4509/fd
    root@gw:/proc/4509/fd# ls -l
    lrwx------    1 root     root            64 Nov 11 11:33 0 -> /dev/null
    lrwx------    1 root     root            64 Nov 11 11:33 1 -> /dev/null
    lrwx------    1 root     root            64 Nov 11 11:33 2 -> /dev/null
    lrwx------    1 root     root            64 Nov 11 11:33 3 -> socket:[97551]
    l-wx------    1 root     root            64 Nov 11 11:33 4 -> /tmp/var/log/messages
    See the /proc/4509/fd/3 file? That's the UNIX domain socket. :) File descriptor 4 happens to be where the actual log file is being written (in my case I don't use a custom log path, so it goes to the stock /tmp/var/log/messages). You would see this as going to /cifs1/syslog.

    Okay, so now we have that out of the way and you understand logging a bit better. Oh, BTW: syslog() is a really fantastic way to do "low-frequency" logging (meaning stuff that isn't coming in non-stop constantly), but for massively intensive logging (things like webserver logs, etc.) native filesystem I/O is the way to go.

    The 50KByte size "thing" you mention is important information too -- that's the size of the log when it gets auto-rotated. (See -s 50 in syslogd's arguments above).

    I'm thinking what's happening is that dnsmasq may have a logging bug where it gets stuck in some loop right in the middle of when the 50KByte log gets rotated. I'm not sure -- I need a 100% reliable way to reproduce it and then I can use strace/gdb to figure out what on earth the process is doing.
  9. gfunkdave

    gfunkdave LI Guru Member

    Thanks for the syslog and Unix lesson. :) It's funny - I majored in computer science but we never got into stuff this practical.

    I have the same version of Tomato running on a few routers (two RT-N-16s and an E3000). I just tried to reproduce the issue on the E3000 and the other RT-N-16 and couldn't. THe E3000 seems to be quite content to rotate its logs whether I log to a CIFS destination or a USB drive. Same with the other Asus.

    The only difference I can readily think of between them all is that the router with the problem is running Toastman's VLAN build.

    Let me know what else I can do to help.
  10. koitsu

    koitsu Network Guru Member

    I think the issue is more than likely just intermittent in nature (somehow/for some reason). I can't see how the VLAN build (vs. STD) would cause something like this, since syslog() is implemented in uClibc and isn't a kernel thing. Let me know if you figure out a way to reproduce it consistently and I will happily try to reproduce it here (I do use CIFS/SMB; source machine is a FreeBSD box running Samba 3.6.8) sometime.

    Alternately, if you're willing to install Entware, I can try to step you through what you'd need to run once the dnsmasq process is stuck at 100% CPU.
  11. mstombs

    mstombs Network Guru Member

    syslogd in tomato is a busybox version, so would be good to check busybox versions are the same

    The original error reminds me of a potential bug which used to be able to crash vax/vms systems

    "Error: unable to write to system log file"
    "Error: Error writing to system log"

    Each error spawning 2 further error log messages - and bamm!
    dc361 likes this.
  12. gfunkdave

    gfunkdave LI Guru Member

    There are enough differences with how that router is configured to make it too time-consuming and problematic to figure it out.

    I am happy to install Entware and try to debug that way...as long as Entware is easy to remove. The router is at my dad's dental office and gets fairly heavy use when the office is open. I need to make sure they don't get any service disruptions while the office is open.

    Let me know what to do - thanks.
  13. koitsu

    koitsu Network Guru Member

    Entware is easy to remove:

    cd /opt
    rm -fr bin etc lib libexec sbin share tmp usr var
    Yup, that's it -- assuming you don't have those directories for other things. :) If you're using Optware, you probably shouldn't use Entware (they'll conflict), but instead I'd just upgrade to Entware entirely. Follow the installation instructions on the Entware page for how to install it, and make sure /opt is either bound to cifs or a USB stick or something -- you'll need the space.

    What you want to do is use strace on the dnsmasq process to find out if it's constantly trying to execute a syscall or what it's doing (if anything; there's always the possibility it's a kernel lock which is wedged and thus the process shows 100% CPU even though it's the kernel which is doing it). To install strace:

    opkg update
    opkg install strace
    Then you want to do this, when the dnsmasq process is at 100% CPU:

    strace -c -p {pid_of_dnsmasq}
    Let this run for about 10 seconds or so, then press Ctrl-C. You should get some output that indicates a list of syscalls and how many times they were seen / how much CPU time they took up. That will give me some idea of what the daemon might be spinning on. My gut feeling is that it's going to be a read() or write() call that's spinning, but I'm not sure.

    Once you have that, use strace to get some actual syscall data (vs. just a summary). This is probably going to be a *lot* of output, so you don't need to let it run very long -- maybe a screen-full of data or so, okay? Press Ctrl-C after you get that data.

    strace -v -t -F -f -p {pid_of_dnsmasq}
    Save the output somewhere (to a file, etc.) and then please either upload that file or paste it in here (within a code block of course).

    None of this is going to tell me where in the code (line number, etc.) it's spinning, but it'll at least give me an idea of what syscalls might be being called repeatedly and then I can go look at the dnsmasq code to see if there's a while() loop around something with invalid error handling conditionals or similar. If this bug can be tracked down, it needs to be submitted upstream (to the dnsmasq author).
  14. gfunkdave

    gfunkdave LI Guru Member

    Hi Koitsu,

    When I try to run strace, I just get a Not Found error. But the file is there, and even which knows it is:

    root@router:/# strace -c -p 586
    -sh: strace: not found
    root@router:/# which strace
    root@router:/# ls -l /opt/bin
    -rwxrwSrwx    1 root    root        261884 Nov  9 04:51 find
    -rwxrwSrwx    1 root    root        291960 Nov  9 02:27 ldconfig
    -rwxrwSrwx    1 root    root        467312 Nov 12 21:56 opkg
    -rwxrwSrwx    1 root    root        433276 Nov  9 04:47 strace
    -rwxrwSrwx    1 root    root        51452 Nov  9 04:51 xargs
    What's going on here? I'm sure it's something stupid...but calling opkg, for example, works fine.

    Edit: I did get several errors on installation of Entware saying that it can't create various symlinks. Here's an example: * extract_archive: Cannot create symlink from ./opt/lib/libnsl.so.0 to 'libnsl-0.9.32.so': Operation not supported.
  15. koitsu

    koitsu Network Guru Member

    The binary probably isn't working because of missing libraries or the binary is built for a different CPU architecture (very unlikely though, because it works fine on my RT-N16). It may be that there are conflicting libraries on your system.

    The symlink error seems to indicate that the CIFS share you're using (which I assume is /cifs1/opt, and then /opt is a bind mount to /cifs1/opt? Yes? The command mount when run by itself will show you all your mounts + their details) doesn't allow symlinks, and this is very likely the cause of the problem with strace not running.

    Also, another thing I'm noticing is that all your file permissions are utterly bizarre when compared to mine. Your file permissions show user=rwx, group=rw but with setgid bit set (but WITHOUT execute bit), other=rwx. I wonder if this is causing problems with directories in some way, since directories need the execute bit set. I really can't debug this, all I can tell you is that my file permissions don't look like that, but I'm also using a FreeBSD system with Samba as my CIFS/SMB source, not a Windows machine.

    Your strace binary looks the same as mine (timestamp is off by 2 hours but that's probably because you're in a different timezone than me); file size and general timestamp/date match.

    root@gw:/tmp/home/root# md5sum /opt/bin/strace
    38e1fa1380686b5b17e9a8185f823cce  /opt/bin/strace
    Bottom line: on Linux, if the shell says "Command not found", it doesn't necessarily mean the file isn't in your $PATH, it could indicate there are library-related problems or issues with libc conflicts, or potentially CPU architecture differences.

    For library issues, you can use the command ldd to investigate them -- however, be aware Busybox has an internal command called ldd that is a completely buggered/broken/stupid/horrible equivalent. Instead, just do opkg install ldd then use /opt/bin/ldd (you will need to type the full pathname every time) on the binary, i.e. /opt/bin/ldd /opt/bin/strace to get a list of what libraries are linked. Example:

    root@gw:/tmp/home/root# /opt/bin/ldd /opt/bin/strace
            libgcc_s.so.1 => /opt/lib/libgcc_s.so.1 (0x2aac0000)
            libc.so.0 => /opt/lib/libc.so.0 (0x2aae1000)
            ld-uClibc.so.0 => /opt/lib/ld-uClibc.so.0 (0x2aaa8000)
    Also, if you have a shell environment set up with custom dot files/etc., especially if you override LD_LIBRARY_PATH (shame on you for doing that! Never do that!), this may be the source of some complexity. Really hard to say.

    You can check architecture issues using file, but again, you need to install it: opkg install file.

    root@gw:/tmp/home/root# file /opt/bin/strace
    /opt/bin/strace: ELF 32-bit LSB executable, MIPS, MIPS32 version 1 (SYSV), dynamically linked (uses shared libs), with unknown capability 0xf41 = 0x756e6700, with unknown capability 0x70100 = 0x1040000, stripped
  16. gfunkdave

    gfunkdave LI Guru Member

    Aha! Yes, you're correct that I had /opt bound to /cifs1/opt. That cifs1 share goes to a Windows Server 2008 R2 machine...and apparently there is something lost in translation from Windows file sharing to Samba regarding file permissions. I mapped instead to /cifs2/opt, which is located on a Linux box, and it worked fine.

    root@router:/tmp/home/root# strace -c -p 585
    Process 585 attached - interrupt to quit
    Process 585 detached
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     46.08    0.050595           0    186703           _newselect
     38.08    0.041808           0    186703    186703 write
      9.12    0.010014           0    186704           times
      6.73    0.007385           0    186703           getpid
      0.00    0.000000           0         2           close
      0.00    0.000000           0         2           ioctl
      0.00    0.000000           0         2           bind
      0.00    0.000000           0         2           recvfrom
      0.00    0.000000           0         2           recvmsg
      0.00    0.000000           0         2           sendmsg
      0.00    0.000000           0         4           sendto
      0.00    0.000000           0         2           socket
      0.00    0.000000           0        40           stat64
      0.00    0.000000           0         4           fcntl64
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.109802                746875    186703 total
    For the actual syscall data, it's essentially the same small block of lines repeating hundreds or thousands of times a second - hitting Ctrl-C immediately on getting the "process attached" line still swamped my scrollback buffer in Putty.

    root@router:/tmp/home/root# strace -v -t -F -f -p 585
    Process 585 attached - interrupt to quit
    23:33:13 svr4_exit()                    = 4043
    23:33:14 getpid()                       = 585
    23:33:14 write(12, "<30>Dec 31 18:00:21 dnsmasq[585]"..., 70) = -1 EAGAIN (Resource temporarily unavailable)
    23:33:14 stat64(0x44fa10, 0x7f9a99b8)   = 0
    23:33:14 _newselect(13, [4 5 6 7 8 9 10], [12], [], NULL) = 1 (out [12])
    23:33:14 times({tms_utime=10135, tms_stime=33065, tms_cutime=0, tms_cstime=0}) = 23711
    23:33:14 getpid()                       = 585
    23:33:14 write(12, "<30>Dec 31 18:00:21 dnsmasq[585]"..., 70) = -1 EAGAIN (Resource temporarily unavailable)
    23:33:14 _newselect(13, [4 5 6 7 8 9 10], [12], [], NULL) = 1 (out [12])
    23:33:14 times({tms_utime=10135, tms_stime=33065, tms_cutime=0, tms_cstime=0}) = 23712
    23:33:14 getpid()                       = 585
    23:33:14 write(12, "<30>Dec 31 18:00:21 dnsmasq[585]"..., 70) = -1 EAGAIN (Resource temporarily unavailable)
    23:33:14 _newselect(13, [4 5 6 7 8 9 10], [12], [], NULL) = 1 (out [12])
    23:33:14 times({tms_utime=10135, tms_stime=33065, tms_cutime=0, tms_cstime=0}) = 23712
  17. koitsu

    koitsu Network Guru Member

    This helps tremendously. Yep, it's what I figured -- it's stuck in a retry loop of some kind, where it doesn't sleep or wait -- it just continually retries non-stop. The write() is trying to go to a file descriptor (fd 12 in your 2nd strace output). There are other syscalls involved in this loop:


    Also, I'm a little surprised by the dnsmasq output attempting to be written to the file descriptor -- the timestamp is 21 seconds after the epoch. That's a little odd, but eh, I guess to some degree that's acceptable because dnsmasq starts up before the router has a chance to sync its time (AFAIR).

    Can you show me the output from ls -l /proc/585/fd (assuming the process is still running at 100% CPU) ? I'm curious what file descriptor 12 points to. I'm betting it's trying to point to a UNIX domain socket, which would be it attempting to talk to syslogd (that's my guess anyway).

    Edit: Meant file descriptor 12, not 15. :)
  18. gfunkdave

    gfunkdave LI Guru Member

    I had already gotten the router back to its sane (non system load of 1.0) state, so I had to reboot it and such - the PID is now 584. Other output was the same:

    23:49:40 getpid()                      = 584
    23:49:40 write(12, "<30>Dec 31 18:00:21 dnsmasq[584]"..., 70) = -1 EAGAIN (Resource temporarily unavailable)
    23:49:40 _newselect(13, [4 5 6 7 8 9 10], [12], [], NULL) = 1 (out [12])
    23:49:40 times({tms_utime=2452, tms_stime=7505, tms_cutime=0, tms_cstime=0}) = -15712
    root@router:/tmp/home/root# ls -l /proc/584/fd
    lrwx------    1 root    root            64 Nov 12 23:50 0 -> /dev/null
    lrwx------    1 root    root            64 Nov 12 23:50 1 -> /dev/null
    lr-x------    1 root    root            64 Nov 12 23:50 10 -> pipe:[1547]
    l-wx------    1 root    root            64 Nov 12 23:50 11 -> pipe:[1547]
    lrwx------    1 root    root            64 Nov 12 23:50 12 -> socket:[1549]
    lrwx------    1 root    root            64 Nov 12 23:50 2 -> /dev/null
    lrwx------    1 root    root            64 Nov 12 23:50 3 -> /tmp/var/lib/misc/dnsmasq.leases
    lrwx------    1 root    root            64 Nov 12 23:50 4 -> socket:[1536]
    lrwx------    1 root    root            64 Nov 12 23:50 5 -> socket:[1537]
    lrwx------    1 root    root            64 Nov 12 23:50 6 -> socket:[1543]
    lrwx------    1 root    root            64 Nov 12 23:50 7 -> socket:[1544]
    lrwx------    1 root    root            64 Nov 12 23:50 8 -> socket:[1545]
    lrwx------    1 root    root            64 Nov 12 23:50 9 -> socket:[1546]
    Also, I think you mean file descriptor 12, not 15, right?
    Thanks again for helping me track this down. I am going to sleep now - gotta be up in 7 hours! :)

    edit: I'm assuming that socket it's trying to write to is a TCP/IP socket for the CIFS share?
  19. koitsu

    koitsu Network Guru Member

    Yeah, just noticed my mistake right as you did, haha. :)

    Yep, some kind of UNIX domain socket. netstat -x would show it, but it's not going to show what PID is associated with it. On BSD we have the fstat and sockstat utilities that can do this, but on Linux I think the only thing that can be used to determine this is lsof. I would recommend installing it (opkg install lsof) just because it is a very useful utility.

    However, even lsof on Linux won't show all of what's needed -- apparently this is a well-known "design choice" of Linux, where both ends of the UNIX domain socket aren't actually tracked, thus there's no way to find out what's connected to what. This also confirms it. On my own system, for example:

    root@gw:/tmp/home/root# pgrep dnsmasq
    root@gw:/tmp/home/root# ls -l /proc/4658/fd
    lrwx------    1 root    root            64 Nov 12 21:40 0 -> /dev/null
    lrwx------    1 root    root            64 Nov 12 21:40 1 -> /dev/null
    lr-x------    1 root    root            64 Nov 12 21:40 10 -> pipe:[97930]
    l-wx------    1 root    root            64 Nov 12 21:40 11 -> pipe:[97930]
    lrwx------    1 root    root            64 Nov 12 21:40 12 -> socket:[97934]
    lrwx------    1 root    root            64 Nov 12 21:40 2 -> /dev/null
    lrwx------    1 root    root            64 Nov 12 21:40 3 -> /tmp/var/lib/misc/dnsmasq.leases
    lrwx------    1 root    root            64 Nov 12 21:40 4 -> socket:[97920]
    lrwx------    1 root    root            64 Nov 12 21:40 5 -> socket:[97921]
    lrwx------    1 root    root            64 Nov 12 21:40 6 -> socket:[97926]
    lrwx------    1 root    root            64 Nov 12 21:40 7 -> socket:[97927]
    lrwx------    1 root    root            64 Nov 12 21:40 8 -> socket:[97928]
    lrwx------    1 root    root            64 Nov 12 21:40 9 -> socket:[97929]
    root@gw:/tmp/home/root# cat /proc/net/unix | grep 97934
    87c231e0: 00000002 00000000 00000000 0001 01 97934
    And this is as far as I can get. The address 0x87c231e0 is ultimately what's important (I could probably determine useful things from this), but keep reading for why that's not going to happen.

    lsof isn't of any help either, because it effectively does the same thing I just did, but at least tells me if it's a UNIX domain socket vs. TCP socket vs. UDP socket:

    root@gw:/tmp/home/root# lsof -n -p 4658
    dnsmasq 4658 nobody  cwd    DIR      31,2      208    50 /
    dnsmasq 4658 nobody  rtd    DIR      31,2      208    50 /
    dnsmasq 4658 nobody  txt    REG      31,2  253988  384 /usr/sbin/dnsmasq
    dnsmasq 4658 nobody  mem    REG      31,2    26800  379 /lib/ld-uClibc.so.0
    dnsmasq 4658 nobody  mem    REG      31,2    66152  253 /lib/libgcc_s.so.1
    dnsmasq 4658 nobody  mem    REG      31,2  392044  252 /lib/libc.so.0
    dnsmasq 4658 nobody    0u  CHR        1,3      0t0  199 /dev/null
    dnsmasq 4658 nobody    1u  CHR        1,3      0t0  199 /dev/null
    dnsmasq 4658 nobody    2u  CHR        1,3      0t0  199 /dev/null
    dnsmasq 4658 nobody    3u  REG      0,10      187  1002 /tmp/var/lib/misc/dnsmasq.leases
    dnsmasq 4658 nobody    4u  inet      97920      0t0  UDP *:bootps
    dnsmasq 4658 nobody    5u  sock        0,4      0t0 97921 can't identify protocol
    dnsmasq 4658 nobody    6u  inet      97926      0t0  UDP *:domain
    dnsmasq 4658 nobody    7u  inet      97927      0t0  TCP *:domain (LISTEN)
    dnsmasq 4658 nobody    8u  sock        0,4      0t0 97928 can't identify protocol
    dnsmasq 4658 nobody    9u  sock        0,4      0t0 97929 can't identify protocol
    dnsmasq 4658 nobody  10r  FIFO        0,5      0t0 97930 pipe
    dnsmasq 4658 nobody  11w  FIFO        0,5      0t0 97930 pipe
    dnsmasq 4658 nobody  12u  unix 0x87c231e0      0t0 97934 socket
    Note the very last line there (the one with "12u") -- yup. UNIX domain socket, but doesn't tell me jack squat about what it's actually talking to on the other side.

    The only choice, from that link I provided, is to gdb the kernel and its memory space in real-time. Good grief, what a travesty. And we can't do this on TomatoUSB, because the kernel configuration lacks the necessary options/config shims enabled to allow you to do this (involves using /proc/kcore to read kernel memory space and so on; there is no /proc/kcore).

    I'm going to have to assume that the domain socket is for syslogd, because the problem seems to happen when syslogd is told to log to a file on CIFS, and when the log rotates (at 50KBytes). So I guess I get to go look through the mess that is the dnsmasq code... Or syslogd. Or both. I have to make a lot of assumptions here -- it would be a lot easier if I could debug it.

    Here's a question: how often does this happen to you? 50KBytes of syslog data is a lot, so what exactly do you have that's logging that much data constantly? Just curious.

    I wonder if this problem will happen if you log to /cifs2/syslog (which is the Linux machine) rather than /cifs1/syslog (which is your Windows server). Can you try that temporarily and see if the issue is repeatable there?

    If it doesn't happen when using CIFS/SMB to a Linux box running Samba, then I'm inclined to think something is indeed ""amiss"" (for lack of better phrasing) with the Windows boxes' SMB export or configuration (probably configuration). The issue you saw with your /opt is an additional indicator.

    Another theory is that the CIFS/SMB mount (/cifs1) "goes away" (maybe socket timeout?) and when syslogd goes to rotate the log (once hitting 50KBytes), this causes syslogd itself to do something odd that affects the underlying programs talking to it (issuing syslog() messages). You would need to try to work with your Windows server administrator to figure out what the deal is; it could be some strange interoperability quirk between the kernel smbclient (CIFS/SMB mount support) and actual Windows. Could also be file locking related.
  20. koitsu

    koitsu Network Guru Member

    Do you want to know how I'd solve this, cleanly that is? It's pretty simple:

    Programs using syslog() are going to talk to the local syslogd daemon, right? syslogd supports the -R flag, which lets it forward/siphon logging messages to another syslog daemon running on the network (i.e. syslogd -R would forward all syslog messages to the syslog daemon running on the machine). This avoids use of -O /cifs1/syslog (which relies on CIFS/SMB). could be your Linux machine, which runs syslogd. You may need to adjust its syslogd to accept remote logging connections (probably some flag) and set up its /etc/syslog.conf or /etc/syslog-ng.conf (dunno -- see your Linux distro docs) to make sure logs from the router get logged to a separate file or so on.

    And I can also assure you there are syslog daemons/services that run on Windows, so you could technically have your Windows server administrator install one of those + set it up and log to a file that you have access to (possibly even to the same filesystem which you mount as /cifs1 for example!). They are usually very small, very low-CPU services.

    IMO, the solution I've proposed above is cleaner than using -O to log to an alternate path. Maybe -O has a bug, maybe it doesn't, or maybe it "doesn't work right" with certain CIFS/SMB shares. Not sure.

    The TomatoUSB GUI has this feature built-in: Administration -> Logging -> Log To Remote System. Check that box, then enter the IP of the syslog server you want to forward logging messages to, and save. Make sure to uncheck Custom Log Path & Filename too of course (sure you can do both at the same time, but the -O thing is what we're trying to avoid).

    Also about syslog messages: these use the UDP protocol, so they go to UDP port 514. Not TCP. Something to keep in mind.
  21. koitsu

    koitsu Network Guru Member

    Okay, I've had a chance to look at the dnsmasq source code for it's syslog() logging. The relevant code is in release/src/router/dnsmasq/src/log.c. I'm looking at Toastman-RT-N, although gfunkdave uses Tomato-RT -- I'm not fretting over that right now.

    The error being returned by write() is EAGAIN. We don't know why it's doing that; it may be that the CIFS/SMB share went away (i.e. the problem might go away on its own after ls -l /cifs1 or a remount of /cifs1 + restart of syslogd is done), but we don't know. Fixing that is ultimately the "true" fix, if you ask me, but for now let's continue -- because there seems to be some very bizarre design choices in dnsmasq (if you ask me).

    One thing I want to make clear before I start: the code does in fact use write() natively. I assumed it used syslog(), but it doesn't. Instead in log_reopen() (which is what's called when there's no logfile or when it needs to be reopened, including if no log file is specified via the command line) will call socket() (with a UNIX domain socket) to log to syslog. I don't know why the author is avoiding use of syslog() here. I really don't. Maybe some syslog() implementations block (wait indefinitely) and he doesn't want his program to stall if that happens -- that's my guess anyway. Fine, whatever.

    Some analysis:

    - Every time my_syslog() is called, getpid() is called. This is downright silly and a waste of CPU; the daemon should track its own PID when starting up. AFAIK this program does not fork when handling requests, so it should be able to make this call once somewhere near/around the start of the program.

    - Every time my_syslog() is called, time() is called. This is where the times() call above is seen. There's nothing one can do about this AFAIK; write() just writes a buffer, so he has to make the logging string himself, so this is inevitable. However, time() is expensive (CPU-wise), especially if there is a tremendous amount of calls being made to it in a short period of time. There are ways to solve this (i.e. have the daemon be told when the clock changes every 1 second, rather than poll for the clock ever time), so that's another optimisation that the dnsmasq author should really consider. Anyway, that explains all the calls to times().

    - my_syslog() has its own queueing mechanism for logging messages -- specifically, it's just a linked list. If write() fails for some reason, then the logging message is lost -- and that's what the queueing mechanism is for (to keep the logs from getting lost). My guess is that someone bitched/cried hard at the dnsmasq author and he added this queueing mechanism into place, to allow for retry of logging. And this is where the issue gfunkdave is seeing comes into play.

    - Once my_syslog() finishes its "stuff", it calls log_write() (another internal function). The hope here is that the log_write() call works and thus the subsequent if() doesn't apply and the log_write() function can return. This design model is documented in lines 385 to 399.

    - Here's where we get into the design of dnsmasq's messaging queue retry logic, specifically lines 400 to 421. The for-loop and related bits aren't entirely obvious (see comments at lines 389 to 398) but it looks like it tries to split logging messages up into "chunks" of 8 and then send those off. But here's were we have the awful design choice: look at lines 413 to 416. The daemon is waiting 1*d milliseconds between log_write() calls of pending messages in the queue. d is based on the total number of entries left in the messaging queue to process minus 8. So at most, I think the daemon would be sleeping for 2^8 milliseconds (see his comment at line 394) -- that's 256 milliseconds. Not seconds, milliseconds! Awful.

    - Now look at log_write(), where the write() magic happens. Look specifically at line 199. You'll see that EAGAIN is explicitly handled by... calling return(). The design methodology here (meaning its hope -- not it's guarantee!) is that we can just indefinitely keep trying if the write() failed. That is what EAGAIN indicates -- "try again later" -- but it's fairly obvious the queueing mechanism delay (see above) is so tiny/minute that this results in a spinning loop and causes 100% CPU (possibly more like 99% CPU since nanosleep() does get called). The author "hopes for the best" -- his comment says he hopes things are working by the time the next select() or poll() is entered. There is no select() or poll() call in log.c. So I have to assume he's hoping that his write() induces select() or poll(). I sure hope he's not referring to when dnsmasq handles a request (DNS lookup request, DHCP request, etc.) because that's going to be pretty damn constant on most heavier networks where DNS is used all the time. I'm having to make some assumptions here of course.

    Edit: Okay, looking at lines 23 to 28, and finally (the most important) lines 134 to 135 (which explicitly disable blocking on the file descriptor used for write()), it's obvious the author is hoping the select() or poll() in question is for lookups or "other operations", not the actual logging bits itself. Sigh.

    The design model here is "grrrr" (meaning I do not agree with it but I understand the idea), but the amount of delay is awful. A maximum of 256 milliseconds (assuming the queue is completely full) is not long enough, especially with EAGAIN. This model/design should really be changed to delay for longer periods of time (going into things like 5 or 10 seconds, maybe even longer). I understand *why* he wants a low delay -- because writing to a UNIX domain socket should be quick/reliable, since it's on the same system, but that's a bad assumption. I do know people who do things like stick /var/log on an NFS mount and have syslogd log to /var/log (and what happens when the NFS server goes down? :) ).

    So that's my analysis of the dnsmasq part of it. The problem here is starting to look more like something anomalous with syslogd; the UNIX domain socket that dnsmasq is writing to starts returning EAGAIN (from the kernel) and I'm not sure why. syslogd may be doing something naughty -- and we know that this only happens when syslogd is using -O /cifs1/file to write to CIFS/SMB rather than the local memory filesystem.

    Welcome to *IX and software design! ;-)
  22. koitsu

    koitsu Network Guru Member

    BTW, regarding CIFS/SMB in the Linux kernel -- from what I'm reading on LKML, in 2.6 the TCP timeout is a whopping 5 minutes. It's also not configurable via a mount option; it's hard-coded. In Linux 3.4 it was lowered to 60 seconds apparently. The setting is called SMB_ECHO_INTERVAL and oh look what I just found (for 3.4 probably).

    The reason I mention this is that possibly syslogd is causing mayhem in general because the underlying /cifs1 mount (or network I/O to the CIFS/SMB box) is stalling for some reason. Not really sure. Hmm, although if it only happens when the log rolls at 50KBytes, that seems to imply it's not a CIFS/SMB issue at all (i.e. the mount is usable/working fine) but syslogd itself having some bug where it makes a mess when rotating the log when some UNIX domain sockets are established (or maybe when a message comes in at the exact same time the log is rolled). Again: would be easier if I could reproduce this issue myself.

    If this turns out to be a bug in syslogd I won't be surprised in the least. Know why? syslogd appears to be the Busybox version, and the Busybox authors have a well-established history of file descriptor-related "mental issues" (like descriptor exhaustion due to lack of close() and so on). So the more I think about this, the more I think the bug might be in syslogd itself.
  23. koitsu

    koitsu Network Guru Member

  24. koitsu

    koitsu Network Guru Member

    It looks to me like a good number of fixes have been backported to the Toastman version (comparing these I had to do by hand -- serious PITA), but not all of them. It's obvious because syslogd.c at Busybox's site is 26KBytes, while Toastman's is 19KBytes. The few semi-recent (last 16 or so changes) I reviewed don't seem relevant and most are already in the Toastman version anyway.

    I've put enough time into this for now. Until I can be told a way to reproduce the issue, I'm going to have to advocate that people rely on writing their syslog files a physical disk or the stock (default) memory filesystem, rather than NFS or CIFS/SMB. If I can be told how to reproduce the issue, I can start trying to troubleshoot where/how the problem happens and what needs to be fixed. If this turns out to be a bug in Busybox code, I'm going to be pretty ticked off (I have basically zero respect for that junk pile).
  25. mstombs

    mstombs Network Guru Member

    Great stuff koitsu - I'm sure the dnsmasq author would be interested in your diagnosis, because even if dnsmasq is not the root cause - it is the thing that goes wrong - I suspect it is the dnsmasq log-async function which explains why he doesn't want to rely on plan syslog.

    My test router running easytomato based on toastman reports the following busybox version

     root@easyRTN16:/tmp/home/root# busybox
    BusyBox v1.18.5 (2012-09-25 15:07:05 EDT) multi-call binary.
    Copyright (C) 1998-2009 Erik Andersen, Rob Landley, Denys Vlasenko
    and others. Licensed under GPLv2.
    I do know that Rodney has compiled a much more recent version

    There used to be some patches to the busybox inside tomatousb firmware (udhcpc for example), don't know if they are still needed, or how to easily use a different binary.

    I'm sure busybox devs would want to know if the latest version exhibits the bad behaviour!

    I do know kiwi syslog server runs on windows and can accept remote syslog from routers - but I have only ever used the free version.
  26. gfunkdave

    gfunkdave LI Guru Member

    Wow, thanks koitsu. As you suspected, logging to a CIFS share on a Linux box works fine. So it's apparently an issue with syslog writing to a CIFS share that is hosted on a Windows machine (and maybe even a Windows Server 2008 R2 machine).

    I intentionally told it to log everything in an effort to not wait long before it tried to roll over the log file. Ordinarily it doesn't come up much. I also set it to try to roll over at 25 kB, and experience the same issue.

    I am the administrator for the Windows machine, and I didn't see anything in the logs about this...but of course, I could have overlooked something.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice