1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Reconciling Memory Reports

Discussion in 'Tomato Firmware' started by Planiwa, Nov 13, 2009.

  1. Planiwa

    Planiwa LI Guru Member

    Code:
    # cat /proc/meminfo
            total:    used:    free:  shared: buffers:  cached:
    Mem:  14823424 13463552  1359872        0   966656  4423680
    Swap:        0        0        0
    MemTotal:        14476 kB
    MemFree:          1328 kB
    MemShared:           0 kB
    Buffers:           944 kB
    Cached:           4320 kB
    SwapCached:          0 kB
    Active:           4368 kB
    Inactive:         2672 kB
    HighTotal:           0 kB
    HighFree:            0 kB
    LowTotal:        14476 kB
    LowFree:          1328 kB
    SwapTotal:           0 kB
    SwapFree:            0 kB
    # 
    # echo $(( 944 + 4320 + 4368 + 2672 )) 
    12304
    # echo $(( 12304 + 1328 ))            
    13632
    
    The numbers don't exactly add up. In any case, this tells us that total memory is
    14823424 == 14476 KiB (not kB).

    Meanwhile, the GUI Overview page says Total memory is:

    Code:
    Total / Free Memory	14.14 MB / 5,616.00 KB (38.80%)
    This does agree with the meminfo total: 14823424 is indeed 14.14 MiB

    Now, let's look at top:

    Code:
    # TOP
    Mem: 13364K used, 1112K free, 0K shrd, 944K buff, 4320K cached
    CPU:   7% usr  38% sys   0% nic  53% idle   0% io   0% irq   0% sirq
    Load average: 0.00 0.00 0.00 1/26 787
      PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
      786   784 root     R     1948  13%  15% top 
    32437 32436 root     S     1976  14%   0% -sh 
      735   734 root     S     1972  14%   0% -sh 
      106     1 root     S     1960  14%   0% crond -l 9 
      784   735 root     S     1956  13%   0% /bin/sh /home/root/TOP 
      787   784 root     S     1952  13%   0% sed 1s/...// 
       71     1 root     S     1948  13%   0% syslogd -L -s 50 
       74     1 root     S     1932  13%   0% klogd 
        1     0 root     S     1772  12%   0% init noinitrd 
    11555     1 root     S     1720  12%   0% redial 
       26     1 root     S     1720  12%   0% buttons 
       99     1 root     S     1708  12%   0% nas /etc/nas.conf /var/run/nas.pid lan
      113     1 root     S     1684  12%   0% httpd -s 
    32436    72 root     S     1600  11%   0% dropbear -p 22 
      734    72 root     S     1600  11%   0% dropbear -p 22 
       72     1 root     S     1532  11%   0% dropbear -p 22 
      109     1 root     S     1256   9%   0% rstats 
    11563     1 nobody   S      884   6%   0% dnsmasq 
    11556     1 root     S      856   6%   0% pppoecd vlan1 -u XXXXXXXXXXXXXXXXXXX -
        3     1 root     SWN      0   0%   0% [ksoftirqd_CPU0]
    # 
    
    Let's look at the VSZ and %MEM columns of TOP:

    The %MEM values add up to about 200%. How is this possible, given that there is no storage other than RAM? Are the inactive pages compressed and stashed in Cached?

    Let's run TOP again, and kill the two dead processes, one at a time:

    Code:
    # TOP
    Mem: 13392K used, 1084K free, 0K shrd, 944K buff, 4320K cached
    CPU:   0% usr  15% sys   0% nic  84% idle   0% io   0% irq   0% sirq
    Load average: 0.00 0.00 0.00 1/26 850
      PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
      849   847 root     R     1948  13%  15% top 
    32437 32436 root     S     1976  14%   0% -sh 
      735   734 root     S     1972  14%   0% -sh 
      106     1 root     S     1960  14%   0% crond -l 9 
      847   735 root     S     1956  13%   0% /bin/sh /home/root/TOP 
      850   847 root     S     1952  13%   0% sed 1s/...// 
       71     1 root     S     1948  13%   0% syslogd -L -s 50 
       74     1 root     S     1932  13%   0% klogd 
        1     0 root     S     1772  12%   0% init noinitrd 
    11555     1 root     S     1720  12%   0% redial 
       26     1 root     S     1720  12%   0% buttons 
       99     1 root     S     1708  12%   0% nas /etc/nas.conf /var/run/nas.pid lan
      113     1 root     S     1684  12%   0% httpd -s 
    32436    72 root     S     1600  11%   0% dropbear -p 22 
      734    72 root     S     1600  11%   0% dropbear -p 22 
       72     1 root     S     1532  11%   0% dropbear -p 22 
      109     1 root     S     1256   9%   0% rstats 
    11563     1 nobody   S      884   6%   0% dnsmasq 
    11556     1 root     S      856   6%   0% pppoecd vlan1 -u XXXXXXXXXXXXXXXXXXX -
        3     1 root     SWN      0   0%   0% [ksoftirqd_CPU0]
    # kill -9 32437
    # TOP
    Mem: 13268K used, 1208K free, 0K shrd, 944K buff, 4320K cached
    CPU:  15% usr  23% sys   0% nic  61% idle   0% io   0% irq   0% sirq
    Load average: 0.00 0.00 0.00 1/25 855
      PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
      854   852 root     R     1948  13%  15% top 
      735   734 root     S     1972  14%   0% -sh 
      106     1 root     S     1960  14%   0% crond -l 9 
      852   735 root     S     1956  13%   0% /bin/sh /home/root/TOP 
      855   852 root     S     1952  13%   0% sed 1s/...// 
       71     1 root     S     1948  13%   0% syslogd -L -s 50 
       74     1 root     S     1932  13%   0% klogd 
        1     0 root     S     1772  12%   0% init noinitrd 
    11555     1 root     S     1720  12%   0% redial 
       26     1 root     S     1720  12%   0% buttons 
       99     1 root     S     1708  12%   0% nas /etc/nas.conf /var/run/nas.pid lan
      113     1 root     S     1684  12%   0% httpd -s 
    32436    72 root     S     1600  11%   0% dropbear -p 22 
      734    72 root     S     1600  11%   0% dropbear -p 22 
       72     1 root     S     1532  11%   0% dropbear -p 22 
      109     1 root     S     1256   9%   0% rstats 
    11563     1 nobody   S      884   6%   0% dnsmasq 
    11556     1 root     S      856   6%   0% pppoecd vlan1 -u XXXXXXXXXXXXXXXXXXX -
        3     1 root     SWN      0   0%   0% [ksoftirqd_CPU0]
        4     1 root     SW       0   0%   0% [kswapd]
    # kill -9 32436
    # TOP
    Mem: 13000K used, 1476K free, 0K shrd, 944K buff, 4320K cached
    CPU:   0% usr  15% sys   0% nic  84% idle   0% io   0% irq   0% sirq
    Load average: 0.00 0.00 0.00 1/24 860
      PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
      859   857 root     R     1948  13%  15% top 
      735   734 root     S     1972  14%   0% -sh 
      106     1 root     S     1960  14%   0% crond -l 9 
      857   735 root     S     1956  13%   0% /bin/sh /home/root/TOP 
      860   857 root     S     1952  13%   0% sed 1s/...// 
       71     1 root     S     1948  13%   0% syslogd -L -s 50 
       74     1 root     S     1932  13%   0% klogd 
        1     0 root     S     1772  12%   0% init noinitrd 
    11555     1 root     S     1720  12%   0% redial 
       26     1 root     S     1720  12%   0% buttons 
       99     1 root     S     1708  12%   0% nas /etc/nas.conf /var/run/nas.pid lan
      113     1 root     S     1684  12%   0% httpd -s 
      734    72 root     S     1600  11%   0% dropbear -p 22 
       72     1 root     S     1532  11%   0% dropbear -p 22 
      109     1 root     S     1256   9%   0% rstats 
    11563     1 nobody   S      884   6%   0% dnsmasq 
    11556     1 root     S      856   6%   0% pppoecd vlan1 -u XXXXXXXXXXXXXXXXXXX -
        3     1 root     SWN      0   0%   0% [ksoftirqd_CPU0]
        4     1 root     SW       0   0%   0% [kswapd]
        7     1 root     SW       0   0%   0% [mtdblockd]
    # 
    
    So, killing the 2 dead processes with a VSZ of 1976+1600=3576, frees up
    1476-1084=392 K of memory.

    Now, let's try something:

    Code:
    # for i in 1 2 3 4 5 6 7 8 9 10;do sleep 99 &
    > done
    # TOP
    Mem: 13920K used, 556K free, 0K shrd, 944K buff, 4320K cached
    CPU:   7% usr   7% sys   0% nic  84% idle   0% io   0% irq   0% sirq
    Load average: 0.00 0.00 0.00 1/34 955
      PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
      954   952 root     R     1952  13%  15% top 
      735   734 root     S     1972  14%   0% -sh 
      106     1 root     S     1960  14%   0% crond -l 9 
      952   735 root     S     1956  13%   0% /bin/sh /home/root/TOP 
      955   952 root     S     1952  13%   0% sed 1s/...// 
       71     1 root     S     1948  13%   0% syslogd -L -s 50 
       74     1 root     S     1932  13%   0% klogd 
      949   735 root     S     1932  13%   0% sleep 99 
      951   735 root     S     1932  13%   0% sleep 99 
      944   735 root     S     1932  13%   0% sleep 99 
      946   735 root     S     1932  13%   0% sleep 99 
      945   735 root     S     1932  13%   0% sleep 99 
      948   735 root     S     1932  13%   0% sleep 99 
      950   735 root     S     1932  13%   0% sleep 99 
      947   735 root     S     1932  13%   0% sleep 99 
      943   735 root     S     1932  13%   0% sleep 99 
      942   735 root     S     1932  13%   0% sleep 99 
        1     0 root     S     1772  12%   0% init noinitrd 
    11555     1 root     S     1720  12%   0% redial 
       26     1 root     S     1720  12%   0% buttons 
    # 
    
    Free space has dropped from 1476 to 556 (by 920K).
    And after a couple of minutes, back up to 1392 -- not quite the previous 1476:

    Code:
    Mem: 13084K used, 1392K free, 0K shrd, 944K buff, 4320K cached
    CPU:   8% usr  41% sys   0% nic  50% idle   0% io   0% irq   0% sirq
    Load average: 0.00 0.00 0.00 1/24 967
      PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
      964   735 root     S     1956  13%   8% /bin/sh /home/root/TOP 
      966   964 root     R     1948  13%   8% top 
      735   734 root     S     1972  14%   0% -sh 
      106     1 root     S     1960  14%   0% crond -l 9 
      967   964 root     S     1952  13%   0% sed 1s/...// 
       71     1 root     S     1948  13%   0% syslogd -L -s 50 
       74     1 root     S     1932  13%   0% klogd 
        1     0 root     S     1772  12%   0% init noinitrd 
    11555     1 root     S     1720  12%   0% redial 
       26     1 root     S     1720  12%   0% buttons 
       99     1 root     S     1708  12%   0% nas /etc/nas.conf /var/run/nas.pid lan
      113     1 root     S     1684  12%   0% httpd -s 
      734    72 root     S     1600  11%   0% dropbear -p 22 
       72     1 root     S     1532  11%   0% dropbear -p 22 
      109     1 root     S     1256   9%   0% rstats 
    11563     1 nobody   S      884   6%   0% dnsmasq 
    11556     1 root     S      856   6%   0% pppoecd vlan1 -u XXXXXXXXXXXXXXXXXXX -
        3     1 root     SWN      0   0%   0% [ksoftirqd_CPU0]
        4     1 root     SW       0   0%   0% [kswapd]
        7     1 root     SW       0   0%   0% [mtdblockd]
    Files take up memory, of course, and this includes the message log files, at 50k each:

    Code:
    # l /var/log
    -rw-rw-rw-    1 root     root         7004 Nov 13 09:59 messages
    -rw-rw-rw-    1 root     root        51287 Nov 12 23:01 messages.0
    
    If a system with just one log file runs low on memory and tries to log this, this may result in an attempt to start a second log file that keeps needing more memory. ...

    Logging in and starting up processes takes memory, of course, possibly resulting in orphaned processes with stuck memory allocations.

    Just using the web GUI will affect memory:

    Code:
    # cat /proc/meminfo
            total:    used:    free:  shared: buffers:  cached:
    Mem:  14823424 13131776  1691648        0   970752  4464640
    
    and after didplaying the QoS Graphs page:

    Code:
    # cat /proc/meminfo
            total:    used:    free:  shared: buffers:  cached:
    Mem:  14823424 13418496  1404928        0   974848  4743168
    
    An instant loss of 287 kB.

    A huge memory user is the Conntrack table. Each entry takes close to 400 bytes.
    So, if you have a sudden connection surge of 1000 or 3000 connections, as may happen with some games or other recklessly (pre-)configured P2P applications, that may mean a sudden demand of 1MB or more.
     
  2. mstombs

    mstombs Network Guru Member

    It should be clear that the VSZ is virtual size, the %MEM is the same info expressed as a percentage.

    The total adds up to much more than 100% because the virtual size includes shared code/libraries - the Busybox utilities, for example, share a lot of common code which all instances claim to have. No benefit in paging when the only disk is a ram disk.
     
  3. Planiwa

    Planiwa LI Guru Member

    I'd like to contribute some more findings that may help users who are interested in understanding how to measure their free memory, what affects it, and how it affects the health of their router.

    This afternoon I ran vit every minute:

    Code:
    091113_15:35:20     VIT: 2 16 5552-1068 .0 2/25 2d 2d 3d ppp0:2771-1384 eth1:4428-2849
    091113_15:36:22     VIT: 2 16 5552-1068 .0 1/25 2d 2d 3d ppp0:2781-1385 eth1:4437-2857
    091113_15:37:25     VIT: 2 15 5552-1068 .0 1/25 2d 2d 3d ppp0:2795-1386 eth1:4452-2871
    091113_15:38:27     VIT: 2 12 5552-1068 .0 1/25 2d 2d 3d ppp0:2795-1386 eth1:4453-2871
    [B]091113_15:39:30     VIT: 2 216 5672-1252 .0 1/25 2d 2d 3d ppp0:2796-1387 eth1:4453-2871
    091113_15:40:32     VIT: 2 261 5672-1256 .1 2/25 2d 2d 3d ppp0:2797-1388 eth1:4454-2871
    [/B]091113_15:41:34     VIT: 2 78 5672-1256 .0 2/25 2d 2d 3d ppp0:2801-1392 eth1:4458-2871
    091113_15:42:37     VIT: 2 39 5672-1256 .0 2/25 2d 2d 3d ppp0:2803-1394 eth1:4460-2871
    091113_15:43:39     VIT: 3 32 5672-1256 .0 1/25 2d 2d 3d ppp0:2805-1396 eth1:4463-2871
    091113_15:44:42     VIT: 3 39 5672-1256 .1 2/25 2d 2d 3d ppp0:2808-1399 eth1:4466-2871
    091113_15:45:44     VIT: 3 39 5672-1256 .0 2/25 2d 2d 3d ppp0:2810-1401 eth1:4468-2871
    
    That looks mostly unremarkable. It appears that the free RAM remained fairly constant, rising about 100kB (2%) at 15:39. It appears that a sharp rise in Connections was accompanied by a slight rise in free space. Hm.

    Let's look at the syslog:

    Code:
    Nov 13 15:39:03 ROUTER user.info VIT: 2 53 [B]4832-412[/B] .0 2/33 2d 2d 3d ppp0:2796-1387 eth1:4453-2871
    
    We are fortunate to have a measurement about half-way between the two measurements where there appeared to be slight rise in free RAM.

    This shows that there actually was a huge drop in free RAM!

    It appears that there was a connection storm which was completely missed by the once-per-minute vit probe, but the half-hourly probe just happened to catch it earlier.


    Here is something else. Invoking a shell takes up about 100k of real RAM.

    Thus:

    Code:
    # . mem;mem
    6160-1564
    6066-1470
    #
    # cat mem
    set -- $(cat /proc/meminfo); set -- $((${10}+${13})) ${10}; echo ${1%???}-${2%???}
    # 
    

    It appears that free memory starts out quite high at reboot. Then it drops with use.
    I have a feeling it may have to do with conntrack table allocation (i.e. Connections).
    This needs more research.
     
  4. Planiwa

    Planiwa LI Guru Member

    Caution: Scheduler may be hazardous to your Free Space

    So, I've been noticing that vit always reports much less free space and several extra processes, when run from Scheduler. Here is what I found:

    run from Scheduler, using logger:
    Code:
    Nov 14 02:08:03 ROUTER user.info VIT: 3  38 5595-1150 .1 2/29 5h 5h 5h ppp0:1100-82 eth1:1132-92
    Nov 14 02:23:03 ROUTER user.info VIT: 2  33 5652-1204 .1 1/28 5h 5h 5h ppp0:1139-84 eth1:1173-95
    Nov 14 02:38:02 ROUTER user.info VIT: 1   5 5652-1204 .2 1/28 5h 5h 5h ppp0:1146-85 eth1:1180-96
    Nov 14 02:53:04 ROUTER user.info VIT: 1   6 5115-667 .1 3/33 6h 6h 6h ppp0:1151-86 eth1:1185-97
    contrast this with, run from the ssh session:
    Code:
    VIT: 091114_02:23:04 2  33 6119-1671 .1 1/24 5h 5h 5h ppp0:1139-84 eth1:1173-95
    ...
    VIT: 091114_02:52:46 1   5 6127-1683 .1 1/23 6h 6h 6h ppp0:1151-86 eth1:1185-97
    VIT: 091114_02:53:02 1   6 5128-679 .1 2/34 6h 6h 6h ppp0:1151-86 eth1:1185-97
    VIT: 091114_02:53:20 1   9 6193-1744 .1 1/23 6h 6h 6h ppp0:1151-86 eth1:1185-97
    
    The Scheduler consumed 1 Megabyte of RAM! Here is what it did:

    Code:
    Sat Nov 14 02:53:02 EST 2009
      PID USER       VSZ STAT COMMAND
        1 root      1768 S    init noinitrd 
        2 root         0 SW   [keventd]
        3 root         0 SWN  [ksoftirqd_CPU0]
        4 root         0 SW   [kswapd]
        5 root         0 SW   [bdflush]
        6 root         0 SW   [kupdated]
        7 root         0 SW   [mtdblockd]
       26 root      1720 S    buttons 
       28 root         0 SWN  [jffs2_gcd_mtd3]
       66 root      1720 S    redial 
       68 root       856 S    pppoecd vlan1 -u XXXXXXXXXXXXXXXXXXX -p XXXXXXXX -r 1
       73 root      1948 S    syslogd -L -s 50 
       74 root      1532 S    dropbear -p 22 
       76 root      1932 S    klogd 
       96 root      1708 S    nas /etc/nas.conf /var/run/nas.pid lan 
      103 root      1960 S    crond -l 9 
      106 root      1256 S    rstats 
      110 root      1684 S    httpd -s 
      502 root      1600 S    dropbear -p 22 
      503 root      1976 S    -sh 
      881 nobody     876 S    dnsmasq 
     7054 root      1980 R    /bin/sh /home/root/vit -d -s 15 
    [B]15518 root         0 Z    [sh]
    15519 root      1960 S    /bin/sh -c sched sch_c2 
    15522 root      1724 S    sched sch_c2 
    15538 root      1956 S    /bin/sh /tmp/sch_c2_cmd.sh  
    15630 root         0 Z    [[]
    [/B]15632 root      1948 R    ps 
    0.11 0.12 0.09 2/28 15633
    
    Not only does it fire up 5 processes, but 2 of them are Zombies!
    Isn't that interesting?

    So, when the router crashes randomly, perhaps it is triggered by the Scheduler, which suddenly wants a Megabyte of RAM.

    Might be preferable to use cron or cru in the init script, e.g.:

    cru a Xfer "59 * * * * /root/xfer"

    this runs the xfer command at 1 minute before each hour is up.
     
  5. TexasFlood

    TexasFlood Network Guru Member

    Cron is what I've been using, cru to add to crontab.
     

Share This Page