Check if a script is already running? - bash & TomatoRAF

Discussion in 'Tomato Firmware' started by MatteoV, Jan 6, 2014.

  1. MatteoV

    MatteoV Networkin' Nut Member

    first of all let me kindly ask if this is the right place for asking help about bash scripting on Tomato (RAF, by +Victek). Excuse me if it is not...

    I recently wrote a script that emails me warnings when unknown devices appear in my network. And a lot others...due to I sometimes like to give me a scripting goal and reach it :p
    Now, I have some questions/problems and would like to ask for your help.

    Many scripts are just cronjobs that I schedule running for every minute. Some have long waiting times (sleep) because they must, for example, wait for a PC to wake up, or something similar. I am concerned of the fact that another instance of the same script run by cron will "overwhelm" the older instance (due to my code) if it is still running. I can't avoid the waitings of more than a minute, sometimes, so I always went the way: let's at first check if there are multiple instances of "me", then go on just if there are not. When one of the script run is still in progress to do its duties there's no problem missing a run.

    But it's not that easy. Scripts run code of course and it seems they are running multiple times to the classic commands I try to use like:
    CHECK_RUNNING=`ps -w | grep $SCRIPT_NAME | grep -v grep | wc -l`
    ...even if that's for sure a single instance I can get 2-3 as a response. And I don't really have an idea on how to understand more on this and avoid faulties in my code....for now all my scripts just consider "multiple instance" a response of 4 or more. But I don't feel that's reliable, is it?

    Can you give me some suggestions on how to do it right?

    Thank you.
  2. koitsu

    koitsu Network Guru Member

    Try using pidof, ex. pidof scriptname. You will need to mess about to figure out what the exact argument should be for scriptname. pidof will return an exit code of 0 if it found a matching process, otherwise it'll return 1. Example:

    pid="`pidof miniupnpd`"
    if [ $? -eq 0 ]; then
      echo "miniupnpd is not running"
      echo "miniupnpd is running, PID $pid"
    The use of quotes in the assignment for variable $pid is important.

    If your problem is that you only want one instance of a script to be running simultaneously, then that is incredibly easy: implement use of a PID file within your script (and when the script ends remove the PID file), and make the script look for the existence of that PID file. If it exists, it means there's already a script running and you should exit 0 at that point.

    The only situations where this bites you is if you're storing the PID file on something whose contents get restored after reboot (e.g. a JFFS partition, or ext3 partition on a USB stick, etc.). You want to store the PID file in some place that's RAM -- the proper place is /var/run -- so that if the router reboots (administratively or otherwise) while the script is running, by using RAM there will be no stale PID file left over.

    Implementing what I describe is probably 6 or 7 lines of shell. It's not hard and is very common.

    An alternate solution to use of a pidfile is to use flock with an exclusive lock (the default), in addition to -n (very important), so that the flock -n call fails immediately if there's already a lock in place. This is "more advanced" than what I describe, but the premise is the same. It's too bad the Busybox documentation for flock is awful (for example I have no idea what the -c flag does per usage syntax). You definitely want -n though, because otherwise you'd have a ton of scripts running and blocking (waiting indefinitely) for the already-running script to finish, which would result in the equivalent of a cascading forkbomb in the case the script took longer to finish than the interval of the cronjob.

    P.S. -- This is UNIX shell scripting 101 stuff and really isn't about Tomato or TomatoUSB firmwares at all. stackoverflow would be a better place for this kind of discussion.

    P.P.S. -- The shell on Busybox is not bash. It is "bash-like", but there are many nuances of its parser/implementation that are different from bash. These are more common than you might think. Just remember that -- /bin/sh is not bash on Tomato/Busybox.
    Last edited: Jan 6, 2014
    MatteoV likes this.
  3. MatteoV

    MatteoV Networkin' Nut Member

    Koitsu, thank you! Very useful post.
    The pid file seems a perfect solution, or flock which I will investigate.
    Also the pidof suggestion woke many ideas for improving much code that actually uses ps and awk just to gather the pid and kill some otherwise unkillable script!

    P.S. I always confuse bash with busybox, it's clear they are know..that time actually passed understanding if and how to adapt one of the plenty bash solutions available online :)

    Found more info on the flock system and I love it.
    This is the approach that seems to be working perfect:
    #Script settings
    SCRIPT_BIN="$(basename $0)"
        # Check for the lock on $LOCK_FILE (fd 200) or exit
        flock -xn 200 || {
            logger -t crond.stop "$SCRIPT_NAME Script had FLock. Check running was $CHECK_RUNNING. Aborted this instance."
            exit 1
        #No lock, OK, let's go on
        echo $$ 1>&200
        trap cleanup INT TERM EXIT QUIT KILL STOP # call cleanup() if script exits
        cleanup() {
              flock -u 200
        #Script code here
        exit 0
    ) 200>"$LOCK_FILE"
    I found out I can close/kill the sh script and lock is being released correctly, after implementing the trap rule. What about the INT TERM EXIT QUIT KILL STOP, are they enough? I added some of these reading the result of kill -l and deciding on my own, would you add/toggle some of these? Any suggestion is really welcome ;)

    Last edited: Jan 6, 2014
  4. koitsu

    koitsu Network Guru Member

    1) Your logger statement refers to a variable called $CHECK_RUNNING, which isn't defined in that script (at least the part you pasted). If it's somewhere further up / which you didn't paste, then that's cool.

    2) You cannot trap SIGKILL. I would recommend trapping SIGINT, SIGTERM, and SIGQUIT. Do not trap SIGSTOP (trust me on this).

    3) Your script is a little weird -- specifically the use of an arbitrary file descriptor number, which I just don't see the point of. I'm still trying to wrap my brain around why you designed it like this, but right now my brain is also engaged in 4 different other things relating to work, so I'm probably not giving it as much attention/clarity as I should. That's my own fault, not yours. :)
  5. MatteoV

    MatteoV Networkin' Nut Member

    Sorry, my fault. That was my old way to do the check, it's being left there for evident lack of actual reason:)
    It was just:
    CHECK_RUNNING="$(ps -w | grep "$SCRIPT_BIN" | grep -v grep | wc -l)"
    I trust you for sure about STOP.
    About EXIT, will flock take care of deleting the lockfile by itself and remove the lock? I guess it does the latter but not the first, if I understood correctly. Unsure.
    About KILL, well, is that (just wondering...) when I go and kill myself a script with kill PID? I like, again, that the lockfile is deleted and the lock toggled in these occasions, so the next run can go on instead of being locked forever. That's why I thought to trap it. Not sure at know, I'm still wondering: did I understand something about flock, BTW? LOL

    Hem, again, I feel particularly stupid at this, and I felt the same when I tried to understand this descriptor number. I just didn't get what it is, honestly. I've read on stackoverflow an answer saying it's just an arbitrary number, and high enough not to conflict with other locks (200). I have different scripts and I put it in everyone, different, one has 200 the other 300 and so on. If I understood flock correctly (but I guess I did not :) ) that's just a global number indicating that lock. If I don't fix it the same script will have another each time (if it's the script's PID, for example) and the "lock" would just be absent every time even if another script instance is already running, because it would have another descriptor. Is this crazyness or I got it right?

    Also, I admit I don't get if/why I need to put the script's PID into the lockfile. I guess it's not so really needed and it's there just for my sake or particular operations like kill old script precisely, or similar stuff.

    @koitsu don't want to steal your time, anyway, answer if and when you can with no hurry. Scripts work wonderfully thanks to your help already and I feel they are nearly perfect now that they even send e-mail to the right server (local or remote) precisely, without waking the server PC up every time...thanks to msmtp and it :)
    Of course tomorrow it's another day and I will find other million (non?)problems to fix LOL...well, or I'll just think about the work which resumes :p

    Thank you.
    Have a nice time,
  6. jerrm

    jerrm Network Guru Member

    You may not like the self-calling aspect of this, but it is an easy cut and paste at the start of most scripts and doesn't require additional lock files.
    [ "$flockme" != "$0" ] && {
      ( flock -nx 9 || exit 222
        export flockme="$0"
        "$0" "$@"
        # remove the unlock if script launches a background job
        # that needs to complete before running again
        flock -u 9
      ) 9< $0
      [ "$flockx" = "222" ] && {
        echo $0 Already Running - Exiting
        exit 1
      exit $flockx
    Last edited: Jan 7, 2014
  7. mstombs

    mstombs Network Guru Member

    You can use square brackets in scripts to avoid the ps grep finding itself needed the second grep exclusion

    ps | grep [p]program
    But not when prognam is a variable, so another vote for pidof.

    The old "All you need" adblock script used pidof for pixelserv, "lean and mean" currently uses the inefficient double grep jerrm!
  8. jerrm

    jerrm Network Guru Member

    Actually the grep comes closer to accommodating the possibility of more than one pixelserv running, but the "stop" code still blindly does a killall.

    Currently neither the pixelserv startup or stop is how I would probably do it, but I haven't gone there yet.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice