PSMon is not in the repositories for Fedora 9, Ubuntu Hardy, or openSUSE 11. You can install PSMon using CPAN as described in the PSMon manual.
There is also an install script stored in the utility's support
subdirectory that will take care of installation tasks for you.
PSMon needs a few Perl modules to function. The support/install.sh
script will install those Perl modules for you, or you can get them
from your distribution's package repository first. The advantage of
installing from the package repository is that you can keep the modules
up-to-date through your normal Linux distribution updates. The commands
shown below first install these extra Perl modules, then run the
install script for the PSMon program.
# yum install perl-CPAN perl-YAML
# yum install perl-Config-General perl-Proc-ProcessTable perl-Unix-Syslog
# tar xjf psmon-1.29.tar.bz2
# cd psmon*
# ./support/install.sh
Checking for Config::General ... found
Checking for Proc

rocessTable ... found
Checking for Unix::Syslog ... found
Checking for Getopt

ong ... found
Installing psmon ... done
Installing psmon-config ... done
Installing etc/psmon.conf ... done
Generating HTML documentation support/psmon.html ... done
Installing manual psmon.1 ... done
The configuration file generated by the script has key value pairs
either at the top level of the file or nested inside Process groupings.
The syntax is designed to be similar to that of the Apache
configuration file. There is a special Process * group
that lets you apply settings for all processes. However, this might not
work as you expect -- it could end up killing many processes that you
did not intend to get rid of, so you should avoid using the Process * group.
p>Near the top of the default /etc/psmon.conf file you will see
Disabled True, making PSMon not do anything until you have changed this directive in the configuration file.
PSMon supports a small collection of directives that are designed to
be used at the top level, outside of any Process group. These let you
set the frequency (in seconds, default 60) with which
PSMon will scan the process table. Changing this to 5 seconds will
cause respawns and badly performing processes to be killed more
quickly, but PSMon will consume more CPU time on the machine. The AdminEmail
directive (default root@localhost) lets you set the email address that
PSMon notifies when processes are spawned or killed, or a failure
occurs while it performs those operations.
There are also two directives, NeverKillPID and NeverKillProcessName,
that can be used to protect processes from ever being killed. These two
directives take a space-delimited list of Process IDs (PID) and process
names and default to 1 and a list of kernel threads that you really don't want to kill by mistake.
The example below shows a Process group, which is started and finished with XML-like tags. After the Process
declaration you put the name of the process that you are describing.
You cannot include path information in the process name, and should
omit any command-line options that the command might have taken. Being
able to specify the full path (or a regular expression to match
against) of the process you wish to use PSMon with would be a welcome
enhancement. For the SSH daemon, simply using sshd is not
likely to generate any false hits with other running processes. In this
example the sshd process group ensures that the SSH daemon is up and
running, should it exit or crash for any reason.
<Process sshd>
SpawnCmd /sbin/service sshd start
</Process>
Other directives that you can use in a Process group include Instances, to control the maximum number of process that can be running, and KillCmd,
which lets you specify a custom way to close the process if it is
misbehaving. If KillCmd is not specified, a SIGKILL will be sent to
close the process. You might like to consider using a KillCmd to send a
SIGTERM to the process, wait a few seconds, and then send a stronger
SIGKILL if the process is still around. Another good option for the
KillCmd is to use the /etc/init.d scripts to stop a service.
You can set resource limits for a process using PctCPU, PctMEM, and TTL
directives to set a percentage limit on the CPU and RAM usage and how
long the process can live in total. The PIDFile directive is used to
tell PSMon a file path which contains the process ID of the daemon
which you don't want PSMon to kill. The PIDFile directive is only
useful if you are using the PctCPU, PctMEM, or TTL directives too. As
an example of why you might like to use the PIDFile directive, consider
a daemon that spawns many children to perform network communications.
You might like to make sure that the children do not consume more than
70% of the system's RAM. Using the PIDFile you can tell PSMon not to
kill the main control process, but only the child worker processes if
they start to consume too much memory.
The TTL directive is handy to ensure that processes that are meant
to complete within a known amount of time have done so. For example,
you can limit the updatedb command or the use of unison or find to a
one-hour duration to stop them from running unchecked from a user's
cron job:
<Process find>
ttl 86400
instances 30
</Process>
You can control how verbose PSMon is using the NoEmail, NoEmailOnKill, and NoEmailOnSpawn directives. These all default to False, but setting them to True will result in no emails at all, none on process killing, or none on process spawning, respectively.
You can also set the LogLevel and AdminEmail
directives on a per-process section basis, so you can send email to an
SMS gateway when a very important process such as Apache has crashed.
Changing the LogLevel also affects how failed respawn attempts are
reported. PSMon reports a failure to stop or start a process using the
LogLevel plus one, so setting the Apache group to have a high LogLevel
will also cause PSMon to report respawn errors to syslog with a high
priority.
Sending the USR1 signal to PSMon when it is running as a daemon will
make it rescan the running processes immediately. You can start PSMon
as a daemon using the --daemon command-line option.
Final words
I am not to sold on the idea of killing processes if they are using
too much of a system's resources, since a process may legitimately be
using 95% of the CPU for a few minutes and you wouldn't want it to be
killed. Enforcing a maximum run time, if you select a time well beyond
what most legitimate uses of the command would require, can help to
protect the system from badly behaving cron jobs when you are not
around to notice them. Being able to respawn processes automatically if
they have exited is certainly useful -- although sshd and Apache do not
tend to crash much, you can bet the one time they do is when you board
a airplane for an nine-hour flight. Its multiple capabilities make
PSMon a worthy utility for your system administration toolkit.