Basic Server Monitoring with Python
Uptime is an important statistic for anyone in the internet business. For a system administrator to keep track of uptime, a reliable method for monitoring servers and services is key. A reliable, offsite shell acount like Devio.us lends itself to this task quite naturally. In this post, I will demonstrate:
- How to write a simple script that connects to a server
- How to make scripts report to you via email
- How to set scripts and programs to run at a scheduled time
Writing a script to monitor a service
The first step in building a server monitoring system is defining the requirements for the services you need to monitor. For my personal use, it would be nice if the monitor tool were able to check arbitrary ports by attempting to connect to them. Additionally, I would like my monitor to be able to pull web pages from HTTP and HTTPS and check for validity. Python's built-in modules support a wide number of protocols natively, so if I use Python as the language for this project, I would be able to easily extend it in the future. The first requirement is easily solved in Python by using the socket module, while the latter will use the urllib2 module. Now for some code:
Breaking this down a little bit:
- Line 1 is a requisite for running a python script directly (i.e. without invoking python first).
- Lines 3-5 load the modules for HTTP(S) connections, TCP connections, and command line arguments, respectively.
- Lines 7-15 establish a TCP connection to an arbitrary hostname:port, and report if the attempt was successful.
- Lines 17-22 establish makes an HTTP(S) request to a server, and reports if the attempt was successful.
- Lines 24-28 parse the arguments passed to the script, and determine which test was requested.
- Lines 30-34 are invoked when the script is initially called on the command line. First the number of arguments is checked, then the actual server test is run. If the test fails, it prints an error message.
That was pretty painless. To test it, save it as service-monitor.py, and then set the script as executable:
And to test it against Devio.us' SSH service, we invoke it on the command line:
Nothing printed, so it must have worked just fine! Now lets test the Devio.us website: Again, nothing printed, so the site must be up and running just fine! But we should really test out the error functionality:Looking good! Unfortunately, we will need it to do a little more than that before it becomes truely useful.
Adding email alerts to the monitor
So the first iteration of this service monitor was good; it accurately assessed whether a service was accessible. However, the monitor would not be very useful if it were not running on the computer in front of me. If I were on a different computer in another part of the world, I would have no way of knowing my server had crashed. To alleviate this problem, I will add email alerts when the service is unreachable.
Above you will see the following modifications:
- Line 3 imports the system() call from the os module. This will allow the monitor to execute other programs.
- Line 7 imports asctime from the time module. This will be put in the email body so that uptime can be accurately gagued.
- Line 32-35 build the email message, and invoke the mail program.
- Line 38 allows an additional argument: an email address.
- Line 41 changes the print command to an email command, instead of writing an error message to the screen, the error will now be sent to an email account of someone who cares.
With the changes above made, testing is in order. The script is invoked as above, except with an extra argument: an email address. Thus, to test Devio.us' SSH service:
If the above check were to fail, I would receive an email telling me. That's excellent, but I still have to invoke this each time I want to run a check. In order to free up time for better things, I will have to schedule this to execute routinely using cron.
Using cron to put it all together
cron is a handy tool for scheduling programs to execute on Devio.us while you are away. To tell cron to run the monitor every five minutes, invoke crontab's edit mode:
and add a new line containing:
Make sure that there is an extra line at the end of the file! Now the monitor will test my website every 5 minutes, and email me if there is any problems.
But that was too hard!
I realize that not everyone is a programmer, and not everyone has the time to debug and test python code. For that reason I have included the full source for my version of this system monitor below. I hope you find this post and the code included on it useful. If you find any errors, or have any suggestions, please feel free to comment below.
Cheers,
Motoma