What is this? ~~~~~~~~~~~~~ lvs-rrd (for lack of a better name) is a couple of scripts to collect connection data from an LVS, store them in RRD files and later graph that data so that trends can be observed. Basically, it makes pretty graphs ;) Why? ~~~~ I wrote these scripts because I realized that the other methods of collecting data for other monitors (cpu, memory, load, etc) was not directly applicable to collecting LVS data, because the number of real servers could fluctuate depending on scheduled downtime, or adding servers to handle load. I didn't want to have to rewrite my scripts each time to add in another server, especially considering that an RRD file, once created, can't have data sources added. Also, there would be data loss between the time I added a new server and the time that I got around to adding the server to the collection scripts. Ok... so I knew that I was gonna be lazy in the future and wouldn't to deal with this, or re-learn all the RRD stuff I know now, when in 6 months someone decides they want to add a server. Requirements: ~~~~~~~~~~~~~ At least one LVS director (the script goes on each director, for now). The wonderful rrdtool (http://rrdtool.com) Bash v2 - as these are bash scripts. (It reportedly doesn't work w/ bash1) bc - Should be included w/ your distro, but is apparently not installed by default on RH7.3, at least. Optional: a web server of some sort, somewhere, to display the data. Many people put web servers on their directors for a final failover in case all of the real servers are down, so this shouldn't be a problem. Alternatly, you could rsync or otherwise transfer the RRDs to another server to be graphed, if you wish. Or you can NFS mount the directories, or sync just the /proc/net/ip_vs file over, or make it available via a web server, or via inetd/xinetd/tcpserver... Whatever. (You'll need to modify the update script slightly in those cases. Feel free to ask if you need help.) Optional: PHP. This is by no means necessary and I use it simply to run the graphing script when the page is requested, this way the graphs are only generated when they are needed. Setup: ~~~~~~ ** Important ** If you are upgrading from earlier than version 0.3 (anyone?) read the upgrade section below first! ** You can get the latest version of the script at: http://tepedino.org/lvs-rrd/ Extract the file to your web root. It will create a directory called 'lvs-rrd'. Check the top of both scripts (lvs.rrd.update and graph-lvs.sh) and change the variables there to match your setup. Add this line to your crontab (It is no longer necessary to do this as root, as /proc/net/ip_vs is world readable) changing WEBROOT to the appropriate path. * * * * * WEBROOT/lvs-rrd/lvs.rrd.update 2> /dev/null > /dev/null This update script should collect data from any real servers in the cluster. Change the permissions on the 'graphs' directory to be writable by your webserver. It will write the graphs here when the graphing script is called by PHP. If you're using some other method to display the graphs, this step may or may not be necessary. That's it! You'll have to wait a few minutes to start seeing data, so be patient. ** Upgrading ** ~~~~~~~~~~~~~~~ From v0.5 through v0.3 to this version Because of the addition of the protocol (TCP/UDP) in the rrd file names, to prevent same port, different protocol problems (Thanks Xavier!), I have the update script check for old filenames and copy them to the new filename format. I only copy them as a 'just in case' measure. Once you're satisfied that it's all working well, you can run: rm `ls lvs.????????.????.????????.????.???.rrd|cut -f1,2,3,4,5,7 -d.` to remove your old files. You might notice some files still in the old file format. These are machines that were in your cluster at one point, but arn't now (Temporarily down... whatever). You can delete them if they have no useful data, or just leave them there for when the machine comes back up, or keep them cause you've grown fond of them. Doesn't matter. The script will ignore them. There was no change to the actual data contained in the rrd files, just the naming of them. Ealier than v0.3: Some things have changed since versions earlier than 0.3, so you'll need to make some changes so as to not lose your collected data. The names of the update script and the rrd files have changed to allow for more finely grained graphing of the data. First, (after extracting the scripts to your lvs-rrd directory) you'll need to figure out what the new names of your rrd files will be. The easiest way to do this is to run the new lvs.rrd.update script (the new name of the connections.rrd.update script). It will create new files in the form of: lvs.VIP.VIPport.RIP.Rport.rrd where VIP is (Obviously) the VIP, VIPport is the associated port, RIP is the real server IP, and Rport, the associated real server port. So, for example, a real server 10.1.1.10 on port 80 in a virtual server with a VIP of 192.169.1.10 and a port of 80 would have the name: lvs.C0A9010A.0050.0A01010A.0050.rrd So, find the corresponding connections rrd file, delete the new lvs.XXX.rrd file, and hard link the old file to the new name. ie: ln connections.1.10.rrd lvs.C0A9010A.0050.0A01010A.0050.rrd Once that is done, change your crontab to run lvs.rrd.update instead of connections.rrd.update. Now go read the Changelog to see what other changes there were. Usage: ~~~~~~ The update script should just do it's job. No need to worry about it. The graphing script will, by default, graph all of the rrd file's data in one graph (well, 5 graphs, but all the data is in each graph). The graphing script breaks down like this: graph-lvs.sh [-lH] [-I VIP] [-P port] [-i IP] [-p port] -l Lazy (Generate graphs at most once per 5 minutes) -H Output an HTML page. -I VIP Graph only servers in this virtual server -P port Graph only VIPs on this port -i IP Graph only this real server IP -p port Graph only this real server port -r Reverse (flip) active and inactive (positive to negative) -s Separate Active/Inactive graphs (nullifies -r) -l will greaty reduce the loading time of very complicated graphs that are viewed by more than one person, or if you hit reload alot, as the graphs will only be regenerated at most once ever 5 minutes, or whenever the graphs are viewed, whichever is longer. Normally they're generated each time the graphing script is run, and it's suggested to leave lazy off while you are testing the script and changing colors or what have you, and to turn it on when you've got everything settled, to reduce the load on the graphing server. -H will output an HTML page specific for the graphs that were generated. This is useful for when you want to run the script from within a simple php page (as included) and you have several graphs you want to create, as all you will need to do is change the options to the script. -I -P -i and -p are for the VIP, it's port, the real server, and it's port. They're used to limit the rrd files used to generate the graphs. So if you want to graph all connections going to a specific VIP, or a specific real server, or to a particular port, you can. These options are additive in their restrictions. Meaning, if you use them all, you are essentially specifying one rrd file to use (assuming it exists). If you supply an invalid IP or port, that particular option will be ignored, and an error will be output, and the script will then assume all servers are to be included. -r Will swap positive and negative ends of the graph, in case your setup produces more inactive than active connections, or you just like it that way. -s outputs a separate graph (the second named with a "-I" just before the ".gif") for active and inactive connections. This is useful for people with lopsided data (ie: 7000 inactive and 5 active) which is common w/ masq setups. Example: graphs-lvs.sh -lH -I 192.168.1.10 will graph connections to that VIP on all ports for all real servers that are in that virtual server, at most once every 5 minutes and it will output an HTML page to display those graphs. The supplied php page has just one line. All you have to do is add command line options as you like and make copies for each service or servers you want to graph. Optional changes: ~~~~~~~~~~~~~~~~~ Right now the colors for each server are created by adding or subtracting the specified colors the base color, an amount which is determined by how many real servers are in the cluster. The actual equation is simply 255/(# of RSs). As it is currently set up, the servers start off with the base color and after the first server is graphed, the selected colors are added or subtracted from the base. At the top of the graphing script you'll see 6 variables: ARED, AGREEN, ABLUE, IRED, IGREEN and IBLUE. You can set these variables to either a two digit hex number or either A or S. "A" will add color to each successive server, where "S" will subtract color. Also, there are variables for various other colors on the graphs. Play around with them to find what you like. Limitations: ~~~~~~~~~~~~ I've tested the script with up to 28 real servers (well, 2 servers copied a bunch of times) and it works, and creates a really nice gradient in the process. I assume, though, that there's some limit to the number of real servers that it can handle, as 28 servers creates one hell of a command line to be run. Plus, while the speed of 2 servers is perfectly acceptable (about 1 second to generate all 5 graphs on a p4 1.8ghz) 28 servers takes about 10 seconds. Now, I know that it may have been better to use the RRD perl bindings to write this script, but my perl is a little rusty, and I'm not even sure it would be all that much faster. Besides, I already had shell scripts that created these graphs, I just wanted to write them so the data collection and graphing was dynamic, so it could change with the number of real servers. I know this works with 2.4.x kernels. I have yet to test it with 2.6.x kernels. I think the /proc/net/ip_vs file is the same, but I haven't had a chance to check. Anyone? Who are you? ~~~~~~~~~~~~ My name is Sal Tepedino and I'm just a SysAdmin that needed a way to pass the time while I was looking for another job (Need a Sysadmin?) If you have any questions/comments/suggestions, feel free to write me at sal at tepedino dot org. I'd also like to know if anyone's actually using this script, and if you like it, hate it, whatever. Disclaimer ~~~~~~~~~~ Although I don't think it likely that this script will cause any damage, if it does, I am not responsible. You should have looked over these scripts before you ran them, shouldn't you? There is no warranty that comes with this software, and I will not fix anything that may break as a result of using this script. License ~~~~~~~ You may distribute and modify this script freely and without charge as long as you acknowledge me (Sal Tepedino) in the documentation as the original author of the script, and you state clearly if you have made any modifications to the script, or the package containing the script. You may not charge for this software, although if you want to charge for setting it up, feel free, but don't charge much as it's real easy to set up, ok?