Okie.. so I am totally bored at work, lets see what fun I can have with IBM Director and some spare servers lying around.
I thought it might be cool to try and make a rule that checked if a server had been turned off, and then brought it back up.
This would be usefull for us, as we cannot physically administer all the servers we are responsible for, and there are always overzealos ‘admins’ walking about in remote locations who have a tendency to sometimes turn off servers instread of just logging out.
Above you see 2 IBM xseries 306m servers on the bench. They both contain the PCI version of the IBM RSA-II remote management card.
This card gives you a nice web-based management interface for your server, even if its powered down. You can, for example, turn on the server from here, or monitor its vital statistics, and take control of the console.
You can also set up the RSA card to report to a central IBM Director management server. Here you see the server listed as a “physical platform”, and as you can see, it has been associated with the IBM Director Agent component running in windows.
We can do a lot of stuff with the object here, same as via the web interface, as much more, as you can see from the menu.
As you can see above, one of the things we can do from here is turn the power on and off remotely, as with the web interface.
We now have all the bits we need to build a solution.
We have an Alert source: The RSA-II card, and a management server to interpret the alerts, the IBM Director Server
We have an eventing engine that can bind the alert to some actions, and the IBM director server itself is capable of actually performing the actions too.
To start, we create a rule in the IBM Director Event Action Plan builder tool.
It consists of 2 components.. an event filter.. or ‘trigger’ as I like to call it, and then some actions to perform once the trigger is tripped.
In the IBM world, servers and software (such as agents) generate alerts using a bunch of different languages or protocols. You can read about some of them here.
The one we are looking for here is a so-called MPA Alert. Its sent out by hardware like the RSA card, or the xseries BMC (Baseboard Management Controller), or in our case by the RSA-II card.
We are gonna respond to the MPA.Component.Server.Power.Off event, here you see it occuring in the IBM director eventlog:
In our event action plan, we make a Threshold Filter that looks specifically for this event occuring.
We assign a 10 second timeout to the filter, and the Count field is set to 1, it only needs to occur once for the filter to trigger the actions.
Actions are pretty straitforward. We want the server in question to be turned back on when the event is triggered, and we want to recieve some kind of alert of this happening.
As you could see before, the IBM director server itself gives various power options for servers.
Luckily for us, when building events, the Event Action Plan Builder provides an interface to many of these controls for the Eventing engine to use
The last bit we need is a neato mail to be sent to us admins when this event is triggered.
So, now we can take our finished Event Action Plan, and apply it to some objects.
For this to work, you need to apply the Action Plan to the physical platform, not the Agent Object of the server.
Ok.. thats about it.
If all goes well, you will find you can no longer turn off the server, whahah. 😀 (unless you unplug the Ethernet of the RSA card of course). It will turn itself on every time within 10 seconds of being turned off. Talk about uptime!
And to boot, you get a nice mail in your inbox: