Everything's ok alarm
Wim van der Ham last edited by
I'm looking for a way to receive an "everything's ok alarm". IOW an alarm that is really a signal that monitoring is running ok and I don't need to worry about missing a real alarm.
I know there's no way to get a user defined message in the database log file, so I need to create an alert from some monitor program.
What's the best way to implement this?
Rob Fitzpatrick last edited by
I can't tell you what's best for you but in my experience, people eventually become deaf and blind to "alarms" that contain information that can essentially be ignored, i.e. alarms that are not actionable. Do you really want 24 alarms per day? Or 12? When you don't receive one on schedule, will you notice immediately?
I think a better approach is to have two processes on separate systems that can monitor each other, each able to alert when the other becomes unresponsive. If they run on separate infrastructure then the chance of both failing simultaneously becomes very small. I believe the paid ProTop functions this way.
paul administrators last edited by
Rob is correct, and ProTop already has the equivalent alarm: it's called the heartbeat. If monitoring is NOT running OK for any given monitored resource, the web portal generates an alert saying "HEY! I haven't heard anything from the ProTop agent for siteName.ResourceName for the past x minutes!!". This is much more interesting than getting an hourly or daily "Everything is OK!" email as you may not notice if you miss one or two or five.
The default heartbeat check interval is 1200 seconds. After that, a heartbeat alert is generated. Note however that you must configure in ProTop where your heartbeat alerts are sent. They can stay in the web portal, they can be sent to an email list, or they can be sent to an emergency "pager" list (typically to your mobile phone).
The various web portals around the world also ping each other via email to make sure that outbound alerting is always working. if one of the portals fails to send emails for more than a few minutes, one of the other portals alerts our devOps team so that we can investigate.