Monitor Heartbeat Performance
Hi there! I want to build a monitor to get the "HeartBeat" from a performance counter. The performance counter alternates by taking the values 0 and 1 every 5 seconds or so. The heatbeat interval may vary as shown below:
¯|_|¯¯¯|_|¯|__|¯|_|¯¯|_____|¯|_|¯¯¯¯¯¯¯|_|¯|_|¯|_ ...
I would like to receive an alert if the heartbeat stays at the same value for over 15 minutes or so. Right now, I don't know what kind of monitor I should use to do that.
Thanks!
Zenar
November 16th, 2010 9:44am
Ok I found an idea but I don't know if it's the best thing to do...
1- I would take the "Consecutive Samples over Threshold"
2- The interval is set to 1 min
3- The "Number of samples" is set to 15 (For 15 min)
But then again, it is possible that the monitor is not lucky and takes samples only when the "heartbeat" is equal to 0 and 1 while the performance counter is still alive...
If I make a statistic, I have 50% of chances that the monitor takes one value. If I multiply 15 times the 50%, the result is 0,000030517578125 (0,003%). So there is a slim chance that I get a false alert over the time...
And again, the monitor allow me to specify only if the value is "Greater than" or "Lower than"... so I must create two monitors only to do that, for the 0 and the 1. I'm thinking that I should create my own script to do that.
Since I'm not used to SCOM, maybe there is already something to do that so I will wait your comments and suggestions.
Zenar
Free Windows Admin Tool Kit Click here and download it now
November 16th, 2010 10:49am
What is the Object\Counter you are sampling?HTH, Jonathan Almquist - MSFT
November 16th, 2010 10:22pm
Hi,
I would like to share the following with you for your reference:
Performance Monitors
http://technet.microsoft.com/en-us/library/ff629408.aspx
I think you can try Delta Threshold, set Threshold value: 0.
Hope this helps.
Thanks.
Nicholas Li - MSFT
Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
November 17th, 2010 12:47am
Ok I found an idea but I don't know if it's the best thing to do...
1- I would take the "Consecutive Samples over Threshold"
2- The interval is set to 1 min
3- The "Number of samples" is set to 15 (For 15 min)
But then again, it is possible that the monitor is not lucky and takes samples only when the "heartbeat" is equal to 0 and 1 while the performance counter is still alive...
If I make a statistic, I have 50% of chances that the monitor takes one value. If I multiply 15 times the 50%, the result is 0,000030517578125 (0,003%). So there is a slim chance that I get a false alert over the time...
And again, the monitor allow me to specify only if the value is "Greater than" or "Lower than"... so I must create two monitors only to do that, for the 0 and the 1. I'm thinking that I should create my own script to do that.
Since I'm not used to SCOM, maybe there is already something to do that so I will wait your comments and suggestions.
Zenar
Be careful with a 1 minute sample rate. Although you think you look at a 15 minutes interval, this monitor actually will do a moving average (whenever the monitor has 15 samples it will not drop them all and collect 15 new ones. it will just drop the oldest sample
when it takes a new sample), so essentially evaluate the monitor every minute, which can cause state changes in a minute and making the alerts useless because they disappear so fast
I'm not sure what you are trying to do, but if you want to reduce your heartbeats alerts, this is not the way to go imho. i wouldn't mess with the heartbeat workflows unless you know exactly what you are doing. Just change the default 3 missed heartbeats
and/or the interval time a heartbeat will be sent. Maybe this is helpful to you to reduce the number of heartbeat failures:
http://jama00.wordpress.com/2010/07/14/what-is-the-optimal-setting-for-my-environment-when-it-comes-to-missed-heartbeats/
Rob Korving
http://jama00.wordpress.com/
November 17th, 2010 1:18am
Ok I found an idea but I don't know if it's the best thing to do...
1- I would take the "Consecutive Samples over Threshold"
2- The interval is set to 1 min
3- The "Number of samples" is set to 15 (For 15 min)
But then again, it is possible that the monitor is not lucky and takes samples only when the "heartbeat" is equal to 0 and 1 while the performance counter is still alive...
If I make a statistic, I have 50% of chances that the monitor takes one value. If I multiply 15 times the 50%, the result is 0,000030517578125 (0,003%). So there is a slim chance that I get a false alert over the time...
And again, the monitor allow me to specify only if the value is "Greater than" or "Lower than"... so I must create two monitors only to do that, for the 0 and the 1. I'm thinking that I should create my own script to do that.
Since I'm not used to SCOM, maybe there is already something to do that so I will wait your comments and suggestions.
Zenar
Be care with a 1 minute sample rate. Although you think you look at a 15 minutes interval, this monitor actually will do a moving average (whenever the monitor has 15 samples it will not drop them all and collect 15 new ones. it will just drop the oldest sample
when it takes a new sample), so essentially evaluate the monitor every minute, which can cause state changes in a minute and making the alerts useless because they disappear so fast
I'm not sure what you are trying to do, but if you want to reduce your heartbeats alerts, this is not the way to go imho. i wouldn't mess with the heartbeat workflows unless you know exactly what you are doing. Just change the default 3 missed heartbeats
and/or the interval time a heartbeat will be sent. Maybe this is helpful to you to reduce the number of heartbeat failures:
http://jama00.wordpress.com/2010/07/14/what-is-the-optimal-setting-for-my-environment-when-it-comes-to-missed-heartbeats/
Rob Korving
http://jama00.wordpress.com/
Free Windows Admin Tool Kit Click here and download it now
November 17th, 2010 1:19am
Well... The heartbeat of the performance counter is to see if the program (service) is still working. The service can always be "started" but can stop working inside so that's why we are monitoring the heartbeat to see if the program is still alive.
For the 1 min interval, I know that the monitor will keep the last X samples. If the program died, the heartbeat will stop too so the alert will not be cleared until we fix the problem. Rob1974, the link you sent to me is blocked by the company I'm working
with so I can't see it :-P
For now, I'm using the idea I left yesterday but I know it's not the proper way to do heartbeat monitoring. If you have any ideas or solutions, feel free to post it. :-)
Thanks for your replies!
Zenar
November 17th, 2010 8:56am
You really mean the heartbeat of opsmgr itself? because it's useless to monitor. When the heartbeat isn't send the health service watcher will pick on that and you'll get a critical by default.
With the monitor you trying to make you'll probably won't get an alert either assuming you make a rule on the agent side. Healthservice stops sending heartbeats, which probably means the agent has a problem, which in turn makes it impossible to send
an alert to a management server. If this is collected by the agent it will send it again when it has connection again aka when there's no problem anymore.
Btw, blocking a common blog site isn't really helpful for an IT department. i'd ask to unblock it (wordpress.com domain totally) :)
Rob Korving
http://jama00.wordpress.com/
Free Windows Admin Tool Kit Click here and download it now
November 17th, 2010 12:21pm
I'd have to agree with Rob. HB is an internal process (communication packet sent from agent to MS), not a performance counter, which the MS uses to determine whether an agent is communicating. Let's say there was a HB counter that you
could sample. This wouldn't be any more reliable than the HB monitor implemented in SCOM.HTH, Jonathan Almquist - MSFT
November 17th, 2010 2:41pm