Management Server grayed out
Hello,
One of our management server remains grayed out. The RMS is fine.
I restarted the service System Center Management. The Management Server came green healthy but after 30 minutes it is grayed out again.
I resbooted the Managerement server. The Management Server came green healthy but after 30 minutes it is grayed out again.
it is not stable !!! any idea?
It happened a little bit after 7 pm last night and the operations manager log has filled up desgtroyed any log from this time.
the first error available is by 6:35 a.m. this morning
Log Name: Operations Manager
Source: HealthService
Date: 5/5/2011 6:32:56 AM
Event ID: 4506
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: opmgrms1.ad
Description:
Data was dropped due to too much outstanding data in rule "Microsoft.Linux.SLES.11.LogicalDisk.DiskReadsPerSecond.Collection" running for instance "/" with id:"{12945CB4-9927-6A00-572E-2CC215817557}" in management group "SCOM-MED".
and
Log Name: Operations Manager
Source: HealthService
Date: 5/5/2011 1:08:22 PM
Event ID: 2115
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: opmgrms1.ad.medctr.ucla.edu
Description:
A Bind Data Source in Management Group SCOM-MED has posted items to the workflow, but has not received a response in 9840 seconds. This indicates a performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectEntityHealthStateChange
Instance : opmgrms1.ad.medctr.ucla.edu
Instance Id : {EA878F39-4DF3-0145-F7C2-50BE6A431D96}
Nothing in the System log ... neither in the Application log
The Agents using this Management Server look okay...
looking at
http://blogs.technet.com/b/kevinholman/archive/2008/04/21/event-id-2115-a-bind-data-source-in-management-group.aspx
The server took longer to go back to gray but the event 2115 are still coming in with the new threshold found in the article,.
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
May 5th, 2011 3:48pm
hello Dan,
There has not been new Management Pack for the last three months.
We had suppress 95% of the alerts and tuned for the past two years making the system working slow but working for the past 8 months...
what is strange is everything was working better with the old hardware ... slow but working... the RMS server was hosting its own database and the datawarehouse is hosting its own as well and it was working!!!
now the hardware is "supposely" faster and better but we have issues ... !!! The OperationsManager is on a SQL Cluster out of the RMS and we have more issues !!! it is faster when it works for the10-30 minutes after the restart of the service but it seems
creating more traffic than before...
Working on the rules now:
Rule: Data Warehouse performance collection: writer average batch processing time
Target: Data Warehouse Connection Server
Object: OpsMgr DW Writer Module
Counter: Avg. Batch Processing Time, ms
Collect Operations Manager DB Write Action Modules\Avg. Processing Time
Collects Avg. Batch Size performance counter.
Collection Server
OpsMgr DB Write Action Modules
Avg. Processing Time
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
Free Windows Admin Tool Kit Click here and download it now
May 5th, 2011 5:38pm
hello Dan,
There has not been new Management Pack for the last three months.
We had suppress 95% of the alerts and tuned for the past two years making the system working slow but working for the past 8 months...
what is strange is everything was working better with the old hardware ... slow but working... the RMS server was hosting its own database and the datawarehouse is hosting its own as well and it was working!!!
now the hardware is "supposely" faster and better but we have issues ... !!! The OperationsManager is on a SQL Cluster out of the RMS and we have more issues !!! it is faster when it works for the10-30 minutes after the restart of the service but it seems
creating more traffic than before...
Working on the rules now:
Rule: Data Warehouse performance collection: writer average batch processing time
Target: Data Warehouse Connection Server
Object: OpsMgr DW Writer Module
Counter: Avg. Batch Processing Time, ms
Collect Operations Manager DB Write Action Modules\Avg. Processing Time
Collects Avg. Batch Size performance counter.
Collection Server
OpsMgr DB Write Action Modules
Avg. Processing Time
I have created a Dashboard with the four views containing the 2 Servers (OM - DW) x 2 Rules (Avg.Bacth Size, Avg. Batch processing Time)
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
May 5th, 2011 5:38pm
Have you added more agents? or ... just a guess ... do you have workflows running on the RMS? HP blade hardware MP? Using the RMS as a watcher node?Microsoft Corporation
Free Windows Admin Tool Kit Click here and download it now
May 5th, 2011 6:49pm
Have you added more agents? or ... just a guess ... do you have workflows running on the RMS? HP blade hardware MP? Using the RMS as a watcher node?
Microsoft Corporation
Hi Dan,
1. the number of agents has grown from 550 to 600 within the last 6 months...
2. I will have to check this as there are some workflows running on the RMS for sure I will need to identify them. Most of the workflows are running on the Management Servers ( the one which are grayed out ... one new one today failing!!!)
3. Yes. We have the HP Blade hardware MP set since about 12 months now
4. I think you catch the bottleneck... the RMS was used as a watcher node to start the Manual Ping (really not a good idea!!!) and as the list has been expanded it might be the issue, even if these alerts are still coming in propely and
are the only one doing so. I will move the Watcher node soemwhere else. But this has happened at least three weeks ago... and the issue came only last night !!! Also the servers which are grayed out are the
Management Servers not the Root Management Server (Watcher Node).
but why this is happening when we have more "performent hardware than before and better architecture having the SQL Operations Manager Database out of the RMS... there still something strange as except the hardware change for the SQL Database nothing changed
recently...
Using the dashboard I am trying to confirm this is the issue but I need to expand the dashboard to each instance now for each counters...
- discoverywriteitemmodule
- eventwritemodule
- performancesignaturewritemodule
- performancewritemodule
- sqlwritemodule
- statechangewritemodule
I am also checking all the 2115 sources as it seems not only on the datawarehouse but more workflows outside are involved too.
I am reducing the number of machines pinged to see if the Management Servers are coming back up. Also I place a request for VMs to be created and dedicated to the Watcher Nodes. with 1,000 Pings (including 600 servers) items I will
check how many do I need I think at least 3 but if I do a cross pinging I might need 6 or 7, am I right?
I saw an article with only 99 items on the watcher node recommended is it ok... as the MS engineer had planed for 700-750 for us during his installation mission.
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
May 5th, 2011 7:21pm
Have you added more agents? or ... just a guess ... do you have workflows running on the RMS? HP blade hardware MP? Using the RMS as a watcher node?
Microsoft Corporation
Hi Dan,
1. the number of agents has grown from 550 to 600 within the last 6 months...
2. I will have to check this as there are some workflows running on the RMS for sure I will need to identify them.
3. Yes. We have the HP Blade hardware MP set since about 12 months now
4. I think you catch the bottleneck... the RMS was used as a watcher node to start the Manual Ping and as the list has been expanded it might be the issue, even if these alerts are still coming in propely and are the only one doing so. I will move the Watcher
node soemwhere else. But this has happened at least three weeks ago... and the issue came only last night !!!
but why this is happening when we have more "performent hardware than before and better architecture having the SQL Operations Manager Database out of the RMS... there still something strange as except the hardware change for the SQL Database nothing changed
recently...
Using the dashboard I am trying to confirm this is the issue but I need to expand the dashboard to each instance now for each counters...
- discoverywriteitemmodule
- eventwritemodule
- performancesignaturewritemodule
- performancewritemodule
- sqlwritemodule
- statechangewritemodule
I am also checking all the 2115 sources as it seems not only on the datawarehouse but more wrokflows outside are involved too.
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
Free Windows Admin Tool Kit Click here and download it now
May 5th, 2011 7:23pm
Have you added more agents? or ... just a guess ... do you have workflows running on the RMS? HP blade hardware MP? Using the RMS as a watcher node?
Microsoft Corporation
Hi Dan,
1. the number of agents has grown from 550 to 600 within the last 6 months...
2. I will have to check this as there are some workflows running on the RMS for sure I will need to identify them. Most of the workflows are running on the Management Servers ( the one which are grayed out ... one new one today failing!!!)
3. Yes. We have the HP Blade hardware MP set since about 12 months now
4. I think you catch the bottleneck... the RMS was used as a watcher node to start the Manual Ping (really not a good idea!!!) and as the list has been expanded it might be the issue, even if these alerts are still coming in propely and
are the only one doing so. I will move the Watcher node soemwhere else. But this has happened at least three weeks ago... and the issue came only last night !!! Also the servers which are grayed out are the
Management Servers not the Root Management Server (Watcher Node).
but why this is happening when we have more "performent hardware than before and better architecture having the SQL Operations Manager Database out of the RMS... there still something strange as except the hardware change for the SQL Database nothing changed
recently...
Using the dashboard I am trying to confirm this is the issue but I need to expand the dashboard to each instance now for each counters...
- discoverywriteitemmodule
- eventwritemodule
- performancesignaturewritemodule
- performancewritemodule
- sqlwritemodule
- statechangewritemodule
I am also checking all the 2115 sources as it seems not only on the datawarehouse but more workflows outside are involved too.
Microsoft.SystemCenter.CollectAlerts Microsoft.SystemCenter.CollectDiscoveryData Microsoft.SystemCenter.CollectEventData Microsoft.SystemCenter.CollectPerformanceData Microsoft.SystemCenter.CollectPublishedEntityState Microsoft.SystemCenter.CollectSignatureData Microsoft.SystemCenter.DataWarehouse.CollectEntityHealthStateChange Microsoft.SystemCenter.DataWarehouse.CollectEventData Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData
33 events of each within 1 hour!!!
I am reducing the number of machines pinged to see if the Management Servers are coming back up. Also I place a request for VMs to be created and dedicated to the Watcher Nodes. with 1,000 Pings (including 600 servers) items I will
check how many do I need I think at least 3 but if I do a cross pinging I might need 6 or 7, am I right?
I saw an article with only 99 items on the watcher node recommended is it ok... as the MS engineer had planed for 700-750 for us during his installation mission.
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
May 5th, 2011 7:25pm
on the second event involved in this darkness (Event ID 4506) there is only one MP which seems to be involved:
Cross-Platform MP
Data was dropped due to too much outstanding data in rule "Microsoft.Linux.SLES.11.LogicalDisk.FreeMegabytes.Collection"
running for instance "/opt/IBM" with id:"{925DF086-DC2F-C099-8AB4-0577428FB6AB}" in management group "SCOM-MED".
Data was dropped due to too much outstanding data in rule "Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond.Collection"
running for instance "/" with id:"{72553262-DE70-8DBC-202D-9292ED9679DB}" in management group "SCOM-MED".
all events have the prefix Microsoft.Linux.SLES.11.LogicalDisk....
or Microsoft.Linux.RHEL.5.LogicalDisk....
Still checking ...
Thanks,
Dominique
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
Free Windows Admin Tool Kit Click here and download it now
May 6th, 2011 4:13pm
Hi Dom,
The cross plat monitoring is also a bigger hit on the management server than for instance windows agents. Consider moving that to another management server and not the RMS if there are more than a few crossplat agents. What is the number of crossplat hosts
you are monitoring?
Pinging a thousand boxes with the RMS is also one of the possible bottlenecks. As it is the RMS doing all the work. You might consider not pinging the servers you are already monitoring with scom agents to start with. Next you could consider increasing the
sample time somehow (for example in stead of every 2 minutes move it up to 3 or 4 minutes between samples). And of course also have another management server do most of this work and not the RMS.Bob Cornelissen - BICTT (My BICTT Blog)
May 7th, 2011 5:49am
Hi Dom,
The cross plat monitoring is also a bigger hit on the management server than for instance windows agents. Consider moving that to another management server and not the RMS if there are more than a few crossplat agents. What is the number of crossplat hosts
you are monitoring?
Pinging a thousand boxes with the RMS is also one of the possible bottlenecks. As it is the RMS doing all the work. You might consider not pinging the servers you are already monitoring with scom agents to start with. Next you could consider increasing the
sample time somehow (for example in stead of every 2 minutes move it up to 3 or 4 minutes between samples). And of course also have another management server do most of this work and not the RMS.
Bob Cornelissen - BICTT (My BICTT Blog)
Hi Bob,
1. Cross-Platform Host through their MP: 100
Cross-Platform Host through Multi-Host Ping MP: 250
2. Yes correct I discover this and I am trying to move the watcher nodes to other servers dedicated to this but so far they do not pop up on the Ping Watcher Role view even the registries have been updated properly on each watcher node.
I need to ping the servers as our environment do not "believe" in "heartbeat failure" and "failed to connect to computer" previously used so I had to backuped up to a Simple MP done through the Muti-Host Ping MP 3.0 from System Central ...
The set on the ping is 300 so I think this means ms so it is already 5 minutes...
I have also several Management Server and it seems to be them not supporting the load as even the RMS is node watcher it is still up and "pinging"... the individual MSs are grayed out and not working anymore... I am trying adding several severs as Watcher
Nodes but I might need to add the Management Server function as well a it does not seems to work for now... I have another thread opened.
http://social.technet.microsoft.com/Forums/en-US/operationsmanagergeneral/thread/4665376b-e98c-47e6-b04b-ea43ec0cbb44
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
Free Windows Admin Tool Kit Click here and download it now
May 7th, 2011 6:21pm
Hi Dom,
Unfortunate that they dont believe in the scom monitoring for heartbeats and failed to connect, which is actually going further than just a ping (ping is the very last thing to stop on a machine and the first to start). But anyway, if you can not convince
them of that you might have double monitoring (by using ping mp). I think that even a small change in interval will do a lot. Perhaps go to 360 for the ping (and yes the value you see is seconds). For the ping mp I have seen that you have other thread open,
so I am sure you will get through that one in there.
I would advise to bring the cross plat monitoring to a dedicated management server, because this has a high number of workflows running. Remember to distribute the runas accounts to that management server as well, otherwise it will not monitor those boxes.Bob Cornelissen - BICTT (My BICTT Blog)
May 8th, 2011 5:11am
Hi,
Based on my research, I would like to suggest the following:
1.
Clear the HealthService queue on the problematic server:
1)
Stop System Center Management service.
2)
Go to C:\Program Files\System Center Operations Manager 2007\, and rename the “Health Service State” folder.
3)
Restart System Center Management service.
2.
Check
Antivirus exclusions settings:
Antivirus exclusions for Operations Manager 2007
http://blogs.msdn.com/b/nickmac/archive/2008/07/18/antivirus-exclusions-for-operations-manager-2007.aspx
Antivirus Exclusions for MOM and OpsMgr
http://blogs.technet.com/b/kevinholman/archive/2007/12/12/antivirus-exclusions-for-mom-and-opsmgr.aspx
3.
Please also try the methods in the following post:
SCOM: How to troubleshoot gray agent states in System Center Operations Manager 2007
http://blogs.technet.com/b/schadinio/archive/2010/07/20/scom-how-to-troubleshoot-gray-agent-states-in-system-center-operations-manager-2007.aspx
Meanwhile, I would like to share the following with you for your reference:
The new and improved guide on HealthService Restarts. Aka – agents bouncing their own HealthService
http://blogs.technet.com/b/kevinholman/archive/2009/12/21/the-new-and-improved-guide-on-healthservice-restarts-aka-agents-bouncing-their-own-healthservice.aspx
Hope this helps.
Thanks.
Nicholas Li - MSFT
Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
May 9th, 2011 12:12am
Hi,
Based on my research, I would like to suggest the following:
1.
Clear the HealthService queue on the problematic server:
1)
Stop System Center Management service.
2)
Go to C:\Program Files\System Center Operations Manager 2007\, and rename the “Health Service State” folder.
3)
Restart System Center Management service.
2.
Check
Antivirus exclusions settings:
Antivirus exclusions for Operations Manager 2007
http://blogs.msdn.com/b/nickmac/archive/2008/07/18/antivirus-exclusions-for-operations-manager-2007.aspx
Antivirus Exclusions for MOM and OpsMgr
http://blogs.technet.com/b/kevinholman/archive/2007/12/12/antivirus-exclusions-for-mom-and-opsmgr.aspx
3.
Please also try the methods in the following post:
SCOM: How to troubleshoot gray agent states in System Center Operations Manager 2007
http://blogs.technet.com/b/schadinio/archive/2010/07/20/scom-how-to-troubleshoot-gray-agent-states-in-system-center-operations-manager-2007.aspx
Meanwhile, I would like to share the following with you for your reference:
The new and improved guide on HealthService Restarts. Aka – agents bouncing their own HealthService
http://blogs.technet.com/b/kevinholman/archive/2009/12/21/the-new-and-improved-guide-on-healthservice-restarts-aka-agents-bouncing-their-own-healthservice.aspx
Hope this helps.
Thanks.
Nicholas Li - MSFT
Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
May 9th, 2011 12:12am
Hi Dom,
Unfortunate that they dont believe in the scom monitoring for heartbeats and failed to connect, which is actually going further than just a ping (ping is the very last thing to stop on a machine and the first to start). But anyway, if you can not convince
them of that you might have double monitoring (by using ping mp). I think that even a small change in interval will do a lot. Perhaps go to 360 for the ping (and yes the value you see is seconds). For the ping mp I have seen that you have other thread open,
so I am sure you will get through that one in there.
I would advise to bring the cross plat monitoring to a dedicated management server, because this has a high number of workflows running. Remember to distribute the runas accounts to that management server as well, otherwise it will not monitor those boxes.
Bob Cornelissen - BICTT (My BICTT Blog)
you last sentence will help a lot and I will work on this . Is a VM okay or ashould it be a physical box. For now I will evaluate the cross-Platform environment to 250 servers.
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
Free Windows Admin Tool Kit Click here and download it now
May 9th, 2011 1:32pm
Hi Dominique,
I am sure that offloading the cross plat monitoring to a different box will help a lot because of the number of workflows it needs to keep in the air for this. So how big is your environment? You state that your evalualtion will be for 250 crossplat boxes
or will this be the total amount?
Well all the usual discussions to take a physical or a virtual are true in this case as well. I have used virtuals to run a very high load of workflows on (for example monitoring vmware clusters with a
lot of objects). My preference for larger deployments will go towards physical. Simply because it is dedicated hardware, not shared with other virtuals. And a double quad core with a good amount of RAM is easier
available on physical in most cases. The sizing guide will tell you about how much you can load on one machine. I think it was 500 cross plat agents on one MS box. But dont forget that you might do basic monitoring and that you might go beyond that. For instance
also picking up additional hardware things through SNMP (also a lot of workflows there!), or you might want to monitor a lot of custom processes or log file entries, or you might want to run third party mp's (novell, bridgeways and others). All of this increases
the load in the end and it is really testing by putting load on the machine and seeing how much it can take in your specific case. If you add a lot of additional stuff you might want to place less than 500 agents on one MS. Just do this in increments of 30
at a time for instance and watch the counters on the MS.Bob Cornelissen - BICTT (My BICTT Blog)
May 9th, 2011 1:54pm
Hi Bob,
- cross plat monitoring : 250 Servers
- Total environment 1,000 Servers
VM: because it is easy for me to get it deployed but it will be
- 4 Gb RAM
- 2 CPU Dual 3.00 Ghz
- Network Speed: 1 Gb/s
- Mircrosoft Windows Server 2008 R2 Enterprise 64-bits
- Drive 50 GB
So I would prefer too a physical machine but our process will be longer...
Let me check for the sizing guide as I have
http://aspoc.net/archives/2007/10/16/opsmgr-2007-database-and-data-warehouse-size-calculator/
http://blogs.technet.com/b/momteam/archive/2007/10/15/opsmgr-2007-database-and-data-warehouse-size-calculator.aspx
or the huge one...
http://www.microsoft.com/downloads/en/confirmation.aspx?FamilyID=B0E059E9-9F19-47B9-8B01-E864AEBF210C
Not sure if there is a step for only the Cross Platform MP I will check the web site for tCross Platform itself...
I think it is scenario 3 which might fit the specifc server but it will be only for Unix or Linux and only a Management Server (No database)
Role: Management Server
Hardware:
• 2 disk RAID 1
• 4 GB RAM
• Dual Proc
I have so far less than 500 Linux/Unix/ etc... platforms ... so hopefully a VM could handle this load... or I will have the need of more memory as for Management Server it seems to be the only parameter changing between 3-4-5-6 (from 4 GB RAM to 8 GB RAM)
this should make me able to handle between 500 and 1000 serevrs for this management Server except the load or workflow... or maybe refurbish an existing server decomissionned... from its original purpose...
Yes definetely I will need more than the basic monitoring.
SNMP will be also on the list sooner or later correct.
Performance: processor, memory, etc... nworks...
I will follow the 30 additional at once only and see how it works...
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
Free Windows Admin Tool Kit Click here and download it now
May 9th, 2011 5:29pm
Hi Bob,
As I have two threads which could be definetely link
http://social.technet.microsoft.com/Forums/en-US/operationsmanagergeneral/thread/4665376b-e98c-47e6-b04b-ea43ec0cbb44
Role: Management Server
Hardware:
• 2 disk RAID 1
• 8 GB RAM
• Dual Proc
I might go to physical for the Cross-platform and remain on VM for the Ping MP.
Let me know your feelings?
Thanks,
DomSystem Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
May 9th, 2011 5:52pm
Hi,
Based on my research, I would like to suggest the following:
1.
Clear the HealthService queue on the problematic server:
1)
Stop System Center Management service.
2)
Go to C:\Program Files\System Center Operations Manager 2007\, and rename the “Health Service State” folder.
3)
Restart System Center Management service.
2.
Check
Antivirus exclusions settings:
Antivirus exclusions for Operations Manager 2007
http://blogs.msdn.com/b/nickmac/archive/2008/07/18/antivirus-exclusions-for-operations-manager-2007.aspx
Antivirus Exclusions for MOM and OpsMgr
http://blogs.technet.com/b/kevinholman/archive/2007/12/12/antivirus-exclusions-for-mom-and-opsmgr.aspx
3.
Please also try the methods in the following post:
SCOM: How to troubleshoot gray agent states in System Center Operations Manager 2007
http://blogs.technet.com/b/schadinio/archive/2010/07/20/scom-how-to-troubleshoot-gray-agent-states-in-system-center-operations-manager-2007.aspx
Meanwhile, I would like to share the following with you for your reference:
The new and improved guide on HealthService Restarts. Aka – agents bouncing their own HealthService
http://blogs.technet.com/b/kevinholman/archive/2009/12/21/the-new-and-improved-guide-on-healthservice-restarts-aka-agents-bouncing-their-own-healthservice.aspx
Hope this helps.
Thanks.
Nicholas Li - MSFT
Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Hello Nicholas,
1. it is already a current practice used several times but does not seem working on this case.
2. I place all these exclusions and see how it affects the issue.
3. I read this KB but it does not help really as all the steps have been and seems working for a while and then failing again.
the last item from Kevin is set now... waiting for the alerts...
My threshold are already set as defined by the aricles for the value...
Monitor: Health Service Handle Count Threshold
Class
Management Server 10000
Management Server Agent 10000
Exchange 2007 Computer Group 5000
Group
Management Server Computer Group 30000
So no problem with the overrides.
waiting for alerts on:
Health Service Handle Count
Health Service Private Bytes
Monitoring Host Handle Count
Monitoring Host Private Bytes
After 5 minutes two management servers came healthy...
Then one of them became gray again after 30 minutes but I did not get any alert on the console....!!!
the Ping MP has been removed, no alert, no email, no member in the PingTarget and the PingWatcher Role.
back to the 2115 and 4506
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
Free Windows Admin Tool Kit Click here and download it now
May 9th, 2011 6:51pm
Yes I think you can do that. Also for the ping mp you could start at a lower number and increase the numbers if the server can handle it.Bob Cornelissen - BICTT (My BICTT Blog)
May 10th, 2011 1:47am
If you monitor from a VM, I would go for steps at least. If the vm host does not run a lot of intentsive other virtual guests than it is do-able. Monitoring vmware also has a big hit as it monitors a lot of objects when going to larger numbers of hosts
and guests. Nworks is the most scalable and stable solution for this and in my opinion well worth it (am a big fan for years). It is not just about memory. It will need more and more as you add more monitoring, but in the end it is about the number of workflows
and stuff it needs to do at any given time. AT some point it just can not handle it and it is not a given that this will be at 100% cpu of 100% memory usage. Keep an eye on the performance counters already in the moniotring -> opsmgr MP under the managemenet
packs folder.Bob Cornelissen - BICTT (My BICTT Blog)
Free Windows Admin Tool Kit Click here and download it now
May 10th, 2011 1:56am
Hello Bob,
Surpise this morning... The registry deleted yesterday to remove the Ping on the RMS are back... I will have to check what happened... I don't think there is any automatic process.. isn't it ?
Monitoring > Operations Manager > Management Server Performance > Workflow Count
RMS: 7.239 (Flat for days...)
MS1, MS2, DMS1: 0
Monitoring > Operations Manager > Management Server Performance > Active File Uploads (Average 70) Peak: 939!!!! RMS ONLY
Monitoring > Operations Manager > Management Server Performance > Console and Connection Count: 6 to 12 RMS ONLY
I don't see any counters for Memory and/or CPU in this folder!!!
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
May 10th, 2011 12:21pm
Hi Dom,
The memory and cpu counters belong to the Windows Server 200x management pack and can be found there (or at top level Computers, right-click the computer and select open performance view.
Next to the workflow count the Module count is also interesting.
What is interesting is that the management servers have a zero workflow count?
I dont know why the registry entries are back. Perhaps you have ghosts.Bob Cornelissen - BICTT (My BICTT Blog)
Free Windows Admin Tool Kit Click here and download it now
May 10th, 2011 12:45pm
Hi Bob,
Memory and CPU are empty in the Performance Vioews for all Servers, so i checked the Health Explorer and it is Healthy but there is no State Change Events at all on RMS and MSs...!!! and other Entity Health are Healthy as well but no State Change Events
whatsoever... starnge
The Avaibility has State Change Events filled up no problem...
The Monitors are Enabled.
Module Count: 29,000 +
MSs all at 0 but it might be because they are grayed out and do not report...!!! but also the one which is healthy !!
I don't seem to catch any performace on the MSs except the RMS itself...
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
May 10th, 2011 2:15pm
Hi Bob,
I am thinking for now to remove the Cross-Platform Management Pack for 1-2 days to see if the Management Server is able to come back?
Just delete it from Administration > Management Packs > Right Click delete on all MPs containing UNIX or Linux in their description. it is about 24 items... Any better way to do it?
Thanks,
DomSystem Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
Free Windows Admin Tool Kit Click here and download it now
May 11th, 2011 8:03pm
You could put the unix boxes in maintenance mode.Bob Cornelissen - BICTT (My BICTT Blog)
May 12th, 2011 3:08am
This will stop the workflows for RHEL...cross-platform MP... will it be sufficient to stop all traffic for UNIX/Linux machines?
I did it (Monitoring > Unix/Linus Servers )
I will restart the MS which is always grayed out to see how it behave now...
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
Free Windows Admin Tool Kit Click here and download it now
May 12th, 2011 4:30pm
This will stop the workflows for RHEL...cross-platform MP... will it be sufficient to stop all traffic for UNIX/Linux machines?
I did it (Monitoring > Unix/Linus Servers )
I will restart the MS which is always grayed out to see how it behave now...
I am opening anew thread as the server is grayed out again with all servers Unix/linux in Maintenance Mode.
http://social.technet.microsoft.com/Forums/en-US/operationsmanagergeneral/thread/d6e3eb4a-a3a2-444d-8971-811b90e77e5e
Thanks,
Dom
System Center Operations Manager 2007 / System Center Configuration Manager 2007 R2 / Forefront Client Security / Forefront Identity Manager
May 12th, 2011 4:30pm