Hi there,
I've been banging my head against for a while now. We have a "IP less" Exchange 2013 SP1 DAG that's been going great for over a year now. I've noticed that randomly all the databases will failover at various times throughout the day. We've looked into recent changes and nothing seems to stand out.
Here is the results of our CollectOverMetrics.ps1. I can see the failover was automatic and caused by a "PeriodicAction". Any idea where I can dig into next to find this root cause?
- We've looked into the AV - disabled as a test
- We've looked at the backup agent in the server - disabled as a test
Any ideas or suggestions?
Update#1 - Noticed we had several Health check mailboxes that were corrupted similar to the post below. Went through steps to recreate them mentioned in the linked post.
Update#2 - I looked into the "IP less DAG" networking and noticed that it auto configured with replication $true on the ISCI network (105 network). I disabled replication on that network so it simply uses the "MapiDagNetwork"
instead. Tested failing over/back a db, seems to be ok so far. I somewhat inherited this with not much Exchange knowledge, trying my best to work through it. I will have to see if any more random failovers occur. I will update this thread with more info if
the issue is resolved or if another failover occurs.
Before
After
Update#3 - Arrived this morning to see the mailboxes tripped over to the other exchange server after hours. Looked through the event logs and this event stood out. Could two missed consecutive heartbeats be causing the the databases to failover?
- Edited by TSGzz Friday, August 28, 2015 2:24 PM