Exchange DAG DR failover and manual failback - DR CASs are not happy.
We had an unexpected failover of our DAG to a secondary datacenter (we forgot to set the activation block on the DR DAG nodes), and once the situation was resolved in the primary datacenter we manually failed back the DAG back.
During the failover to our primary datacenter, the Exchange management pack in SCOM started generating all sorts of alerts regarding the primary site CASs not being able to acces ActiveSync and a few other web protocols which we didn't think anything of
at the time since it was an unexpected failover.
However after the manual failback (which was really just redistributing the database back to their primary owner), the secondary site CASs started generating all sorts of similar alerts about not being able to access those web protocls. To my knowledge prior
to this failover and failback, we have not had the secondary site CASs complain about the web protocols not working.
I chose to focus on one of the web protocols by selecting the ActiveSync errors, and ran the Test-ActiveSyncConnectivity command against one of the secondary site CASs, and this is what I got back:
RunspaceId : 278ef843-8eeb-4654-8dbf-81d1c4a812ec
LocalSite : DRSITE
SecureAccess : True
VirtualDirectoryName :
Url :
UrlType : Unknown
Port : 0
ConnectionType : Plaintext
ClientAccessServerShortName : DRCAS1
LocalSiteShortName : DRSITE
ClientAccessServer : DRCAS1.company.com
Scenario : Reset Credentials
ScenarioDescription : Reset automated credentials for the Client Access Probing Task user on Mailbox server PRODDN1.company.com.
PerformanceCounterName :
Result : Failure
Error : [Microsoft.Exchange.Monitoring.CasHealthStorageErrorException]: An error occurred while trying to access mailbox PRODDN1.company.com,
on behalf of user company.com\extest_e2048c50283a2
Additional information:
[Microsoft.Exchange.Data.Storage.WrongServerException]: The user and the mailbox
are in different Active Directory sites..
UserName : extest_e2048c50283a2
StartTime : 7/20/2012 3:13:00 PM
Latency : 00:00:00.0156001
EventType : Error
LatencyInMillisecondsString :
Identity :
IsValid : True
WARNING: No Client Access servers were tested.
I don't understand why the secondary site CASs are still giving this error when the manual failback was over 12 hours ago. Technically the user has existed in all AD sites for over 1/2 a year now.
I don't see anything unusual in the Application or System logs on the secondary site CAS I ran the command above on. Anyone have any ideas on how to make the secondary site CASs snap out of whatever delusion they are in?
July 20th, 2012 5:33pm
I think what that's telling you is that the secondary site CAS is in a different AD site from the mailbox server on which the mailbox is activated, which is expected and normal. I agree that the message isn't the best.Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
Free Windows Admin Tool Kit Click here and download it now
July 20th, 2012 5:59pm
Thanks for responding Ed.
The concern at the moment is we never had SCOM generate erros on these Test-Whatever CAS cmdlets for our secondary site CASs, so why now and how do we get it to stop?
For whatever reason I was under the impression that this was working fine before the failover and failback given the way it is acting now.
If you or anyone else has any ideas, I would sure appreciate hearing it.
July 20th, 2012 9:58pm
You could configure an override in SCOM to ignore the message.Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
Free Windows Admin Tool Kit Click here and download it now
July 21st, 2012 3:16am
We thought about that, but we would have to target the specific CASs in the secondary datacenter each with their own individual with the SCOM override, as we wouldn't want the monitor stopped on the CASs in the primary datacenter, and we would prefer not
to do that as it seems to be too much of a one off for us when this was never an issue for us in the past.
I.E. We are trying to find out why now all of a sudden after a DAG failover and a failback that the secondary datacenter CASs are just now raising these alarms through SCOM.
July 21st, 2012 7:42am