SWITCHOVER DATACENTER (MANUAL FAILOVER) / 02 NODES - 02 SITES

Hello Everyone,

Hi,

I'm making contingency tests pass, and there has been a problem when trying to mount the databases in the contingency site. The current scenario is:

SiteA:

MBX01 : Mailbox Server with 04 active databases (DB01, DB02, DB03, DB04).

CAS01: Client Access Server + FSW

DC01 & DC02

SiteB:

MBX02 : Mailbox Server with 04 passive databases.

CAS02 : Client Access Server + AFSW

DC03

DAG:

Name: DAG01

Configuration: 02 Nodes - 02 Sites - Active/Passive

For the test I performed the following procedure (First, the Exchange servers from SiteA went out):

  1. Step 01: Validate the copy status // Get-MailboxDatabaseCopyStatus -Identity DB0* | select name, status, SelectcontentIndexState | sort status | ft -auto
  2. Step 02: Validate servers status // Get-DatabaseAvailabilityGroup DAG01 | FL Name, StoppedMailboxServers, StartedMailboxServers
  3. Step 03: Mark the primary site DAG members are in failed state (stop) // Stop-DatabaseAvailabilityGroup DAG01 -ActiveDirectorySite SiteA -ConfigurationOnly
  4. Step 04: Validate servers status again.
  5. Step 05: Stop the cluster service on each DAG member server in SiteB // Stop-Service ClusSvc
  6. Step 06: Restore DAG in SiteB // Restore-DatabaseAvailabilityGroup DAG01 -ActiveDirectorySite SiteB

Then databases should be mounted on the MBX02 server (cuz activation mode is unrestricted), but these are not mounted. I tried to mount the database using the following command: Move-ActiveMailboxDatabase DB03 -ActiveOnServer MBX02 -MountDialOverride:None, without success. 

Displays the following error: Active Manager isn't reachable on server MBX01.domain.local. The Microsoft Exchange Replication service might not be running.Error Error 0x6ba <The RPC server is unavailable> from cli_GetPrimaryActiveManager.

I hope you can help me.

Thanks.

June 4th, 2015 4:01am

Hi

We normally use Stop-DatabaseAvailabilityGroup  -ConfigurationOnly when server is not accessible or offline.

Add this at the last as per the article with similar issue

Get-MailboxDatabase | Resume-MailboxDatabaseCopy -Confirm $False

Where(Site\Server) are you running the above commands.?

Are you actually taking out the servers, shutting it down or disconencting from network?

Hope during the test you still have connectivity to the FSW, because if quorum is not met database will not mount.

for 2 server+1FSW windows server 2012 dynamic quorum doesn't apply.

You might want to have a look in the below article to have a better understanding, however for multi-site scenario things might change a bit.

How does Dynamic Quorum work for a two Node DAG

Review the configuration of the quorum by typing:

cluster /quorum

You might want to try forcing it its not there.

net start clussvc /forcequorum

Find the PrimaryActiveManager location:

Get-DatabaseAvailabilityGroup -Status | fl Primary*

If its still located in the failed server, use the below cmdlets to move it out.

Move-ClusterGroup -Name "Cluster Group" -Node MBX-2 

Detailed article for troubleshooting:

Exchange 2010 / 2013 PAM and the Cluster Core Resources

http://blogs.technet.com/b/timmcmic/archive/2014/08/04/exchange-2010-2013-pam-and-the-cluster-core-resources.aspx

NOTE:- Make sure you are in 'Run As Administrator' EMS.

Free Windows Admin Tool Kit Click here and download it now
June 4th, 2015 8:45am

Hi Fed,

Are you following Datacenter Switchovers article for this?

https://technet.microsoft.com/en-us/library/dd351049(v=exchg.150).aspx

June 4th, 2015 8:51am

Hi Satyajit,

Thanks for your answer, i proceed to answer your question:

To simulate the fall of the Exchange servers, i come off the Exchange virtuales servers (MBX01 and CAS01). Since the servers are turned off, the FSW and PAM is lost, cuz they are located in the CAS01 and MBX01 respectively.

Cuz Exchange servers SiteA are off, the powershell commands executed from the server MBX02 located at SiteB.

I'm following this article: https://technet.microsoft.com/en-us/library/dd351049%28v=exchg.150%29.aspx?f=255&MSPPError=-2147217396, indicating that when the mailbox server is unavailable but the DC is operational (Which is my case), you must use the Stop-DatabaseAvailabilityGroup command with the parameter -ConfigurationOnly...or am i wrong?

When i stop the cluster Service in the MBX02 server and run the Restore-DatabaseAvailabilityGroup, it's assumed that the DAG is restored in the SiteB forcing use AFSW (located on the CAS02 server) which allows the mounted databases...

Then i always execute what you recommend in your answer? o after the above ther anything else i should do?

NOTE: On Monday i will be making the relevant tests.

Thanks!

Fed


Free Windows Admin Tool Kit Click here and download it now
June 5th, 2015 11:49pm

Hi Satyajit,

Yes i'm following this article.

I also support the following blogs:

https://smtpport25.wordpress.com/2010/12/10/exchange-2010-dag-local-and-site-drfailover-and-fail-back/

https://rajisubramanian.wordpress.com/2014/03/02/exchange-2013-datacenter-failover-and-disaster-recovery/

Thanks.

Fed.

June 5th, 2015 11:54pm

Hi Satyajit,

Thanks for your answer, i proceed to answer your question:

To simulate the fall of the Exchange servers, i come off the Exchange virtuales servers (MBX01 and CAS01). Since the servers are turned off, the FSW and PAM is lost, cuz they are located in the CAS01 and MBX01 respectively.

Cuz Exchange servers SiteA are off, the powershell commands executed from the server MBX02 located at SiteB.

I'm following this article: https://technet.microsoft.com/en-us/library/dd351049%28v=exchg.150%29.aspx?f=255&MSPPError=-2147217396, indicating that when the mailbox server is unavailable but the DC is operational (Which is my case), you must use the Stop-DatabaseAvailabilityGroup command with the parameter -ConfigurationOnly...or am i wrong?

When i stop the cluster Service in the MBX02 server and run the Restore-DatabaseAvailabilityGroup, it's assumed that the DAG is restored in the SiteB forcing use AFSW (located on the CAS02 server) which allows the mounted databases...

Then i always execute what you recommend in your answer? o after the above ther anything else i should do?

NOTE: On Monday i will be making the relevant tests.

Thanks!

Fed


  • Edited by Fed Yunis Saturday, June 06, 2015 3:54 AM
Free Windows Admin Tool Kit Click here and download it now
June 6th, 2015 3:47am

Hi Satyajit,

Thanks for your answer, i proceed to answer your question:

To simulate the fall of the Exchange servers, i come off the Exchange virtuales servers (MBX01 and CAS01). Since the servers are turned off, the FSW and PAM is lost, cuz they are located in the CAS01 and MBX01 respectively.

Cuz Exchange servers SiteA are off, the powershell commands executed from the server MBX02 located at SiteB.

I'm following this article: https://technet.microsoft.com/en-us/library/dd351049%28v=exchg.150%29.aspx?f=255&MSPPError=-2147217396, indicating that when the mailbox server is unavailable but the DC is operational (Which is my case), you must use the Stop-DatabaseAvailabilityGroup command with the parameter -ConfigurationOnly...or am i wrong?

When i stop the cluster Service in the MBX02 server and run the Restore-DatabaseAvailabilityGroup, it's assumed that the DAG is restored in the SiteB forcing use AFSW (located on the CAS02 server) which allows the mounted databases...

Then i always execute what you recommend in your answer? o after the above ther anything else i should do?

NOTE: On Monday i will be making the relevant tests.

Thanks!

Fed


  • Edited by Fed Yunis Saturday, June 06, 2015 3:54 AM
June 6th, 2015 3:47am

Hi Fed,

You are correct on all your points.

The Restore-DatabaseAvailabilityGroup cmdlet performs several operations that affect the structure and membership of the DAG's cluster. This task does the following:

  • Forcibly evicts the servers listed on the StoppedMailboxServers list from the DAG's cluster, thereby reestablishing quorum for the cluster enabling the surviving DAG members to start and provide service.

  • Configures the DAG to use the alternate witness server if there is an even number of surviving DAG members, or a single surviving DAG member.

Just to confirm run these cmdlets to see where we are, has the switchover and cluster values are set\restored as expected or not.

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,servers,*witness*,operationalservers,primaryactivemanager

Basically below fields should quickly tell you if secondary site servers have taken over or not.

PrimaryActiveManager      : CAS-2
WitnessShareInUse         : Alternate

Use the below to find the QuorumType and the SharePath using FailoverClusters modules:

Get-ClusterQuorum -Cluster DAG01.domain.com | fl

Get-ClusterResource "File Share Witness" -Cluster DAG01.domain.com | Get-ClusterParameter

Once all set, use my earlier mentioned article to verify you are getting values similar to mine.

Then you can use my earlier post to force quorum incase its not there. This part we can discuss once you have the Get- cmdlets results.

Things to keep in mind:(You might already have,but still worth mentioning)

Exchange 2010 High Availability and Site Resilience Misconceptions
http://blogs.technet.com/b/exchange/archive/2011/05/31/exchange-2010-high-availability-misconceptions-addressed.aspx

Configure database availability group properties
https://technet.microsoft.com/en-us/library/dd297985.aspx#UseShell

References:

Verifying the file share witness server / directory in use for Exchange 2010
http://blogs.technet.com/b/timmcmic/archive/2012/03/12/verifying-the-file-share-witness-server-directory-in-use-for-exchange-2010.aspx

Free Windows Admin Tool Kit Click here and download it now
June 6th, 2015 11:02pm

Hi Satyajit,

Thanks for the summary y validation.

For issues of availability, the tests will be performed tonight

I reported the results.

Regards.

Fed

June 10th, 2015 10:46am

Datacenter Switch Over

  •     Verify the Started Server and Stopped servers in the DAG 

Get-DatabaseAvailabilityGroup <DAGName>  -Status | FL Name, *Servers

  •          Use the Stop-DatabaseAvailabilityGroup to mark the primary site DAG members are in failed state. 

Stop-DatabaseAvailabilityGroup Identity <DAGName> -ActiveDirectorySite PrimarySite 

  •          Verify the Started Server and Stopped servers in the DAG 

Get-DatabaseAvailabilityGroup <DAGName>  -Status | FL Name, *Servers

  •          Stop the cluster service in all the passive node of the secondary site 

Stop-service clussvc 

  •          Use the Restore-DatabaseAvailablityGroup to remove the stoppedmailbox server from the DAG and re-establish the quorum using the alternate Witness server 

Restore-DatabaseAvailabilityGroup <DAGName> -Activedirectorysite DR

Verify Database copy status : If the status failed and disconnected needs to force mount in this case may data loss occurs.

Get-MailboxDatabaseCopyStatus | select name, status | sort Status | ft -auto

Get-MailboxDatabaseCopyStatus * -Active | Select Name,Status,MailboxServer,ActivationPreference,ContentIndexState

  •          Force mount database in DR server

Move-ActiveMailboxDatabase <Database Nme> -ActivateOnServer <DR Server Name> -MountDialOverride BestEffort -SkipLagChecks -SkipClientExperienceChecks SkipActiveCopyChecks

  •          Under healthy and disconnected status mount with below command

Move-ActiveMailboxDatabase <Database Nme>  -ActivateOnServer <DR Server Name>  -MountDialOverride:None

FAILBACK

  •          When the service or power is restored in the Primary site is up  Stop the cluster service in all the node of the Primary site 

Stop-service clussvc 

  •     run Start-DatabaseAvailabilityGroup to revert the datacenter 

Start-DatabaseAvailabilityGroup <DAGName> -ActiveDirectorySite ProductionSite

  •          Verify Dag status

Get-DatabaseAvailabilityGroup <DAGName>  -Status | FL Name, *Servers

If any node not up (Stop & disable) cluster service from Services. MSc and Start-DatabaseAvailabilitygroup one by one

Start-DatabaseAvailabilitygroup -Identity <DAGNAME> -mailboxServer <ServerName>

  •     Check out the Quorum model

Get-ClusterQuorum | fl

 

  •     Still if its show the older quorum model execute the below powershell cmdlet 

DatabaseAvailabilityGroup -Identity DAG01

Free Windows Admin Tool Kit Click here and download it now
June 11th, 2015 9:59am

Hi Satyajit

Sorry for delay in my response.
I followed all your recomendations, and all the tests were successfull.

Thanks for your time and Support.

Regards,

Fed

June 19th, 2015 7:54pm

Hi Fed,

Glad to hear that. Thanks for coming back and updating us.

Free Windows Admin Tool Kit Click here and download it now
June 22nd, 2015 12:22am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics