DR scenario suggestions (unique Exchange 2013 setup)

Hi there, I have a unique setup that I've been asked to create DR documentation for. The setup is very unique. Here is what I've brainstormed so far...

Environment

  • 1x Primary Exchange 2013 server located at HQ
  • 1x Secondary Exchange 2013 server located at DR site


DAG setup

Setup in a "IP less DAG", only two servers with all roles on each server

Failing over the databases mid day works fine, no disconnect from the outlook clients and mail continues to flow. In a real scenario where the HQ server no longer exists this is my brainstormed plan so far.

All virtual directories are currently..

  • owa.company.com\_____
  • owa.company.com resolves to the primary exchange server

Brainstormed scenario -  Exchange specific tasks in the event of a HQ failure

  • HQ server failed no longer functions
  • Update DNS to point owa.company.com to the secondary server
  • Databases should already be active on secondary server
  • The company already has firewall rules created and a MX records for ready for the DR site

I assume this is all that would need to be done? I know this is a very unique setup, and it could be done better. Unfortunately I don't have the pull to change this setup. Any suggestions on this scenario would greatly be appreciated.

Thank you!






  • Edited by TSGzz Wednesday, January 28, 2015 9:13 PM grammer
January 29th, 2015 12:11am

Hi

DAG works in the same way as in Exchange 2010 except that in IP Less DAG you don't have a CNO and don't need to set any IP Address for the quorum object. But yo do need to set (or leave Exchange set it for you) a Witness server and shared folder.

taking that in count, you should decide which site is your primary datacenter, I mean the one who will mount the databases automatically because it has the Majority of the nodes (the consolidated server and the File Share Witness) and which one would be your secondary/contingence/resilence site where you need to create a specific procedure to restore the service in case of failure (using the stop/start/restore-databaseavailabilitygroup cmdlets). Also take in care the use of DAC in a multi site configuration.

The IP Less DAG configuration does not solve the splitbrain syndrome.

Also in case that you have Azure, you can plan to put the witness in Azure (yes, now is supported :) ) so that can give you a more efficient way to support a site failure (more info at https://technet.microsoft.com/en-us/library/dn903504(v=exchg.150).aspx).

Free Windows Admin Tool Kit Click here and download it now
January 29th, 2015 11:06pm

Currently our File Share Witness is located on one of our servers in the Hyper V cluster at our primary site.

I am aware the setup is not ideal. I just wanted to make sure I had a somewhat clear idea on what happens if the Primary Exchange serve got blown away and we need to use the secondary.

So far its...

  • HQ server failed no longer functions
  • Update DNS to point owa.company.com to the secondary server
  • Databases should already be active on secondary server
  • The company already has firewall rules created and a MX records for ready for the DR site

Thank you for your suggestions. I will look into putting the FSW in Azure.

January 30th, 2015 1:00am

So, If your Witness is in your primary site, the databases will not mount automatically in the secondary site after a primary site failure.

Why? because in the secondary site you run out of enough votes. My explanation: if you have 2 servers and a FSW then you have 3 votes, if you loose your primary site then you loose 2 of 3 votes and you don't have enough to start the cluster service in the secondary site.

taking that in care you need to plan how to recover the secondary site to get your databases up and online. This is pretty easy in Exchange 2013 and Exchange 2010 but you have to take in care the following:

1. You have configured the DAG in DAC mode

2. You have created an alternative File Share Witness in the secondary datacenter.

After you are ok with that prerequisites the process to do a failover in a secondary datacenter without enough votes are:

1. Stop-DatabaseAvailabilityGroup DAGNAME -ActiveDirectorySite "LostADSite" -ConfigurationOnly

This needs to be done in the secondary site. If your AD servers and Exchange server are on line in the primary site (maybe for a connectivity issues than lead into a secondary site activation) you need to run this command in the primary site as well. 

Important: you can also use the servername switch if you don't have splitted the AD sites. You can't use the -ActiveDirectorySite Switch if you have a single AD Site crossing both physical sites.

2. I always recommend to turn of the Exchange servers in the dead site (primary site in this case). Including the server hosting the File Share Witness if is possible. But is a personal recommendation.

3. be sure that the cluster service in the surviving exchange server is stopped or stop it using the command Stop Service ClusSvc

4. Run Restore-DatabaseAvailabilityGroup DAGNAME -ActiveDirectorySite "SecondaryADSite"

Important: see point 1 for recommendations

5. In this moment your databases will be starting or already started. If not run again the restore-DatabaseAvailabilityGroup cmdlet or try to start manually the databases using the Mount-Database switch

Hope this helps.

The switchback include the start-DatabaseAvailabilityGroup cmdlet pointing to the primary ADSite or to the server in the primary site.

Also after your secondary site activation, your cluster service will be pointing to the alternative witness instead the primary (even when in Exchange seems to be well configured). To ensure that after recover the primary site everything will work fine the next time, you need to run the cmdlet Set-DatabaseAvailabilityGroup DAGNAME once again without any other switches. This will reconfigure the DAG settings to their initial state.

If you want to read more, this is a great blog to understand some concepts about HA

http://blogs.technet.com/b/exchange/archive/2011/05/31/exchange-2010-high-availability-misconceptions-addressed.aspx

And also you can use this PPT prepared by the Exchange team in your switchover or failover procedures

http://blogs.technet.com/b/exchange/archive/2012/10/19/exchange-2010-datacenter-switchover-troubleshooter-now-available.aspx

Free Windows Admin Tool Kit Click here and download it now
January 30th, 2015 7:33pm

Thank you for your detailed response. I will create a secondary FSW in our DR site and review the links you've provided.

Would DNS have to be updated since all virtual directorys are using owa.company.com\ _ _ _ _ ? Which currently resolves to the Primary Exchange server?

January 30th, 2015 7:40pm

Yes, a good recommendation is to set the internal and external DNS records for Exchange (OWA, AES, ECP and OA) to have a TTL of 15 minutes (some others recomend 5 minutes). So your clients will have a maximum of 15 minutes until they refresh the new IPs.

If your question points to the capability of Exchange 2013 CAS to be connected to the mailbox directly without doing proxying as a way to use Round Robin instead IP address change, is useful only when you have both CAS servers online.

I prefer the Load Balancer option, but if the $$ is not enough, the DNS IP change is the best method in this moment for your configuration.

have a nice one!

 

Free Windows Admin Tool Kit Click here and download it now
January 30th, 2015 8:16pm

I've gone through all the material you've linked. Looks like everything is now in order now

Again, thank you for your help.



  • Edited by TSGzz Friday, January 30, 2015 6:15 PM grammer
January 30th, 2015 8:52pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics