Li Zhen ...
First, I did not ask about DAG, if you just read the entire thread you could seen I am asking about CAS and EDGE roles ... specifically about DNS records related to high-availability scenario of two sites
Second, this is for you and Andy, I am quite sure you are out of technology, I am asking about Exchange Server 2013, just search for "High Availability and Site Resilience" in Exchange Online help, and see the section of
"Site resilience" you will find:
==================
Site resilience
<content xmlns="http://ddue.schemas.microsoft.com/authoring/2003/5">
Although Exchange 2013 continues to use DAGs and Windows Failover
Clustering for Mailbox server role high availability and site resilience, site
resilience isn't the same in Exchange 2013. Site resilience is much better in
Exchange 2013 because it has been simplified. The underlying architectural
changes that were made in Exchange 2013 have significant impact on the recovery
aspects of a site resilience configuration.
In Exchange 2010, mailbox (DAG) and client access (Client Access
server array) recovery were tied together. If you lost all of your Client Access
servers, the VIP for the array, or a significant portion of your DAG, you were
in a situation where you needed to do a datacenter switchover. This is a
well-documented and generally well-understood process, although it takes time to
perform, and requires human intervention to begin the process.
In Exchange 2013, if you lose your Client Access server array for
whatever reason (for example, the load balancer fails), you don't need to
perform a datacenter switchover. With the proper configuration, failover happens
at the client level and clients are automatically redirected to a second
datacenter that has operating Client Access servers, and those operating Client
Access servers proxy the communication back to the user's Mailbox server, which
remains unaffected by the outage (because you don't do a switchover). Instead of
working to recover service, the service recovers itself and you can focus on
fixing the core issue (for example, replacing the failed load balancer).
Furthermore, with the namespace simplification, consolidation of
server roles, de-coupling of Active Directory site server role requirements,
separation of Client Access server array and DAG recovery, and load balancing
changes, there are changes in Exchange 2013 that now enable both Client Access
server and DAG recovery to be separate and automatic across sites, thereby
providing datacenter failover scenarios, if you have three locations.
In Exchange 2010, you could deploy a DAG across two datacenters and
host the witness in a third datacenter and enable failover for the Mailbox
server role for either datacenter. But you didn't get failover for the solution
itself, because the namespace still needed to be manually changed for the
non-Mailbox server roles.
In Exchange 2013, the namespace doesn't need to move with the DAG.
Exchange leverages fault tolerance built into the namespace through multiple IP
addresses, load balancing (and if need be, the ability to take servers in and
out of service). Modern HTTP clients work with this redundancy automatically.
The HTTP stack can accept multiple IP addresses for a fully qualified domain
name (FQDN), and if the first IP address it tries fails hard (that is, it can't
connect), it will try the next IP address in the list. In a soft failure
(connection is lost after the session is established, perhaps due to an
intermittent failure in the service where, for example, a device is dropping
packets and needs to be taken out of service), the user might need to refresh
their browser.
This means the namespace is no longer a single point of failure as
it was in Exchange 2010. In Exchange 2010, perhaps the biggest single point of
failure in the messaging system is the FQDN that you give to users because it
tells the user where to go. In the Exchange 2010 paradigm, changing where that
FQDN goes isn't easy because you have to change DNS, and then handle DNS
latency, which in some parts of the world is challenging. And you have name
caches in browsers that are typically about 30 minutes or more that also have to
be handled.
One of the changes in Exchange 2013 is to enable clients to have
more than one place to go. Assuming the client has the ability to use more than
one place to go (almost all the client access protocols in Exchange 2013 are
HTTP based (examples include Outlook, Outlook Anywhere, EAS, EWS, OWA, and EAC),
and all supported HTTP clients have the ability to use multiple IP addresses),
thereby providing failover on the client side. You can configure DNS to hand
multiple IP addresses to a client during name resolution. The client asks for
mail.contoso.com and gets back two IP addresses, or four IP addresses, for
example. However many IP addresses the client gets back will be used reliably by
the client. This makes the client a lot better off because if one of the IP
addresses fails, the client has one or more other IP addresses to try to connect
to. If a client tries one and it fails, it waits about 20 seconds and then tries
the next one in the list. Thus, if you lose the VIP for the Client Access server
array, recovery for the clients happens automatically, and in about 21
seconds.
The benefits include the following:
- In Exchange 2010, if you lose the load balancer in your primary datacenter
and you don't have another one in that site, you had to do a datacenter
switchover. In Exchange 2013, if you lose the load balancer in your primary
site, you simply turn it off (or maybe turn off the VIP) and repair or replace
it. Clients that aren't already using the VIP in the secondary datacenter will
automatically fail over to the secondary VIP without any change of namespace,
and without any change in DNS. Not only does that mean you no longer have to
perform a switchover, but it also means that all of the time normally associated
with a datacenter switchover recovery isn't spent. In Exchange 2010, you had to
handle DNS latency (hence, the recommendation to set the Time to Live (TTL) to 5
minutes, and the introduction of the failback URL). In Exchange 2013, you don't
need to do that because you get fast failover (20 seconds) of the namespace
between VIPs (datacenters). - Because you can fail over the namespace between datacenters, all that's
needed to achieve a datacenter failover is a mechanism for failover of the
Mailbox server role across datacenters. To get automatic failover for the DAG,
you simply architect a solution where the DAG is evenly split between two
datacenters, and then place the witness server in a third location so that it
can be arbitrated by DAG members in either datacenter, regardless of the state
of the network between the datacenters that contain the DAG members. - In this scenario, the administrator's efforts are geared toward simply
fixing the problem, and not spent restoring service. You simply fix the thing
that failed; while service has been running and data integrity has been
maintained. The urgency and stress level you feel when fixing a broken device is
nothing like the urgency and stress you feel when you're working to restore
service. It's better for the end user, and less stressful for the
administrator.
You can allow failover to occur without having to perform
switchbacks (sometimes mistakenly referred to as failbacks). If you lose Client
Access servers in your primary datacenter and that results in a 20 second
interruption for clients, you might not even care about failing back. At this
point, your primary concern would be fixing the core issue (for example,
replacing the failed load balancer). After it's back online and functioning,
some clients will start using it, and other clients might remain operational
through the second datacenter.
Exchange 2013 also provides functionality that enables
administrators to deal with intermittent failures. An intermittent failure is
where, for example, the initial TCP connection can be made, but nothing happens
afterward. An intermittent failure requires some sort of extra administrative
action to be taken because it might be the result of a replacement device being
put into service. While this repair process is occurring, the device might be
powered on and accepting some requests, but not really ready to service clients
until the necessary configuration steps are performed. In this scenario, the
administrator can perform a namespace switchover by simply removing the VIP for
the device being replaced from DNS. Then during that service period, no clients
will be trying to connect to it. After the replacement process has completed,
the administrator can add the VIP back to DNS, and clients will eventually start
using it.
</content>
==================