Exchange 2013 removed automatically from DAG on VM Ware
Hi All ,

Good Day!!!.

We are facing an very hectic problem with Exchange 2013 with VM Ware

We have 2 Exchange server 2013 which hold all roles and for witness share we have another 3rd server.also we have 2 edge exchange 2010 which resides on DMZ zone.
All servers are running on Windows Server 2012 R2 std.
VM Ware ESX I 5.5 Update 1  running on 2 Hosts

EXCH-01
EXCH-02
EDGE-01
EDGE-02
DAGFS
DC-01
DC-02

All server are equally shared in both ESXI Server like

Host1 : DC-01 / EXCH-01 / EDGE-01 / DAGFS
Host1 : DC-02 / EXCH-02 / EDGE-02

Coming to the issue : Any one of the exchange server is leave from DAG Automatically after subsequent packet drops of Replication / Production Network validation .

Ethernet Card is VMNET3 for all VM's in VM Ware

Now one of the exchange server shows down in failover cluster manager ,  but other services are running on same server like MAPI / IMAP / POP3.

We have change the file share witness server as well after got event in failover cluster but still the event is generating frequently.

Attached sheet of events

Troubleshoot action taken :
   changed all server network adaptor from E1000a to VMNET3
   DAG file share witness server changed
   Removed one node from DAG and re-joined

Microsoft help on this very much thank full.
July 6th, 2015 10:17am

Are you sure the virtual networks are set up correctly? Can you ping each Exchange server from the other Exchange server using the IP of the replication network? Since everything is virtual, I'd ensure the network side of your virtual infrastructure is correct (in regards to Exchange).
Free Windows Admin Tool Kit Click here and download it now
July 6th, 2015 10:27am

What OS are you running Exchange on? When this happens are you running backups?


July 6th, 2015 11:15am

Any issue in Validate a Configuration wizard of Failover Clustering?
Free Windows Admin Tool Kit Click here and download it now
July 6th, 2015 6:23pm

Hi kesa,

Thank you for your question.

By my understanding, this issue was caused by the network issue, please contact network administrator to make sure network connectivity is well.

Notice: check the heartbeats work correctly and network stability.

Did you install NLB and DAG?

NLB and DAG are not both configured on all-in-one.

Did you have configure cluster IP address and if this IP was conflicted to other?

If there are any questions regarding this issue, please be free to let me know.

Best Regard,

July 6th, 2015 11:05pm

Hi All,

Good Day!!!,

Thanks for your faster response first.

Answers are below:

Are you sure the virtual networks are set up correctly?

             --Yes , we have 4 Physical nic in ESXI, [2-for Exchange    Production network with high availability   mode, 1-for dedicated DAG replication, 1 for DMZ EDGE]

Can you ping each Exchange server from the other Exchange server using the IP of the replication network?

              --Yes All Exchange servers can able to ping seamlessly

What OS are you running Exchange on?

              --All Server are Windows Server 2012 R2 DC

When this happens are you running backups?

               --No Issues while taking backup

Any issue in Validate a Configuration wizard of Fail over Clustering?

               --Yes, we got Network -Warning  on  Validate Multiple Sub net Properties

            Message is : The HostRecordTTL property for network name 'Name: DAG' is set to 300 ( 5 minutes). For local clusters the suggested value is 1200 (20 minutes).

check the heartbeats work correctly and network stability.

            --  When we check heart beat network ping continuously is fine, but when we check Windows/failover cluster event log : Cluster has missed two consecutive heartbeats for the local endpoint 172.19.100.4:~3343~ connected to remote endpoint 172.19.100.3:~3343~.

 

Did you install NLB and DAG?

            --Only DAG is configured, there is no NLB, we used for that DNS round robin method for Client access.

 

NLB and DAG are not both configured on all-in-one.

             --yah it true only but that is not in our cause

 

Did you have configure cluster IP address and if this IP was conflicted to other?

            -- Yah we are got for this IP conflict message long back (4 months) after that now did not see at all in fail over cluster event

thanks for solution in advance.

Free Windows Admin Tool Kit Click here and download it now
July 7th, 2015 2:55am

Hi kesa,

Did the issue solve?

Could you tell us more details which is about Exchange 2013 removed automatically from DAG.

Best Regard,

Jim

July 7th, 2015 3:38am

Hi Jim,

thanks for your reply.

Actually the second node is not removed, but it shows Down in DAG as well as fail over cluster manager console.

This is was happen previously also but it will come up after some times when the heart beat or production network handshake is happen using UDP port 3343, this we found in fail over cluster event in application log .

Now the scenario is very worst UDP communication is not happening , node 2 is shown in down very long days and every day the sever getting rebooted as well. 

thanks in advance.

Free Windows Admin Tool Kit Click here and download it now
July 7th, 2015 4:36am

Are there any firewalls between these two servers?
July 7th, 2015 11:13am

Hi ,

No firewall between server..it in same data center location.

thanks

Free Windows Admin Tool Kit Click here and download it now
July 7th, 2015 11:16am

Hi ,

No firewall between server..it in same data center location.

July 7th, 2015 11:43am

Hi kesa,

We could refer to the following link to check if the issue persist:

https://technet.microsoft.com/en-us/library/cc773498%28v=ws.10%29.aspx

If there are any questions regarding this issue, please be free to let me know.

Best Regard,

Jim

Free Windows Admin Tool Kit Click here and download it now
July 8th, 2015 1:06am

Hi Jim,

thanks for your post.

Can we re configure quorum of failvoer cluster if the cluster is managed by Exchange DAG.

Please confirm .

thank you

July 8th, 2015 2:31am

Hi kesa,

Yes, you could do that.

You could move FWS to another server to check if the issue persist.

In addition, is the link helpful which I supply?

If there are any questions regarding this issue, please be free to let me know.

Best Regard,

Jim

Free Windows Admin Tool Kit Click here and download it now
July 8th, 2015 2:52am

Hi Jim,

This activity of change FSW to Another server already done, but after that also we are getting the same event there is not change.

Please provide some other links for solutions..

thanks.

July 8th, 2015 3:05am

Hi kesa,

Will Exchange really down? Or just show service down in failover cluster manager? Exchange 01 or Exchange 02? Or just random Exchange server?

Tell us more details which is subsequent packet drops of Replication / Production Network validation .

By your sentence which is but other services are running on same server like MAPI / IMAP / POP3. Did you means the Exchange server is online in fact?

If there are any questions regarding this issue, please be free to let me know.

Best Regard,

Jim

Free Windows Admin Tool Kit Click here and download it now
July 8th, 2015 4:32am

Hi Jim,

Thanks for your reply!!!,

Will Exchange really down?

       --No, but it reboots in every day in certain time periods

 just show service down in failover cluster manager?

       --Yes

Exchange 01 or Exchange 02?

          --Only Exchange 2

 just random Exchange server?

      --No random

Tell us more details which is subsequent packet drops of Replication / Production Network validation.

      --Both Production & Replication -Adapter pocket gets drops

By your sentence which is but other services are running on same server like MAPI / IMAP / POP3. Did you means the Exchange server is online in fact?

      --Yes , Exchange is online only.

Thanks in Advance


July 8th, 2015 8:48am

Hi All,

It there any help from the below info, details are collected from Cluster\report folder

File Name: ValidateStorage.log 

m_FindFileOnSmbShare: EXIT: hr 0x80070035

CprepConnectToNewSmbShares3: ERROR: Failed calling FindFirstFile on share

NetFt has 0 existing routes

thanks

Free Windows Admin Tool Kit Click here and download it now
July 8th, 2015 10:43am

Hi kesa,

We could do some basic checks on Exchange 02:

1. All the drivers are up to date

2. Servers are properly patched

3.Necessary exceptions are made in Antivirus for mailbox server

If there are any questions regarding this issue, please be free to let me know.

Best Regard,

Jim

July 9th, 2015 4:43am

Hi Jim,

Thanks for your reply,

Find the answers:

1. All the drivers are up to date   

        ---All Drivers are  up to date 

2. Servers are properly patched

       --- All Patches are updated

3.Necessary exceptions are made in Antivirus for mailbox server

       ---There is no antivirus soaftware installed on both servers

FYI, there is ValidateStorage.log available in cluster\report folder on both exchange servers

   in this we can able to see some error lines

0000827c.00000794::2015/07/07-04:14:20.947  m_FindFileOnSmbShare: ERROR: Failed to open file enum for {\\[IPV6 address available here%14]\ClusterTestShare_{af797d7e-eb28-4da6-bb3f-3aa96bdef18d}\*}, error=80070035

0000827c.00000794::2015/07/07-04:14:20.949  m_FindFileOnSmbShare: EXIT: hr 0x80070035

0000827c.00000794::2015/07/07-04:14:20.951  CprepConnectToNewSmbShares3: ERROR: Failed calling FindFirstFile on share {\\[IPV6 address available here]}, adjusted pathname {\\[fe80::54b6:c3c3:eec8:a6ab%14]\ClusterTestShare_{af797d7e-eb28-4da6-bb3f-3aa96bdef18d}}, error=80070035.

0000827c.00000794::2015/07/07-04:14:20.953  m_FindFileOnSmbShare: ENTER

0000827c.00000794::2015/07/07-04:14:51.989  m_FindFileOnSmbShare: ERROR: Failed to open file enum for {\\169.254.2.206\ClusterTestShare_{af797d7e-eb28-4da6-bb3f-3aa96bdef18d}\*}, error=80070035

0000827c.00000794::2015/07/07-04:14:51.992  m_FindFileOnSmbShare: EXIT: hr 0x80070035

0000827c.00000794::2015/07/07-04:14:51.994  CprepConnectToNewSmbShares3: ERROR: Failed calling FindFirstFile on share {\\169.254.2.206}, adjusted pathname {\\169.254.2.206\ClusterTestShare_{af797d7e-eb28-4da6-bb3f-3aa96bdef18d}}, error=80070035.

0000827c.00000794::2015/07/07-04:14:51.996  m_FindFileOnSmbShare: ENTER

0000827c.00000794::2015/07/07-04:15:23.037  m_FindFileOnSmbShare: ERROR: Failed to open file enum for {\\169.254.166.171\ClusterTestShare_{af797d7e-eb28-4da6-bb3f-3aa96bdef18d}\*}, error=80070035

0000827c.00000794::2015/07/07-04:15:23.040  m_FindFileOnSmbShare: EXIT: hr 0x80070035

0000827c.00000794::2015/07/07-04:15:23.042  CprepConnectToNewSmbShares3: ERROR: Failed calling FindFirstFile on share {\\169.254.166.171}, adjusted pathname {\\169.254.166.171\ClusterTestShare_{af797d7e-eb28-4da6-bb3f-3aa96bdef18d}}, error=80070035.

0000827c.00000794::2015/07/07-04:15:23.044  CprepConnectToNewSmbShares3: EXIT: hr 0x80070035

0000827c.00000794::2015/07/07-04:15:23.052  CprepCreateNewSmbShares3 ENTER

If possible please clarify why some test is running on different IP address which is not available in our data center.

thanks in advance

Free Windows Admin Tool Kit Click here and download it now
July 9th, 2015 8:03am

Hi Andy David,

Good Day!!!,

Sorry for delay reply.

As per your link we have changed the belowe after check our setup

SameSubnetThreshold = 10   (from 5 )

RouteHistoryLength = 20

But it also not solved our issue.

Is there any other link we can check the issues.

thanks

July 9th, 2015 9:19am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics