I have a two node (node1 & node2) exchange 2013 DAG with FSW quorum on a 'non' exchange server (server), all within the same site. All databases are mounted on the first node which is working with database copies on the second node which is 'down'.
A few times I have experienced, one of the DAG node goes 'offline' in failover cluster manager, in the past it has resolved itself, or I have taken various steps to resolve it. This time however it will not come online again and I cannot find any method to fix this.
Windows event errors include (repeatedly);
1564
File share witness resource 'File Share Witness (\\server\EX15DAG.domain.net)' failed to arbitrate for the file share '\\server\EX15DAG.domain.net'. Please ensure that file share '\\server\EX15DAG.domain.net' exists and is accessible by the cluster.
1573
Node 'Node2' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.
C:\Windows\system32>cluster node (from the 'working node')
Listing status for all available nodes:
Node Node ID Status
-------------- ------- ---------------------
NODE1 1 Up
NODE2 2 Down
C:\Windows\system32>cluster node (from the not working node)
Listing status for all available nodes:
Node Node ID Status
-------------- ------- ---------------------
NODE1 1 Down
NODE2 2 Joining
The above shows me two different results depending on which exchange node I run it from.. interesting...
My DC and exchange server's time are all in sync.
I can access the FWS, and confirm the share is present, and being updated by the working node. The share has the Trusted Subsystem full control permission on the share and ntfs security.
Internally we do not use a firewall, and can confirm there are not any firewall issues.
I do not have any Antivirus running, so nothing is being blocked or interfered with here.
I can ping all hosts involved from all machines (both nodes, fws, dc, dag dns name, everything).
I have restarted the failed node, and also the working node.
It was all fine until an unexpected host/VM failure and restart.
I figure I can remove all the database copies from the failed node, then evict it from the cluster, and start it again, but if I can just get it joined again properly I would much prefer that.