Recently, I was doing some research for a book that I’m working on together with Exchange MVP’s Paul Cunningham and Steve Goodman. It involved recovering a “failed” server using the /m:recoverserver switch. The process itself is straightforward, but depending on what server role(s) you are recovering, you might have to perform some additional post-recovery steps.
In this particular case, I was recovering a Client Access Server (single role) which also happened to be the File Share Witness for one of my Database Availability Groups.
As such, you need to ‘reconfirm’ the recovered server as a File Share Witness. One way of doing so, is to run the following command:
[sourcecode language=”powershell”]Get-DatabaseAvailabilityGroup <DAG name> | Set-DatabaseAvailabilityGroup[/sourcecode]
However, upon executing the command, I was presented with the following error message:
Given that I didn’t use a System State Backup, I was surprised to read that a File Share Witness already existed.
The first thing I did was to check the restored server itself to see if the share existed. As expected, there was nothing to see.
By default, a DAG uses the File Share Witness + Node Majority Cluster Quorum model. This prevents you from removing the File Share Witness from the cluster because it is a critical resource. So, my next thought was to temporary ‘move’ the File Share Witness to another server and then move it back. First, I executed the following command to move the FSW to another server:
Get-DatabaseAvailabilityGroup <DAG name> | Set-DatabaseAvailabilityGroup –WitnessServer <server name>
The command completed successfully, after which I decided to move the FSW back to the recovered server using the following command:
[sourcecode language=”powershell”]Get-DatabaseAvailabilityGroup <DAG name> | Set-DatabaseAvailabilityGroup –WitnessServer <recovered server name>[/sourcecode]
I was surprised to see that the command failed with the same error message as before:
I then took a peak at the cluster resources and found the following:
It seemed there were now TWO File Share Witnesses for the same DAG, where the failed one is the one that used to live on the recovered server.
At this point, I decided to clear house and remove both resources. Before being able to do so, I had to switch the Quorum Model to “Node Majority Only”:
[sourcecode language=”powershell”]Set-ClusterQuorum –NodeMajority[/sourcecode]
I then re-ran the command to configure the recovered server as the File Share Witness:
[sourcecode language=”powershell”]Get-DatabaseAvailabilityGroup <DAG name>| Set-DatabaseAvailabilityGroup –WitnessServer <recovered server name>[/sourcecode]
Note: when configuring the File Share Witness, the cluster’s quorum model is automatically changed back into NodeAndFileShareMajorty
After this series of steps, everything was back to the way it was and working as it should. I decided to double-check with Microsoft whether they had seen this before. That’s also where I got the [unofficial] naming for this “issue”: ghost file share witness (Thanks to Tim McMichael). If you ever land in this situation, I suggest that you contact Microsoft Support to figure out how you got into that situation. From personal testing, however, I can tell that this behaviour seems consistent when recovering a File Share Witness using the /m:recoverserver switch.