Recently, I got asked to assist with a Hybrid Configuration Wizard which was failing with the following error message:
Updating hybrid configuration failed with error ’Subtask NeedsConfiguration execution failed: Configure Mail Flow Default Receive Connector cannot be found on server <server name>. at Microsoft.Exchange.Management.Hybrid.MailFlowTask.DoOnPremisesReceiveConnectorNeedConfiguration() at…
Although the message might not reveal much information at first sight, it does contain everything we need to start troubleshooting. Typically, I would suggest you go and have a look into the Hybrid Configuration Wizard log files (located in the logging\Update-HybridConfiguration folder), but the only thing you would find there is the exact same error message.
First, we know that the HCW is trying to configure the hybrid mail flow and that it failed trying to modify the default connector that’s in place. More specifically, it was trying to modify the receive connector on the server that’s specified in the error message.
In this particular case, it wasn’t even able to find the Default Receive Connector. However, when you run the Get-ReceiveConnector -Server <servername>, the receive connector does show up. How is this possible?
The Hybrid Configuration Wizard looks at more specifics than just the existence of the connector. In fact, it will check that the connector’s configuration is valid as well. As such, it will check the bindings on the connector and expect that both bindings for IPv4 and IPv6 are present. So to check whether your existing connector is valid, you should run the following command:
In this particular case, the IPv6 bindings were missing. This was caused because IPv6 was disabled on the server (which shouldn’t be!). Re-enabling IPv6 and then either manually adding the binding to the connector or re-creating the connector solved the issue.
The morale here is that you shouldn’t disable IPv6 on an Exchange 2013 box. Even more so, it’s not supported if you do. I’ve seen companies that still disable IPv6 by default; maybe a remainder from earlier times where disabling IPv6 would actually solve issues instead of creating them. However, times have changed and the IPv6 implementation in Windows is much better now…
As part of running through the “New Migration Batch”-wizard, the remote endpoint (the on-premises Exchange server) is tested for its availability. After running this step, the following error is displayed:
By itself, this error message does not reveal much information as to what might be causing the connection issues. In the background, the wizard actually leverages the “Test-MigrationServerAvailability” cmdlet. If you run this cmdlet yourself, you will get a lot more information:
In this particular case, you’ll see that the issue is caused by 501 response from the on-premises server. The question is of course: why? We recently moved a number of mailboxes and then we did not encounter the issue. The only thing that had changed between then and now is that we reconfigured our load balancers in front of Exchange to use Layer 7 instead of Layer 4. So that is why I shifted my attention to the load balancers.
While reproducing the error, I took a look at the “System Message File” log in the KEMP load balancer. This log can be found under Logging Options, System Log Files. Although I didn’t expect to see much here, I saw the following message which drew my attention:
After changing the value from its default (RFC Conformant), I could now successfully complete the wizard and start a hybrid mailbox move. So the “workaround” was found. But I was wondering, why does the Load Master *think* that the request coming from Microsoft is non-RFC compliant?
The first thing I did is ask Microsoft if they could clarify a bit on what was happening. I soon got a reply that – from Microsoft’s point of view – they were respecting the RFC documentation regarding the 100 (Continue) Status. No surprise here.
After reading the RFC specifications I decided to take some network traces to find out what was happening and maybe understand how the 501 response was triggered. The first trace I took, was one from the Load Master itself. In that trace, I could actually see the following:
Effectively, Office 365 was making a call to the Exchange Web Services and using the 100-continue status. As described per the RFC documentation, the Exchange on-premises server should now respond appropriately to the 100-continue status. Instead, we can see that in the entire SSL conversation, exactly 5 seconds go by after which Office 365 makes another call to the EWS virtual directory without having received a response to the 100-continue status. At the point, the KEMP Load Master generated the “501 Invalid Request”.
I turned back to the (by the way, excellent) support guys from KEMP and explained them my findings. Furthermore, when I tested without Layer 7 or even without a Load Master in between, there wasn’t a delay and everything was working as expected. So I knew for sure that the Exchange 2013 on-premises was actually replying correctly to the 100-continue status. As a matter of fact, without the KEMP LM in between, the entire ‘conversation’ between Office 365 and Exchange 2013 on-premises was perfectly following the RFC rules.
So, changing the 100-continue settings from “RFC Conformant” to “Ignore Continue-100” made sense as now KEMP would just ignore the 100-continue “rules”. But I was still interested in finding out why the LM thought the conversation was not RFC conformant in the first place. And this is where it gets interesting. There is this particular statement in the RFC documentation:
“Because of the presence of older implementations, the protocol allows ambiguous situations in which a client may send “Expect: 100- continue” without receiving either a 417 (Expectation Failed) status or a 100 (Continue) status. Therefore, when a client sends this header field to an origin server (possibly via a proxy) from which it has never seen a 100 (Continue) status, the client SHOULD NOT wait for an indefinite period before sending the request body.”
In fact, that was exactly what is happening here. Office 365 (the client) sent an initial 100-continue status and waited for a response to that request. In fact, it waits for exactly 5 seconds and sends the payload, regardless of it having received a response. In my opinion, this falls within the boundaries of the scenario described above. However, talking to the KEMP guys there seems to be a slightly different interpretation of the RFC which caused this mismatch and therefore the KEMP issuing the 501.
In the end, there is still something we haven’t worked out entirely: why the LM doesn’t send back the Continue-100 status back to Office 365 even though it receives it back almost instantaneously from the Exchange 2013 server.
All in all, the issue was resolved rather quickly and we know that changing the L7 configuration settings in the Load Master solves the issue (and this workaround was also confirmed as being the final solution by KEMP support, btw). Again, changing the 100-continue handling setting too “Ignore” doesn’t render the configuration (or the communication between Office 365 or Exchange on-premises) non-RFC compliant. So there’s no harm in changing it.
There are multiple ways to setup a highly available ADFS server farm. One possibility is to install multiple federation servers using the default Windows Internal Database. In that case, the first federation server is designated as being the ‘primary’ federation server. Every subsequent federation server that is added to the farm will be a ‘secondary’ federation server.
These secondary federation servers periodically poll the primary federation server for configuration changes and replicate these changes across. By default this is every 5 minutes.
This scenario is especially useful if you do not have a SQL server available or if you cannot make your SQL server highly available but still want to increase resiliency for your federation server farm.
Note when using the Windows Internal Database instead of SQL, you are limited to a maximum of 5 federation servers in a farm.
If you want more information, read my previous article on the implications of a database choice in ADFS:
When installing a secondary federation server, you might see the following error in the AD FS 2.0 Application Event Log when the server tries to contact the primary federation server to replicate the configuration database:
EventID: 344 Source: AD FS 2.0s
There was an error doing synchronization. Synchronization of data from the primary federation server to a secondary federation server did not occur.
Exception details: System.IO.InvalidDataException: ADMIN0023: Incorrect value for property LastPublishedPolicyCheckTime: 12/31/1899 11:00:00 PM. at Microsoft.IdentityServer.PolicyModel.PropertyTypes.DateTimeProperty.Validate(Object context) at Microsoft.IdentityServer.PolicyModel.PropertyTypes.PropertySet.ValidateProperties(Object context) at Microsoft.IdentityServer.PolicyModel.Client.ClientObject.GetData() at Microsoft.IdentityServer.PolicyModel.Client.ClientObject.OnReadFromStore() at Microsoft.IdentityServer.PolicyModel.Client.SearchResult..ctor(SearchResultData data, PropertyFactoryBase factory) at Microsoft.IdentityServer.Service.Synchronization.SyncAdministrationManager.DoSyncForItems(List`1 itemsToSync) at Microsoft.IdentityServer.Service.Synchronization.SyncAdministrationManager.Sync(Boolean syncAll) at Microsoft.IdentityServer.Service.Synchronization.SyncAdministrationManager.Sync() at Microsoft.IdentityServer.Service.Policy.PolicyServer.Service.SqlPolicyStoreService.DoSyncDirect() at Microsoft.IdentityServer.Service.Synchronization.SyncBackgroundTask.Run(Object context)
User Action Make sure the primary federation server is available or the service account identity of this machine matches the service account identity of the primary federation server.
In this specific case, the customer decided to geographically spread the different AD FS servers to increase the (site) resiliency of their federation server farm. However, this particular secondary federation server was located in a different time zone than the primary federation server. It seems that AD FS cannot handle the time zone difference by itself (unlike e.g. Active Directory that reduces time back to UTC).
After changing the time zone on the secondary AD FS server to match the time zone of the primary AD FS server, replication started working.
After installing Exchange 2010 SP2 Update-Rollup 3, you might see the following error pop-up in the event logs: Unhandled Exception “User setting ‘PreferredSite’ is not available. “
Although at first not much information was available, Greg Taylor already said on the 13th of June that the error is known and they were looking into it.
Apparently, Scott Schnoll also mentioned this in his presentation @ TechEd, stating the following:
(I haven’t had the chance to view the recording, but thanks to my colleague Dave for pointing this out!)
It’s good to see that the errors (although presenting them as critical), do no harm. Unfortunately, the more mailboxes you have, the more errors you’ll have in your event log. From what I can see, there might even be an error every few seconds; therefore possibly flooding your event log.
Microsoft will make an Interim Update available. However; at time of writing no fix was available yet.