Monday, March 30, 2015

KEMP LoadMaster and Exchange 2013: Check your Server Check configuration

If you deploy a KEMP LoadMaster to load balance Exchange (which you should!) you may see an unusual behavior where the LoadMaster treats a service failure incorrectly as a server failure. First let me explain a very typical configuration example to demonstrate the issue, after that I'll explain how to fix.

I won't go into the details of deploying the LoadMaster or Exchange 2013, if you're reading this article you're supposed to have a good understanding of Exchange 2013 load balancing and the basics of working with the KEMP LoadMaster.

I start my configuration by downloading the latest Exchange 2013 Templates from the KEMP Technologies website. In this example I used Core services: MAPI, SMTP and Unified HTTP/HTTPS because I'm not going to enable ESP for this service.

So a new Virtual Service is created with the Exchange 2013 HTTPS Reencrypted template.

image

Next step is to assign a SSL certificate to the Virtual Service:

image

And add the real servers to all nine sub-Virtual Services:

image

The result is a nice and healthy Virtual Service:

image

So far so good? Well, almost… Let's see what happens when one of our Real Servers encounters an issue. To do so we simulate an unhealthy OWA, resulting in having Managed Availability no longer reporting a 200 OK when the /owa/healthcheck.htm url is queried.

Set-ServerComponentState ex01 -State inactive -Component owaproxy -Requester healthapi

image

Now if we check the health of the Virtual Service in the KEMP WUI we expect it to report an unhealthy Real Server for the OWA sub-Virtual Service. Instead it displays a failed RS for all services:

image

In the Warning Log is an endless series of these error messages:

Mar 30 13:56:04 lb100 l4d: Removing RS 192.168.200.182:443 from VS 192.168.200.200:443(E2013HTTPS) - EOF or Incorrect data received
Mar 30 13:56:04 lb100 last message repeated 5 times
Mar 30 13:56:13 lb100 l4d: Adding RS 192.168.200.182:443 to VS 192.168.200.200:443(E2013HTTPS)
Mar 30 13:56:13 lb100 last message repeated 5 times
Mar 30 13:56:13 lb100 l4d: Removing RS 192.168.200.182:443 from VS 192.168.200.200:443(E2013HTTPS) - EOF or Incorrect data received
Mar 30 13:56:13 lb100 last message repeated 5 times
Mar 30 13:56:22 lb100 l4d: Adding RS 192.168.200.182:443 to VS 192.168.200.200:443(E2013HTTPS)
Mar 30 13:56:22 lb100 last message repeated 5 times

imageApparently the LoadMaster detects the entire RS unavailable and removes the RS from the VS. Now typically we have enabled the Drop Connections on RS failure feature because this is something you want for load balancing Exchange. The result is that your Outlook uses will be disconnected and forced to reconnect every time the LoadMaster removes 'their' RS from the VS. Especially for Outlook in online mode this will result in helpdesk calls and unhappy users.


I worked with KEMP Support to troubleshoot this unexpected behavior and the root cause was found pretty fast. By default the Real Server Check uses HTTP/1.1 to query the healthcheck.htm url, as can be seen here:
image
HTTP/1.1 is a bit more efficient than the default of HTTP/1.0 because it bundles multiple requests. Unfortunately this breaks our per-service health checks because the LoadMaster is no longer able to detect which subVS was the unhealthy one, as the result of that the entire RS is removed from the service.


My recommendation is to disable the Use HTTP/1.1 feature of all subVS to restore normal behavior.


KEMP Support, as always, was great to assist us with this issue. I left a Feature Request to ask them to update the Exchange Templates to remove the HTTP/1.1 checkbox by default.

4 comments:

Anonymous said...

Nice work and great information, thanks so much for sharing.

Unknown said...

Thanks for the information! Helped us out

Anonymous said...

Thanks! :-)

Anonymous said...

Great article. Helped me out, thanks!