Wednesday, 24 July 2013

Lync Trunk Configuration

I've been wanting to post this for a while.  A while back I was involved in a support case that took a while to pinpoint and eventually resolve.  It was an intermittent issue, which I hate.  I used the usual method for troubleshooting and asked for specific examples with date and time stamps as well as caller and callee.  Then got stuck in to the Lync Monitoring server logs delving deep into the guts of each call.  The most frustrating thing was that Lync reported no failure, expected or unexpected, for any of the calls.

The issue reported was that calls which were put on hold seemed to disappear and couldn't be retrieved.  The strange thing was that the person having been put on hold was still on hold.

The Lync solution was pretty standard except for the addition of an Asterisk Free PBX which was providing hold music.  

I know this is not a "supported" solution before you say anything.  But what do you do when there is no other solution for providing consistent hold music and the customer really wants music on hold?  And I also know there are better solutions available now with embedded music on hold on the Aries handsets and gateways with dedicated storage for music on hold and even MOH ports.  This was an early project and these solutions didn't exist.

Anyway, the only characteristic similarity I found for any of these calls and, as it turns out, a lot more calls which weren't reported as problematic was 
"Call terminated on mid-call media failure where both endpoints are internal"
Lync insisted that it was always the outside party that ended the call.  But the error said that the call terminated where both endpoints were internal.  The outside party also insisted that they could still hear hold music so assumed they were still on hold.  Now the penny has dropped.  Lync is dropping the calls because as far as it is concerned, it doesn't have control of the call any more.

Why was this happening?  What you should know is that the Asterisk is connected to Lync via a trunk.  There are a couple of configurable attributes on a trunk that were key in resolving the issue.

Figure 1

RTCPActiveCalls This parameter determines whether RTCP packets are sent from the PSTN gateway, IP-PBX, or SBC at the service provider for active calls. An active call in this context is a call where media is allowed to flow in at least one direction. If RTCPActiveCalls is set to True, the Mediation Server or Lync Server client can terminate a call if it does not receive RTCP packets for a period exceeding 30 seconds.
Note that disabling the checks for received RTCP media for active calls in Lync Server elements removes an important safeguard for detecting a dropped peer and should be done only if necessary.
Default setting: True

RTCPCallsOnHold - This parameter determines whether RTCP packets continue to be sent across the trunk for calls that have been placed on hold and no media packets are expected to flow in either direction. If Music on Hold is enabled at either the Lync Server client or the trunk, the call will be considered to be active and this property will be ignored. In these circumstances use the RTCPActiveCalls parameter.
Note that disabling the checks for received RTCP media for active calls in Lync Server elements removes an important safeguard for detecting a dropped peer and should be done only if necessary.
Default setting: True

Why were these relevant?  Put simply, Lync thought the calls ended because it wasn't receiving RTCP (RTP Control Protocol) Packets.  RTCP gathers statistics for a media connection and information such as transmitted octet and packet counts, packet loss, jitter, and round-trip delay time. Lync and other systems use this information to control quality of service parameters, perhaps by limiting flow, or using a different codec.  Because Lync wasn't receiving these packets it simply ended the call by severing the trunk.

By setting these parameters to false it stops Lync from checking for these packets.

In an elevated Lync command shell enter the following:
Set-CSTrunkConfiguration -identity “trunk name” -RTCPActiveCalls $False -RTCPCallsOnHold $False
 You'll get the following warning when you set the configuration.

WARNING: When RTCP active calls or RTCP calls on hold is false, it is recommended that you enable session timer to periodically verify that the call is still active.
So as well as the above you need to enable the session timer.
Set-CSTrunkConfiguration -identity “trunkname” -enablesessiontimer $True 
The strange thing is that all three settings essentially check to see if a call is active.  The session timer does it by sending periodic probes to the mediation server and waiting for a reply.  And I've explained what the other does already.  The point is that all three settings have the same goal.  And here we have disabled one and enabled the other.  Microsoft say in the description of the parameter disabling these is a bad thing and should be done only if necessary.  But I wouldn't have to use this setting if the gateway didn't stop sending RTCP packets in the first place.  At least there is a setting to change to fix it.

As soon as I changed these parameters the errors stopped and so did the dropped calls.  We have also since had to change it on the trunk to the PSTN gateway for the same reason.

It also turns out that I am not alone in the issue.  You'll probably find more than a few forums mentioning the same thing.  As I said, I have been wanting to post this for a while.  And I really hope it helps you if you encounter the issue, but also to understand the mechanics of why it is happening in the first place.  I also hope that gateway and SBC manufacturers and SIP providers take note and ensure their products consistently send RTCP packets.

Thanks for reading.