Gradwell Blog

Understanding VoIP Call Quality

At the moment, we have some customers experiencing phone calls dropping out whilst using our VoIP platform. If this is you, then we do apologise for the pain, and i know that the customer support team is working very hard on gathering information, keeping everyone informed and fixing any localised issues.

We think the problem at the moment is caused by a higher than average latency across our internal network. Which is causing RTP packets and possibly DNS traffic to be dropped. However, it is not immediately obvious what the cause of that traffic is - and we are currently working very hard to understand that cause. Initally, I would say that network latency today is lower than yesterday.

This post is prompted by a customers email to me, in which he makes an interesting observation:

...the main problem I have, which is the lack of logging or metrics from our end. When the phones go scatty, it could I guess be the phones, or the natting router, or the ISP, or Gradwell. I'm disinclined to think it is the phones or the natting router, because these don't change day to day but the level of service does. So that leaves the Internet link or Gradwell. But the problem is that when things do go wrong, we never really know why, so things never seem to get fixed - they only sort of get better for a while. The first voip service that when things went wrong would tell me _exactly_ what went wrong so that we could fix it, would get my business forever.

This, really , I think, is the the holy grail of VoIP providers. PSTN providers don’t have this problem because they lay a wire to the door of their customers. VoIP providers can do this, by using their own broadband and LLU scenarios (and this is something we’re considering too), but ultimately VoIP should be able to work reliably across a wide spectrum of networks and customers will always want to use their own broadband service.

When it comes to monitoring our core network, we make test calls, we quality score our asterisk servers and we monitor network latency. However, we don’t have the view from the customers end and so it usually looks like the customer has hung up when we get a call disconnect.

From our experience, most “call drops” occur because the audio stream arrives out of order, and cannot be succesfully put back together. This leads to the user at either end hanging up the phone and considering the call “dropped”. We probably understand a bit more about call flows by matching the rtp stream and packet sequence against the SIP messages that control the call.

Perhaps what we need is a customised ADSL router that understands SIP traffic and can spot when a call is “dropped”, and feed that back to us in real time. We might be able to build something like that using DD-WRT2, the Linux operating system for Linksys routers, and then build up a real time view of what the causes of the problems are.

I’d be interested in hearing from anyone else who has been thinking about this problem.

Update: Weds 17th:General feedback and latency monitoringshows that we have reduced our network latency to some quite sensible levels (these graphs don’t show much historical data, because we broke the old ones upgrading to some better monitoring software).

Comments are closed.