When a customer reports a poor wireless VoIP experience and you rule out the low-hanging fruit and best practices, e.g. roaming optimization, verification of clean RF environment, and other client configuration, where do you go next?
Audio streams consist of a steady delivery of small frames (typically 200-250KB). The average delay between audio frames is known as latency. The variability of latency over time is known as jitter. Desired latency is dictated by the audio codec being used. For example, the G.711 codec sends audio frames every 20ms and thus has a desired latency of 20ms.
Once the desired latency is known it can be relatively easy to plot an audio stream as a function of time using wireshark.
- Use a display filter to isolate audio frames in one direction
- Choose ‘Statistics’ -> ‘I/O graphs’ and plot that data
- The key to the best visualization is to set an appropriate interval to get a good sense of the flow of frames.
- The interval should be set to a multiple of the desired latency of the codec being evaluated and should be less than 10x that latency.
- Default interval used by Wireshark is 1s which offers a visualization that is difficult to glean information from.
G.711 has a desired latency of 20ms so I would set the interval to 100ms and expect to see approximately 5 frames per data point on the graph. We want the number of data frames per data point to be low, e.g. <10-15, for the best visual representation of the stream.

Now that we know the basics of graphing an audio stream using Wireshark we can return to troubleshooting. There are two possible sources of “bad” RTP streams.
- Audio stream has increased latency and jitter. This can be seen by the I/O graph above.
- Audio was picked up poorly at the microphone (the source). In this case audio stream could look good in an I/O graph but when stream is played, you can hear sound cutting in and out.
Always start as close to the source as possible and then capture packets to investigate latency and jitter of the audio stream and playback the audio to gauge how well the microphone picked up the source audio. Use Over-the-air captures as opposed to AP captures and / or WLC data plane captures as we want to see the untouched audio stream as it was originally transmitted by the wireless device.
By following audio streams from egress of one device to ingress of next device in path, one can isolate where jitter or unwanted noise enters a stream. From there you can focus troubleshooting on that particular host or network device.