An important task when configuring VoIP systems is to ensure sufficient voice quality. Two factors significantly affect the voice quality of a VoIP connection: The voice delay on its way from the sender to the receiver, and the loss or late arrival of data packets on their way to the receiver. The International Telecommunication Union (ITU) has extensively tested what people perceive to be sufficient voice quality, and has published the result in the ITU G.114 recommendation.
The quality of a telephone connection is perceived as normal with a delay of no more than 100 ms and a packet loss of less than 5%, and still as good quality with no more than a 150 ms delay and less than 10% packet loss. Ultimately, some listeners still find the quality to be acceptable with up to 300 ms at 20%, although this is the limit before the connection becomes no longer useful for voice transmission.
Along with the average delay time, the fluctuation in this delay can also be perceived by the human ear. Variations in the runtime of the speech information from the sender to the receiver (jitter) are tolerable at up to 10 ms, but more is perceived as irritating.
A VoIP connection should be configured to remain within these marginal values: Packet loss up to 10%, delay up to 150 ms, jitter up to 10 ms.
- Jitter can be offset by using a buffer at the receiver. This jitter buffer caches a quantity of packets and passes them to the receiver at regular intervals. This buffering compensates for the fluctuations in the transmission time between the individual packets.
- The delay is influenced by several components:
- The fixed portion of the delay consists of the processing ( packet assembly, encoding and compression at the sender and the receiver), serialization (the time for transferring the packet from the application to the interface), and propagation (the time for transmission over the WAN link).
- The variable component is determined by the jitter or the jitter-buffer setting.
- Along with the general network losses, packet loss is ultimately significantly affected by the jitter buffer. If packets arriving have a greater delay than the jitter buffer can counterbalance, the packets are dropped and packet loss increases. The larger the jitter buffer, the smaller the loss. Conversely, the jitter buffer also increases the overall delay. The jitter buffer should thus be set small enough for the quality to still be considered sufficient.
Going into detail, the delay is determined in particular by the codec used, the resulting packet size, and the available bandwidth:
- The time of processing is determined by the codec used. With a sampling time of 20 ms, a new packet is generated every 20 ms precisely. The times for compression, etc. are usually negligible.
- The time to transfer packets to the interface is defined by the quotient of packet size and available bandwidth:
Packet size in bytes | |||||||
---|---|---|---|---|---|---|---|
1 | 64 | 128 | 256 | 512 | 1024 | 1500 | |
56 kbps | 0.14 | 9 | 18 | 36 | 73 | 146 | 215 |
64 kbps | 0.13 | 8 | 16 | 32 | 64 | 128 | 187 |
128 kbps | 0.06 | 4 | 8 | 16 | 32 | 64 | 93 |
256 kbps | 0.03 | 2 | 4 | 8 | 16 | 32 | 47 |
512 kbps | 0.016 | 1 | 2 | 4 | 8 | 16 | 23 |
768 kbps | 0.010 | 0.6 | 1.3 | 2.6 | 5 | 11 | 16 |
1536 kbps | 0.005 | 0.3 | 0.6 | 1.3 | 3 | 5 | 8 |
- A 512-byte packet on an FTP connection over a 128-kbps upstream link occupies the line for at least 32 ms.
Apart from that, the packets on a VoIP connection often consist of much more than just the payload itself. In additional to the payload, there are also IP headers and, if applicable, IPSec headers. The payload results from the product of payload data rate and the codec sampling interval. In addition, all codecs require 40 bytes for IP, RTP and UDP headers and at least 20 bytes for the IPSec header (although the RTP and IPSec headers can be larger, depending on the configuration).
Without IPSec Payload IP payload Ethernet / PPPoE ATM net Bps ATM gross Bps Code 20 ms 20 ms 20 ms 20 ms 20 ms G711-64 160 200 222 96000.0 106000.0 G722-64 160 200 222 96000.0 106000.0 G726-40 100 140 162 76800.0 84800.0 G726-32 80 120 142 76800.0 84800.0 G726-24 60 100 122 57600.0 63600.0 G726-16 40 80 102 57600.0 63600.0 G729-8 20 60 82 57600.0 63600.0 Without IPSec Payload IP payload Ethernet / PPPoE ATM net Bps ATM gross Bps Code 30 ms 30 ms 30 ms 30 ms 30 ms G711-64 240 280 302 89600.0 98933.3 G722-64 240 280 302 89600.0 98933.3 G726-40 150 190 212 64000.0 70666.7 G726-32 120 160 182 64000.0 70666.7 G726-24 90 130 152 51200.0 56533.3 G726-16 60 100 122 38400.0 42400.0 G729-8 30 70 92 38400.0 42400.0 G723-6,3 24 64 86 38400.0 42400.0 With IPSec Payload IP payload IPSec payload Ethernet / PPPoE ATM net Bps ATM gross Bps Code 20 ms 20 ms 20 ms 20 ms 20 ms 20 ms G711-64 160 200 260 282 134400.0 148400.0 G722-64 160 200 260 282 134400.0 148400.0 G726-40 100 140 200 222 96000.0 106000.0 G726-32 80 120 180 202 96000.0 106000.0 G726-24 60 100 160 182 96000.0 106000.0 G726-16 40 80 140 162 76800.0 84800.0 G729-8 20 60 120 142 76800.0 84800.0 With IPSec Payload IP payload IPSec payload Ethernet / PPPoE ATM net Bps ATM gross Bps Code 30 ms 30 ms 30 ms 30 ms 30 ms 30 ms G711-64 240 280 340 362 102400.0 113066.7 G722-64 240 280 340 362 102400.0 113066.7 G726-40 150 190 250 272 89600.0 98933.3 G726-32 120 160 220 242 76800.0 84800.0 G726-24 90 130 190 212 64000.0 70666.7 G726-16 60 100 160 182 64000.0 70666.7 G729-8 30 70 130 152 51200.0 56533.3 G723-6,3 24 64 124 146 51200.0 56533.3 - IP payload: Voice payload + 40 byte header (12 byte RTP; 8 byte UDP; 20 byte IP header)
- IPSec payload: IP packet + padding + 2 byte (padding length & next header) = multiple of the IPSec initialization vector
Note: The values in the table apply to the use of AES. For other encryption methods, the resulting packet size may vary slightly.Note: For further information on bandwidth requirements for Voice over IP with IPSec is available in the LANCOM techpaper Performance Analysis of Routers. - The time for transmission over the Internet depends on the distance (about 1 ms per 200 km) and the other routers en route (about 1 ms per hop). This time is approximately half the average time of a series of pings to the remote site.
- Many IP telephones allow the jitter buffer to be set directly, i.e. as a fixed number of packets for caching. The phones then load up to 50% of the set number of packets, and then start the playback. The jitter buffer thus corresponds to half the set number of packets multiplied by the sampling time of the codec.
- Conclusion: The total delay in the following example results from the bandwidth, a ping time of 100 ms to the remote station, and a jitter buffer of 4 packets for the two codecs:
Codec | Processing | Serialization | Propagation | Jitter buffer | Total |
---|---|---|---|---|---|
G.723.1 | 30 ms + 7.5 ms look ahead | 32 ms | 50 ms | 60 ms | 179.5 ms |
G.711 | 20 ms | 32 ms | 50 ms | 40 ms | 142 ms |
- The packet transmission time to the interface (serialization) is based on a PMTU of 512 bytes for a 128 kbit connection. For slower interfaces or other codecs, you may need to set other jitter buffers and / or PMTU values.
Important: Please note that the bandwidths are required in the sending and receiving direction, as well as just for one single connection.Note: These explanations relate to very low bandwidth Internet connections. Where higher bandwidths are available, reducing the size of the PMTU has a barely perceptible influence on performance.