Glossary

What is a jitter buffer?

A jitter buffer is a small memory buffer that holds incoming VoIP audio packets for a few milliseconds before playing them, so that packets arriving at uneven intervals can be reassembled and played back at a smooth, even pace. It is the mechanism that hides network jitter — the variation in packet arrival time — from your ears.

Without it, the natural unevenness of internet delivery would turn into choppy, stuttering audio. The jitter buffer trades a tiny amount of delay for a steady playback stream.

What problem it solves

Voice is sent as a stream of small packets over RTP, one every ~20 ms. The network is supposed to deliver them at that steady rate, but in practice they arrive unevenly — some early, some late, some out of order. That variation is jitter.

Audio playback, however, must be perfectly steady: one packet every 20 ms, in order. The jitter buffer bridges the gap. It collects arriving packets, reorders them, and releases them to the decoder on a fixed clock — absorbing the network’s unevenness.

How a jitter buffer works

Incoming packets land in the buffer instead of playing immediately.
The buffer holds them briefly (the buffer depth, e.g. 30–80 ms) and reorders any that arrived out of sequence.
It releases packets to the audio decoder at a steady interval.
Packets that arrive too late — after their playback slot has passed — are discarded, which registers as packet loss at the codec.

Static buffers use a fixed depth. Adaptive jitter buffers (the modern default) measure live jitter and resize continuously — growing during congestion to protect audio, shrinking when the network is clean to minimize delay.

The latency trade-off

A jitter buffer cannot be free: every millisecond it holds packets adds to one-way latency. Too small, and high-jitter packets arrive too late and get dropped (choppy audio). Too large, and the added delay makes conversation feel laggy and causes people to talk over each other. The adaptive approach exists precisely to walk this line automatically — only as deep as the current network demands.

Common questions

What is the difference between jitter and a jitter buffer?

Jitter is the problem — the variation in how evenly voice packets arrive over the network. A jitter buffer is the fix — a short buffer that holds and reorders packets so they play back at a steady pace despite that variation. You measure jitter in milliseconds; a healthy VoIP call typically keeps it under ~30 ms, which the buffer absorbs invisibly.

Does a jitter buffer cause delay?

Yes, a small amount — that is the trade-off. The buffer must hold packets briefly to smooth them, adding that hold time to one-way latency. A few tens of milliseconds is imperceptible; the goal is the smallest depth that still prevents choppiness, which is why adaptive buffers resize to current conditions rather than using a fixed large value.

What is an adaptive jitter buffer?

An adaptive jitter buffer measures network jitter in real time and changes its depth on the fly — deepening when packet arrival gets erratic (protecting audio at the cost of slight delay) and shrinking when the network is stable (minimizing delay). Almost all modern VoIP endpoints use adaptive buffers; static fixed-depth buffers are largely legacy.

Can a jitter buffer fix bad audio?

Only the jitter component. It smooths uneven arrival, but it cannot recover packets that are truly lost, nor undo high latency or low bandwidth. If audio is still choppy with a working buffer, the cause is usually real packet loss or congestion — fixed with QoS and a stable connection, not a bigger buffer.

See DialPhone call quality

AI business phone system → · QoS → · Pricing →

QoS — prioritizing voice to reduce jitter at the source
Packet loss — what the buffer can’t recover
RTP — the protocol carrying the buffered packets
MOS score — how the resulting quality is measured