Glossary · RTP
What is RTP?
RTP (Real-time Transport Protocol) is the IETF standard, defined in RFC 3550, for delivering live audio and video streams across IP networks. In a VoIP call, SIP handles the signaling — set up, modify, tear down — and RTP carries the actual encoded voice between the endpoints. SIP arranges the conversation; RTP is the conversation.
How RTP works
After SIP negotiates the codec and the UDP ports, each endpoint begins emitting a stream of RTP packets containing the encoded audio. Every packet carries:
- A sequence number so the receiver can detect lost or reordered packets.
- A timestamp the receiver uses to play audio back at the original cadence.
- An SSRC identifier that labels the stream, allowing the receiver to demux multiple sources.
- A payload type that identifies the codec — G.711 µ-law, G.722 wideband, Opus, and others.
The stream is one-way per direction, so a full-duplex voice call uses two RTP streams. A companion protocol, RTCP (RTP Control Protocol), runs alongside on an adjacent port to exchange jitter, loss, and round-trip statistics that endpoints use to adapt.
RTP vs. SIP vs. SRTP
The three protocols are often confused and have distinct jobs:
- SIP is the signaling — set up, modify, transfer, and end the session.
- RTP is the media transport — the actual audio packets between the endpoints.
- SRTP is RTP with encryption — the same media stream wrapped in AES-based confidentiality and integrity protection.
In modern deployments, calls run SIP-over-TLS for signaling encryption and SRTP for media encryption. Plain SIP-over-UDP and plain RTP still work but expose the call to passive interception on the wire.
What can go wrong on the RTP path
VoIP quality is determined almost entirely by what happens to RTP packets between the two endpoints. Three failure modes dominate:
- Latency — one-way delay. Above 150 ms, conversation feels delayed; above 300 ms it breaks down.
- Jitter — variation in packet arrival time. Above 30 ms, the jitter buffer starts to drop or repeat audio frames.
- Packet loss — packets that never arrive. Above 1%, audio gaps and clicks become audible.
These three drive the MOS score — the standard 1-to-5 perceived-quality rating. Quality of Service policies exist primarily to protect the RTP path.
RTP port handling and NAT
RTP uses a dynamically negotiated UDP port range — typically 10000-20000 or 16384-32767 — rather than a fixed port. That makes RTP traffic awkward for firewalls and NAT gateways: every call requires a fresh port mapping. The standard solutions are:
- Session border controllers that hold the media path and present a consistent address to the public network.
- STUN, TURN, and ICE for WebRTC endpoints to discover and traverse NATs without an SBC.
- Symmetric RTP so the receiver sends RTP back to the source address it saw, helping NAT mappings stay open.
RTP frequently asked questions
What does RTP stand for?
RTP stands for Real-time Transport Protocol. It is the IETF standard (RFC 3550) for delivering real-time media — primarily audio and video — over IP networks. RTP carries the encoded media payload while a separate signaling protocol such as SIP arranges the session.
What is the difference between RTP and SIP?
SIP is signaling — it sets up, modifies, and ends the call. RTP is media transport — it carries the actual encoded voice between the two endpoints once the call is up. A VoIP call uses both: SIP arranges it, RTP delivers the audio frame-by-frame.
What is the difference between RTP and SRTP?
RTP is the basic real-time transport. SRTP is the secure variant that wraps the same media payload in AES encryption and HMAC integrity protection. Production deployments increasingly require SRTP because plain RTP exposes call audio to anyone with packet capture on the path.
What port does RTP use?
RTP does not use a single fixed port. Endpoints negotiate UDP ports during SIP setup from a configured range — commonly 10000-20000 or 16384-32767 — with one port pair per stream. RTCP traditionally uses the port immediately above each RTP port for control statistics.
See how DialPhone fits
DialPhone runs SRTP-encrypted RTP across its business phone and contact center paths, with managed session border controllers and continuous MOS reporting — so the media path is both secure and measurable, not a black box behind the SIP signaling layer.