Introduction
This memorandum specifies the real-time transport protocol (RTP),
which provides end-to-end delivery services for data with real-time
characteristics, such as interactive audio and video. Those services include payload type identification, sequence numbering, timestamping and delivery monitoring. Applications typically run RTP on top of
UDP to make use of its multiplexing and checksum services; both
protocols contribute parts of the transport protocol functionality.
However, RTP may be used with other suitable underlying network or
transport protocols (see Section 11). RTP supports data transfer to
multiple destinations using multicast distribution if provided by the
underlying network.
Note that RTP itself does not provide any mechanism to ensure timely
delivery or provide other quality-of-service guarantees, but relies
on lower-layer services to do so. It does not guarantee delivery or
prevent out-of-order delivery, nor does it assume that the underlying
network is reliable and delivers packets in sequence. The sequence
numbers included in RTP allow the receiver to reconstruct the
sender's packet sequence, but sequence numbers might also be used to
determine the proper location of a packet, for example in video
decoding, without necessarily decoding packets in sequence.
While RTP is primarily designed to satisfy the needs of multi-
participant multimedia conferences, it is not limited to that
particular application. Storage of continuous data, interactive
distributed simulation, active badge, and control and measurement
applications may also find RTP applicable.
This document defines RTP, consisting of two closely-linked parts:
o the real-time transport protocol (RTP), to carry data that has
real-time properties.
o the RTP control protocol (RTCP), to monitor the quality of service
and to convey information about the participants in an on-going
session. The latter aspect of RTCP may be sufficient for "loosely
controlled" sessions, i.e., where there is no explicit membership
control and set-up, but it is not necessarily intended to support
all of an application's control communication requirements. This
functionality may be fully or partially subsumed by a separate
session control protocol, which is beyond the scope of this
document.
RTP represents a new style of protocol following the principles of
application level framing and integrated layer processing proposed by
Clark and Tennenhouse [10]. That is, RTP is intended to be malleable
Schulzrinne, et al. Standards Track [Page 4]
RFC 3550 RTP July 2003
to provide the information required by a particular application and
will often be integrated into the application processing rather than
being implemented as a separate layer. RTP is a protocol framework
that is deliberately not complete. This document specifies those
functions expected to be common across all the applications for which
RTP would be appropriate. Unlike conventional protocols in which
additional functions might be accommodated by making the protocol
more general or by adding an option mechanism that would require
parsing, RTP is intended to be tailored through modifications and/or
additions to the headers as needed. Examples are given in Sections
5.3 and 6.4.3.
Therefore, in addition to this document, a complete specification of
RTP for a particular application will require one or more companion
documents (see Section 13):
o a profile specification document, which defines a set of payload
type codes and their mapping to payload formats (e.g., media
encodings). A profile may also define extensions or modifications
to RTP that are specific to a particular class of applications.
Typically an application will operate under only one profile. A
profile for audio and video data may be found in the companion RFC
3551 [1].
o payload format specification documents, which define how a
particular payload, such as an audio or video encoding, is to be
carried in RTP.
A discussion of real-time services and algorithms for their
implementation as well as background discussion on some of the RTP
design decisions can be found in [11].
1.1 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119 [2]
and indicate requirement levels for compliant RTP implementations.
2. RTP Use Scenarios
The following sections describe some aspects of the use of RTP. The
examples were chosen to illustrate the basic operation of
applications using RTP, not to limit what RTP may be used for. In
these examples, RTP is carried on top of IP and UDP, and follows the
conventions established by the profile for audio and video specified
in the companion RFC 3551.
Schulzrinne, et al. Standards Track [Page 5]
RFC 3550 RTP July 2003
2.1 Simple Multicast Audio Conference
A working group of the IETF meets to discuss the latest protocol
document, using the IP multicast services of the Internet for voice
communications. Through some allocation mechanism the working group
chair obtains a multicast group address and pair of ports. One port
is used for audio data, and the other is used for control (RTCP)
packets. This address and port information is distributed to the
intended participants. If privacy is desired, the data and control
packets may be encrypted as specified in Section 9.1, in which case
an encryption key must also be generated and distributed. The exact
details of these allocation and distribution mechanisms are beyond
the scope of RTP.
The audio conferencing application used by each conference
participant sends audio data in small chunks of, say, 20 ms duration.
Each chunk of audio data is preceded by an RTP header; RTP header and
data are in turn contained in a UDP packet. The RTP header indicates
what type of audio encoding (such as PCM, ADPCM or LPC) is contained
in each packet so that senders can change the encoding during a
conference, for example, to accommodate a new participant that is
connected through a low-bandwidth link or react to indications of
network congestion.
The Internet, like other packet networks, occasionally loses and
reorders packets and delays them by variable amounts of time. To
cope with these impairments, the RTP header contains timing
information and a sequence number that allow the receivers to
reconstruct the timing produced by the source, so that in this
example, chunks of audio are contiguously played out the speaker
every 20 ms. This timing reconstruction is performed separately for
each source of RTP packets in the conference. The sequence number
can also be used by the receiver to estimate how many packets are
being lost.
Since members of the working group join and leave during the
conference, it is useful to know who is participating at any moment
and how well they are receiving the audio data. For that purpose,
each instance of the audio application in the conference periodically
multicasts a reception report plus the name of its user on the RTCP
(control) port. The reception report indicates how well the current
speaker is being received and may be used to control adaptive
encodings. In addition to the user name, other identifying
information may also be included subject to control bandwidth limits.
A site sends the RTCP BYE packet (Section 6.6) when it leaves the
conference.