JavaScript Session Establishment Protocol (JSEP)
The communication model between a client and remote host is based on the JSEP architecture, which differentiates the signaling and media transaction into different layers.
The differentiation is shown in the following figure:
JSEP signaling and media
As an example, let's consider two peers, A and B, where A initiates communication with B. Initially, in the first case, A being the offerer will have to call the createOffer
function to begin a session. A also mentions details such as codecs through a setLocalDescription
function, which sets up its local config. The remote party, B, reads the offer and stores it using the setRemoteDescription
function. The remote party, B, calls the createAnswer
function to generate an appropriate answer, applies it using the setLocalDescription
function, and sends the answer back to the initiator over the signaling channel. When A gets the answer, it also stores it using the setRemoteDescription
function, and the initial setup is complete. This is repeated for multiple offers and answers. The latest on JSEP specifications can be read from the Internet Engineering Task Force (IETF) site at http://datatracker.ietf.org/doc/draft-ietf-rtcweb-jsep/.
Signal and media flows
The differentiation between signal and media flows is an important aspect of the WebRTC call setup.
The signaling mechanism can be any among HTTP/REST, JavaScript Object Notation (JSON) via XMLHttpRequest (XHR), Session Initiation Protocol (SIP) over websockets, XMPP, or any custom or proprietary protocol. The media (audio/video) is defined through the Session Description Protocol (SDP) and flows from peer to peer.
A few instances of end-to-end signaling and media flow variants are shown in the following screenshot:
The preceding figure depicts signaling over the WebRTC API in the JSON format via XHR.
Now, the following figure depicts signaling over the WebRTC API in eXtensible Messaging and Presence Protocol (XMPP):
While it's very popular to use the WebRTC API with SIP support through JavaScript libraries such as JSSIP, SIPML5, PJSIP, and so on, these libraries cater to the SIP/IMS (IP Multimedia Subsystem) world and are not mandatory for setting up enterprise-level WebRTC Infrastructure. In fact, it is a misconception that WebRTC is coupled with SIP in itself; it isn't.
Note
IP Multimedia System (IMS) is part of the Next Generation Network (NGN) model for IP-based communication.