| About|News|Products|Prices|Solutions|Support|Registration|Services|Jobs|Contacts|Home|

H.323 protocol versus SIP, comparison

What will win as the de facto standard - H.323 or SIP?

Session Initiation Protocol (SIP) is a standard introduced by the Internet Engineering Task Force in 1999 to carry voice over IP. Since it was created by the IETF, it approaches voice and multimedia from the Internet, or IP, perspective. H.323 emerged around 1996, and as an International Telecommunication Union standard was designed from a telecommunications perspective. Both standards have the same objective - to enable voice and multimedia convergence with IP protocols.

As the older standard, H.323 has been embraced by many of the early VoIP players, so it has the advantage of being implemented first. SIP more easily allows applications to be developed because of its origins and has been gaining in popularity, especially in North America and with new entrants into the VoIP market. We don't believe either protocol will " win. " In fact, we believe that protocol wars waste a lot of good energy that could be better spent evolving the standards. And both protocols are evolving to offer more features. In interoperability between the two, the industry is making slow but sure progress. Interoperability must first happen between vendor implementations of the same protocol (SIP-to-SIP and H.323-to-H.323) and then between protocols. Many vendors now support both protocols, as their customers want flexibility to choose based on different needs. Some, like CommWorks, even support an interoperability module between the protocols to support service interworking. We believe both standards have their advantages and disadvantages, and fully support efforts to make the two interoperable.

SIP is, more or less, equivalent to the Q.931 and H.225 components of H.323. These protocols are responsible for call setup and call signalling. Consequently, both SIP and H.323 can be used as signalling protocols in IP networks.

H.323 protocol

H.323 is an ITU VOIP protocol. It was created at about the same time as SIP, but was more widely adopted and deployed earlier. Today, most of the world's VoIP traffic is carried over H.323 networks, with billions of minutes of traffic being carried every month.

H.323's strengths lie in its ability to serve in a variey of roles, including multimedia communication (voice, video, and data conferencing), as well as applications where interworking with the PSTN is vital. H.323 was designed from the outset with multimedia communications over IP networks in mind, making it the perfect solution for real-time multimedia communication over packet-based networks.

H.323 was designed with a good understanding of the requirements for multimedia communication over IP networks, including audio, video, and data conferencing. It defines an entire, unified system for performing these functions, leveraging the strengths of the IETF and ITU-T protocols. As a result, it might be reasonable for users to expect about the same level of robustness and interoperability as is found on the PSTN today, although this admittedly varies across the globe. H.323 was designed to scale to add new functionality. The most widely deployed use of H.323 is "Voice over IP" followed by "Videoconferencing", both of which are described in the H.323 specifications. H.323 has defined a number of features to handle failure of intermediate network entities, including "alternate gatekeepers", "alternate endpoints", and a means of recovering from connection failures.

SIP protocol

SIP, the session intitation protocol, is the IETF protocol for VOIP and other text and multimedia sessions, like instant messaging, video, online games and other services. SIP is very much like HTTP, the Web protocol, or SMTP. Messages consist of headers and a message body. SIP message bodies for phone calls are defined in SDP -the session description protocol.

SIP was designed to setup a "session" between two points and to be a modular, flexible component of the Internet architecture. It has a loose concept of a call (that being a "session" with media streams), has no support for multimedia conferencing, and the integration of sometimes disparate standards is largely left up to each vendor. As a result, SIP is now a 10-year old protocol with a vast number of interoperability problems. While SIP has been successfully deployed in some environments, those are generally "closed" environments where the means of interoperability has been PSTN gateways. SIP has not defined procedures for handling device failure. If a proxy fails, the user agent detects this through timer expiration. It is the responsibility of the user-agent to send a re-INVITE to another proxy, leading to long delays in call establishment.

* SIP is a text-based protocol that uses UTF-8 encoding
* SIP uses port 5060 both for UDP and TCP. SIP may use other transports

SIP offers all potentialities of the common Internet Telephony features like: call or media transfer, call conference, call hold.
Since SIP is a flexible protocol, it is possible to add more features and keep downward interoperability. SIP also does suffer from NAT or firewall restrictions. (Refer to NAT and VOIP). SIP can be regarded as the enabler protocol for telephony and voice over IP (VoIP) services. The following features of SIP play a major role in the enablement of IP telephony and VoIP:

* Name Translation and User Location: Ensuring that the call reaches the called party wherever they are located. Carrying out any mapping of descriptive information to location information. Ensuring that details of the nature of the call (Session) are supported.
* Feature Negotiation: This allows the group involved in a call (this may be a multi-party call) to agree on the features supported recognizing that not all the parties can support the same level of features. For example video may or may not be supported; as any form of MIME type is supported by SIP, there is plenty of scope for negotiation. Call Participant Management - During a call a participant can bring other users onto the call or cancel connections to other users. In addition, users could be transferred or placed on hold.
* Call feature changes: A user should be able to change the call characteristics during the course of the call. For example, a call may have been set up as voice-only, but in the course of the call, the users may need to enable a video function. A third party joining a call may require different features to be enabled in order to participate in the call
* Media negotiation: The inherent SIP mechanisms that enable negotiation of the media used in a call, enable selection of the appropriate codec for establishing a call between the various devices. This way, less advanced devices can participate in the call, provided the appropriate codec is selected.

SIP Vs. H.323 - A Comparison table





H.323 covers almost every service, such as capability exchange, conference control, basic signaling, QoS, registration, service discovery, and so on.

SIP is modular because it covers basic call signaling, user location, and registration.  Other features are in other separate orthogonal protocols.  











Call control Functionality

Call Transfer



Call Forwarding



Call Holding



Call Parking/Pickup



Call Waiting



Message Waiting Indication



Name Identification



Call Completion on Busy Subscriber



Call Offer



Call Intrusion




H.323 splits them across H.450, RAS, H.245 and Q.931


Advanced Features

Multicast Signaling

Yes, location requests (LRQ) and auto gatekeeper discovery (GRQ).

Yes, e.g., through group INVITEs.

Third-party Call Control

Yes, through third-party pause and re-routing which is defined within H.323. More sophisticated control is defined by the related H.450.x series of standards.

Yes, through SIP as described in separate Internet Drafts.




Click for Dial




Large Number of Domains

The initial intent of H.323 was for the support of LANs, so it was not inherently designed for wide area addressing. The concept of a zone was added to accommodate wide area addressing.  Procedures are defined for “user location” across zones for email names. Annex G defines communication between administrative domains, describing methods to allow for address resolution, access authorization and usage reporting between administrative domains. In multi-domain searches, there is no easy way to perform loop detection. Performing the loop detection can be done (using the PathValue field), but introduces other issues related to scalability.

SIP inherently supports wide area addressing. When multiple servers are involved in setting up a call, SIP uses a loop detection algorithm similar to the one used in BGP, which can be done in a stateless manner, thus avoiding scalability issues. The SIP Registrar and redirect servers were designed to support user location.

Large Number of Calls

H.323 call control can be implemented in a stateless manner.  A gateway can use messages defined in H.225 to assist the gatekeeper in performing load balancing across gateways.

Call control can be implemented in a call stateless manner. SIP supports n to n scaling between UAs and servers. SIP takes less CPU cycles to generate signaling messages; therefore a server could theoretically handle more transactions. SIP has specified a method of load balancing based upon the DNS SRV record translation mechanisms.  

Connection State

Stateful or Stateless.  

Stateful or Stateless.   A SIP call is independent of the existence of a transport-layer connection, but instead signals call termination explicitly.


Yes, H.323 uses Unicode (BMPString within ASN.1) for some textual information (h323-id), but generally has few textual parameters.

Yes, SIP uses Unicode (ISO 10646-1), encoded as UTF-8, for all text strings, affording full character set neutrality for names, messages and parameters. SIP provides for the indication of languages and language preferences.


Defines security mechanisms and negotiation facilities via H.235, can also use SSL for transport-layer security.

SIP supports caller and callee authentication via HTTP mechanisms. Cryptographically secure authentication and encryption is supported hop-by-hop via SSL/TSL, but SIP could use any transport-layer or HTTP-like security mechanism, such as SSH or S-HTTP. Keys for media encryption are conveyed using SDP. SSL supports symmetric and asymmetric authentication. SIP also defines end-to-end authentication and encryption using either PGP or S/MIME. 

Interoperability among Versions

The fully backward compatibility in H.323 enables all implementations based on different H.323 versions to be seamlessly integrated.

In SIP, a newer version may discard some old features that are not expected to be implemented any more. This approach saves code size and reduces protocol complexity, but loses some compatibility between different versions.  

Implementation Interoperability

H.323 provides an implementers’ guide, which clarifies the standard and helps towards interoperability among different implementations.

SIP thus far has not provided an implementation agreement.


Even with H.323's direct call model, the ability to successfully bill for the call is not lost because the endpoint reports to the gatekeeper the beginning and end time of the call via the RAS protocol.

If the SIP proxy wants to collect billing information, it has no choice but to stay in the call signaling path for the entire duration of the call so that it can detect when the call completes. Even then, the statistics are skewed because the call signaling may have been delayed.


H.323 supports any codec, standardized or proprietary, not just ITU-T codecs. There have been codepoints for MPEG and GSM, which are not ITU-T codecs, in H.323 for a long time; many vendors support proprietary codecs through ASN.1 NonStandardParameters, which is equivalent to SIP's "privately-named codec by mutual agreement"; and any codec can be signaled via the GenericCapability feature that was added in H.323v3. Payload types can be specified statically or dynamically.

SIP supports any IANA-registered codec (as a legacy feature) or other codec whose name is mutually agreed upon. Payload types can be specified statically or dynamically.

Call Forking

H.323 gatekeeper can control the call signaling and may fork the call to any number of devices simultaneously.

SIP proxies can control the call signaling and may fork the call to any number of devices simultaneously.

Transport protocol

Reliable or unreliable, e.g., TCP or UDP. Most H.323 entities use a reliable transport for signaling.

Reliable or unreliable, e.g., TCP or UDP. Most SIP entities use an unreliable transport for signaling.

Message Encoding

H.323 encodes messages in a compact binary format that is suitable for narrowband and broadband connections.

SIP messages are encoded in ASCII text format, suitable for humans to read. 


Flexible addressing mechanisms, including URLs and E.164 numbers.

SIP only understands URL-style addresses.

PSTN Interworking

H.323 borrows from traditional PSTN protocols, e.g., Q.931, and is therefore well suited for PSTN integration. However, H.323 does not employ the PSTN's circuit-switched technology--like SIP, H.323 is completely packet-switched. How Media Gateway Controllers fit into the overall H.323 architecture is well-defined within the standard.

SIP has no commonality with the PSTN and such signaling must be "shoe-horned" into SIP. SIP has no architecture that describes the decomposition of the gateway into the Media Gateway Controller and the Media Gateways.

Loop Detection

Yes, routing gatekeepers can detect loops by looking at the CallIdentifier and destinationAddress fields in call-processing messages. If the combination of these matches an existing call, it is a loop.

Yes, the SIP message Via header facilitates this. However, there has been talk about deprecating Via as a means of loop detection due to its complexity. Instead, the Max-Forwards header seems to be the prefered method of limiting hops and therefore loops.

Minimum Ports for VoIP Call

5 (Call signaling, 2 RTP, and 2 RTCP.)

5 (Call signaling, 2 RTP, and 2 RTCP.)

Video and Data Conferencing

H.323 fully supports video and data conferencing. Procedures are in place to provide control for the conference as well as lip synchronization of audio and video streams.

SIP has limited support for video and no support for data conferencing protocols like T.120. SIP has no protocol to control the conference and there is no mechanism within SIP for lip synchronization.


Revised: September 01, 2007, Copyright MATRIX, UAB