Good practices for using SIP

Last modified by Simon Morlat on 2024/11/15 17:13

Introduction

A SIP client or server has generally a huge number of parameters, in order to provide the maximum flexibility to adapt to various use-cases. In the context of a classic deployment, with mobiles or desktop apps, to work on public internet, what would be the best practices, the most recommended settings ? This article will expose the Linphone team advice.

Use SIP/TLS

No matter whether security is important for you, SIP/TLS is definitely the best option. Others have painful drawbacks:

  • SIP/UDP and SIP/TCP are unsafe because of SIP-aware NAT routers. Indeed, many wifi routers, home routers, DSL routers, have a built-in NAT logic to handle SIP specifically. While it was originally designed to workaround some usual address translation issues (in Via, Contact headers, in SDP message etc), this logic has proven to be mostly broken. We have seen routers rebooting because they were crashing while handling a SIP message. The only solution is then to encrypt the SIP message, so that such routers don't do anything specific with it, just basic IP packet routing. Linphone and Flexisip have their own built-in nat aware algorithms, they don't need any router assistance at all. SIP/TLS is also the best solution to encrypt SIP in Linphone and Flexisip.
  • SIP/UDP suffers from the network MTU. Though a UDP fragmentation protocol exists for big packets, it is very often not supported or rejected by internet routers. As a result, there is a high risk that any SIP packet above 1440 bytes will never reach its destination. Having an INVITE or 200OK bigger than MTU is extremely frequent. If you enable more codecs, enable ICE, add a video stream, the SDP part of the INVITE grows very significantly.

Forget about keep-alives for mobile apps, use push notifications

SIP keepalive packets (\r\n\r\n), whose goal is primilary to keep the connection between client and server alive across NAT routers, are a bad practice for mobiles. They drain the battery by forcing the phone to wakeup to let the app handle this traffic. Fortunately phone manufacturers and mobile OS have solved the problem by killing apps in background automatically, or at least cut their access to the network. Push notifications must be used to wake-up apps each time they need to receive a message or INVITE.

Use ICE

Unless your SIP server is a PSTN gateway, or a back to back user agent that needs to read the media streams established between clients, ICE coupled with media relay service such as TURN, is the only way to guarantee that RTP packets are delivered in an efficient way between clients. Don't try to use STUN solely, don't expect clients to know at once what their IP address/port used for media will be. This is not possible.

Enable AVPF if you use video

Because video codecs use differential coding, the loss of one packet results not only in the whole video frame being not decodable, but also all future dependent coded frame, until a Key-Frame (which are frames coded independent from others) is received. This is why even a 0.1% loss rate can result in video being frozen for up to 30 seconds if the encoder outputs key-frames at a 30 seconds interval.

Sending key-frame at a high rate is not a good solution: key-frames are much bigger than others, so in order to keep a reasonable bitrate usage, they severely degrade quality.

AVPF (RFC4585) adds some very useful RTCP messages called PLI (Packet Loss Indication), SLI (Slice Loss Indication), RPSI (Reference Picture Indication), that allow fast error recovery of video when packet transmission errors occur. A video receiver, upon detection of a lost packet, can send a RTCP PLI packet to the remote encoder in order to request the sending of new key-frame, that will allow the decoding process to resume shortly.

Their alternative is the legacy SIP INFO packet containing a VFU request (Video Fast Update), which is horribly slow because it is carried over all the SIP infrastructure on TCP or TLS, while RTP/RTCP is often directly exchanged over UDP between clients.

AVPF is enabled by default in Liblinphone, visible in the "a=rtcp-fb" attributes in the SDP messages. To be effectively used, both caller and callee must support it, and if the server is a back to back user agent, it shall also support it as well.

You can't expect to experience reliable video quality without AVPF.

Use DNS names, not IP addresses

It is a best practice to request the SIP server to use its public DNS name rather than its IP address in SIP messages, for example in Record-route headers.

Record-route headers are typically inserted by SIP proxy servers to force an established dialog to use the same hop by hop path between the two clients, which is good for NAT traversal and authentication of requests.

When a Record-route header directly contains an IP address, this creates several problems:

  • the client won't be able to perform the match between the TLS x509 certificate CN or SubjectAltNames and the hostname in the SIP URI. Most of times, x509 certificates do not assert IP addresses, just DNS names.
  • in case of network switch during a call from a pure IPv6 network to a pure IPv4 network or vice-versa, the client won't be able to reconnect the call (with a re-INVITE procedure) because the IP address targetted by the Record-route is no longer routable through the new network.
  • on IPv6 networks featured with a NAT64 router, IPv4 address are not routable directly but the client must get the IPv6 equivalent address by using the DNS server, which require to have a domain name.