Frequently Asked Question

All Categories » Open Source Software / Asterisk / FreePBX

Why SIPTLS and SRTP is important

Last Updated about a month ago

VoIP calls are made up of two separate parts:

Signalling – the messages that set up, manage and end a call
Media – the actual voice audio carried during the call

In many older or poorly configured VoIP systems, both are sent unencrypted. This means anyone with access to the same network path, a compromised switch, a Wi-Fi network, a service provider segment, or captured traffic at a firewall can potentially inspect or reconstruct the call.

Using SIP over TLS for signalling and SRTP for media is the standard way to reduce that exposure.

How unencrypted VoIP traffic is normally sent

A typical insecure SIP call uses:

SIP over UDP or TCP, usually on port 5060, for signalling
RTP, usually on a dynamic UDP port range, for voice media

In this model:

The phone or PBX sends SIP registration messages to the provider
The SIP server sends call setup messages such as INVITE, 100 Trying, 180 Ringing and 200 OK
The SDP information inside the SIP messages tells each side which IP address, codec and RTP port to use
Voice packets are then sent as plain RTP between endpoints or via a media relay

Without encryption, both the signalling and the audio can be exposed.

What is exposed when SIP is not encrypted

Plain SIP reveals a large amount of useful information to anyone who can capture the traffic.

Commonly exposed details include:

Extension numbers
Internal and external telephone numbers
Caller ID
Dialled numbers
SIP usernames
Authentication attempts
IP addresses of phones, PBXs and providers
Call start and end times
Codec details
RTP destination ports
Network layout and device information
User-Agent strings showing phone make, model and firmware

In some cases, if insecure authentication methods or weak configurations are used, an attacker may also gain information useful for:

Password guessing
Account enumeration
SIP registration hijacking
Toll fraud
Targeted attacks against specific devices or PBX software versions

What is exposed when RTP is not encrypted

Plain RTP exposes the actual voice stream.

If a third party can capture the RTP packets, they may be able to:

Reconstruct the audio of the call
Listen live or play it back later
Analyse speech content
Feed the captured audio into transcription tools
Extract names, account details, card details, business information and other sensitive content

Because RTP is designed for real-time delivery rather than confidentiality, it does not protect the voice data by itself.

How easy it is to intercept unencrypted VoIP traffic

Intercepting unencrypted VoIP traffic is often far easier than many organisations expect.

It may be possible in any of the following situations:

On an untrusted or shared Wi-Fi network
On a flat internal LAN without proper segmentation
On a compromised endpoint or switch
On a poorly secured remote worker connection
At a firewall or router where packet captures are taken
Anywhere traffic traverses infrastructure not fully under the organisation’s control

Once packet capture is available, standard network analysis tools can identify SIP and RTP very quickly. Because the SDP in SIP usually discloses where the RTP stream will be sent, an observer can often follow the signalling and then locate the voice stream with minimal effort.

In practical terms, plain SIP and RTP are usually considered easy to inspect by anyone with moderate network knowledge and access to the traffic path.

How calls can be reconstructed from captured traffic

A captured unencrypted call can often be reconstructed in a straightforward way:

Capture SIP traffic

The observer reads the call setup messages
The called number, caller identity and negotiated media details are visible

Read the SDP payload

The SDP identifies:
IP addresses
UDP ports for RTP
Codecs in use

Capture the RTP streams

The observer filters traffic for the advertised RTP ports
The packets are assembled into audio in each direction

Decode the codec

Common codecs such as PCMU, PCMA, G.722 or G.729 can be decoded if supported

Replay or save the call

The result may be a complete or near-complete recording of the conversation

This is why unencrypted VoIP should not be treated as private merely because it is “internal” or “digital”. In many ways VoIP is easier to intercept and store than plain old two wire analogue voice.

Why SIP over TLS is important

SIP over TLS protects the signalling layer by encrypting SIP messages in transit.

This means that an observer can no longer easily read:

Telephone numbers
Call setup messages
Registration traffic
Authentication exchanges
SDP details
Device and PBX metadata

Benefits of SIP over TLS include:

Confidentiality of signalling

Call setup information is encrypted rather than sent in plain text.

Reduced credential exposure

Registration and authentication traffic is better protected against interception.

Protection against simple call metadata harvesting

Attackers cannot easily collect dialled numbers, extension details and device information from passive captures.

Improved server identity validation

TLS certificates help the client verify that it is talking to the expected server rather than an impostor.

What SIP TLS does not do

SIP over TLS protects the signalling, but it does not automatically encrypt the voice media.

If a system uses:

SIP over TLS + plain RTP

then:

The SIP messages are protected
The actual voice stream may still be intercepted if the RTP path is visible

For full call protection, media encryption is also required.

Why SRTP is important

SRTP stands for Secure Real-time Transport Protocol. It protects the voice media itself.

With SRTP enabled:

RTP audio is encrypted
Packet integrity is checked
Replay attacks are harder
Intercepted media cannot be readily converted into intelligible audio without the session keys

Benefits of SRTP include:

Voice confidentiality

The conversation content is protected in transit.

Media integrity

Packets cannot be altered as easily without detection.

Reduced replay risk

Captured packets are less useful for replay attacks.

Protection against straightforward call reconstruction

A packet capture alone is not enough to rebuild the call audio unless the attacker also obtains the encryption material.

Why SIP TLS and SRTP should be used together

For secure VoIP, signalling and media must both be protected.

Important limitation: encryption in transit is not the same as end-to-end encryption

SIP TLS and SRTP generally protect traffic in transit between systems that support it, such as:

Phone to PBX
PBX to SIP provider
Phone to hosted platform

However, the call may still be decrypted and re-encrypted at intermediate systems such as:

SBCs
Hosted PBX platforms
SIP carriers
Media relays
Recording systems

This means SIP TLS and SRTP provide strong transport security, but not always true end-to-end encryption across the whole call path.

Some SIP Trunking providers support encryption along the path, others support it at the proxy but drop it after that. The BT PSTN network, and it's digital equivalent IPStream is NOT encrypted and all voice traffic is unprotected. BT Will tell you it's to reduce latency, but the truth is far less palatable.

So, SRTP will protect your calls from between phones internally, externally, via SIP Clients on Mobiles, and inter-company calls. It *may* protect calls between clients on the same SIP trunking provider, but after that it's not possible to be sure.

As more and more companies move to SIP, and configure their phone systems to support encryption, more of your voice traffic will be encrypted - but the rule is, the lowest encryption supported by both ends, so if both support it, encryption is solid, if only one supports it, there will be no encryption.

The Future

There are proposals and protocols to encrypt the actual voice data, but these all suffer weak points along the path of trust. In order to properly scramble voice data so that it can't be decrypted, a 'key' must be shared before hand. That's hard to do without manually sharing keys between phone systems. With SIP over TLS, the encryption keys are shared in the signalling session, but if we can't encrypt that - we can't send keys that way.

For now, SIP/TLS and SRTP is the best we can do, and it certainly provides a robust protection on site, and between endpoints - and if your SIP trunking provider supports it, a fair part of the path.