Frequently Asked Question

Why SIPTLS and SRTP is important
Last Updated 4 hours ago

VoIP calls are made up of two separate parts:

  • Signalling – the messages that set up, manage and end a call
  • Media – the actual voice audio carried during the call

In many older or poorly configured VoIP systems, both are sent unencrypted. This means anyone with access to the same network path, a compromised switch, a Wi-Fi network, a service provider segment, or captured traffic at a firewall can potentially inspect or reconstruct the call.

Using SIP over TLS for signalling and SRTP for media is the standard way to reduce that exposure.

How unencrypted VoIP traffic is normally sent

A typical insecure SIP call uses:

  • SIP over UDP or TCP, usually on port 5060, for signalling
  • RTP, usually on a dynamic UDP port range, for voice media

In this model:

  • The phone or PBX sends SIP registration messages to the provider
  • The SIP server sends call setup messages such as INVITE, 100 Trying, 180 Ringing and 200 OK
  • The SDP information inside the SIP messages tells each side which IP address, codec and RTP port to use
  • Voice packets are then sent as plain RTP between endpoints or via a media relay

Without encryption, both the signalling and the audio can be exposed.

What is exposed when SIP is not encrypted

Plain SIP reveals a large amount of useful information to anyone who can capture the traffic.

Commonly exposed details include:

  • Extension numbers
  • Internal and external telephone numbers
  • Caller ID
  • Dialled numbers
  • SIP usernames
  • Authentication attempts
  • IP addresses of phones, PBXs and providers
  • Call start and end times
  • Codec details
  • RTP destination ports
  • Network layout and device information
  • User-Agent strings showing phone make, model and firmware

In some cases, if insecure authentication methods or weak configurations are used, an attacker may also gain information useful for:

  • Password guessing
  • Account enumeration
  • SIP registration hijacking
  • Toll fraud
  • Targeted attacks against specific devices or PBX software versions

What is exposed when RTP is not encrypted

Plain RTP exposes the actual voice stream.

If a third party can capture the RTP packets, they may be able to:

  • Reconstruct the audio of the call
  • Listen live or play it back later
  • Analyse speech content
  • Feed the captured audio into transcription tools
  • Extract names, account details, card details, business information and other sensitive content

Because RTP is designed for real-time delivery rather than confidentiality, it does not protect the voice data by itself.

How easy it is to intercept unencrypted VoIP traffic

Intercepting unencrypted VoIP traffic is often far easier than many organisations expect.

It may be possible in any of the following situations:

  • On an untrusted or shared Wi-Fi network
  • On a flat internal LAN without proper segmentation
  • On a compromised endpoint or switch
  • On a poorly secured remote worker connection
  • At a firewall or router where packet captures are taken
  • Anywhere traffic traverses infrastructure not fully under the organisation’s control

Once packet capture is available, standard network analysis tools can identify SIP and RTP very quickly. Because the SDP in SIP usually discloses where the RTP stream will be sent, an observer can often follow the signalling and then locate the voice stream with minimal effort.

In practical terms, plain SIP and RTP are usually considered easy to inspect by anyone with moderate network knowledge and access to the traffic path.

How calls can be reconstructed from captured traffic

A captured unencrypted call can often be reconstructed in a straightforward way:

  1. Capture SIP traffic
  • The observer reads the call setup messages
  • The called number, caller identity and negotiated media details are visible
  1. Read the SDP payload
  • The SDP identifies:
  • IP addresses
  • UDP ports for RTP
  • Codecs in use
  1. Capture the RTP streams
  • The observer filters traffic for the advertised RTP ports
  • The packets are assembled into audio in each direction
  1. Decode the codec
  • Common codecs such as PCMU, PCMA, G.722 or G.729 can be decoded if supported
  1. Replay or save the call
  • The result may be a complete or near-complete recording of the conversation

This is why unencrypted VoIP should not be treated as private merely because it is “internal” or “digital”. In many ways VoIP is easier to intercept and store than plain old two wire analogue voice. 

Why SIP over TLS is important

SIP over TLS protects the signalling layer by encrypting SIP messages in transit.

This means that an observer can no longer easily read:

  • Telephone numbers
  • Call setup messages
  • Registration traffic
  • Authentication exchanges
  • SDP details
  • Device and PBX metadata

Benefits of SIP over TLS include:

  • Confidentiality of signalling

Call setup information is encrypted rather than sent in plain text.

  • Reduced credential exposure

Registration and authentication traffic is better protected against interception.

  • Protection against simple call metadata harvesting

Attackers cannot easily collect dialled numbers, extension details and device information from passive captures.

  • Improved server identity validation

TLS certificates help the client verify that it is talking to the expected server rather than an impostor.

What SIP TLS does not do

SIP over TLS protects the signalling, but it does not automatically encrypt the voice media.

If a system uses:

SIP over TLS + plain RTP

then:

  • The SIP messages are protected
  • The actual voice stream may still be intercepted if the RTP path is visible

For full call protection, media encryption is also required.

Why SRTP is important

SRTP stands for Secure Real-time Transport Protocol. It protects the voice media itself.

With SRTP enabled:

  • RTP audio is encrypted
  • Packet integrity is checked
  • Replay attacks are harder
  • Intercepted media cannot be readily converted into intelligible audio without the session keys

Benefits of SRTP include:

  • Voice confidentiality

The conversation content is protected in transit.

  • Media integrity

Packets cannot be altered as easily without detection.

  • Reduced replay risk

Captured packets are less useful for replay attacks.

  • Protection against straightforward call reconstruction

A packet capture alone is not enough to rebuild the call audio unless the attacker also obtains the encryption material.

Why SIP TLS and SRTP should be used together

For secure VoIP, signalling and media must both be protected.


Important limitation: encryption in transit is not the same as end-to-end encryption

SIP TLS and SRTP generally protect traffic in transit between systems that support it, such as:

  • Phone to PBX
  • PBX to SIP provider
  • Phone to hosted platform

However, the call may still be decrypted and re-encrypted at intermediate systems such as:

  • SBCs
  • Hosted PBX platforms
  • SIP carriers
  • Media relays
  • Recording systems

This means SIP TLS and SRTP provide strong transport security, but not always true end-to-end encryption across the whole call path. 

Some SIP Trunking providers support encryption along the path, others support it at the proxy but drop it after that. The BT PSTN network, and it's digital equivalent IPStream is NOT encrypted and all voice traffic is unprotected. BT Will tell you it's to reduce latency, but the truth is far less palatable. 

So, SRTP will protect your calls from between phones internally, externally, via SIP Clients on Mobiles, and inter-company calls. It *may* protect calls between clients on the same SIP trunking provider, but after that it's not possible to be sure. 

As more and more companies move to SIP, and configure their phone systems to support encryption, more of your voice traffic will be encrypted - but the rule is, the lowest encryption supported by both ends, so if both support it, encryption is solid, if only one supports it, there will be no encryption. 


The Future

There are proposals and protocols to encrypt the actual voice data, but these all suffer weak points along the path of trust. In order to properly scramble voice data so that it can't be decrypted, a 'key' must be shared before hand. That's hard to do without manually sharing keys between phone systems. With SIP over TLS, the encryption keys are shared in the signalling session, but if we can't encrypt that - we can't send keys that way. 

For now, SIP/TLS and SRTP is the best we can do, and it certainly provides a robust protection on site, and between endpoints - and if your SIP trunking provider supports it, a fair part of the path. 

This website relies on temporary cookies to function, but no personal data is ever stored in the cookies.
OK
Powered by GEN UK CLEAN GREEN ENERGY

Loading ...