Mobile phones are still phones. 5G devices need the ability to fall back to LTE when network voice support isn’t available. Here’s how and why.
Although 5G is marketed as the main driver for enhanced data services with its eMBB, URLLC, and mMTC services, voice and video remain key elements to subscribers. Indeed, GSMA estimates that the number of worldwide voice subscriptions will increase by 1.2 billion in 2025 compared to 2020. Operators must, therefore, offer an increasing amount of voice services.
This article presents the technical details of how voice services can be incorporated in a 5G network. This includes the description of the IP multimedia system (IMS) support in a 5G system, the presentation of the various deployment options, and a technical presentation of interim solutions where a device camps in a 5G network, but where voice services are transferred to legacy technology. The devil, of course, is in the details.
Old vs. new technologies
Telecom networks have evolved from circuit-switched 2G networks, with an initial focus on telephony, to fully packet-switched 4G networks focused on internet data communications. Yet, voice and video services incorporate many technology features. To quote the common proverb “the devil is in the details.” There will not be just one single technical solution in the 5G system offering voice services.
Due to 5G’s extended flexibility and various network deployment scenarios, operators need to adapt their service introduction scheme to the underlying infrastructure scenarios. To put it simply: two major circumstances influence the methodology of introducing voice services into 5G.
First, we need to consider the radio access network (RAN) within the 5G system — whether 5G new radio (NR) is offered in addition to LTE as non-standalone access (NSA, or option 3 deployment) or whether there is a 5G standalone (SA mode, or option 2 deployment) network. To go further into the details, the NSA mode includes network deployment options offering dual-connectivity scenarios where either LTE is the primary radio access technology (EN-DC) or 5G is the primary radio access technology (NE-DC).
The second question is what type of core network — Evolved Packet Core (EPC) or the 5G core (5GC) — is used, and if an operator will offer voice services. In a dual-connectivity scenario, there can be a voice service restriction indicated by the radio access technology (RAT). This description concentrates on voice or speech services, though 5G may certainly offer video or communication services, e.g. Rich Communications Services (RCS). These are managed in a very similar way to the voice services. A marginal difference is the support of emergency services. From a signaling perspective, a network distinguishes between an emergency voice call and a general voice call. Regarding protocol and transport, emergency and voice are handled in a similar way, except for quality of service (QoS) profiles, but a network may indicate the support of both services as separated offerings.
There is a small difference between legacy networks and a 5G network offering voice services, as the latter exchanges connection parameters and service access policies during the registration procedure. The user equipment (UE) will indicate its capabilities to the network. In reverse direction, the network offers subscribed services, i.e. voice or video calls, to the UE. With respect to the the details, the offering of voice services can be described as a per-UE policy. The network offers its services during the registration procedure in the attach accept message and not as general system information indication to all. The main reason is to sustain a high level of flexibility, especially with respect to the types of UEs. For example, there may be a machine type oriented device without the voice capability. The indication that a network supports emergency services is broadcast via system information. Thus, depending on legal aspects, an anonymous emergency call could be supported without a subscriber module known as SIM card.
Voice over NR is voice over IP incorporating the IP multimedia subsystem (IMS) infrastructure previously introduced in LTE. Its advantage comes in the ability to have in place a management and orchestration system that guarantees QoS for each application from an end-to-end perspective, as opposed to VoIP provided via traffic-channel-only approach. The purpose of IMS is the establishment, control, and maintenance of a packet data unit (PDU) session, including all relevant data bearers with corresponding QoS flow for best end-user quality experience.
The network must establish at least two data bearers, one for the content &emdash; the speech packets containing the encoded audio itself — a second bearer for IMS signaling. Like in VoLTE, there is a major difference with voice over IMS in 5G system (5GS) when compared to voice services offered by external applications, e.g. so-called over-the-top (OTT) speech services. This is because OTT speech may operate transparently to the connectivity network and there is no IMS management to ensure QoS. This raises the question: how to connect IMS to the 5G core representing the next generation network?
For certain reasons such as time-to-market acceleration, stepwise network deployments, disaggregation of network entities, and the coexistence with legacy technologies, there is no single 5G deployment scenario. The following section will shed some light on the plethora of 5G deployment options supporting voice services (Figure 1).
The evolutionary paths describe whether in an NSA connection voice will be supported by Evolved Universal Terrestrial Radio Access (E-UTRA) only and if the simultaneous NR data connection can either be sustained or suspended. This option is referred to as the voice over LTE in EN-DC setup. The Evolved Packet System (EPS) fallback describes the scenario where 5GC does not offer voice services. If needed, the voice call will transfer to an EPS connection (VoLTE), including also a RAT change from 5G NR to LTE. The advantage is that the UE camps in 5G NR and the handover to legacy network is executed only when the voice call is connected.
Another fallback mode is the RAT fallback. The assumption in this mode is that the core network supports voice connection, but the current RAT, presumably NR, does not. What that occurs, a voice connection transfers from NR to E-UTRA, representing a RAT change only. Voice over NR (VoNR) indicates a scenario where the NR network does support voice services and the 5GC offers a connection to IMS. The primary deployment focus of VoNR is standalone operation (SA) where 5GC connects to IMS supporting voice services. VoNR also works in non-standalone (NSA) operation modes like E-UTRA and NR dual connectivity (EN-DC).
5G supports multimedia telephone services for IMS (MTSI), representing the application layer. The media flow consists of audio, video and “text“ (here corresponding to general data as images, text, websites, etc.) leveraging modern collaboration and communication tools. To cherish the QoS support, the real-time protocol (RTP), real-time streaming protocol (RTSP) and the real-time control protocol (RTCP) coordinate the media transport and tackle impairments such as delayed, disordered, or misguided packets. The transport and network layers are realized by the well-known protocols TCP, UDP and IP (IPv4 and IPv6). The RAT functions are provided by either E-UTRA or 5G NR. The session initiation protocol (SIP) and the session description protocol (SDP) undertake the control plane of the voice connection. Figure 2 contains the network protocols DHCP and DNS as they offer supplementary services, e.g. home operator services.
IMS supporting voice services in 5G
Support for IMS services, including network interfaces, protocol layers, and signaling scenarios, is prerequisite for voice services offered in 5G. To leverage QoS aspects, a so-called QoS flow is established between the UE and the network, accompanied by parameters such as latency, priority, packet error rate, and guaranteed bit rate. To reduce signaling overhead, 5G assigns a 5G QoS flow identity (5QI) to each QoS flow. All protocol layers and network functions are aware of this 5QI. There is a recommendation to apply those 5QI profiles: 5QI = 1 for conversational voice, 5QI = 2 for conversational video requiring certain QoS values, 5QI = 5 for IMS signaling and optionally 5QI = 6 to 9 for concurrent media flows with lower QoS requirements.
Because the network considers voice as an application in a 5G system, there are no mandatory configurations of protocol layers. They can, however, be seen more as recommendations. Voice focuses more on latency than on reliability. Aspects such as efficient usage of the radio resources and energy consumption play a pivotal role in a voice connection. Semi-persistent scheduling mechanisms allow a quasi-constant scheduling of guaranteed bit rate radio resources with low signaling overhead. Additionally, the slot aggregation mechanism lets the automatic repetition of a speech packet increase reliability, with focus on the reduction of latency. Energy reduction is tackled by discontinuous reception and transmission (DRX and DTX). The focus on latency before reliability is clear: set the Radio link control (RLC) layer into unacknowledged mode and to skip the integrity check at the Packet Data Convergence Protocol (PDCP) layer for security reasons, with only ciphering enabled.
Following the trend of high-quality audio transfer, 3GPP developed the enhanced voice services (EVS) speech codec that is now mandatory with 5G voice. The EVS continues the tradition of link-adaptive multi-rate speech codecs (AMR). Leveraging the demand of enhanced audio quality and allowing the transfer of audio signals beyond speech such as music, the EVS uses the higher data rates offered by 5GS for the transfer of enhanced encoded audio signals. Technically, EVS increases the audio bandwidth and covers the audible frequency range from 20 Hz to 20 kHz, corresponding to the typical range of the human ear. To convert the analog audio signal into a digital signal, the EVS applies known methods like amplitude quantization and discrete sampling. As an enhancement, compared to older generation speech codecs, the EVS provides a finer quantization level and a higher sample rate. One important aspect of the EVS is its interoperability codec mode that would allow to adjust the EVS speech codec also to legacy voice codec rates, enabling a smooth introduction of VoNR.
Regarding the infrastructure architecture, the introduction of voice services requires some adaptation, and the flexible architecture provides new optional interfaces and functions. Firstly, the operator needs to decide which core network is incorporated and if it should support voice services. To put it simply, this leads to the decision of offering either EPS fallback or VoNR. Secondly, the core network EPS or 5GC needs to be connected to IMS via several interfaces to exchange user and signaling data. Through those interfaces, the various network entities communicate with each other. As there is no default 5G system, several entities and several interfaces can be deployed optionally, but their existence or absence may have an impact on the UE behavior.
- The N6 interface provides the data transfer between 5GC and IMS. In the 5G system, the N6 interface is already used to exchange data between 5GC and an external data network. Thus, due to the introduction of voice services, the N6 interface needs to be extended and provides a connection to another data network, which is therefore the IMS.
- If both core networks are applied, EPS and 5GC, the N26 interface may share some signaling information between the EPS mobility management entity (MME) and the 5GC access and mobility function (AMF). If this interface is signaled as present, the UE uses a single registration procedure as the two core network entities coordinate mobility and registration.
- The S5 interface allows the exchange and coordination of user data between the session management (SMF) and the user plane function (UPF) with the serving gateway (SGW).
- A common home subscriber service center (HSS) allows the coordination of subscription profiles and access policies.
Voice over New Radio (VoNR)
VoNR describes the routing and connection control of EVS encoded speech packets over IP protocol using the 5G NR radio interface and the 5GC core network. Figure 3 depicts the protocol architecture of VoNR. The protocol architecture incorporates the IMS as described previously. A major objective is the provisioning of voice services in a standalone operation of 5G, but not restricted to 5G SA only. One advantage with VoNR is the ability to use the sophisticated QoS support offered by the 5G protocol layers for the applications “voice and video”. A small drawback may be that 5G may not present the same coverage as LTE from day one operation. Consequently, a meticulous planning and deployment with overlapping coverage areas is recommended to avoid dropped calls.
EPS and RAT fallback
You may consider EPS fallback or RAT fallback as interim deployment scenarios to provide voice services in an early time-to-market approach. They do not require the full incorporation of 5G core network. During connection setup, the call will forward to an incumbent LTE network. This is done through the signaling procedure of either a handover command or of a channel release command containing a redirection indication. The decision of such a transfer into legacy networks can either be taken by the network during call setup or it can be indirectly requested by the UE. The latter would be the case when a UE signals its support of voice services during the registration procedure, but confines this support to LTE only. Consequently, the UE in idle mode camps on the higher-prioritized 5G network and would only move to LTE in case of an ongoing speech call (Figure 4).
EPS fallback represents a change of two connections. With respect to the connection to the core network, voice packets and protocols switch from the 5GC to the EPS. With respect to the radio interface connection, a handover from 5G NR to LTE takes place. RAT-fallback maintains the connection to the 5GC but changes the 5G NR RAT to LTE. A third possible implementation of voice services incorporates an enhancement of the existing LTE base station architecture. The legacy eNB will be extended to a next generation NodeB (ng-eNB). The ng-eNB uses the 5G protocol layer PDCP instead of the LTE protocol layer PDCP, but the underlaying radio protocols are still LTE-based. Compared to a VoLTE connection, the advantage with this approach is that there will be an end-to-end voice connection with sustaining of the QoS profile, otherwise a mapping from LTE QoS to 5G QoS within the network is required.
Testing voice services
Testing voice services in 5G typically starts with a basic verification of proper implementation and functional behavior. To simplify, the question is whether a call can be established and is the voice signal audible. These are the first test and measurement questions to be answered, followed by an enhanced analysis that determines the quality of the audio under well-known and reproducible conditions. Besides device-oriented voice testing, mobile network testing and benchmarking of deployed services in a live network guarantee the experienced user quality.
A setup for voice quality testing includes mobile radio testing capability supporting signaling and functional testing, as well as enhanced protocol procedures. These include interoperability, multi-connectivity and mobility scenarios (Figure 5). To investigate the proper audio quality, a test setup may also use audio quality test equipment, with either digital or analog interfaces to the mobile radio tester. To enable stress tests, a test setup may allow the activation of fading on the radio interface and may emulate network impairments like IP-packet disordering, delay or discarded packets.
In addition to the RAT technologies 5G and LTE, such a setup may also support legacy RAT such as 2G or 3G and non-cellular technologies like Bluetooth or Wi-Fi, as these technologies also offer voice services. A technological aspect not discussed here is voice over non-3GPP technologies.
An obvious requirement of a voice over 5G test setup is the capability to emulate the IMS network and its signaling protocols: SIP, SDP and data provisioning. The audio quality is typically indicated as mean opinion score (MOS) value, derived from algorithms such as the perceptual quality for voice (PoLQA) algorithm published by the ITU. The advantages of a lab-based test setup are that the conditions are reproducible, the test repeatable and performed under predefined conditions.
To monitor the quality of certain applications like voice or video and to fulfil the KPI requirements, field or drive testing is necessary. Here, a passive device like a scanner is extended by a device that can actively set up a connection, and analysis on the application quality can be performed. In addition, network operators may like to compare their network quality in a benchmarking process against other networks or monitor the entire network via multiple samples and a statistical analysis to obtain a summarized view.
For a more detailed explanation of VoNR, see White paper: 5G Voice over New Radio (VoNR).
Reiner Stuhlfauth holds a graduate engineer’s degree in telecommunication from the University of Kaiserslautern. He started his career with a position as Network Planning Engineer at a German network operator. In 1999 he joined Rohde & Schwarz as trainer for wireless communication standards and took the position of Technology Manager in 2015. Reiner is one of five co-authors of a book on 5G technology, published by Rohde & Schwarz.