The Utah VHF Society - Observations about the codec used for D-Star

The Utah VHF Society

Observations about the audio codec used for D-Star

Purpose of this page:

A few comments about this page:

With the continued interest in digital voice communications in amateur radio, we decided to run a few tests using D-Star radios to ascertain the behavior of the codec when subject to sounds other than those of the human voice.

We felt it important to be able to understand how the voice of a communicator - either via an analog or digital transmission - might be affected in conditions that were less-than-ideal: Particularly in light of recent re-emphasis on the facility of using amateur radio in emergency communications systems, we wanted to provide some characterization of how they might behave in "less-than-ideal" situations - such as those where the speaker's voice may be in severe competition with other sounds.

It should be noted that the intent of this analysis was to provide a reference for those who might consider the implementation of a digital radio system: It should come as no surprise that the digital voice system used in D-Star is somewhat more "fragile" than old-fashioned analog system. This not-unexpected result is a logical consequence of the "lossy" coding typical of low-rate speech-only codecs - the one used in D-Star being one of the better-performing codecs in this class.

As can be seen from this page, the codec was subject to sounds that it was NOT intended to be able to handle (such as music) in order to observe how it would break down and provide some insight to how it responded when presented with other, more-realistic situations.

In some ways, the results were rather surprising: Some listeners were, frankly, appalled at the results, but the opinion of many was along the lines of "That's better than I thought it would be..."

To be sure, being able to make sense out of a degraded transmission - either digital or analog - is a skill acquired through experience, practice, and training. It was noted that so-called "skilled" operators (e.g. those that regularly work pileups on HF and are rather used to picking fragments of voice out of chaos) were generally able to make out the gist of what was being said on both the analog and digital transmissions, but that the analog transmissions were noticeably more "copyable."

It was also noted that the un-skilled listener (a random person, unaccustomed to having to dig "speech fragments" out of such chaos or even a casual user of FM voice) had noticeably more difficulty deciphering the degraded digital speech than the analog.

Again, such a result was not surprising, once again showing that experience and practice are of paramount importance in any critical traffic-handling situation.

Brief mention has also been made on this page with reference to the degradation of signals due to the "digital cliff" - that is, the rather un-graceful drop-off in perceived quality that occurs when digital signals degrade below a certain point. Again, recognizing and knowing how to deal with these sorts of situations is another important facet of training and experience.

Finally, links have been provided to observations made by various public-service entities related to digital (some of which use somewhat more sophisticated codecs at higher bit rates which are arguably more-resistant to the demonstrated degradations) and analog (both trunked and un-trunked) systems and how they perform in a number of environments. While these observations may not always be directly applicable to many amateur-radio situations (e.g. trunking versus non-trunked systems, and the ability of the experienced amateur to arbitrarily choose a frequency, mode and signal path as necessary) they are well-worth a read by any would-be system designer and emergency planner!

While I'm sure that at least some of this has already been hashed and re-hashed, revisiting this and related topics is likely to be worthwhile.

First off, let's make something absolutely clear to the reader:

This page is not intended to "bash" D-Star or its codec, but rather to educate the user about peculiarities intrinsic to the codec used by D-Star.

Some of the observations made on this page may not apply solely to D-Star, but to other low-rate digital voice codec/systems as well.

The codec used in D-Star is, by design a lossy codec: That is, data reduction is accomplished by preserving only those fundamental characteristics of the human voice that are required to adequately reproduce it. As one might expect, with lower bit rates, this representation becomes less-precise and, inevitably, deviations from the original source become increasingly obvious.

As digital voice systems become increasingly commonplace it must not escape the attention of the users of these systems their limitations - particularly in comparison with traditional analog systems. These limitations become apparent when these codecs are attempting to encode and replicate increasingly-complex sounds - specifically, when these other sounds are in competition with the human speaker: As the original voice sounds become "diluted" with extraneous noise, these codecs can fail, unable to make sense out of what is being inputted to them.

Particularly with low-rate codecs - such as the one used with D-Star - this becomes increasingly problematic as the increasingly-complex sounds can no longer be accurately represented in the limited bandwidth available. The result of this breakdown in encoding is that the intelligibility of the speech is further degraded - possibly to the point of unintelligibility - while the same speech, conveyed on an analog system, may still be understandable to the experienced listener.

The purpose of this page is to demonstrate the various ways in which both D-Star and analog signals are affected when the speaker is in competition with other sounds. By being aware of the nature of these complex interactions one may, through experience and training, be able to avoid as much as possible or be able to deal with those situations in which intelligibility may be - or is being - compromised by other sounds: As any experienced communicator knows, such problems can seriously impede effective traffic-handling which, under the worst of conditions, can result in loss of life and property.

Basic principles of D-Star's codec:

It is fortunate that the human speech is more-or-less comprised of two different types of sounds:

Voiced sounds, such as vowels. These sounds are represented by a fundamental frequency (produced by our vocal cords) as well as the myriad of harmonics (produced by interactions of our mouth, tongue and nasal cavities, for example.) In addition to vowels, voiced consonants such as "M" are included in this group.
Unvoiced sounds (fricatives) such as consonants. Many consonants consists of clicks or noises, such as the "K" or "S" sounds - and are, in essence, just bursts of spectrally-shaped noise. As with the voiced sounds, their timbre is altered by various resonant structures in our head such as the mouth, tongue and nasal cavities.

Note that some sounds (such as those represented by "D" or "B") contain both voiced and un-voiced components.

For speech-only use it is generally enough to optimize the codec to work based on the assumption that it will encounter only the above conditions. In doing so, one not only reduces the complexity of the codec (which can also reduce the cost of the hardware required to implement it) but also make it more likely that such a codec - with an innate, limited repertoire of capabilities - can operate at very low bit rates. It is the implementation of this rather simplified acoustic model that causes these classes of codecs to be very poor at reproducing spectrally-complex (with non-harmonically related content) sounds such as music and other "non-voice" sounds.

Compander:

It should be mentioned that the D-Star codec includes another feature: A "compander." This device maintains a more-or-less constant audio level, bringing up the microphone gain if the speaker's voice is quiet, and reducing it if it is too loud. This is, in most cases, a desirable feature as it can greatly improve intelligibility - particularly if the speaker is a soft-spoken one. The caveat is that this same compander can also increase the level of background noise and contribute to "codec confusion," as we shall see.

Comment about the analog audio on the IC-91AD:

On the IC-91AD and, to my knowledge all other Icom D-Star capable radios, there is no "compander" in the audio chain used for both "normal" (+- 5 kHz) and "narrow" (+-2.5 kHz) deviation - only the usual limiter/clipper arrangement found in typical FM transmitters. It is for this reason that one must be careful in making A/B comparisons between D-Star and analog: If doing so, one must take into account the fact that the compander may "fix" - when using D-Star - an audio level that is inappropriately low for the FM analog modulator! It has been noted that, in some "A/B" comparisons between D-Star and FM found on the web, the orchestrators have been remiss in their attention to this particular (important) detail!

"Breaking" the codec - and subsequent analysis:

For further testing, audio files were constructed for the purpose of testing the effects of non-speech audio (as well as speech plus "other" sounds). To do this, the following setup was used to transmit and receive the test files:

One IC-91AD Handie-Talkie, used as a transmitter, was connected to a dummy load and set to low power.
For the transmitting radio, the test audio was input through the external microphone connector from an audio player on which uncompressed .WAV files were played. The audio level was set so that the peak level was just +- 5kHz when in analog ("FM") mode - that is, barely hitting the radio's clipper.
A second IC-91AD was placed next to the unit that was transmitting, tuned to the same frequency with a rubber-duck antenna connected. The audio output of this radio was connected to a digital audio recorder, with the resulting audio being recorded in an uncompressed .WAV format. In this configuration, signal quality was verified and found to be excellent .
The audio content was played (and recorded) twice:

First, using FM (+-5 kHz) mode
Again using D-Star audio (DV) mode.

The two resulting audio files (one using FM and the other using D-Star) were merged and synchronized (to minimize D-Star's intrinsic processing delay) into a single stereo audio file with the channels as follows:

LEFT Channel: D-Star audio

RIGHT Channel: FM audio

On this web page, the resulting files were subsequently edited and encoded as .MP3 at a bit rate high enough to avoid further audible contribution of compression artifacts.
For convenience, files containing only audio from the D-Star and FM tests are also provided and these are about half the size of the corresponding stereo file.
There were no bit errors during reception of the D-Star transmissions: What is heard are simply artifacts of the codec itself!

Important Notes relating to playback of the audio files:

When playing back the stereo file, remember that the left channel contains the D-Star audio and the right channel contains the FM audio.
In order to properly separate the two channels, it is strongly recommend that you either:

Wear headphones, listening to one ear at-a-time
Control the "balance" using the computer's mixer as to hear just one channel at a time from the speakers/headphones.

Individual files are also available that contain only D-Star or FM audio.

Tests with music:

Please note: The authors are fully aware that the codec used in D-Star is not intended to be able to faithfully reproduce sounds other than speech.

Let's first try an acid test: Music. Unlike human speech, music need not have much harmonically-related content at all, with different notes of different amplitudes and timbres occurring all at once - not to mention the inclusion of both notes and noise (drums, cymbals, etc.) at the same time! To do this, a simple tune was pounded out on a synthesizer using a number of different instrument/note combinations. Another file contains a more complicated clip with multiple instruments and voices.

"Music file 1" was designed to have several distinct sections to demonstrate various properties of the codec:

(0:00-0:21) - Piano, single note melody with no reverb and minimal sustain.
(0:21-0:45) - Piano, single note melody with some reverb and sustain.
(0:45-1:11) - Piano, melody with (mostly) non-harmonically related chords.
(1:11-1:50) - Piano, single-note melody with bass and cymbal accompaniment.
(1:50-2:28) - Melody played by an "orchestra".
(2:28-3:30) - Melody carried by a "choir".

Music file 1 - (stereo, D-Star=left, FM=right, 3:30, MP3, 3.12 MB)

Music file 2 - (stereo, D-Star=left, FM=right, 0:30, MP3, 428 kB)

(FM-only file)
(D-Star only file)
"Music file 2" contains an excerpt from the song "Benson, Arizona" from the movie "Dark Star" by John Carpenter. This clip is used in accordance with "Fair Use" provisions of U.S. copyright laws.

In this section, the melody is quite recognizable. It is interesting to note, however, that the attack and, to a lesser-extent, the decay of the piano note is considerably altered. This alteration could be explained somewhat by the amplitude compressor intrinsic to D-Star, but much of the change in the dynamics of the "attack" are due to the rather coarse "frame rate" of the audio codec which - coupled to the very "lossy" nature of the compression, cannot respond to quickly-changing properties of the piano.
When the sustain is added, it becomes apparent that the built-in companding of the codec is considerably altering the attack and decay dynamics of the note. As notes are transitioned, it sounds less like a piano as the pitches of the notes tend to "slide" into each other during the transition, resembling more a sort of "sliding" wind instrument rather than a percussion instrument. Another interesting change from the original is that the dynamics (that is, the difference of loudness between notes) is pretty much lost.
When a lower, non-harmonically-related note is added to the tune, the codec becomes extremely confused, seeming to "lock" onto the dominant note. Of course, with a single voice, this situation is not likely to arise as it is reasonable to expect only a single, strong fundamental to be present in the human voice. It is also interesting to note that, during the "overlap" of two notes, bursts of noise (not necessarily related to the actual notes) often appear.
For this section a bass guitar and cymbal were added. At times, the codec "locks" onto the bass note instead of the piano melody - sometimes switching mid-note as the amplitude of one diminishes and the other becomes dominant and "captures" the codec. It is interesting to note, however, that the resulting note pitch through the codec is often just plain wrong! The most obvious example of this is in the first few seconds of the file where the bass note is solo and comparing it to the FM version, you can see that some of the notes aren't even close! (It's also interesting to hear what happens to the cymbal during the piece...) This "note inaccuracy" is intrinsic to the codec's finite spectral resolution: Again, considering the nature of the human voice, such alterations don't impair intelligibility, but do contribute to the somewhat "robotic" sound apparent in D-Star encoded speech - particularly with adult male speech.
The "orchestra" represents a fairly complex sound with many non-harmonically-related components and, not surprisingly, the codec interprets these mostly as noise. When the melody comes along - with strong "horn" components - it takes over (or "captures") the codec and is (mostly) recognizable. The martial bass accompaniment, clearly audible in the FM version, sounds more like burst of noise in this version. Again, this demonstrates the aspect of the codec in which it attempts to discern which audio components are likely to be those most important in the conveyance of information: Unlike with speech, there aren't readily apparent harmonic relationships between lower-frequency components and its harmonics, so it is not surprising that the codec would "assume" that such a complex sound was more likely to be an unvoiced speech component.
The "choir" of synthesized voices also has some significant non-harmonically-related components as well, but many (varying) harmonically-related ones. It is interesting to note the vacillation of the codec when trying to reproduce a single note, with it alternating between a recognizable representation of the original sound and an odd "buzzing" sound that is, itself, often related to the original note.

"Music 2"

This clip is even more complex, including both strong single-note components in addition to non-harmonically-related notes and a multi-voiced chorus. A careful listening reveals many of the same characteristics that reveal themselves in the "Music 1" clip. Interestingly, much of the voice is still understandable despite obvious degradation, but we'll discuss why this is later on.

What does all of this show us? It would seem to demonstrate many the properties of the codec that make it especially suitable for the low-bitrate representation of the human voice, but it also demonstrates that, if significant audio energy is present besides that of the human voice, it is likely to break down it some odd (but understandable) ways as the codec - which seems to be able to generate only one sound at-a-time - seems to be "captured" by the dominant (usually louder) sound(s).

Response of the codec to single-frequency tones:

Another interesting phenomenon - one that may be lost to those without "musical" ears - is that, with the D-Star music clips, many of the notes are rendered off-pitch. Without knowing the precise internals of the codec, one would have to guess that this is a result of the limited "spectral" resolution of the codec itself, having only finite resolution when it came to reproducing frequencies in a precise manner. To be sure, accurate preservation of absolute frequencies is not particularly important in casual, human speech and the deficiencies of the codec in this regard become readily apparent only when music is played.

This leads to some interesting questions: How does this codec react when presented with audio containing only a single frequency? To do this, a number of files were created and run through the codec:

Discrete tones - (stereo, D-Star=left, FM=right, 0:26, MP3, 414 KB)

(FM-only file)
(D-Star only file)
This file contains the following 5-second segments with these tones:

150 Hz
440 Hz
1000 Hz
837.5 Hz
1234.375 Hz
328.125 Hz

Slow sweep - (stereo, D-Star=left, FM=right, 2:01, MP3, 1.85 MB)

(FM-only file)
(D-Star only file)
This file consists of a slow sweep from 300 Hz to 2.5 kHz over a period of 120 seconds.

Analysis:

The "FM" (right channel) audio is provided mostly as a basis of comparison.

The "D-Star" (left channel) audio does show some interesting properties:

In the "Discrete Tones" file - especially if one is listening to the "stereo" version in which both tone are heard simultaneously - especially on speakers rather than headphones - one can tell that there is, in fact, a difference in tone frequencies between the two channels, the difference varying, probably dependent upon how far away the tone is from the discrete step in the codec's resolution. The 150 Hz tone, which is below the intended frequency range of both the codec and the audio circuitry of the radio, is barely detectable - if at all.

In order to do additional testing a sound file was created with a slow, phase-continuous frequency change from 300 to 2500 Hz over a period of 120 seconds. In this case, one can clearly hear the discrete frequency steps as the codec reproduces the tones. In certain places, one can hear a "glitch" as the codec's "indecision" seems to, briefly, cause the tone to degrade from a sine wave to more of a burst of noise: Again, these are not the result of bit errors or extraneous noise, but rather peculiarities of the codec.

How about if we change the tone at a faster rate?

Fast sweep - (stereo, D-Star=left, FM=right, 1:01, MP3, 953 kB)

(FM-only file)
(D-Star only file)
This file consists of a slow sweep from 300 Hz to 2.5 kHz over a period of 1 second.

Clearly, something different is happening here: The codec is not responding to this as a series of single-frequency tone and the result is that we hear an odd-sounding ascending noise sequence that somewhat resembles the original sweep. The breakdown of the codec isn't too surprising, as the "dwell time" of the tone on any single frequency is too short for the codec to consider it to be a single tone.

This leads one to ask how long in duration the tone needs to be in order for it to be rendered in a way that differs relatively little from the original? At the time of writing, we haven't run any tests to determine this, but if one looks carefully at the transitions between the tones in the above clips, one can tell that it is short - but definitely finite!

Comment: Considering that D-Star's codec has a 50Hz audio "frame rate" it is likely that the dwell time of a tone would have to be discrete multiples of that period to be successfully detected and rendered. This would also imply that the codec's ability to change the sound that it is producing would also be discrete multiples of that same period - that is, no sound could change at faster than 50 times per second - probably slower! (With the code's base coding rate of 2400 bits/sec - not including forward error correction - that would imply that only there are 48 bits per "frame" of sound, some of which are likely to be overhead rather than actual representations of the sounds being encoded.

Testing with DTMF tones:

At this point, one may wonder about DTMF tones and other types of signaling that might be passed via the D-Star codec?

According to available data sheets, the codec contains some built-in utilities that allow the detection and regeneration of certain types of tone-signaling, such as DTMF, which permits their precise transmission and subsequent reproduction. Provision is made for the detection and regeneration of single-frequency tones from 156.25 to 3812.5 Hz with a resolution of 31.25 Hz. The latter would certainly explain some of the behavior that has been observed. Although not known for certain, it seems likely that the "bin size" (as in an FFT, for example) in various frequency-determining portions of the codec's algorithm have fixed-sized frequency steps and if this is the case, a 256-point FFT is implied, as the data sheet indicates that the codec's sampling rate is 8 kHz.

If this were the case for all frequencies, this would imply that the codec would be unable to pass DTMF signaling, as a resolution of just 31.25 Hz is not enough to maintain industry specifications of frequency accuracy. Also, as seen from our music tests, the codec is unable to produce two simultaneous tones! Again, reference to the codec's data sheet tells us something else: There is a built in decoder for DTMF as well as some other common "call-progress" tones (such as ringing, busy, dial tone, etc.) that might be encountered on a telecommunication's channel.

In order to test the behavior when presented with such sounds, a file was created with two parts: The first half consists of all 16 standard DTMF tones (0-9, *, #, and A-D) at various cadences followed by the same 16 DTMF tones - except with their frequencies shifted down by about 5%. (For this test, we did not simulate the various "call progress" tones used in telephony.) The original idea was to observe the DTMF tones as they passed through the system, and then again when "off-frequency" tones were presented to the codec - but the results of this test were rather unexpected:

DTMF tones - (stereo, D-Star=left, FM=right, 0:14, MP3, 229 kB)

(FM-only file)
(D-Star only file)
The first part of this file (0-7 seconds) contain all 16 standard DTMF digits while the second half (7-14 seconds) are the same DTMF digits, but with the frequency lowered by about 5%.

As is readily apparent, the codec crashed and burned when presented with the "on-frequency" DTMF tones, but it seemed to reproduce the "off frequency" tones to some degree - although not very well! The reason for this anomaly is unknown at this point and further tests will have to be conducted to see if this is a problem related to this specific radio, but it is likely a result of serendipity and relationship of the "5% lowered" tones.

Ignoring the frequency anomaly for the moment, let's analyze the response of the codec to the DTMF tones: Because of the inbuilt delay of the D-Star codec, it is not as important that one decodes the tones real time, but rather it can afford to sit around for a short time and decide if the DTMF tone is, in fact, valid. In comparison with conventional DTMF decoders - such as those found on auto patches - this is an advantage, as it allows one to do even a better job when it comes to preventing "falsing" - that is, the erroneous detection of non-DTMF audio as a DTMF tone.

It is also important to realize that the D-Star codec does not transmit DTMF (or single-frequency) tones in the same way that it transmits voice! These tones appear to be special cases for the codec in that instead of transmitting an approximate representation of the sound that it is hearing (as would be the case for a voice component, in which the timbre and frequency are symbolically represented) but that the codec transmits a special set of codes that indicate to the receiver that this is, in fact, a specific type of tone. This is easy to verify by carefully observing the data waveform of the D-Star baseband signal when precise, single-frequency tones or DTMF signals are transmitted, and such is strongly implied in the data sheet for the codec as well!

Comment: This has the implication that it should be fairly easy to generate "canned" waveforms for the purpose of generating single-frequency and DTMF signals without the need of a codec simply by "capturing" the bit pattern generated when such tones are being transmitted.

Comments:

Upon discovery of the codec's inability to properly pass DTMF tones, the accuracy of the original tones being played back through the codec were re-checked and found to be well within specifications, so we are currently at a loss to explain this if, in fact, the DTMF detection/regeneration is enabled.
Note that when one presses the keys on the IC-91AD to generate DTMF tones while transmitting using D-Star, the tones emitted at the receiver are "clean." This likely means that the radio's computer sends instructions to the codec to generate a specific DTMF tone, rather than having the audio tone that one hears from the speaker (or would be transmitted in FM mode) decoded and subsequently interpreted by the codec.
Based on the observations above it is likely that the "DTMF Decoding" feature of the codec is, in fact, not fully-implemented - something that would explain the lack of "clean" DTMF tones resulting from externally-applied audio in the above test. In other words, you cannot reliably pass DTMF signaling through a D-Star link system unless that audio is generated from a D-Star radio itself! This means that if you are using a D-Star system as a gateway or as a relay of an analog channel, you should not expect DTMF control to be possible through that link.

Let's move on to some scenarios that are more likely to be encountered in real-world operation.

Tests with multiple voices:

Another test was with multiple voices. For this test, different text is read by a male voice and gradually faded to a female voice.

(0:00-0:15) Male voice only
(0:15-0:35) Slow mixing-in of female voice
(0:35-1:10) Both voices of equal average level
(1:10-1:23) Fading-out of male voice

Male-Female voice mix file - (stereo, D-Star=left, FM=right, 1:23, MP3, 1.26 MB)

Analysis:

The "FM" (right channel) audio is provided mostly as a basis of comparison: Aside from reduced frequency response and noted loss of fidelity - as would be expected from a "voice" channel - the original program material (the two voices) sounds pretty much like the original: There is the understandable difficulty in understanding them separately when they are both at the same amplitude. The trained ear, however, can distinguish much of what an individual speaker says.

The D-Star (left channel) audio demonstrates, again, what happens when the codec is presented with two audio signals (voices, in this case) of roughly-equal amplitude.

If you listen very carefully, you can observe an interesting property: Unlike the analog, you never hear both voices simultaneously, but rather one voice at a time. It is fortunate that human speech is not only redundant in its nature (that is, it is easy to infer what was missed by listening to what came immediately before and after the "missing" portion) but that it's also full of pauses. During those instance where both voices are seemingly present, note that the codec actually switches between the male and female voice rapidly, giving the illusion that both voices are present simultaneously. The result - even though it contains many "mixed signal" artifacts - is surprisingly understandable.

In this (and other) voice-containing clips there is another interesting artifact of the compression: A slight "waver" the the frequency of the voice - most easily noted on the lower-frequency male voice. This is likely related to the codec's finite frequency resolution and its innate inability (unlike that of analog) to reproduce frequencies precisely matching those of the original as we have seen in the tone tests above. With higher frequencies (as in the music and with the female voice) these finite-sized steps (and subsequent errors) become proportionally smaller and are less-noticeable.

It is has also been suggested that some of this "wavering" may, in fact, be intentional in order to make the re-created voices sound slightly less robotic that they would be if subtle changes in voice pitch had been "locked" to discrete (and "wrong") frequencies. It could also be that there's enough spectral spread near the fundamental frequency of the voice to cause a degree of "indecision" in which frequency, exactly, is to be reproduced.

Crowd noise:

How about many voices? Up to this point we have explored scenarios that are unlikely to occur in casual operation - that is, the playing of music and the presence of two, simultaneous voices of roughly equal amplitude. A much more likely scenario is that of a crowd. For this, there are two files to demonstrate what happens a noise from a crowd of people is in competition with the speaker:

Female voice with crowd mix - (stereo, D-Star=left, FM=right, 1:16, MP3, 1.17 MB)

Male voice with crowd mix - (stereo, D-Star=left, FM=right, 1:26, MP3, 1.32 MB)

Each file starts with a voice, alone and then gradually (starting at about 15 seconds) the crowd noise is increased in amplitude the the point where it equals that of the voice (by approximately 30 seconds.) The amplitude of the speaker's voice is then gradually reduced during the last 15 seconds or so, leaving only the crowd noise.

The D-Star (left channel) audio, unlike the situation in which there are just two speakers competing for "codec time," the codec has trouble when faced with a (more-or-less) constant background noise - particularly in light of the fact that this noise (the crowd) consists of many voices overlaying each other. With this constant noise, the codec has trouble finding "holes" in which clearly-audible syllables of the speaker can be heard and the voice becomes largely unintelligible. Note: Keep in mind that upon repeated playbacks, one will likely become more-familiar with the text being read and be able to understand more of what is being said that would be the case for a first-time listener despite adverse conditions.

Through careful scrutiny, the clip also demonstrates another property: When the speaker's voice is competing with that of the crowd, one hears the speaker's voice only when its peak level exceeds that of the noise of the crowd, effectively "capturing" the codec.

The lessons here should be obvious, no matter what sort of system (analog or digital) you are using: Do your best to make sure that your voice is the dominant one! Soft-spoken, mic-shy users may find themselves competing unfavorably with the crowd noise: If the sound-level of their voice is too quiet, the background sounds may simply override and "capture" the codec, removing too many traces of the speaker's voice to be audible.

There is another "gotcha", however: The D-Star codec includes an audio compander - that is, a built-in device that equalizes the audio level such that either too high or too low audio gets attenuated or amplified to maintain an overall, constant level. While this feature is generally useful, it can be of detriment when other noises are present. During pauses between words and syllables it will happily increase the gain, bringing up the background and, possibly, causing the sort of "codec confusion" that has been demonstrated here.

At this point in the analysis, a possibly-unrelated question arose: How well does the codec handle un-voiced audio, such as a whisper? It was expected that, because the timbre of unvoiced sounds make an important contribution to the conveyance of information in human speech that the designers of the codec would have assured that it would be capable of reproducing such speech. To test this, another file was created - but without any competition from background noises:

"Whisper" file - (stereo, D-Star=left, FM=right, 1:19, MP3, 1.2 MB)

Analysis:

The "FM" (right channel) audio is provided mostly as a basis of comparison: Aside from reduced frequency response and noted loss of fidelity - as would be expected from a "voice" channel - the original program material sounds pretty much like the original.

The D-Star (left channel) audio is also quite intelligible: There are the expected artifacts associated with the audio compression, but there is little degradation in intelligibility as there is no competition from unrelated noises.

Tests with other ambient noises:

It is expected that amateur radio will continue to make valuable contributions to public service in the future, in both emergency and non-emergency situations. In these and normal, everyday situations one can reasonably expect that the amateur radio operator will experience adverse conditions - including those where there is significant background noise - that may impact intelligibility.

With the advent of digital voice systems in both the public and private communications services some concern has been raised about the ability of the codecs being used to cope with situations where the speaker's voice is being affected somehow - either by high ambient noise from nearby equipment, or significant alteration of the voice by, say, the breathing apparatus of a fire fighter.

More recently, there has been some "pushback" by emergency responders (see below) - with some municipalities either abandoning their new digital-voice systems in favor of the older, analog ones or rejecting their adoption outright! In many cases, these concerns are not unfounded as there are documented cases where the various shortcomings of these systems has been a significant contributor to the loss of life of emergency workers - either by deficiencies in the topology of the system itself, due to the ability of the codec to handle human speech degraded by external factors (a breathing mask or ambient noise) or a combination of both. Additionally, there have been recommendations that the mandated rollout of narrowband, digital communications systems be halted pending the amelioration of such concerns!

It is also important to realize that the codec used in D-Star operates at a lower coding rate than many of those used in public service. With the lower coding rate used in D-Star, it is arguably more important that the user be aware that extraneous sounds can more-easily overwhelm the codec, causing it to further degrade the speech - and this doesn't take into account that D-Star's codec lacks some of the sophisticated adaptive noise reduction techniques (present in some other codec implementations) that could, in theory, reduce problems associated with some external noise sources.

Briefly, the problem of system topology (i.e. how the signal gets from the person transmitting to the person receiving) may be mitigated in amateur radio use because of our potential ability to recognize those situations in which one is unable to get into a repeater and make other arrangements - such as choosing another repeater, improving one's antenna, increasing output power, or simply switching to simplex where short-range communications is adequate: The caveat here is that training is required (whether the system being used is analog or digital) so that the operator (and/or net control operator) is capable of recognizing such situations and taking the appropriate action!

The "Generator" test:

To test the ability of the codec to deal with various types of noises, a few more audio files were created:

Female voice with generator - (stereo, D-Star=left, FM=right, 1:09, MP3, 1.05 MB)

Male voice with generator - (stereo, D-Star=left, FM=right, 1:29, MP3, 1.37 MB)

Each file starts with a voice, alone, and then gradually (starting at about 12-15 seconds) the sound of the generator is increased in amplitude the the point where it equals that of the voice (by approximately 30 seconds.) In the case of the file with the male voice the speaker is faded out, leaving only the generator noise and one can hear the generator stop at the end of the file.

Analysis:

The "FM" (right channel) audio is provided mostly as a basis of comparison: Aside from reduced frequency response and noted loss of fidelity - as would be expected from a "voice" channel - the original program material sounds pretty much like the original. Tests indicate that even when the sound of the speaker and generator are equal, it is still possible for an experienced operator to "dig out" much of what is being said.

The D-Star (left channel) audio begins to suffer from effects similar to those noted in the "crowd noise" test above. Because the sound of the generator is more constant, there are few "gaps of silence" in which the speaker's voice can be inserted. What is heard of the speaker's voice are mostly voice peaks that have "captured" the codec: While a few words and syllables are distinguishable, there is too little information left in order to be able to make much sense out of what is being said.

It is also interesting to note that the interfering sound is not readily recognizable as being that of a gasoline engine in the D-Star clip and a would-be net control may have a difficult time determining what the interfering sound might actually be. It is hoped that common sense would indicate to an operator that being next to a generator (or in any situation in which there was considerable competition from noise) would be a bad idea, but this example demonstrates such if there had been any doubt!

The "Other noises" test:

How about a few other types of noises? The next two audio files include a mix of sounds similar to those that one might expect in various emergency situations:

Female voice and various noises - (stereo, D-Star=left, FM=right, 1:08, MP3, 1.03 MB)

Male voice and various noises - (stereo, D-Star=left, FM=right, 1:00, MP3, 0.94 MB)

As is apparent from the audio file, the voice is mixed with the sounds (mostly sirens) of various emergency vehicles. The final segment, albeit somewhat less likely to occur in a typical situation, has the speaker mixed with three crashes of thunder.

Analysis:

The "FM" (right channel) audio is provided mostly as a basis of comparison: Aside from reduced frequency response and noted loss of fidelity - as would be expected from a "voice" channel - the original program material sounds pretty much like the original. As you might expect, some syllables are lost - simply as a result of the speaker's voice (and, possibly, the radio's clipper) being overwhelmed by the sound: Aside from these brief instances, the audio is generally copyable by an experienced listener, although a few "fills" may be required upon retransmission.

The D-Star (left channel) audio shows a degradation similar to that already demonstrated. One can hear a sort of "capture effect" where the level of the voice is less than that of the offending noise - an example of this being during the siren, where the voice completely disappears during the syllabic pauses: This is in contrast with the FM audio, where the two sounds are simply mixed together, albeit in a non-linear way at times. While a skilled listener can extract much of the information from this clip, significant portions of it are lost and were real traffic being passed, it would likely require a re-transmission.

Again, it cannot be overstressed that in the case of either analog or digital transmission, proper training of the operator is of great importance: In the example above an experienced operator would recognize that the ambient noise would have likely made reception of the transmission difficult and would have either paused to wait for the "QRM" to pass, or would have asked the recipient if "fills" were necessary.

What about "Narrow" FM?

Another mode available on D-Star capable radios is "FM-N" or "Narrow" FM. This is the same as "normal" FM used by amateurs on the VHF and UHF bands for decades, except that the peak deviation is limited to +-2.5 kHz instead of +-5 kHz.

The advantage of narrower deviation is that somewhat less bandwidth is required for a "Narrow" FM voice channel than a "Normal" FM voice channel - but it would be wrong to assume that it was half as wide!

Why? A quick consultation of "Carson's Rule" provides the answer: While the deviation itself is half as wide, one still needs to modulate the same audio bandwidth as before. What this means is that, according to Carson's Rule, if your voice bandwidth extended to 2.5 kHz, you'd still need at least 5 kHz of bandwidth even if your deviation was set to zero!

In practice, one could never set the deviation to zero: As one decreases deviation, the "FM Advantage" (that is, the "quieting" effect of the FM system) begins to diminish with narrower deviation. In fact, the use of "Narrow" FM suffers from this to an extent as compared to "Normal" FM, but the effect is not readily obvious to the causal observer.

It turns out that using half the deviation in "Narrow" FM allows the system designer to tighten channel spacing from 20 kHz to closer to 15 kHz for geographically-adjacent systems, and down to 12.5 kHz (instead of 15 kHz for "normal" FM) for those systems with adequate geographical separation. It should go without saying that these benefits would not be possible unless all receivers used on these systems also incorporated correspondingly narrower filtering in their IF's: It is not enough to simply reduce transmit deviation!

Also remember that Carson's Rule is an approximation: It cannot take into account all situations, such as the instantaneous wider bandwidth of an FM transmission that can occur during transient spectral peaks that may happen during modulation.

In tests done by commercial and public-service entities, it seems that "Narrow" and "Normal" FM are pretty-much equal in their coverage and audio quality when properly implemented. It should also be noted that in many cases, "Narrow" FM radio systems include amplitude-companding techniques - using a compressor at the transmitter and a matching "de-compressor" at the receiver to maximize the signal/noise ratio across the link and to minimize effects of the S/N loss associated with the narrower deviation. It is important to note that not all implementations of "Narrow" FM, particularly those used in amateur radio, include the use of a "compander" system!

Overall comments:

Once again, let's make something absolutely clear to the reader:

This page is not intended to "bash" D-Star or its codec, but rather to educate the user about peculiarities intrinsic to the codec used by D-Star.

It is worth reiterating that both experience and proper training can allow operators to recognize and remedy those situations in which communications may be compromised by extraneous noises, regardless of the medium over which the voice is conveyed!

In order to do this it is very much worthwhile to be familiar with the way in which the communications systems are affected (and often degraded) by such noises. Knowing this improves the likelihood that the operator(s) will be able to recognize what is happening and figure out how to mitigate it: Doing so can not only improve communications efficiency, but it can save time - and maybe lives! Because of the "lossy" nature of the codec used for D-Star (and other digital voice transmission systems) one must accept the fact that, compared to conventional analog systems, it is likely that the users will experience types and degrees degradation that might not be observed on an analog system.

One interesting result of the analysis was the comparison of "intelligibility" by a "skilled" amateur radio operator - one who is rather used to digging out weak, degraded signals from amongst noise and QRM - and the "casual" listener - an "average" person who has not had to do so. As you might expect, the "skilled" listener, being used to noise and QRM, was more-able to understand what was being said when the voice was degraded - although intelligibility through the codec was notably worse than analog. In the case of the un-skilled listener, however, the "digital-degraded" speech was disproportionately more difficult to understand than the "analog-degraded" voice. In retrospect, this isn't too surprising as the "skilled" operator is used to trying to make sense out of what is heard when only random syllabic fragments are present and is less-likely to be distracted by "other" sounds.

By careful analysis of the sound files above, here are some observations made that apply to the codec used in D-Star:

The codec can produce only one sound at a time due to the "digital capture effect."

While, in an analog system, two audio sources simply "mix", the codec will simply "capture" whichever audio source has the most energy. This is readily apparent that for the digital systems, in close analysis of the "Male-Female voice mix" clip above one hears only the male or female voice at any given instant - never both at the same time. With the highly-intermittent and redundant nature of human speech, it is possible that there will be "holes" in which the sounds of the other speaker can be placed, providing the listener with enough information to be able to make some sense out of what is being said.

In cases where the codec cannot distinguish between the two sources of sound (the speaker, and another voice or background noise) the codec is also likely to produce unexpected results. In the examples above, one can hear many instances where sounds are produced that resemble neither audio source. In these cases the codec has mistaken the combined sound as simply noise, or as a random mix of spectral components, and produced it's "best guess." Again, depending on the skill of the listener, these can be distracting, or different enough in their sound that they can be readily ignored.

When presented with a "constant" background noise (such as a crowd, generator or siren) there is less opportunity for the speaker's voice to find a "hole" in which a few syllables can be passed by the codec. In this case, it may be that only voice peaks are able to override the offending noise: Whether or not this will yield sufficient information for the listener to be able to understand the speaker depends on not only the extent to which enough intelligible syllables get through, but also the skill of the listener.

One interesting property with the codec is that, in many cases, the background noise becomes unrecognizable. The implication of this is that the listener - say, a net control station - may not be able to as easily diagnose intelligibility problems associated with a station, especially of the operator of that station is inexperienced and is unaware of the problem with the ambient noise!

The compander built into the codec can, in some instances, make the situation worse.

One of the properties of the D-Star codec that can greatly improve intelligibility is the fact that it has a built-in compander. As mentioned before, this will increase the amplitude of quiet audio and decrease the amplitude of loud audio to maximize intelligibility. In contrast, analog FM transmitters are typically outfitted only with a limiter/clipper to prevent peak audio from exceeding the desired channel bandwidth and, to some extent, reduce the peak-to-average ratio of the speech being transmitted.

It is not uncommon, however, to observe that many analog users seem to have "quiet" audio. There are several possible reasons for this:

The user is talking too quietly.
The microphone may simply be too far away from the user's mouth.
The radio's microphone gain may be too low. This may be remedied through a simple menu setting, but it could involve adjustments internal to the radio or microphone, or even a defective microphone!
The radio may be incorrectly configured - such as being set for "FM-Narrow" on a channel that is configured for +-5 kHz.
A combination of the above.

The problem of "quiet" audio is particularly troublesome when the signal is weak: Because FM has the property of getting "noisy" when signals degrade, this can have a particularly deleterious effect when the user's audio is already low! Compounding this is the fact that many operators (particularly newcomers to FM communications) are unaware of the fact that increasing power will have NO EFFECT on the loudness of the audio: While increasing power may improve intelligibility by reducing the amount of noise, it does nothing to address the root cause (that of low audio levels) in the first place!

With the compander built into D-Star, many of these "sins" are masked as it can seemingly compensate for low audio from the user - regardless of the cause. It does, however, have a potential pitfall when in the presence of ambient noises: When the user stops speaking, it will faithfully increase the the audio gain and bring the amplitude of those noises up - something that can lead to "codec confusion" which, in turn, can easily cause "listener confusion!"

One common scenario is that of a user with "chronically low audio" - a situation often caused by the operator talking too quietly and/or too far from the microphone. On FM, this situation may be obvious to the observer, but the compander in D-Star can compensate for this, further reinforcing a bad habit. If high ambient noise levels are added the mix, the situation is worse as it may be less-likely that the "quiet" speaker's voice will be able to sufficiently overcome the background noise. While this situation is bad enough no matter what system one is using, it can be made even worse with a compander to bring up the background noise and compounding the situation with the "digital capture" effect.

Degradation may be compounded by loss of signal integrity.

Up to this point only scant mention was made about degradation due to loss of signal integrity and in the examples above, it was assumed that there were no bit errors in the received D-Star signals. As we know, this is not necessarily the case in real-world situations.

In the analog world, we are familiar with the tell-tale signs of a weak analog signal - specifically:

White noise, crackle or "popcorn" due to poor "quieting" of a weak signal.
Distortion due to multipath.
"Choppiness" of audio due to squelch-clamping or the signal dropping momentarily below the squelch threshold.

With experience and proper training an operator can recognize these situations on other signals (or, through received reports, his own signal) and attempt to remedy it by increasing power, moving the antenna location, or even having another person relay the message.

D-Star and other digital systems, unfortunately, do not as readily lend themselves to allowing the signal quality to be ascertained by observation. While an oft-touted advantage of the digital systems is their ability to maintain noise-free communications even with degraded signals, this facility comes with a price: The so-called "digital cliff." Unlike analog systems which will often degrade somewhat "gracefully" (e.g. gradual deterioration) or manifest intermittent degradation (such as multipath in a mobile environment) such degradation may not be readily obvious in a digital environment. These include:

Until sufficient data is lost, the digital signal may not show obvious signs of degradation - until it abruptly disappears or becomes badly corrupted.
Loss of digital sync. It is possible that if the signal is momentarily lost, the receiver's codec may lose sync: Even if the signal quality recovers immediately, it may take several seconds before it re-acquires lock, possibly causing critical information to be lost.
With weak or degraded signals it can often take 2-5 seconds after sync loss before a transmission recovered and decoded.
If a transmission is short - or if the critical information is contained at the very beginning of the transmission - the 2-5 seconds that it may take to lock to a degraded signal may result in its content being lost entirely, with the possible result of the recipient being unaware that a transmission ever occurred!
When signals are degraded, callsign/routing information is also degraded. This means that a weak signal with numerous bit errors - even if somewhat "copyable" - may fail to get routed properly in a network unless/until valid data is received. This means that portions (or all) of transmissions may be lost on a data network!
The so-called "R2-D2" effect. If the signals are sufficiently degraded (either due to weak or multipathy signals, or even due to data loss or "timing jitter" on an internet connection - something that is likely on a congested, public internet connection) the resulting speech can be hopelessly garbled.

The above degradation is not unexpected as many of these are the same sorts of degradations that one could expect on any radio link or data network. In many cases, experience and proper training can mitigate these factors, allowing operators to recognize and, hopefully, remedy them.

It is unfortunate, however, that D-Star radios of current manufacture are sadly lacking in any sort of utilities that can assist the detection and diagnosing of such problems:

The S-Meter is of limited use:

Users should realize that the radio's signal level meter will respond to a signal of any sort - including noise and interference - and it is not necessarily useful in determining the cause of signal degradation, such as interference or multipath.
Be aware that many S-Meters have very limited dynamic range and will "peg" even on fairly weak signals, making it of limited use in optimizing a radio link, assuming that such can be done on the basis of signal-strength, alone.

There is no obvious means of determining Bit-Error-Rate (BER):

Despite the fact that both the modem and the codec used in most (if not all) D-Star radios of current manufacture have built-in BER diagnostics, the radios do not (yet) make this information available to the user in any meaningful way.
This is somewhat surprising as even inexpensive, consumer grade wireless appliances (such as wireless LAN cards and wireless telephones) have such facilities - even if they may not be readily known by the casual user.
Having such information available would allow the D-Star user to make a better determination if the radio link is "solid" or "on the edge."

Ironically, one of the best ways currently available to determine viability of a radio path for use as a D-Star link is done by first testing the path using analog to see if it is "clean" and then switching to D-Star!

Additional system complexity demands more/better training:

Remember, unlike most commercial or public service radio system, we amateurs are allowed great flexibility in the way that we configure our networks in that we can choose our frequencies and radio paths at will! Unlike our "professional" counterparts, we have access to many knobs that we can "twiddle" to change the configuration of our system. While this can provide unparalleled communications flexibility, it also requires that the users be very familiar with their radio equipment.

Those considering implementation of a digital radio system must consider that in order for it to be used effectively as a communications tool under adverse conditions, not only do the aforementioned considerations pertaining to the codec need to be considered, but also the fact that the operating the D-Star radio is arguably more complicated than operating an analog one: Not only does it require additional consideration in terms of how the operator uses the radio as compared to its analog counterpart (taking into account both audio and signal integrity considerations) but additional configuration and networking features may further complicate matters if system/radio configuration is not adequately thought out beforehand.

It is important to realize that the system designer can create or avoid problems during system design and configuration. By avoiding unnecessarily complex routing and using fairly simple network topology, one can increase the likelihood that such a system is not only more likely to be usable by the uninitiated and hurriedly-trained operators, but that its maintenance and diagnosing of problems is also likely to be simpler, should difficulties arise! It should also be realized that it is possible that there may be times when it is best not to use an existing D-Star network - such as when it is more appropriate for local communications to use simplex channels, or when it is possible to accommodate necessary communications with other means, leaving the larger, more comprehensive networks available for more-critical use.

Remember: An emergency is not a good time to for someone to try to figure out how to use their radio: Training and practice beforehand is essential!

Final comments:

It is expected that the reader of this page take it simply as an explanation of observations made and put out of his/her mind any perceived biases for or against D-Star or any other digital voice system! It is also expected that readers will do their own research and testing rather than relying solely on this (or any other) reference!

Having said that, it is believed, by the authors, that the tests and conclusions above reflect properties of the systems that should be considered in the implementation of a digital communications system and the training to use it!

Notes and disclaimers:

Some of the above information has been determined using available test gear and Icom D-Star radios and such information is believed to be valid. It is likely that this information will, in the future, be updated and techniques refined.
It is up to you, the reader, to verify that this information is, in fact, correct and suitable for your needs. We cannot be held responsible for the use/misuse of the above information!
If you find that the above information is incorrect or incomplete, please contact the frequency coordinator using the link below.
The text in the above files is read from the Declaration of Independence and The Time Machine by H.G. Wells. Copyrighted content is used in accordance with U.S. "fair use" laws.
Your mileage may vary!

Other Utah VHF Society links related to D-Star:

Utah VHF Society - D-Star Channel Spacing recommendations - Recommendations of channel for D-Star analog channels
Using conventional test gear to evaluate and test D-Star systems - This page covers some aspects of D-Star and analog signals and related test equipment that may make it easier to evaluate the performance of D-Star systems and links.
D-Star repeater installation - Experiences, problems encountered with the installation of a D-Star Stack at a very busy radio site - and solutions to these problems.

The following are FAQ's provided by the Utah VHF society. Note that these may topically overlap the links above:

Misc. links related to D-Star:

http://en.wikipedia.org/wiki/D-STAR - This has a general overview of D-Star.
http://www.arrl.org/FandES/field/regulations/techchar/D-STAR.pdf - This document specifies various aspects of D-Star and its protocols.
http://www.ccarc.net/images/CCARC-Spectrum%20Committee%20Report-%20Rev%203.pdf - This is a document produced by the Colorado frequency coordination body discussing D-Star channel spacing.
http://groups.yahoo.com/group/dstar_digital - This group harbors discussions and information about D-Star.
http://dstarutah.org - The Utah D-Star group

For a report about the testing and implementation of digital voice systems by various public safety agencies, look at these links:

Digital Project Working Group Interim Report, May 2008 - A report by the International Association of Fire Chiefs about studies of analog and digital voice systems.
Phoenix Fire Department Radio System Safety report - This report contains detailed analysis by the City of Phoenix pertaining to real-world use of both analog and digital radio systems, linked from this page.
Radio problems during a fatal fire. This page contains a sobering example of a tragedy that can, in part, be attributed to shortcomings of the communications system may be found here and includes audio files and some brief analysis of what happened. Linked from Daryl Jones' blog. As with any blog, please use your judgment to weigh possible bias of the author and its effects of the facts.

Please note that some of the problems associated with digital, trunked systems do not necessarily apply to amateur radio implementations of digital voice technology. It is, however, well-worth the time for a designer of a radio network - amateur or professional - to become familiar with such systems and take advantage of lessons learned by others, applying them to system design, and to the training of those who are destined to use it!

The above list is, by no means, exhaustive: Other information may be found via web searches.

This matter is open for discussion: If you have concerns or opinions one way or another, please make them known to the frequency coordinator at the email address below.

Questions, updates, or comments pertaining to this web page may be directed to the frequency coordinator.

Return to the Utah VHF Society home page.

Updated 20121220