The Utah VHF Society Observations about the audio codec used for D-Star
Purpose of
this page:
A
few comments about this page:
With the continued
interest in digital voice communications in amateur
radio, we decided to run a few tests using D-Star radios
to ascertain the behavior of the codec when subject to
sounds other than those of the human voice.
We felt it important to be able to understand how the
voice of a communicator - either via an analog or
digital transmission - might be affected in conditions
that were less-than-ideal: Particularly in light of
recent re-emphasis on the facility of using amateur
radio in emergency communications systems, we wanted to
provide some characterization of how they might behave
in "less-than-ideal" situations - such as those where
the speaker's voice may be in severe competition with
other sounds.
It should be noted that the intent of this analysis was
to provide a reference for those who might consider the
implementation of a digital radio system: It should come
as no surprise that the digital voice system used in
D-Star is somewhat more "fragile" than old-fashioned
analog system. This not-unexpected result is a logical
consequence of the "lossy" coding typical of low-rate
speech-only codecs - the one used in D-Star being one of
the better-performing codecs in this class.
As can be seen from this page, the codec was subject to
sounds that it was NOT intended to be
able to handle (such as music) in order to observe how
it would break down and provide some insight to how it
responded when presented with other, more-realistic
situations.
In some ways, the results were rather surprising: Some
listeners were, frankly, appalled at the results, but
the opinion of many was along the lines of "That's
better than I thought it would be..."
To be sure, being able to make sense out of a degraded
transmission - either digital or analog - is a skill
acquired through experience, practice, and training. It
was noted that so-called "skilled" operators (e.g. those
that regularly work pileups on HF and are rather used to
picking fragments of voice out of chaos) were generally
able to make out the gist of what was being said on both
the analog and digital transmissions, but that the
analog transmissions were noticeably more "copyable."
It was also noted that the un-skilled listener (a random
person, unaccustomed to having to dig "speech fragments"
out of such chaos or even a casual user of FM voice) had
noticeably more difficulty deciphering the degraded
digital speech than the analog.
Again, such a result was not surprising, once again
showing that experience and practice are of paramount
importance in any critical traffic-handling situation.
Brief mention has also been made on this page with
reference to the degradation of signals due to the
"digital cliff" - that is, the rather un-graceful
drop-off in perceived quality that occurs when digital
signals degrade below a certain point. Again,
recognizing and knowing how to deal with these sorts of
situations is another important facet of training and
experience.
Finally, links have been provided to observations made
by various public-service entities related to digital
(some of which use somewhat more sophisticated codecs at
higher bit rates which are arguably more-resistant to
the demonstrated degradations) and analog (both trunked
and un-trunked) systems and how they perform in a number
of environments. While these observations may not always
be directly applicable to many amateur-radio situations
(e.g. trunking versus non-trunked systems, and the
ability of the experienced amateur to arbitrarily choose
a frequency, mode and signal path as necessary) they are
well-worth a read by any would-be system designer and
emergency planner!
While I'm sure that at least some of this has already
been hashed and re-hashed, revisiting this and related
topics is likely to be worthwhile.
First off, let's make something absolutely clear
to the reader:
This page is not
intended to "bash" D-Star or its codec, but rather to educate
the user about peculiarities intrinsic to the codec used by
D-Star.
Some of the observations made on this page may not
apply solely to D-Star, but to other low-rate digital voice
codec/systems as well.
The codec used in D-Star is, by design a lossy
codec: That is, data reduction is accomplished by preserving
only those fundamental characteristics of the human voice that are
required to adequately reproduce it. As one might expect,
with lower bit rates, this representation becomes less-precise
and, inevitably, deviations from the original source become
increasingly obvious.
As digital voice systems become increasingly commonplace it must
not escape the attention of the users of these systems their
limitations - particularly in comparison with traditional analog
systems. These limitations become apparent when these codecs
are attempting to encode and replicate increasingly-complex sounds
- specifically, when these other sounds are in competition with
the human speaker: As the original voice sounds become
"diluted" with extraneous noise, these codecs can fail, unable to
make sense out of what is being inputted to them.
Particularly with low-rate codecs - such as the one used with
D-Star - this becomes increasingly problematic as the
increasingly-complex sounds can no longer be accurately
represented in the limited bandwidth available. The result
of this breakdown in encoding is that the intelligibility of the
speech is further degraded - possibly to the point of unintelligibility - while the
same speech, conveyed on an analog system, may still be
understandable to the experienced listener.
The purpose of this page is to demonstrate the various ways in
which both D-Star and analog signals are affected when the speaker
is in competition with other sounds. By being aware of the
nature of these complex interactions one may, through experience
and training, be able to avoid as much as possible or be able to
deal with those situations in which intelligibility may be - or is
being - compromised by other sounds: As any experienced
communicator knows, such problems can seriously impede effective
traffic-handling which, under the worst of conditions, can result
in loss of life and property.
Basic principles of D-Star's codec:
It is fortunate that the human speech is more-or-less comprised of
two different types of sounds:
Voiced sounds, such as vowels.
These sounds are represented by a fundamental frequency
(produced by our vocal cords) as well as the myriad of
harmonics (produced by interactions of our mouth, tongue and
nasal cavities, for example.) In addition to vowels,
voiced consonants such as "M" are included in this group.
Unvoiced sounds (fricatives) such as
consonants. Many consonants consists of clicks or
noises, such as the "K" or "S" sounds - and are, in essence,
just bursts of spectrally-shaped noise. As with the
voiced sounds, their timbre is altered by various resonant
structures in our head such as the mouth, tongue and nasal
cavities.
Note that some sounds (such as those represented
by "D" or "B") contain both voiced and un-voiced components.
For speech-only use it is generally enough to optimize the codec
to work based on the assumption that it will encounter only
the above conditions. In doing so, one not only reduces the
complexity of the codec (which can also reduce the cost of the
hardware required to implement it) but also make it more likely
that such a codec - with an innate, limited repertoire of
capabilities - can operate at very low bit rates. It is the
implementation of this rather simplified acoustic model that
causes these classes of codecs to be very poor at reproducing
spectrally-complex (with non-harmonically related content) sounds
such as music and other "non-voice" sounds.
Compander:
It should be mentioned that the D-Star codec includes another
feature: A "compander." This device maintains a
more-or-less constant audio level, bringing up the microphone gain
if the speaker's voice is quiet, and reducing it if it is too
loud. This is, in most cases, a desirable feature as it can greatly improve
intelligibility - particularly if the speaker is a soft-spoken
one. The caveat is that this same compander can also
increase the level of background noise and contribute to "codec
confusion," as we shall see.
Comment about the analog audio on the IC-91AD:
On the IC-91AD and, to my knowledge all
other Icom D-Star capable radios, there is no
"compander" in the audio chain used for both "normal" (+- 5
kHz) and "narrow" (+-2.5 kHz) deviation - only the usual
limiter/clipper arrangement found in typical FM
transmitters. It is for this reason that one must be
careful in making A/B comparisons between D-Star and
analog: If doing so, one must take into account the fact
that the compander may "fix" - when using D-Star - an audio
level that is inappropriately low for the FM analog
modulator! It has been noted that, in some "A/B"
comparisons between D-Star and FM found on the web, the
orchestrators have been remiss in their attention to this
particular (important) detail!
"Breaking" the codec - and
subsequent analysis:
For further testing, audio files were constructed for the purpose
of testing the effects of non-speech audio (as well as speech plus
"other" sounds). To do this, the following setup was used to
transmit and receive the test files:
One IC-91AD Handie-Talkie, used as a
transmitter, was connected to a dummy load and set to low
power.
For the transmitting radio, the test audio
was input through the external microphone connector from an
audio player on which uncompressed .WAV files were
played. The audio level was set so that the peak level
was just +- 5kHz when in analog ("FM") mode - that is,
barely hitting the radio's clipper.
A second IC-91AD was placed next to the
unit that was transmitting, tuned to the same frequency with a
rubber-duck antenna connected. The audio output of this
radio was connected to a digital audio recorder, with the
resulting audio being recorded in an uncompressed .WAV
format. In this configuration, signal quality was
verified and found to be excellent .
The audio content was played (and recorded)
twice:
First, using FM (+-5 kHz) mode
Again using D-Star audio (DV) mode.
The two resulting audio files (one using FM
and the other using D-Star) were merged and synchronized (to
minimize D-Star's intrinsic processing delay) into a single
stereo audio file with the channels as follows:
LEFT Channel: D-Star audio
RIGHT Channel: FM audio
On this web page, the resulting files were
subsequently edited and encoded as .MP3 at a bit rate high
enough to avoid further audible contribution of
compression artifacts.
For convenience, files containing only
audio from the D-Star and FM tests are also provided and
these are about half the size of the corresponding stereo
file.
There were no bit errors
during reception of the D-Star transmissions: What
is heard are simply artifacts of the codec itself!
Important Notes relating to playback
of the audio files:
When playing back the stereo file, remember
that the left channel contains the D-Star audio and
the right channel contains the FM audio.
In order to properly separate the two
channels, it is strongly recommend that you either:
Wear headphones, listening to one ear
at-a-time
Control the "balance" using the
computer's mixer as to hear just one channel at a time from
the speakers/headphones.
Individual files are also available that
contain only D-Star or FM audio.
Tests with music:
Please note: The authors are fully aware
that the codec used in D-Star is not intended to
be able to faithfully reproduce sounds other than
speech.
Let's first try an acid test: Music. Unlike human
speech, music need not have much harmonically-related content at
all, with different notes of different amplitudes and timbres
occurring all at once - not to mention the inclusion of both notes
and noise (drums, cymbals, etc.) at the same
time! To do this, a simple tune was pounded out on a
synthesizer using a number of different instrument/note
combinations. Another file contains a more complicated clip
with multiple instruments and voices.
"Music file 1" was designed to have several distinct sections to
demonstrate various properties of the codec:
(0:00-0:21) - Piano, single note
melody with no reverb and minimal sustain.
(0:21-0:45) - Piano, single note
melody with some reverb and sustain.
(0:45-1:11) - Piano, melody with
(mostly) non-harmonically related chords.
(1:11-1:50) - Piano, single-note
melody with bass and cymbal accompaniment.
"Music file 2" contains an excerpt
from the song "Benson, Arizona" from the movie "Dark
Star" by John Carpenter. This clip is used in
accordance with "Fair Use"
provisions of U.S. copyright laws.
Analysis: The "FM" (right channel) audio is provided mostly as a
basis of comparison: Aside from reduced frequency response
and noted loss of fidelity - as would be expected from a "voice"
channel - the original program material (the music) sounds pretty
much like the original.
The "D-Star" (left channel) audio does show some interesting
properties:
"Music 1"
In this section, the melody is quite
recognizable. It is interesting to note, however, that
the attack and, to a lesser-extent, the decay of the piano
note is considerably altered. This alteration could be
explained somewhat by the amplitude compressor intrinsic to
D-Star, but much of the change in the dynamics of the "attack"
are due to the rather coarse "frame rate" of the audio codec
which - coupled to the very "lossy" nature of the compression,
cannot respond to quickly-changing properties of the piano.
When the sustain is added, it becomes
apparent that the built-in companding of the codec is
considerably altering the attack and decay dynamics of the
note. As notes are transitioned, it sounds less like a
piano as the pitches of the notes tend to "slide" into each
other during the transition, resembling more a sort of
"sliding" wind instrument rather than a percussion
instrument. Another interesting change from the original
is that the dynamics (that is, the difference of loudness
between notes) is pretty much lost.
When a lower, non-harmonically-related note
is added to the tune, the codec becomes extremely confused,
seeming to "lock" onto the dominant note. Of course,
with a single voice, this situation is not likely to arise as
it is reasonable to expect only a single, strong fundamental
to be present in the human voice. It is also
interesting to note that, during the "overlap" of two notes,
bursts of noise (not necessarily related to the actual notes)
often appear.
For this section a bass guitar and cymbal
were added. At times, the codec "locks" onto the bass
note instead of the piano melody - sometimes switching
mid-note as the amplitude of one diminishes and the other
becomes dominant and "captures" the codec. It is
interesting to note, however, that the resulting note pitch
through the codec is often just plain wrong! The most
obvious example of this is in the first few seconds of the
file where the bass note is solo and comparing it to the FM
version, you can see that some of the notes aren't even
close! (It's also interesting to hear what happens
to the cymbal during the piece...) This "note
inaccuracy" is intrinsic to the codec's finite spectral
resolution: Again, considering the nature of the human
voice, such alterations don't impair intelligibility, but do
contribute to the somewhat "robotic" sound apparent in D-Star
encoded speech - particularly with adult male speech.
The "orchestra" represents a fairly complex
sound with many non-harmonically-related components and, not
surprisingly, the codec interprets these mostly as
noise. When the melody comes along - with strong "horn"
components - it takes over (or "captures") the codec and is
(mostly) recognizable. The martial bass accompaniment,
clearly audible in the FM version, sounds more like burst of
noise in this version. Again, this demonstrates the
aspect of the codec in which it attempts to discern which
audio components are likely to be those most important in the
conveyance of information: Unlike with speech, there
aren't readily apparent harmonic relationships between
lower-frequency components and its harmonics, so it is not
surprising that the codec would "assume" that such a complex
sound was more likely to be an unvoiced speech component.
The "choir" of synthesized voices also has
some significant non-harmonically-related components as well,
but many (varying) harmonically-related ones. It is
interesting to note the vacillation of the codec when trying
to reproduce a single note, with it alternating between a
recognizable representation of the original sound and an odd
"buzzing" sound that is, itself, often related to the original
note.
"Music 2"
This clip is even more complex, including both strong single-note
components in addition to non-harmonically-related notes and a
multi-voiced chorus. A careful listening reveals many of the
same characteristics that reveal themselves in the "Music 1"
clip. Interestingly, much of the voice is still
understandable despite obvious degradation, but we'll discuss why
this is later on.
What does all of this show us? It would seem to demonstrate
many the properties of the codec that make it especially suitable
for the low-bitrate representation of the human voice, but it also
demonstrates that, if significant audio energy is present besides
that of the human voice, it is likely to break down it some odd
(but understandable) ways as the codec - which seems to be able to
generate only one sound at-a-time - seems to be "captured" by the
dominant (usually louder) sound(s).
Response of the codec to single-frequency tones:
Another interesting phenomenon - one that may be lost to those
without "musical" ears - is that, with the D-Star music clips,
many of the notes are rendered off-pitch. Without knowing
the precise internals of the codec, one would have to guess that
this is a result of the limited "spectral" resolution of the codec
itself, having only finite resolution when it came to reproducing
frequencies in a precise manner. To be sure, accurate
preservation of absolute frequencies is not particularly important
in casual, human speech and the deficiencies of the codec in this
regard become readily apparent only when music is played.
This leads to some interesting questions: How does this
codec react when presented with audio containing only a single
frequency? To do this, a number of files were created and
run through the codec:
This file consists of a slow sweep from
300 Hz to 2.5 kHz over a period of 120 seconds.
Analysis: The "FM" (right channel) audio is provided mostly as a
basis of comparison.
The "D-Star" (left channel) audio does show some interesting
properties:
In the "Discrete Tones" file - especially if one is listening to
the "stereo" version in which both tone are heard simultaneously -
especially on speakers rather than headphones - one can tell that
there is, in fact, a difference in tone frequencies between the
two channels, the difference varying, probably dependent upon how
far away the tone is from the discrete step in the codec's
resolution. The 150 Hz tone, which is below the intended
frequency range of both the codec and the audio circuitry of the
radio, is barely detectable - if at all.
In order to do additional testing a sound file was created
with a slow, phase-continuous frequency change from 300 to 2500 Hz
over a period of 120 seconds. In this case, one can clearly
hear the discrete frequency steps as the codec reproduces the
tones. In certain places, one can hear a "glitch" as the
codec's "indecision" seems to, briefly, cause the tone to degrade
from a sine wave to more of a burst of noise: Again,
these are not the result of bit errors or
extraneous noise, but rather peculiarities of the codec.
How about if we change the tone at a faster rate?
Fast
sweep - (stereo, D-Star=left,
FM=right, 1:01, MP3, 953 kB)
This file consists of a slow sweep from
300 Hz to 2.5 kHz over a period of 1 second.
Clearly, something different is happening
here: The codec is not responding to this as a
series of single-frequency tone and the result is that we hear an
odd-sounding ascending noise sequence that somewhat resembles the
original sweep. The breakdown of the codec isn't too
surprising, as the "dwell time" of the tone on any single
frequency is too short for the codec to consider it to be a single
tone.
This leads one to ask how long in duration the tone needs to be in
order for it to be rendered in a way that differs relatively
little from the original? At the time of writing, we haven't
run any tests to determine this, but if one looks carefully at the
transitions between the tones in the above clips, one can tell
that it is short - but definitely finite!
Comment: Considering that D-Star's codec has a
50Hz audio "frame rate" it is likely that the dwell time of a
tone would have to be discrete multiples of that period to be
successfully detected and rendered. This would also imply
that the codec's ability to change the sound that it is
producing would also be discrete multiples of that same period -
that is, no sound could change at faster than 50 times per
second - probably slower! (With the code's base coding
rate of 2400 bits/sec - not including forward error correction -
that would imply that only there are 48 bits per "frame" of
sound, some of which are likely to be overhead rather than
actual representations of the sounds being encoded.
Testing with DTMF tones:
At this point, one may wonder about DTMF tones and other types of
signaling that might be passed via the D-Star codec?
According to available data sheets, the codec contains some
built-in utilities that allow the detection and regeneration of
certain types of tone-signaling, such as DTMF, which permits their
precise transmission and subsequent reproduction. Provision
is made for the detection and regeneration of single-frequency
tones from 156.25 to 3812.5 Hz with a resolution of 31.25
Hz. The latter would certainly explain some of the behavior
that has been observed. Although not known for certain, it
seems likely that the "bin size" (as
in an FFT, for
example) in various frequency-determining portions of the
codec's algorithm have fixed-sized frequency steps and if this is
the case, a 256-point FFT is implied, as the data sheet indicates
that the codec's sampling rate is 8 kHz.
If this were the case for all frequencies, this would
imply that the codec would be unable to pass DTMF signaling, as a
resolution of just 31.25 Hz is not enough to maintain industry
specifications of frequency accuracy. Also, as seen from our
music tests, the codec is unable to produce two
simultaneous tones! Again, reference to the codec's data
sheet tells us something else: There is a built in decoder
for DTMF as well as some other common "call-progress" tones (such
as ringing, busy, dial tone, etc.) that might be encountered on a
telecommunication's channel.
In order to test the behavior when presented with such sounds, a
file was created with two parts: The first half consists of
all 16 standard DTMF tones (0-9, *, #, and A-D) at various
cadences followed by the same 16 DTMF tones - except with their
frequencies shifted down by about 5%. (For this test, we
did not simulate the various "call progress" tones used in
telephony.) The original idea was to observe the DTMF
tones as they passed through the system, and then again when
"off-frequency" tones were presented to the codec - but the
results of this test were rather unexpected:
The first part of this file (0-7 seconds)
contain all 16 standard DTMF digits while the second half
(7-14 seconds) are the same DTMF digits, but with the
frequency lowered by about 5%.
As is readily apparent, the codec crashed and
burned when presented with the "on-frequency" DTMF tones, but it
seemed to reproduce the "off frequency" tones to some degree -
although not very well! The reason for this anomaly is
unknown at this point and further tests will have to be conducted
to see if this is a problem related to this specific radio, but it
is likely a result of serendipity and relationship of the "5%
lowered" tones.
Ignoring the frequency anomaly for the moment, let's analyze the
response of the codec to the DTMF tones: Because of the
inbuilt delay of the D-Star codec, it is not as important that one
decodes the tones real time, but rather it can afford to sit
around for a short time and decide if the DTMF tone is, in fact,
valid. In comparison with conventional DTMF decoders - such
as those found on auto patches - this is an advantage, as it
allows one to do even a better job when it comes to preventing
"falsing" - that is, the erroneous detection of non-DTMF audio as
a DTMF tone.
It is also important to realize that the D-Star codec does not
transmit DTMF (or single-frequency) tones in the same way that
it transmits voice! These tones appear to be special
cases for the codec in that instead of transmitting an approximate
representation of the sound that it is hearing (as would be the
case for a voice component, in which the timbre and frequency are
symbolically represented) but that the codec transmits a special
set of codes that indicate to the receiver that this is, in fact,
a specific type of tone. This is easy to verify by
carefully observing the data waveform of the D-Star baseband
signal when precise, single-frequency tones or DTMF signals are
transmitted, and such is strongly implied in the data sheet for
the codec as well!
Comment: This has the implication that it should be
fairly easy to generate "canned" waveforms for the purpose of
generating single-frequency and DTMF signals without the need of
a codec simply by "capturing" the bit pattern generated when
such tones are being transmitted.
Comments:
Upon discovery of the codec's inability to
properly pass DTMF tones, the accuracy of the original tones
being played back through the codec were re-checked and found
to be well within specifications, so we are currently at a
loss to explain this if, in fact, the DTMF
detection/regeneration is enabled.
Note that when one presses the keys on the
IC-91AD to generate DTMF tones while transmitting using
D-Star, the tones emitted at the receiver are "clean."
This likely means that the radio's computer sends instructions
to the codec to generate a specific DTMF tone, rather than
having the audio tone that one hears from the speaker (or
would be transmitted in FM mode) decoded and subsequently
interpreted by the codec.
Based on the observations above it is
likely that the "DTMF Decoding" feature of the codec is, in
fact, not fully-implemented - something that would explain the
lack of "clean" DTMF tones resulting from externally-applied
audio in the above test. In other words, you
cannot reliably pass DTMF signaling through a D-Star link
system unless that audio is generated from a
D-Star radio itself! This means that if you
are using a D-Star system as a gateway or as a relay of an
analog channel, you should not expect DTMF
control to be possible through that link.
Let's move on to some scenarios that are more likely to be
encountered in real-world operation.
Tests with multiple voices:
Another test was with multiple voices. For this test,
different text is read by a male voice and gradually faded to a
female voice.
Analysis: The "FM" (right channel) audio is provided mostly as a
basis of comparison: Aside from reduced frequency response
and noted loss of fidelity - as would be expected from a "voice"
channel - the original program material (the two voices) sounds
pretty much like the original: There is the understandable
difficulty in understanding them separately when they are both at
the same amplitude. The trained ear, however, can
distinguish much of what an individual speaker says.
The D-Star (left channel) audio demonstrates, again, what happens
when the codec is presented with two audio signals (voices, in
this case) of roughly-equal amplitude.
If you listen very carefully, you can observe an interesting
property: Unlike the analog, you never hear both voices simultaneously,
but rather one voice at a time. It is
fortunate that human speech is not only redundant in its nature
(that is, it is easy to infer what was missed by listening to what
came immediately before and after the "missing" portion) but that
it's also full of pauses. During those instance where both
voices are seemingly present, note that the codec actually switches
between the male and female voice rapidly, giving the illusion
that both voices are present simultaneously. The result -
even though it contains many "mixed signal" artifacts - is
surprisingly understandable.
In this (and other) voice-containing clips there is another
interesting artifact of the compression: A slight "waver"
the the frequency of the voice - most easily noted on the
lower-frequency male voice. This is likely related to the
codec's finite frequency resolution and its innate inability
(unlike that of analog) to reproduce frequencies precisely
matching those of the original as we have seen in the tone tests
above. With higher frequencies (as in the music and with the
female voice) these finite-sized steps (and subsequent errors)
become proportionally smaller and are less-noticeable.
It is has also been suggested that some of this "wavering" may, in
fact, be intentional in order to make the re-created voices sound
slightly less robotic that they would be if subtle changes in
voice pitch had been "locked" to discrete (and "wrong")
frequencies. It could also be that there's enough spectral
spread near the fundamental frequency of the voice to cause a
degree of "indecision" in which frequency, exactly, is to be
reproduced.
Crowd
noise:
How about many voices? Up to this point we
have explored scenarios that are unlikely to occur in casual
operation - that is, the playing of music and the presence of two,
simultaneous voices of roughly equal amplitude. A much more
likely scenario is that of a crowd. For this, there are two
files to demonstrate what happens a noise from a crowd
of people is in competition with the speaker:
Each file starts with a voice, alone and then
gradually (starting at about 15 seconds) the crowd noise is
increased in amplitude the the point where it equals that of the
voice (by approximately 30 seconds.) The amplitude of the
speaker's voice is then gradually reduced during the last 15
seconds or so, leaving only the crowd noise.
Analysis: The "FM" (right channel) audio is provided mostly as a
basis of comparison: Aside from reduced frequency response
and noted loss of fidelity - as would be expected from a "voice"
channel - the original program material sounds pretty much like
the original: In each case, there is the understandable
difficulty in understanding the voice when it is at roughly the
same amplitude as the crowd noise, but the trained ear can dig
out most of the words.
The D-Star (left channel) audio, unlike the
situation in which there are just two speakers competing for
"codec time," the codec has trouble when faced with a
(more-or-less) constant background noise - particularly in light
of the fact that this noise (the crowd) consists of many voices
overlaying each other. With this constant noise, the codec
has trouble finding "holes" in which clearly-audible syllables
of the speaker can be heard and the voice becomes largely
unintelligible. Note: Keep in mind that upon
repeated playbacks, one will likely become more-familiar with
the text being read and be able to understand more of what is
being said that would be the case for a first-time listener
despite adverse conditions.
Through careful scrutiny, the clip also
demonstrates another property: When the speaker's voice is
competing with that of the crowd, one hears the speaker's voice
only when its peak level exceeds that of the noise of the crowd,
effectively "capturing" the codec.
The lessons here should be obvious, no matter
what sort of system (analog or digital) you are using: Do
your best to make sure that your voice is the dominant
one! Soft-spoken, mic-shy users may find themselves
competing unfavorably with the crowd noise: If the
sound-level of their voice is too quiet, the background sounds
may simply override and "capture" the codec, removing too many
traces of the speaker's voice to be audible.
There is another "gotcha", however: The
D-Star codec includes an audio compander - that is, a built-in
device that equalizes the audio level such that either too high
or too low audio gets attenuated or amplified to maintain an
overall, constant level. While this feature is generally
useful, it can be of detriment when other noises are
present. During pauses between words and syllables it will
happily increase the gain, bringing up the background and,
possibly, causing the sort of "codec confusion" that has been
demonstrated here.
At this point in the analysis, a
possibly-unrelated question arose: How well does the codec
handle un-voiced audio, such as a whisper? It
was expected that, because the timbre of unvoiced sounds make an
important contribution to the conveyance of information in human
speech that the designers of the codec would have assured that it
would be capable of reproducing such speech. To test this,
another file was created - but without any competition from
background noises:
Analysis: The "FM" (right channel) audio is provided mostly as a
basis of comparison: Aside from reduced frequency response
and noted loss of fidelity - as would be expected from a "voice"
channel - the original program material sounds pretty much like
the original.
The D-Star (left channel) audio is also quite intelligible:
There are the expected artifacts associated with the audio
compression, but there is little degradation in intelligibility as
there is no competition from unrelated noises.
Tests with other ambient noises:
It is expected that amateur radio will continue to make valuable
contributions to public service in the future, in both emergency
and non-emergency situations. In these and normal, everyday
situations one can reasonably expect that the amateur radio
operator will experience adverse conditions - including those
where there is significant background noise - that may impact
intelligibility.
With the advent of digital voice systems in both the public and
private communications services some concern has been raised about
the ability of the codecs being used to cope with situations where
the speaker's voice is being affected somehow - either by high
ambient noise from nearby equipment, or significant alteration of
the voice by, say, the breathing apparatus of a fire fighter.
More recently, there has been some "pushback" by emergency
responders (see below)
- with some municipalities either abandoning their new
digital-voice systems in favor of the older, analog ones or
rejecting their adoption outright! In many cases, these
concerns are not unfounded as there are documented cases where the
various shortcomings of these systems has been a significant
contributor to the loss of life of emergency workers - either by
deficiencies in the topology of the system itself, due to the
ability of the codec to handle human speech degraded by external
factors (a breathing mask or ambient noise) or a combination of
both. Additionally, there have been recommendations that the
mandated rollout of narrowband, digital communications systems be
halted pending the amelioration of such concerns!
It is also important to realize that the codec used in D-Star
operates at a lower coding rate than many of those
used in public service. With the lower coding rate used in
D-Star, it is arguably more important that the user be aware that
extraneous sounds can more-easily overwhelm the codec, causing it
to further degrade the speech - and this doesn't take into account
that D-Star's codec lacks some of the sophisticated adaptive noise
reduction techniques (present in some other codec implementations)
that could, in theory, reduce problems associated with some
external noise sources.
Briefly, the problem of system topology (i.e. how the signal gets
from the person transmitting to the person receiving) may be
mitigated in amateur radio use because of our potential ability to
recognize those situations in which one is unable to get into a
repeater and make other arrangements - such as choosing another
repeater, improving one's antenna, increasing output power, or
simply switching to simplex where short-range communications is
adequate: The caveat here is that training is required (whether
the system being used is analog or digital) so that the
operator (and/or net control operator) is capable of recognizing
such situations and taking the appropriate action!
The "Generator" test:
To test the ability of the codec to deal with various types of
noises, a few more audio files were created:
Each file starts with a voice, alone, and then
gradually (starting at about 12-15 seconds) the sound of the
generator is increased in amplitude the the point where it equals
that of the voice (by approximately 30 seconds.) In the case
of the file with the male voice the speaker is faded out, leaving
only the generator noise and one can hear the generator stop at
the end of the file.
Analysis:
The "FM" (right channel) audio is provided mostly as a basis of
comparison: Aside from reduced frequency response and noted
loss of fidelity - as would be expected from a "voice" channel -
the original program material sounds pretty much like the
original. Tests indicate that even when the sound of the
speaker and generator are equal, it is still possible for an
experienced operator to "dig out" much of what is being said.
The D-Star (left channel) audio begins to suffer from effects
similar to those noted in the "crowd noise" test above.
Because the sound of the generator is more constant, there are few
"gaps of silence" in which the speaker's voice can be
inserted. What is heard of the speaker's voice are
mostly voice peaks that have "captured" the codec: While a
few words and syllables are distinguishable, there is too little
information left in order to be able to make much sense out of
what is being said.
It is also interesting to note that the interfering sound is not
readily recognizable as being that of a gasoline
engine in the D-Star clip and a would-be net control may have a
difficult time determining what the interfering sound might
actually be. It is hoped that common sense would indicate to
an operator that being next to a generator (or in any situation in
which there was considerable competition from noise) would be a
bad idea, but this example demonstrates such if there had been any
doubt!
The "Other noises" test:
How about a few other types of noises? The next two audio
files include a mix of sounds similar to those that one might
expect in various emergency situations:
As is apparent from the audio file, the voice is
mixed with the sounds (mostly sirens) of various emergency
vehicles. The final segment, albeit somewhat less likely to
occur in a typical situation, has the speaker mixed with three
crashes of thunder.
Analysis:
The "FM" (right channel) audio is provided mostly as a basis of
comparison: Aside from reduced frequency response and noted
loss of fidelity - as would be expected from a "voice" channel -
the original program material sounds pretty much like the
original. As you might expect, some syllables are lost -
simply as a result of the speaker's voice (and, possibly, the
radio's clipper) being overwhelmed by the sound: Aside from
these brief instances, the audio is generally copyable by an
experienced listener, although a few "fills" may be required upon
retransmission.
The D-Star (left channel) audio shows a degradation similar to
that already demonstrated. One can hear a sort of "capture
effect" where the level of the voice is less than that of the
offending noise - an example of this being during the siren, where
the voice completely disappears during the syllabic pauses:
This is in contrast with the FM audio, where the two sounds are
simply mixed together, albeit in a non-linear way at times.
While a skilled listener can extract much of the information from
this clip, significant portions of it are lost and were real
traffic being passed, it would likely require a re-transmission.
Again, it cannot be overstressed that in the case of either analog
or digital transmission, proper training of the operator is of
great importance: In the example above an experienced
operator would recognize that the ambient noise would have likely
made reception of the transmission difficult and would have either
paused to wait for the "QRM" to pass, or would have asked the
recipient if "fills" were necessary.
What about
"Narrow" FM?
Another mode
available on D-Star capable radios is "FM-N" or "Narrow"
FM. This is the same as "normal" FM used by
amateurs on the VHF and UHF bands for decades, except
that the peak deviation is limited to +-2.5 kHz instead
of +-5 kHz.
The advantage of narrower deviation is that somewhat
less bandwidth is required for a "Narrow" FM voice
channel than a "Normal" FM voice channel - but it would
be wrong to assume that it was half as wide!
Why? A quick consultation of "Carson's
Rule" provides the answer: While the
deviation itself is half as wide, one still needs to
modulate the same audio bandwidth as before. What
this means is that, according to Carson's Rule, if your
voice bandwidth extended to 2.5 kHz, you'd still need at
least 5 kHz of bandwidth even if your deviation was set
to zero!
In practice, one could never set the deviation to
zero: As one decreases deviation, the "FM
Advantage" (that is, the "quieting" effect of the FM
system) begins to diminish with narrower
deviation. In fact, the use of "Narrow" FM suffers
from this to an extent as compared to "Normal" FM, but
the effect is not readily obvious to the causal
observer.
It turns out that using half the deviation in "Narrow"
FM allows the system designer to tighten channel spacing
from 20 kHz to closer to 15 kHz for
geographically-adjacent systems, and down to 12.5 kHz
(instead of 15 kHz for "normal" FM) for those systems
with adequate geographical separation. It should
go without saying that these benefits would not be
possible unless all receivers used
on these systems also incorporated correspondingly
narrower filtering in their IF's: It is not enough
to simply reduce transmit deviation!
Also remember that Carson's Rule is an
approximation: It cannot take into account all
situations, such as the instantaneous wider bandwidth of
an FM transmission that can occur during transient
spectral peaks that may happen during modulation.
In tests done by commercial and public-service entities,
it seems that "Narrow" and "Normal" FM are pretty-much
equal in their coverage and audio quality when properly
implemented. It should also be noted that in many
cases, "Narrow" FM radio systems include
amplitude-companding techniques - using a compressor at
the transmitter and a matching "de-compressor" at the
receiver to maximize the signal/noise ratio across the
link and to minimize effects of the S/N loss associated
with the narrower deviation. It is important to
note that not all implementations of "Narrow" FM,
particularly those used in amateur radio, include the
use of a "compander" system!
Overall comments:
Once again, let's make something absolutely clear to the reader:
This page is not
intended to "bash" D-Star or its codec, but rather to
educate the user about peculiarities intrinsic to the codec
used by D-Star.
It is worth reiterating that both experience and
proper training can allow operators to recognize and remedy those
situations in which communications may be compromised by
extraneous noises, regardless of the medium over which the
voice is conveyed!
In order to do this it is very much worthwhile to be familiar with
the way in which the communications systems are affected (and
often degraded) by such noises. Knowing this improves the
likelihood that the operator(s) will be able to recognize what is
happening and figure out how to mitigate it: Doing so can
not only improve communications efficiency, but it can save time -
and maybe lives! Because of the "lossy" nature of the codec
used for D-Star (and other digital voice transmission systems) one
must accept the fact that, compared to conventional analog
systems, it is likely that the users will experience types and
degrees degradation that might not be observed on an analog
system.
One interesting result of the analysis was the comparison of
"intelligibility" by a "skilled" amateur radio operator - one who
is rather used to digging out weak, degraded signals from amongst
noise and QRM - and the "casual" listener - an "average" person
who has not had to do so. As you might expect, the "skilled"
listener, being used to noise and QRM, was more-able to understand
what was being said when the voice was degraded - although
intelligibility through the codec was notably worse than
analog. In the case of the un-skilled listener, however, the
"digital-degraded" speech was disproportionately more difficult to
understand than the "analog-degraded" voice. In retrospect,
this isn't too surprising as the "skilled" operator is used to
trying to make sense out of what is heard when only random
syllabic fragments are present and is less-likely to be distracted
by "other" sounds.
By careful analysis of the sound files above, here are some
observations made that apply to the codec used in D-Star:
The codec can produce only one
sound at a time due to the "digital capture effect."
While, in an analog system, two audio sources
simply "mix", the codec will simply "capture" whichever audio
source has the most energy. This is readily apparent that
for the digital systems, in close analysis of the "Male-Female
voice mix" clip above one hears only the male or
female voice at any given instant - never both at the same
time. With the highly-intermittent and redundant nature of
human speech, it is possible that there will be "holes" in which
the sounds of the other speaker can be placed, providing the
listener with enough information to be able to make some sense out
of what is being said.
In cases where the codec cannot distinguish between the two
sources of sound (the speaker, and another voice or background
noise) the codec is also likely to produce unexpected
results. In the examples above, one can hear many instances
where sounds are produced that resemble neither audio
source. In these cases the codec has mistaken the combined
sound as simply noise, or as a random mix of spectral components,
and produced it's "best guess." Again, depending on the
skill of the listener, these can be distracting, or different
enough in their sound that they can be readily ignored.
When presented with a "constant" background noise (such as a
crowd, generator or siren) there is less opportunity for the
speaker's voice to find a "hole" in which a few syllables can be
passed by the codec. In this case, it may be that only voice
peaks are able to override the offending noise: Whether or
not this will yield sufficient information for the listener to be
able to understand the speaker depends on not only the extent to
which enough intelligible syllables get through, but also the
skill of the listener.
One interesting property with the codec is that, in many cases,
the background noise becomes unrecognizable. The implication
of this is that the listener - say, a net control station - may
not be able to as easily diagnose intelligibility problems
associated with a station, especially of the operator of that
station is inexperienced and is unaware of the problem with the
ambient noise!
The compander built into the
codec can, in some instances, make the situation worse.
One of the properties of the D-Star codec that
can greatly improve intelligibility is the fact that it has a
built-in compander. As mentioned before, this will increase
the amplitude of quiet audio and decrease the amplitude of
loud audio to maximize intelligibility. In contrast, analog
FM transmitters are typically outfitted only with a
limiter/clipper to prevent peak audio from exceeding the desired
channel bandwidth and, to some extent, reduce the peak-to-average
ratio of the speech being transmitted.
It is not uncommon, however, to observe that many analog users
seem to have "quiet" audio. There are several possible
reasons for this:
The user is talking too quietly.
The microphone may simply be too far away
from the user's mouth.
The radio's microphone gain may be too
low. This may be remedied through a simple menu setting,
but it could involve adjustments internal to the radio or
microphone, or even a defective microphone!
The radio may be incorrectly configured -
such as being set for "FM-Narrow" on a channel that is
configured for +-5 kHz.
A combination of the above.
The problem of "quiet" audio is particularly
troublesome when the signal is weak: Because FM has the
property of getting "noisy" when signals degrade, this can have a
particularly deleterious effect when the user's audio is already
low! Compounding this is the fact that many operators
(particularly newcomers to FM communications) are unaware of the
fact that increasing power will have NO EFFECT on
the loudness of the audio: While increasing power may
improve intelligibility by reducing the amount of noise, it does
nothing to address the root cause (that of low audio levels) in
the first place!
With the compander built into D-Star, many of these "sins" are
masked as it can seemingly compensate for low audio from the user
- regardless of the cause. It does, however, have a
potential pitfall when in the presence of ambient noises:
When the user stops speaking, it will faithfully increase the the
audio gain and bring the amplitude of those noises up - something
that can lead to "codec confusion" which, in turn, can easily
cause "listener confusion!"
One common scenario is that of a user with "chronically low audio"
- a situation often caused by the operator talking too quietly
and/or too far from the microphone. On FM, this situation
may be obvious to the observer, but the compander in D-Star can
compensate for this, further reinforcing a bad habit.
If high ambient noise levels are added the mix, the situation is
worse as it may be less-likely that the "quiet" speaker's voice
will be able to sufficiently overcome the background noise.
While this situation is bad enough no matter what system one is
using, it can be made even worse with a compander to bring up the
background noise and compounding the situation with the "digital
capture" effect.
Degradation may be compounded
by loss of signal integrity.
Up to this point only scant mention was made
about degradation due to loss of signal integrity and in the
examples above, it was assumed that there were no bit
errors in the received D-Star signals. As we
know, this is not necessarily the case in real-world situations.
In the analog world, we are familiar with the tell-tale signs of a
weak analog signal - specifically:
White noise, crackle or "popcorn" due to
poor "quieting" of a weak signal.
Distortion due to multipath.
"Choppiness" of audio due to
squelch-clamping or the signal dropping momentarily below the
squelch threshold.
With experience and proper training an operator
can recognize these situations on other signals (or, through
received reports, his own signal) and attempt to remedy it by
increasing power, moving the antenna location, or even having
another person relay the message.
D-Star and other digital systems, unfortunately, do not as readily
lend themselves to allowing the signal quality to be ascertained
by observation. While an oft-touted advantage of the digital
systems is their ability to maintain noise-free communications
even with degraded signals, this facility comes with a
price: The so-called "digital cliff." Unlike analog
systems which will often degrade somewhat "gracefully" (e.g.
gradual deterioration) or manifest intermittent degradation (such
as multipath in a mobile environment) such degradation may not be
readily obvious in a digital environment. These include:
Until sufficient data is lost, the digital
signal may not show obvious signs of degradation - until it
abruptly disappears or becomes badly corrupted.
Loss of digital sync. It is possible
that if the signal is momentarily lost, the receiver's codec
may lose sync: Even if the signal quality recovers
immediately, it may take several seconds before it re-acquires
lock, possibly causing critical information to be lost.
With weak or degraded signals it can often
take 2-5 seconds after sync loss before a transmission
recovered and decoded.
If a transmission is short - or if the
critical information is contained at the very beginning of the
transmission - the 2-5 seconds that it may take to lock to a
degraded signal may result in its content being lost entirely,
with the possible result of the recipient being unaware that a
transmission ever occurred!
When signals are degraded, callsign/routing
information is also degraded. This means that a weak
signal with numerous bit errors - even if somewhat "copyable"
- may fail to get routed properly in a network unless/until
valid data is received. This means that portions (or
all) of transmissions may be lost on a data network!
The so-called "R2-D2" effect. If the
signals are sufficiently degraded (either due to weak or
multipathy signals, or even due to data loss or "timing
jitter" on an internet connection - something that is likely
on a congested, public internet connection) the resulting
speech can be hopelessly garbled.
The above degradation is not unexpected as many
of these are the same sorts of degradations that one could expect
on any radio link or data network. In many cases, experience
and proper training can mitigate these factors, allowing operators
to recognize and, hopefully, remedy them.
It is unfortunate, however, that D-Star radios of current
manufacture are sadly lacking in any sort of utilities that can
assist the detection and diagnosing of such problems:
The S-Meter is of limited use:
Users should realize that the radio's
signal level meter will respond to a signal of any
sort - including noise and interference - and it is not
necessarily useful in determining the cause of signal
degradation, such as interference or multipath.
Be aware that many S-Meters have very
limited dynamic range and will "peg" even on fairly weak
signals, making it of limited use in optimizing a radio
link, assuming that such can be done on the basis of
signal-strength, alone.
There is no obvious means of determining
Bit-Error-Rate (BER):
Despite the fact that both the modem and
the codec used in most (if not all) D-Star radios of current
manufacture have built-in BER diagnostics, the radios do not
(yet) make this information available to the user in any
meaningful way.
This is somewhat surprising as even
inexpensive, consumer grade wireless appliances (such as
wireless LAN cards and wireless telephones) have such
facilities - even if they may not be readily known by the
casual user.
Having such information available would
allow the D-Star user to make a better determination if the
radio link is "solid" or "on the edge."
Ironically, one of the best ways
currently available to determine viability of a radio path for
use as a D-Star link is done by first testing the path using
analog to see if it is "clean" and then switching to
D-Star!
Additional system complexity
demands more/better training:
Remember, unlike most commercial or
public service radio system, we amateurs are allowed great
flexibility in the way that we configure our networks in that
we can choose our frequencies and radio paths at will!Unlike our "professional" counterparts, we have access to many
knobs that we can "twiddle" to change the configuration of our
system. While this can provide unparalleled communications
flexibility, it also requires that the users be very
familiar with their radio equipment.
Those considering implementation of a digital radio system must
consider that in order for it to be used effectively as a
communications tool under adverse conditions, not only do the
aforementioned considerations pertaining to the codec need to be
considered, but also the fact that the operating the D-Star radio
is arguably more complicated than operating an analog one:
Not only does it require additional consideration in terms of how
the operator uses the radio as compared to its analog counterpart
(taking into account both audio and signal integrity
considerations) but additional configuration and networking
features may further complicate matters if system/radio
configuration is not adequately thought out beforehand.
It is important to realize that the system designer can create or
avoid problems during system design and configuration. By
avoiding unnecessarily complex routing and using fairly simple
network topology, one can increase the likelihood that such a
system is not only more likely to be usable by the uninitiated and
hurriedly-trained operators, but that its maintenance and
diagnosing of problems is also likely to be simpler, should
difficulties arise! It should also be realized that it is
possible that there may be times when it is best not to
use an existing D-Star network - such as when it is more
appropriate for local communications to use simplex channels, or
when it is possible to accommodate necessary communications with
other means, leaving the larger, more comprehensive networks
available for more-critical use.
Remember: An emergency is not a good time to
for someone to try to figure out how to use their
radio: Training and practice beforehand is essential!
Final comments:
It is expected that the reader of this page take it simply as
an explanation of observations made and put out of his/her mind
any perceived biases for or against D-Star or any other digital
voice system! It is also expected that readers will
do their own research and testing rather than relying solely on
this (or any other) reference!
Having said that, it is believed, by the authors, that the tests
and conclusions above reflect properties of the systems that
should be considered in the implementation of a digital
communications system and the training to use it!
Notes and disclaimers:
Some of the above information has been
determined using available test gear and Icom D-Star radios
and such information is believed to be valid. It is
likely that this information will, in the future, be updated
and techniques refined.
It is up to you, the reader, to
verify that this information is, in fact, correct and suitable
for your needs. We cannot be held responsible for the
use/misuse of the above information!
If you find that the above information is
incorrect or incomplete, please contact the frequency
coordinator using the link below.
The text in the above files is read from
the Declaration of Independence and The Time
Machine by H.G. Wells. Copyrighted content is used
in accordance with U.S. "fair use" laws.
D-Star repeater
installation - Experiences, problems encountered
with the installation of a D-Star Stack at a very busy
radio site - and solutions to these problems.
The following are FAQ's provided by the
Utah VHF society. Note that these may topically overlap
the links above:
Radio
problems
during
a
fatal fire. This page contains a sobering
example of a tragedy that can, in part, be attributed to
shortcomings of the communications system may be found here
and includes audio files and some brief analysis of what
happened. Linked
from Daryl Jones' blog.As with any
blog, please use your judgment to weigh possible bias of the
author and its effects of the facts.
Please note that some of the problems
associated with digital, trunked systems do not necessarily
apply to amateur radio implementations of digital voice
technology. It is, however, well-worth the time for a
designer of a radio network - amateur or professional - to
become familiar with such systems and take advantage of
lessons learned by others, applying them to system design, and
to the training of those who are destined to use it!
The above list is, by no
means, exhaustive: Other information may be found via web
searches.
This matter is open for
discussion: If you have concerns or opinions one way
or another, please make them known to the frequency
coordinator at the email address below.
Questions, updates, or comments pertaining
to this web page may be directed to the frequency coordinator.