The performance
characteristics of the STARREPORTER method make it superior from a
viewpoint of execution speed and bandwidth preservation
characteristics.
In addition, its ability
to process data sampled at any rates and to provide a continuous
trade off between quality and compression ratio makes STARREPORTER a
more flexible system.
All of these points can be
demonstrated by timing execution speeds and examining the spectral
characteristics of the decompressed output signal.
However, the final evaluation of any audio compression technique
must involve listening to the quality of the output signal. No
matter how efficiently an algorithm performs, we must not lose track
of the ultimate goal - to produce audio output that is
indistinguishable from the original recorded signal.

StarReporter is a zipped file <strrpt1.zip>.
It carries a setup program. After downloading, double click on the
file and follow Winzip's instructions to extract the program. Allow
StarReporter to create its own folder.
NOTE: We strongly recommend that you open a new folder for
the demo program installation that is different from the one in
which the expanded installation files reside. This step allows for
easier removal of the demo program with the Win95 Add/Remove
Programs utility.
A series of sampled data files in various encoding formats are
provided with the program. These will be placed in a Samples folder
below the demo program folder at the time of installation.
The STARREPORTER Multimedia demo is fully functioning with the
exception that it will only save 10 second grabs of data. Routines
such as Capture, Speak, Copy, Cut, Append, Scale, and Trim & Pad
are all available. You may also change sampling rates of individual
files.

A
New Method for Audio Compression
(Ref: Dr Roy C Snell)
Most computer
users have employed such software as PKZIP, LHARC and DriveSpace
either directly or indirectly for compressing data and program
files. These programs involve lossless compression (during the
encoding stage) which reduces the size of a file but permits the
compressed file to be expanded (during the decoding stage) to an
exact copy of the original. A file that has been reduced in size
uses less disk storage space and can be transmitted more quickly to
a remote site over a network such as the Internet.
Since no data
is lost, the only drawback to using such file compression software
is the time required to compress the data initially and the time it
takes to decompress the file each time it is accessed. A file
containing an executable program or a data file whose contents do
not change need only be compressed once so the decompression time is
the most critical. The efficiency of today's compression software
combined with the speed of most computer systems have made automatic
compress/decompress procedures such as the DriveSpace option in
Microsoft's MS-DOS a viable option for many computer users.
In
discussions of file compression techniques, the compression ratio is
usually defined as the size of the original file divided by the size
of the compressed file. For example, if the original file is 300,000
bytes in length and is compressed to 100,000 bytes, the compression
ratio would be 3.0. The value will vary from one file to another but
a compression ratio of around 2.0 is generally considered to be
average for most files. A computer user who takes advantage of this
technology can expect to store, on average, twice as much actual
data on a disk as would be possible in uncompressed form. In
addition, file transmissions over a computer network would take only
half as long as they would in the uncompressed form. These benefits
can be achieved at the expense of the minimal processing time
required for compressing and decompressing the data.
2.
Multimedia Data Storage Formats
For many
years, it was possible to characterise computer files as containing
program code, program data or text. The advent of computer-based
multimedia applications has resulted in a storage requirement for
several new types of data. These include full-colour digitised
images, full-colour movie images and high-quality digitised audio. A
number of internationally recognized formats exist (GIF, JPEG, TIFF)
for the storage of still images. In the case of movie and video
images, an international organization called the Moving Pictures
Expert Group (MPEG) has developed storage standards for both the
video and audio portions of the signal.
A major
problem encountered in multimedia applications is dealing with files
containing high-quality digitized audio signals. In the case of the
standard two-channel audio stored on consumer Compact Disks (CDs),
44,100 data values per channel must be stored each second. Since
each data value occupies two bytes of storage space, a total of
176,400 bytes is needed to store a second of audio data. This means
that five minutes of high-quality, dual-channel audio will occupy
more than 59 million bytes of storage space. For most applications,
such data rates place unacceptable demands on the disk storage
capacity of the system and on the transmission rates of any networks
that are used to transfer the audio data to remote users.
For lower
quality audio the storage requirements are significantly reduced.
For example, most telephone conversations are handled digitally at
some point during the transmission process and the equivalent data
rate is only 8000 bytes per second (usually expressed as 64,000
bits/second or bps). For audio of telephone quality, five minutes of
single channel data would require only 2.4 million bytes of storage.
The lowest audio data rates in general use are those provided by
voice-mail systems. The techniques used reduce storage requirements
by an additional factor of two to 32,000 bps so that five minutes of
speech can be stored in 1.2 million bytes.
3. Compression Problems for Audio Data
No matter how much storage is needed for audio data, the use of
compression software with a compression ratio of 2.0 or greater
would significantly reduce the storage requirements. Unfortunately,
standard compression techniques perform very poorly on audio data
with typical compression ratios ranging between 1.0 and 1.10.
Similar problems are encountered when the usual methods are applied
to digitised images.
An approach
that is effective for multimedia data is commonly referred to as
lossy compression. Normal lossless compression techniques involve
storing a version of the data that can be expanded to an exact copy
of the original file. The production of an exact duplicate of the
original is clearly a requirement for files containing program code,
data and text. In the case of audio and video data, a viewer or
listener is unable to discern minor differences between the original
and a reconstruction. This is particularly true for the moving
images produced in digital video systems. By permitting minor
variations in the reconstructed signal, lossy techniques can produce
significantly higher compression ratios, which can reach 100.0 -
200.0 for moving pictures.
As described
in the previous section, a number of methods exist for processing
low quality audio signals at data rates as low as 32,000 bps. In
this paper we will restrict the remaining discussion to the
compression of high quality audio. By high quality, we mean 16 bit
data captured at sampling rates of at least 10,000 samples per
second and up to the standard CD rate of 44,100 samples per second.
A number of interface boards are available for under $200 that are
capable of producing high quality audio output for such signals.
However, multimedia application programs seldom use that capacity
due to the high storage requirements for the audio data.
Due to the
fact that the human ear is, in may instances, more discerning than
the human eye, the techniques used for the processing of
high-quality digital audio generally produce compression ratios no
higher than 5.5 to 7.5. For a single channel of CD quality audio,
these figures represent data rates from approximately 128,000 bps to
96,000 bps (requiring from 3.6 to 4.8 million bytes of storage for
five minutes audio). While this compression ratio is well below what
can be achieved in the lossy compression of digitised images, the
storage savings produced can be significant for many applications.
4. MPEG Audio Compression
For a number
of years, the Moving Pictures Expert Group (MPEG) has been involved
in the specification of standards covering the storage of both the
video and audio components of digital movies. The actual
implementation of the standards and techniques is left to
individuals and firms who are providing the technology to the
marketplace. In this paper we will deal only with the audio portion
of the MPEG standards.
The MPEG
committee has chosen to recommend three compression methods and
named them Audio Layer-1, Layer-2, and Layer-3. Each of the methods
supports the range of data rates shown below. The rates are quoted
in kilobits per second (kbps) and the amount of storage space
required to store five minutes of compressed audio at each range of
rates is indicated.
Layer-1:
from 32 kbps to 448 kbps (1.2 million to 16.8 million bytes)
Layer-2: from 32 kbps to 384 kbps (1.2 million to 14.4 million
bytes)
Layer-3: from 32 kbps to 320 kbps (1.2 million to 12.0 million
bytes)
There is a
natural trade-off between storage requirements and the quality of
the audio output produced from the compressed data. While analytic
measurements can be made to compare the original and compressed
signals, the most effective tests are carried out using human
listeners.
Tests that have been conducted involve a listening sequence that
plays the original signal (A) followed a pair of outputs (BC)
containing the original and the coded signal played in random order.
The listener has to evaluate both B and C with a number between 1.0
and 5.0 according to the following scale.
5.0 = no
noticeable differences
4.0 = perceptible, but not annoying (first differences noticeable)
3.0 = slightly annoying
2.0 = annoying
1.0 = very annoying
For low
bitrates around 64 kbps per channel, Layer-2 scored between 2.1 and
2.6, whereas Layer-3 scored between 3.6 and 3.8. For higher bitrates
of 128 kbps or more, Layer-2 and Layer-3 achieved quite similar
results and even trained listeners found it difficult to detect
differences between original and reconstructed signal. In general,
the best of the MPEG standards is Layer-3. According to the
listening tests, for a data rate of 128 kbps (compression ratio of
5.5 for CD quality data) the reconstructed output was difficult to
distinguish from the original. At 96 kbps (7.35 compression ratio)
the differences are barely perceptible but not annoying and at 64
kbps (11.02 compression ratio) they are somewhere between slightly
annoying and perceptible.
Additional
tests have been carried out to evaluate the MPEG compression
standards for voice commentary in broadcast applications. For most
speech signals sampled at 15 KHz, Layer-3 with a data rate of 60
kbps (compression ratio 4.0) scored 4.4 on listening tests.
5.
Frequency Domain Characteristics of MPEG Audio Compression
The MPEG Audio Layer-3 standard has received wide acceptance by the
multimedia community. Many Layer-3 software encoding and decoding
packages have been developed and specialized hardware is available
that is capable to decoding both the audio and video components of
MPEG data for real-time applications. Despite this wide acceptance,
there are significant problems with MPEG audio. One problem,
relating to the quality of the decoded audio signal, derives from
the process that is used to implement the compression that is
specified by the MPEG standard.
Understanding
the data rates used when capturing audio data involves some
knowledge of the characteristics of the human ear. The upper
frequency limit for the human auditory system is generally
acknowledged to be 20,000 Hz or below. This limit is achieved
occasionally among the young and falls off with increasing age. For
digital systems, a result called Shannon's Sampling Theorem
specifies that any signal must be sampled (or captured) at a rate
that is greater than twice the highest frequency present. To obtain
the complete audio range for human listeners, the sampling rate must
be above 40,000 samples/second, which explains, in part, why the
rate for CDs was chosen to be 44,100 Hz.
The basic
principles behind MPEG audio compression are derived from a science
called psychoacoustics which deals with the way the human brain
perceives sound. One theory in psychoacoustics states that, when a
weak sound is present near the same frequency as a loud sound, the
brain cannot distinguish the weak sound - so MPEG completely removes
the weak sound from the signal. Another psycho acoustic principle is
that the ear is less sensitive to high frequency sounds than to low
frequency sounds. Therefore, the MPEG implementation removes much of
the high frequency energy from the original signal during the
compression process.
To
demonstrate the effect of MPEG coding, an audio signal was captured
at the CD data rate of 44,100 Hz and then compressed using MPEG
Layer?3 coding. Figure 1a) shows the energy distribution in a
2-second segment of the original signal while Figure 1b) represents
the same segment following MPEG coding. In each case, the horizontal
axis represents the range of audio frequencies from 0 to 22,050 Hz
with the amount of energy at each frequency displayed in the
vertical direction.

Figure 1: a) Original Spectrum b) MPEG Layer-3 compression at 128
kbps
It is apparent that, while the signal contains much of its energy in
the lower frequency range, significant amounts of energy can be
observed at the higher frequencies as well. Figure 1b) demonstrates
that MPEG coding has modified the frequency components of the signal
lying above 16,000 Hz. The energy content at the lower frequencies
matches that of the original signal while significant amounts of
energy are removed in the higher frequencies. In fact, the higher
the MPEG compression ratio, the lower the frequency at which major
energy modifications occur.

Figure 2: a)
MPEG Layer-3 compression b) MPEG Layer-3 compression at
at 96 kbps 64 kbps

Figure 3: a)
Spectrum of original 40 msec b) Spectrum of MPEG Layer-3 compression
signal at 96 kbps
The graphs in Figure 2 indicate that the frequency modification and
energy loss is even more extreme for the higher compression ratios
corresponding to 96 kbps and 64 kbps while Figure 3 shows that
similar behaviour also occurs during shorter time periods. Despite
the theories of acoustic masking presented by the experts in
psychoacoustics, it is clear that much of the compression in MPEG
Layer-3 derives from the removal of significant portions of the
signal energy at the higher frequencies. However, the energy that is
clearly present right up to 22,050 Hz is the major reason that
sampling rates as high as 44,100 Hz is used in the first place. If
there was no interest in these higher frequencies, a lower sampling
rate could have been chosen. For high-quality audio, a compression
technique that preserves the full range of frequency information is
clearly desirable.
6.
Computational Complexity of MPEG Audio Compression
In typical
applications that use audio compression, the compression algorithm
is used once to convert the original data to the shorter compressed
format. Following that, each time the audio is to be listened to,
the decompression algorithm must be used to expand the data back to
its original length.
From the
viewpoint of computational complexity, the process of compressing
the audio data does not require a fast algorithm since it is carried
out only once. For most applications, the decompression algorithm
must produce output data at a rate that is faster than the rate at
which it will be output to a listener through the digital/analog
converter in a computer-based sound system. For two channel, CD
quality sound, this means that 2 x 44,100 = 88,200 data values must
be produced by the decoding software every second.
The MPEG
scheme uses the Fourier or Discrete Cosine transformation to convert
the data to the frequency domain for processing. During the
compression stage, extensive calculations are carried out to
optimise the selections of which frequencies are to be removed from
each input frame. A combination of the optimisation and the
frequency domain transformations produces a complex algorithm that
executes quite slowly on standard computing platforms such as PC's
and Macintoshes. As we indicated earlier, this is not generally a
significant problem since compression takes place only once and is
under no real-time processing constraints.
On the other
hand, the requirement to carry out a frequency domain transformation
during MPEG decompression seriously affects the ability of the
algorithm to produce real-time audio output on general-purpose
computers. Almost all MPEG decoders are implemented using special
signal processing hardware. This means that the playback of MPEG
coded data is seriously restricted in its application in the PC
world.
7.
Specifications for a General Purpose Audio Compression System
The extensive
efforts of the ISO-MPEG Committee have produced very clear
specifications for an audio compression standard. Clearly, the best
approach to discussing a set of alternative specifications is to
compare and contrast them with those prepared by MPEG.
7.1
Quality Considerations
The MPEG
standard establishes, as a basic principle, that the reconstituted
audio signal should be virtually indistinguishable from the
original. This feature has been ignored by the developers of many
audio compression schemes (particularly those designed for voice
rather than music) who have concentrated on compression ratios at
the expense of quality.
The greatly
improved performance of computer-based audio output devices (eg. PC
sound cards) has provided most computer systems with the ability to
produce hi-fidelity sound output. This can be achieved only through
high sampling rates (20 KHz or above) and 16 bit signal resolution.
It is this type of data that must be the target for sound
compression algorithms and a key design goal of the
compression/decompression process should be the production of output
that cannot be distinguished from the original audio signal.
7.2 Full Bandwidth in reconstructed signal
One major
deficiency of the MPEG approach is the serious reduction produced in
the bandwidth of the reconstituted signal. A goal for audio
compression should be the preservation of the entire bandwidth of
the original data. There is no point in recording audio with a
bandwidth of 22,050 Hz (CD quality) and then proceeding to remove
almost all of the energy above 12,000 to 16,000 Hz.
7.3
Fast decompression on general-purpose computers
Another vital
requirement for an audio compression scheme is an algorithm whose
computational complexity will allow real-time decompression at high
data rates on a general-purpose computer without specialised signal
processing hardware. This requirement is not satisfied by the
current MPEG procedures.
7.4 No restrictions on sampling rate
The overall
goal in the MPEG specifications is to produce a compressed audio
data stream at certain specified bit rates. This requirement has
produced algorithms that support a set of standard data rates and
standard target bit rates. An example might be 44,100 Hz data
compressed to a 128,000 bps data stream. In our terminology, this
would correspond to a compression ratio of 5.5125. While the ability
to select target bit rates is significant for specialised hardware
such as laser disk players, most applications are not restricted in
this manner.
While certain
sampling rates are more frequently used than others, the varying
demands of signal representation for the speaking voice, the singing
voice and musical instruments are tied to a number of different
rates. An audio compression algorithm should accept data sampled at
any rate.
In our view,
one major application of compression technology will be in a
reduction of the storage required for audio data on such devices as
computer disks and CD-ROMs. The other will be a reduction in the
time necessary to transmit audio data over a network connection such
as the Internet. Neither of these situations requires a fixed
sampling rate for the original data or a fixed bit rate for
transmission.
7.5
User selection of output quality
A natural
tradeoff exists between the compression ratio and the quality of the
reconstructed audio signal. In short, the more you compress the
original signal the poorer will be the quality of the output. In
this area of technology, a basic goal is the development of an
algorithm that will produce maximum compression with no discernible
degradation in quality.
However, in
some applications, a user is willing to accept a minor reduction in
quality in order to increase the compression ratio. Any system
should provide a user with this flexibility by allowing the choice
of several quality levels during the compression process.
Reference: Dr Roy C Snell |