How it WorksStarReporter FeaturesDownload StarReporter DemoBuy StarReporter NowAbout MediaDownUnderBack Home

 


 

The performance characteristics of the STARREPORTER method make it superior from a viewpoint of execution speed and bandwidth preservation characteristics.

In addition, its ability to process data sampled at any rates and to provide a continuous trade off between quality and compression ratio makes STARREPORTER a more flexible system.

All of these points can be demonstrated by timing execution speeds and examining the spectral characteristics of the decompressed output signal.

However, the final evaluation of any audio compression technique must involve listening to the quality of the output signal. No matter how efficiently an algorithm performs, we must not lose track of the ultimate goal - to produce audio output that is indistinguishable from the original recorded signal.
 

Click here to download StarReporter Multimedia for Win95 Demo (3.2mb)
Also a copy of the Star Audio Player to play sound files.
Read the below Shortened Technical Report or
Download Full-Version Technical Report

 
 

StarReporter is a zipped file <strrpt1.zip>. It carries a setup program. After downloading, double click on the file and follow Winzip's instructions to extract the program. Allow StarReporter to create its own folder.

NOTE: We strongly recommend that you open a new folder for the demo program installation that is different from the one in which the expanded installation files reside. This step allows for easier removal of the demo program with the Win95 Add/Remove Programs utility.

A series of sampled data files in various encoding formats are provided with the program. These will be placed in a Samples folder below the demo program folder at the time of installation.

The STARREPORTER Multimedia demo is fully functioning with the exception that it will only save 10 second grabs of data. Routines such as Capture, Speak, Copy, Cut, Append, Scale, and Trim & Pad are all available. You may also change sampling rates of individual files.

 

A New Method for Audio Compression
(Ref: Dr Roy C Snell)

Most computer users have employed such software as PKZIP, LHARC and DriveSpace either directly or indirectly for compressing data and program files. These programs involve lossless compression (during the encoding stage) which reduces the size of a file but permits the compressed file to be expanded (during the decoding stage) to an exact copy of the original. A file that has been reduced in size uses less disk storage space and can be transmitted more quickly to a remote site over a network such as the Internet.

Since no data is lost, the only drawback to using such file compression software is the time required to compress the data initially and the time it takes to decompress the file each time it is accessed. A file containing an executable program or a data file whose contents do not change need only be compressed once so the decompression time is the most critical. The efficiency of today's compression software combined with the speed of most computer systems have made automatic compress/decompress procedures such as the DriveSpace option in Microsoft's MS-DOS a viable option for many computer users.

In discussions of file compression techniques, the compression ratio is usually defined as the size of the original file divided by the size of the compressed file. For example, if the original file is 300,000 bytes in length and is compressed to 100,000 bytes, the compression ratio would be 3.0. The value will vary from one file to another but a compression ratio of around 2.0 is generally considered to be average for most files. A computer user who takes advantage of this technology can expect to store, on average, twice as much actual data on a disk as would be possible in uncompressed form. In addition, file transmissions over a computer network would take only half as long as they would in the uncompressed form. These benefits can be achieved at the expense of the minimal processing time required for compressing and decompressing the data.

2. Multimedia Data Storage Formats

For many years, it was possible to characterise computer files as containing program code, program data or text. The advent of computer-based multimedia applications has resulted in a storage requirement for several new types of data. These include full-colour digitised images, full-colour movie images and high-quality digitised audio. A number of internationally recognized formats exist (GIF, JPEG, TIFF) for the storage of still images. In the case of movie and video images, an international organization called the Moving Pictures Expert Group (MPEG) has developed storage standards for both the video and audio portions of the signal.

A major problem encountered in multimedia applications is dealing with files containing high-quality digitized audio signals. In the case of the standard two-channel audio stored on consumer Compact Disks (CDs), 44,100 data values per channel must be stored each second. Since each data value occupies two bytes of storage space, a total of 176,400 bytes is needed to store a second of audio data. This means that five minutes of high-quality, dual-channel audio will occupy more than 59 million bytes of storage space. For most applications, such data rates place unacceptable demands on the disk storage capacity of the system and on the transmission rates of any networks that are used to transfer the audio data to remote users.

For lower quality audio the storage requirements are significantly reduced. For example, most telephone conversations are handled digitally at some point during the transmission process and the equivalent data rate is only 8000 bytes per second (usually expressed as 64,000 bits/second or bps). For audio of telephone quality, five minutes of single channel data would require only 2.4 million bytes of storage. The lowest audio data rates in general use are those provided by voice-mail systems. The techniques used reduce storage requirements by an additional factor of two to 32,000 bps so that five minutes of speech can be stored in 1.2 million bytes.


3. Compression Problems for Audio Data

No matter how much storage is needed for audio data, the use of compression software with a compression ratio of 2.0 or greater would significantly reduce the storage requirements. Unfortunately, standard compression techniques perform very poorly on audio data with typical compression ratios ranging between 1.0 and 1.10. Similar problems are encountered when the usual methods are applied to digitised images.

An approach that is effective for multimedia data is commonly referred to as lossy compression. Normal lossless compression techniques involve storing a version of the data that can be expanded to an exact copy of the original file. The production of an exact duplicate of the original is clearly a requirement for files containing program code, data and text. In the case of audio and video data, a viewer or listener is unable to discern minor differences between the original and a reconstruction. This is particularly true for the moving images produced in digital video systems. By permitting minor variations in the reconstructed signal, lossy techniques can produce significantly higher compression ratios, which can reach 100.0 - 200.0 for moving pictures.

As described in the previous section, a number of methods exist for processing low quality audio signals at data rates as low as 32,000 bps. In this paper we will restrict the remaining discussion to the compression of high quality audio. By high quality, we mean 16 bit data captured at sampling rates of at least 10,000 samples per second and up to the standard CD rate of 44,100 samples per second. A number of interface boards are available for under $200 that are capable of producing high quality audio output for such signals. However, multimedia application programs seldom use that capacity due to the high storage requirements for the audio data.

Due to the fact that the human ear is, in may instances, more discerning than the human eye, the techniques used for the processing of high-quality digital audio generally produce compression ratios no higher than 5.5 to 7.5. For a single channel of CD quality audio, these figures represent data rates from approximately 128,000 bps to 96,000 bps (requiring from 3.6 to 4.8 million bytes of storage for five minutes audio). While this compression ratio is well below what can be achieved in the lossy compression of digitised images, the storage savings produced can be significant for many applications.

4. MPEG Audio Compression

For a number of years, the Moving Pictures Expert Group (MPEG) has been involved in the specification of standards covering the storage of both the video and audio components of digital movies. The actual implementation of the standards and techniques is left to individuals and firms who are providing the technology to the marketplace. In this paper we will deal only with the audio portion of the MPEG standards.

The MPEG committee has chosen to recommend three compression methods and named them Audio Layer-1, Layer-2, and Layer-3. Each of the methods supports the range of data rates shown below. The rates are quoted in kilobits per second (kbps) and the amount of storage space required to store five minutes of compressed audio at each range of rates is indicated.

Layer-1: from 32 kbps to 448 kbps (1.2 million to 16.8 million bytes)
Layer-2: from 32 kbps to 384 kbps (1.2 million to 14.4 million bytes)
Layer-3: from 32 kbps to 320 kbps (1.2 million to 12.0 million bytes)

There is a natural trade-off between storage requirements and the quality of the audio output produced from the compressed data. While analytic measurements can be made to compare the original and compressed signals, the most effective tests are carried out using human listeners.

Tests that have been conducted involve a listening sequence that plays the original signal (A) followed a pair of outputs (BC) containing the original and the coded signal played in random order. The listener has to evaluate both B and C with a number between 1.0 and 5.0 according to the following scale.

5.0 = no noticeable differences
4.0 = perceptible, but not annoying (first differences noticeable)
3.0 = slightly annoying
2.0 = annoying
1.0 = very annoying

For low bitrates around 64 kbps per channel, Layer-2 scored between 2.1 and 2.6, whereas Layer-3 scored between 3.6 and 3.8. For higher bitrates of 128 kbps or more, Layer-2 and Layer-3 achieved quite similar results and even trained listeners found it difficult to detect differences between original and reconstructed signal. In general, the best of the MPEG standards is Layer-3. According to the listening tests, for a data rate of 128 kbps (compression ratio of 5.5 for CD quality data) the reconstructed output was difficult to distinguish from the original. At 96 kbps (7.35 compression ratio) the differences are barely perceptible but not annoying and at 64 kbps (11.02 compression ratio) they are somewhere between slightly annoying and perceptible.

Additional tests have been carried out to evaluate the MPEG compression standards for voice commentary in broadcast applications. For most speech signals sampled at 15 KHz, Layer-3 with a data rate of 60 kbps (compression ratio 4.0) scored 4.4 on listening tests.

5. Frequency Domain Characteristics of MPEG Audio Compression

The MPEG Audio Layer-3 standard has received wide acceptance by the multimedia community. Many Layer-3 software encoding and decoding packages have been developed and specialized hardware is available that is capable to decoding both the audio and video components of MPEG data for real-time applications. Despite this wide acceptance, there are significant problems with MPEG audio. One problem, relating to the quality of the decoded audio signal, derives from the process that is used to implement the compression that is specified by the MPEG standard.

Understanding the data rates used when capturing audio data involves some knowledge of the characteristics of the human ear. The upper frequency limit for the human auditory system is generally acknowledged to be 20,000 Hz or below. This limit is achieved occasionally among the young and falls off with increasing age. For digital systems, a result called Shannon's Sampling Theorem specifies that any signal must be sampled (or captured) at a rate that is greater than twice the highest frequency present. To obtain the complete audio range for human listeners, the sampling rate must be above 40,000 samples/second, which explains, in part, why the rate for CDs was chosen to be 44,100 Hz.

The basic principles behind MPEG audio compression are derived from a science called psychoacoustics which deals with the way the human brain perceives sound. One theory in psychoacoustics states that, when a weak sound is present near the same frequency as a loud sound, the brain cannot distinguish the weak sound - so MPEG completely removes the weak sound from the signal. Another psycho acoustic principle is that the ear is less sensitive to high frequency sounds than to low frequency sounds. Therefore, the MPEG implementation removes much of the high frequency energy from the original signal during the compression process.

To demonstrate the effect of MPEG coding, an audio signal was captured at the CD data rate of 44,100 Hz and then compressed using MPEG Layer?3 coding. Figure 1a) shows the energy distribution in a 2-second segment of the original signal while Figure 1b) represents the same segment following MPEG coding. In each case, the horizontal axis represents the range of audio frequencies from 0 to 22,050 Hz with the amount of energy at each frequency displayed in the vertical direction.



Figure 1: a) Original Spectrum b) MPEG Layer-3 compression at 128 kbps

It is apparent that, while the signal contains much of its energy in the lower frequency range, significant amounts of energy can be observed at the higher frequencies as well. Figure 1b) demonstrates that MPEG coding has modified the frequency components of the signal lying above 16,000 Hz. The energy content at the lower frequencies matches that of the original signal while significant amounts of energy are removed in the higher frequencies. In fact, the higher the MPEG compression ratio, the lower the frequency at which major energy modifications occur.

Figure 2: a) MPEG Layer-3 compression b) MPEG Layer-3 compression at
at 96 kbps 64 kbps

Figure 3: a) Spectrum of original 40 msec b) Spectrum of MPEG Layer-3 compression
signal at 96 kbps

The graphs in Figure 2 indicate that the frequency modification and energy loss is even more extreme for the higher compression ratios corresponding to 96 kbps and 64 kbps while Figure 3 shows that similar behaviour also occurs during shorter time periods. Despite the theories of acoustic masking presented by the experts in psychoacoustics, it is clear that much of the compression in MPEG Layer-3 derives from the removal of significant portions of the signal energy at the higher frequencies. However, the energy that is clearly present right up to 22,050 Hz is the major reason that sampling rates as high as 44,100 Hz is used in the first place. If there was no interest in these higher frequencies, a lower sampling rate could have been chosen. For high-quality audio, a compression technique that preserves the full range of frequency information is clearly desirable.

6. Computational Complexity of MPEG Audio Compression

In typical applications that use audio compression, the compression algorithm is used once to convert the original data to the shorter compressed format. Following that, each time the audio is to be listened to, the decompression algorithm must be used to expand the data back to its original length.

From the viewpoint of computational complexity, the process of compressing the audio data does not require a fast algorithm since it is carried out only once. For most applications, the decompression algorithm must produce output data at a rate that is faster than the rate at which it will be output to a listener through the digital/analog converter in a computer-based sound system. For two channel, CD quality sound, this means that 2 x 44,100 = 88,200 data values must be produced by the decoding software every second.

The MPEG scheme uses the Fourier or Discrete Cosine transformation to convert the data to the frequency domain for processing. During the compression stage, extensive calculations are carried out to optimise the selections of which frequencies are to be removed from each input frame. A combination of the optimisation and the frequency domain transformations produces a complex algorithm that executes quite slowly on standard computing platforms such as PC's and Macintoshes. As we indicated earlier, this is not generally a significant problem since compression takes place only once and is under no real-time processing constraints.

On the other hand, the requirement to carry out a frequency domain transformation during MPEG decompression seriously affects the ability of the algorithm to produce real-time audio output on general-purpose computers. Almost all MPEG decoders are implemented using special signal processing hardware. This means that the playback of MPEG coded data is seriously restricted in its application in the PC world.

7. Specifications for a General Purpose Audio Compression System

The extensive efforts of the ISO-MPEG Committee have produced very clear specifications for an audio compression standard. Clearly, the best approach to discussing a set of alternative specifications is to compare and contrast them with those prepared by MPEG.

7.1 Quality Considerations

The MPEG standard establishes, as a basic principle, that the reconstituted audio signal should be virtually indistinguishable from the original. This feature has been ignored by the developers of many audio compression schemes (particularly those designed for voice rather than music) who have concentrated on compression ratios at the expense of quality.

The greatly improved performance of computer-based audio output devices (eg. PC sound cards) has provided most computer systems with the ability to produce hi-fidelity sound output. This can be achieved only through high sampling rates (20 KHz or above) and 16 bit signal resolution. It is this type of data that must be the target for sound compression algorithms and a key design goal of the compression/decompression process should be the production of output that cannot be distinguished from the original audio signal.

7.2 Full Bandwidth in reconstructed signal

One major deficiency of the MPEG approach is the serious reduction produced in the bandwidth of the reconstituted signal. A goal for audio compression should be the preservation of the entire bandwidth of the original data. There is no point in recording audio with a bandwidth of 22,050 Hz (CD quality) and then proceeding to remove almost all of the energy above 12,000 to 16,000 Hz.

7.3 Fast decompression on general-purpose computers

Another vital requirement for an audio compression scheme is an algorithm whose computational complexity will allow real-time decompression at high data rates on a general-purpose computer without specialised signal processing hardware. This requirement is not satisfied by the current MPEG procedures.
7.4 No restrictions on sampling rate

The overall goal in the MPEG specifications is to produce a compressed audio data stream at certain specified bit rates. This requirement has produced algorithms that support a set of standard data rates and standard target bit rates. An example might be 44,100 Hz data compressed to a 128,000 bps data stream. In our terminology, this would correspond to a compression ratio of 5.5125. While the ability to select target bit rates is significant for specialised hardware such as laser disk players, most applications are not restricted in this manner.

While certain sampling rates are more frequently used than others, the varying demands of signal representation for the speaking voice, the singing voice and musical instruments are tied to a number of different rates. An audio compression algorithm should accept data sampled at any rate.

In our view, one major application of compression technology will be in a reduction of the storage required for audio data on such devices as computer disks and CD-ROMs. The other will be a reduction in the time necessary to transmit audio data over a network connection such as the Internet. Neither of these situations requires a fixed sampling rate for the original data or a fixed bit rate for transmission.

7.5 User selection of output quality

A natural tradeoff exists between the compression ratio and the quality of the reconstructed audio signal. In short, the more you compress the original signal the poorer will be the quality of the output. In this area of technology, a basic goal is the development of an algorithm that will produce maximum compression with no discernible degradation in quality.

However, in some applications, a user is willing to accept a minor reduction in quality in order to increase the compression ratio. Any system should provide a user with this flexibility by allowing the choice of several quality levels during the compression process.

Reference: Dr Roy C Snell

 

 

Worldwide Distributors | Media Down Under Pty Ltd
2 Rosebridge Avenue Castle Cove NSW 2069 | Sydney, Australia
Tel (612) 9417 7335 | Fax (612) 9417 7709 | Mobile 0412 292 822 | starreporter@mediadownunder.com