Porting audio applications to OpenBSD

This document currently deals with sampled sounds issues only. Contributions dealing with synthesizers and waveform tables are welcome.

Audio applications tend to be hard to port, as this is a domain where interfaces are not standardized at all, though approaches don't vary much between operating systems.

Using `ossaudio`

The ossaudio emulation is possibly the simplest way, but it won't always work, and it is not such a great idea usually.

It redefines ioctl. If the code to port uses ioctl for more than audio, you will have to #undef ioctl and use the bare form with _ossioctl.
Some features of linux sound are not emulated.
Applications with correct linux sound support that is not Intel-specific tend to use these features.

Using existing NetBSD or FreeBSD code

Since we share part of the audio interface with NetBSD and FreeBSD, starting from a NetBSD port is reasonable. Be aware that some files changed places, and that some entries in sys/audioio.h are obsolete. Also, many ports tend to be incorrectly coded and to work on only one type of machine. Some changes are bound to be necessary, though. Read through the next part.

Writing OpenBSD code

libsndio

OpenBSD has its own audio layer provided by the sndio library, documented in sio_open(4). Until it's merged into this page, you can find further information about programming for this API in the guide, hints on writing and porting audio code. sndio allows user processes to access audio(4) hardware and the aucat(1) audio server in a uniform way. It supports full-duplex operation, and when used with the aucat(1) server it supports resampling and format conversions on the fly.

Hardware independence

YOU SHOULDN'T ASSUME ANYTHING ABOUT THE AUDIO HARDWARE USED.
Wrong code is code that only checks the a_info.play.precision field against 8 or 16 bits, and assumes unsigned or signed samples based on soundblaster behavior. You should check the sample type explicitly, and code according to that. Simple example:

    AUDIO_INIT_INFO(&a_info);
    a_info.play.encoding = AUDIO_ENCODING_SLINEAR;
    a_info.play.precision = 16;
    a_info.play.sample_rate = 22050;
    error = ioctl(audio, AUDIO_SETINFO, &a_info);
    if (error)
	/* deal with it */
    error = ioctl(audio, AUDIO_GETINFO, &a_info);
    switch(a_info.play.encoding)
	{
    case AUDIO_ENCODING_ULINEAR_LE:
    case AUDIO_ENCODING_ULINEAR_BE:
	if (a_info.play.precision == 8)
	    /* ... */
	else 
	    /* ... */
	break;
    case ...

    default:
	/* don't forget to deal with what you don't know !!! For instance, */
	fprintf(stderr, 
		"Unsupported audio format (%d), ask ports@ about that\n",
		a_info.play.encoding);

	}
    /* now don't forget to check what sampling frequency you actually got */

This is about the smallest code fragment that will deal with most issues.

16 bit formats and endianness

In normal usage, you just ask for an encoding type (e.g., AUDIO_ENCODING_SLINEAR), and you retrieve an encoding with endianness (e.g., AUDIO_ENCODING_SLINEAR_LE). Considering that a soundcard does not have to use the same endianness as your platform, you should be prepared to deal with that. The easiest way is probably to prepare a full audio buffer, and to use swab(3) if an endianness change is required. Dealing with external samples usually amounts to:

Parsing the sample format,
Getting the sample in,
Swapping endianness if it is not your native format,
Computing what you want to output into a buffer,
Swapping endianness if the sound card is not in your native format,
Playing the buffer.

Obviously, you may be able to remove steps 3 and 5 if you are simply playing a sound sample which happens to be in your sound card native format.

Audio quality

Hardware may have some weird limitations, such as being unable to get over 22050 Hz in stereo, but up to 44100 in mono. In such cases, you should give the user a change to state his preferences, then try your best to give the best performance possible. For instance, it is stupid to limit the frequency to 22050 Hz because you are outputting stereo. What if the user does not have a stereo sound system connected to his audio card output ?

It is also stupid to hardcode soundblaster-like limitations into your program. You should be aware of these, but do try to get over the 22050 Hz/stereo barrier and check the results.

Sampling frequency

You should definitely check the sampling frequency your card gives you back. A 5% discrepancy already amounts to a half-tone, and some people have much more accurate hearing than that, though most of us won't notice a thing. Your application should be able to perform resampling on the fly, possibly naively, or through devious applications of Shannon's resampling formula if you can.

Dynamic range

Samples don't always use the full range of values they could. First, samples recorded with a low gain will not sound very loud on the machine, forcing the user to turn the volume up. Second, on machines with badly isolated audio, low sound output means you mostly hear your machine heart-beat, and not the sound you expected. Finally, dumb conversion from 16 bits to 8 bits may leave you with only 4 bits of usable audio, which makes for an awfully bad quality.

If possible, the best solution is probably to scan the whole stream you are going to play ahead of time, and to scale it so that it fits the full dynamic range. If you can't afford that, but you can manage to get a bit of look-ahead on what you're going to play, you can adjust the volume boost on the fly, you just have to make sure that the boost factor stays at a low frequency compared to the sound you want to play, and that you get absolutely no overflows -- those will always sound much worse than the improvement you're trying to achieve.
As sound volume perception is logarithmic, using arithmetic shifts is usually enough. If your data is signed, you should explicitly code the shift as a division, as C >> operator is not portable on signed data.

If all else fails, you should at least try to provide the user with a volume scaling option.

Audio performance

Low-end applications usually don't have much to worry about. Keep in mind that some of us do use OpenBSD on low-end 68030, and that if a sound application can run on that, it should.

Don't forget to run benches. Theoretical optimizations are just that: theoretical. Some hard figures should be collected to check what's a sizeable improvement, and what's not.

For high performance audio applications, such as mpegI-layer3, some points should be taken into account:

The audio interface does provide you with the natural hardware blocksize. Using multiples of that for your output buffer is essential. Keep in mind that write, as a system call, incurs a high cost compared to internal audio processing.
Bandwidth is a very important factor when dealing with audio. A useful way to optimize an audio player is to see it as a decompressor. The longer you can keep with the compressed data, the better usually. Very short loops that do very little processing are usually a bad idea. It is generally much better to combine all processing into one loop.
Some formats do incur more overhead than others. The AUDIO_GETENC ioctl should be used to retrieve all formats that the audio device provides. Be especially aware of the AUDIO_ENCODINGFLAG_EMULATED flag. If your application is already able to output all kinds of weird formats, and reasonably optimized for that, try to use a native format at all costs. On the other hand, the emulation code present in the audio device can be assumed to be reasonably optimal, so don't replace it with quickly hacked up code.

A model you may have to follow to get optimal results is to first compile a small test program that enquires about the specific audio hardware available, then proceed to configure your program so that it deals optimally with this hardware. You may reasonably expect people who want good audio performance to recompile your port when they change hardware, provided it makes a difference.

Real time or synchronized

Considering that OpenBSD is not real time, you may still wish to write audio applications that are mostly real time, for instance games. In such a case, you will have to lower the blocksize so that the sound effects don't get out of synch with the current game. The problem with this if that the audio device may get starved, which yields horrible results.

In case you simply want audio to be synchronized with some graphics output, but the behavior of your program is predictable, synchronization is easier to achieve. You just play your audio samples, and ask the audio device what you are currently playing with AUDIO_GETOOFFS, then use that information to post-synchronize graphics. Provided you ask sufficiently often (say, every tenth of a second), and as long as you have enough horse-power to run your application, you can get very good synchronization that way. You might have to tweak the figures by a constant offset, as there is some lag between what the audio reports, what's currently playing, and the time it takes for XWindow to display something.

Contributing code back

In the case of audio applications, working with the original program's author is very important. If his code does only work with soundblaster cards for instance, there is a good chance he will have to cope with other technology soon.

If you don't sent your comments to him by then, your work will have been useless.

It may also be that the author has already noticed whatever problems you are currently dealing with, and is addressing them in his current development tree. If the patches you are writing amount to more than a handful of lines, cooperation is almost certainly a very good idea.

www@openbsd.org
$OpenBSD: audio-port.html,v 1.11 2009/05/29 13:22:38 sthen Exp $