Audio & DSP

Name: Audio & DSP
Availability: InStock

Learn digital signal processing by building it from scratch in Python: generate and sample signals, implement the DFT and FFT yourself, window and analyze spectra, design FIR and IIR filters by hand, build delay, reverb and modulation effects, detect pitch, and assemble a working synthesizer. The math, not the library calls.

11 projects, 275 hands-on levels, run in your browser.

Syllabus

Foundations: code through audio: Never written code before? Start here. You will learn the absolute basics of Python using sounds, samples, frequencies, and waveforms as your playground. By the end you are ready for Project 1.
Signals & Sampling: Sound is a continuous pressure wave, but a computer can only hold a list of numbers. This project builds the bridge: generate the basic waveforms by hand, sample them at a chosen rate, and confront the two facts that govern all of digital audio, aliasing above the Nyquist frequency and the quantization noise of finite bit depth. Everything later in the track is built on these samples.
Synthesis: A synthesizer turns numbers into notes. This project builds the pieces of a voice: the ADSR envelope that shapes a note's loudness over time, additive synthesis that stacks harmonics into rich tones, amplitude and frequency modulation, the wavetable oscillator at the heart of most digital synths, and the mixing that lets many notes sound at once. By the end you can synthesize a chord from scratch.
The Fourier Transform: Every signal is a sum of sinusoids, and the Fourier transform finds them. This project builds the Discrete Fourier Transform straight from its definition, reads off the magnitude and phase of each frequency bin, inverts the transform to get the signal back, and finally implements the Fast Fourier Transform, the divide-and-conquer algorithm that makes all of modern spectral audio possible. You will never call np.fft the same way again.
Spectral Analysis: A single FFT of a whole song is useless: it tells you which frequencies appear, but not when. Real analysis chops the signal into short overlapping frames, tapers each with a window to tame spectral leakage, and transforms them one at a time. This project builds the windows, the framing, the Short-Time Fourier Transform and the spectrogram it produces, and the parabolic interpolation that pinpoints a peak between bins.
Convolution & FIR Filters: A filter shapes which frequencies pass through, and convolution is the operation that applies it. This project builds convolution from its definition, then designs Finite Impulse Response filters: the moving average, the windowed-sinc low-pass that is the workhorse of audio, high-pass by spectral inversion, and band-pass by cascading. You will compute a filter's frequency response and see exactly what it does to a spectrum.
IIR Filters: Where a FIR filter only looks at past inputs, an IIR filter feeds its own past outputs back in. That feedback buys steep filtering with very few coefficients, at the cost of possible instability. This project builds one-pole smoothers, the general difference equation, the second-order biquad that is the building block of every parametric equalizer, the RBJ cookbook coefficients, and the pole-zero analysis that tells you whether a filter will sing or blow up.
Audio Effects: Now make it sound good. This project builds the effects rack: delay and feedback echo, the Schroeder reverb that stacks comb and allpass filters into a sense of space, the modulation effects (chorus, flanger, vibrato) that all come from one fractionally-interpolated delay line, distortion by waveshaping, and the dynamics processors (noise gate, envelope follower, compressor) that control loudness. These are the boxes on every guitarist's pedalboard and every mixing engineer's channel strip.
Pitch & Time: How does a tuner know the note, and how does a DAW change a singer's pitch without changing the tempo? This project answers both. It detects pitch in the time domain with zero-crossings and autocorrelation (the YIN idea), resamples to shift pitch, overlap-adds frames to stretch time independently of pitch, and maps frequencies onto the musical scale of notes and cents.
Features & Coding: Machines do not listen to waveforms; they listen to features. This project builds the descriptors that drive speech recognition, music tagging, and audio codecs: loudness measures, the spectral-shape features (centroid, rolloff, flux) that capture timbre, the perceptual mel scale and its triangular filterbank, mu-law companding that squeezes audio into fewer bits, and onset detection that finds where the beats fall.
Capstone: A Subtractive Synthesizer: Everything comes together as an instrument. A subtractive synth starts with a harmonically-rich oscillator, carves it with a filter, shapes its loudness with an ADSR envelope, and sweetens it through effects. This capstone builds that signal chain end to end, wires it into a voice that plays a MIDI note, renders a melody, and proves it works by detecting the pitch back out of the synthesized sound. You will have built a real synthesizer from first principles.

Key concepts

Additive synthesis: Building a tone by summing harmonics with chosen amplitudes. A sawtooth is harmonics falling off as 1/k; a square uses only the odd ones.
ADSR: The four stages of a classic synth envelope: Attack, Decay, Sustain, Release. They define how a note rises, settles, holds, and fades.
Aliasing: When a frequency above the Nyquist limit is sampled, it masquerades as a different, lower frequency. The single most important pitfall in digital audio, preven…
Allpass filter: A filter that passes every frequency at equal gain but shifts their phase, used to diffuse echoes in reverb without coloring the sound.
Autocorrelation: How well a signal matches a delayed copy of itself. It peaks at the period, making it a robust way to find pitch, as in the YIN algorithm.
Band-pass filter: A filter that passes a band of frequencies between a low and a high cutoff, made by cascading a high-pass and a low-pass.
Biquad: A second-order IIR section with two poles and two zeros, the building block of parametric equalizers. The RBJ cookbook turns a cutoff and Q into its coefficien…
Bit depth: The number of bits used to store each sample. Each bit adds about 6 dB of dynamic range; 16-bit audio has 96 dB.
Chorus: Mixing a signal with one or more slightly delayed, pitch-modulated copies to sound like several players at once.
Clipping: Flattening any sample that exceeds a threshold. Hard clipping is harsh and buzzy; soft clipping with a tanh curve is warm and tube-like.
Comb filter: A delay with feedback, producing a spectrum of regularly spaced peaks like a comb. The resonant building block of reverb.
Companding: Compressing a signal's dynamic range before quantization and expanding it after, so uniform quantization spends bits where the ear needs them most.
Convolution: Sliding a kernel across a signal, multiplying and summing. It is how every linear filter is applied: the output is the input convolved with the filter's im…
Crest factor: The ratio of a signal's peak to its RMS. High for spiky, transient sounds like drums; near 1.4 for a sine; low for compressed or noisy signals.
Cutoff frequency: The frequency at which a filter starts to attenuate, the dividing line between its passband and stopband.
Decibel (dB): A logarithmic measure of level, matching how loudness is perceived: dB = 20 log10(amplitude). Doubling amplitude adds about 6 dB.
Delay line: A buffer that holds past samples so they can be read back later. The foundation of echo, reverb, chorus, and flanging.
DFT: The Discrete Fourier Transform: it decomposes a signal into the sinusoids that make it up, giving the amplitude and phase of each frequency bin.
Distortion: Reshaping the waveform to add harmonics, from a hard digital clip to the smooth saturation of a tube. Also called waveshaping.
Echo: A delayed, attenuated copy of a signal. A feedback echo repeats and fades, each repeat quieter than the last.
Envelope: The contour of a sound's amplitude over time. Shaping it is what turns a raw tone into a plucked, swelled, or sustained note.
Envelope follower: A circuit or algorithm that tracks a signal's amplitude over time with a fast attack and slow release. The level detector inside compressors and auto-wah.
Feedback: Routing a system's output back into its input. It gives IIR filters and echoes their sustain and resonance, but too much makes them unstable.
FFT: The Fast Fourier Transform, a divide-and-conquer algorithm that computes the DFT in O(N log N) instead of O(N^2). The engine of all practical spectral audio.
FIR filter: A Finite Impulse Response filter: output is a weighted sum of past inputs only. Always stable, can have exactly linear phase, but needs many taps for a sharp c…
Flanger: A chorus with a very short, swept delay mixed near 50/50, producing a sweeping, jet-like comb-filter sound.
FM synthesis: Frequency modulation: one oscillator bends the frequency of another, generating rich sidebands from very few operators. The sound of 1980s digital synths.
Frequency: How many cycles a wave completes per second, in hertz, perceived as pitch. Doubling the frequency raises the pitch by an octave.
Frequency bin: One output of the DFT, corresponding to frequency k * sample_rate / N. The bin spacing, sample_rate / N, is the transform's frequency resolution.
Fundamental frequency: The lowest frequency of a periodic sound and the pitch you hear. Its harmonics sit at integer multiples above it.
Harmonic: A sinusoid at an integer multiple of a fundamental frequency. The mix of harmonic amplitudes gives an instrument its timbre.
High-pass filter: A filter that passes frequencies above the cutoff and removes those below, often used to strip rumble and DC.
IIR filter: An Infinite Impulse Response filter: it feeds past outputs back in. Steep filtering from very few coefficients, but it can be unstable.
Impulse response: A filter's output when fed a single unit impulse, which fully characterizes a linear filter. For a FIR filter it is just the coefficients.
LFO: A low-frequency oscillator, too slow to hear, used to modulate a parameter like amplitude (tremolo), pitch (vibrato), or filter cutoff.
Low-pass filter: A filter that passes frequencies below the cutoff and attenuates those above. The most common filter in subtractive synthesis.
Magnitude spectrum: The absolute value of each complex DFT bin: how much of each frequency is present. What a spectrum analyzer plots.
Mel scale: A perceptual frequency scale, roughly logarithmic, on which equal steps sound equally far apart. Mel-spaced filters are the front end of MFCCs.
MFCC: Mel-Frequency Cepstral Coefficients: a compact description of a spectrum's shape on the mel scale, the classic feature set behind speech recognition.
Mu-law: A companding curve that compresses amplitude logarithmically so quiet sounds get finer quantization steps. The telephone audio standard.
Nyquist frequency: Half the sample rate, and the highest frequency a signal can represent. Anything above it folds back as a lower frequency.
Onset detection: Finding where notes start by detecting sudden rises in energy or spectral flux, then picking peaks. The first step in rhythm and tempo analysis.
Phase: Where in its cycle a sinusoid starts, in radians or degrees. Two waves of the same frequency can reinforce or cancel depending on their relative phase.
Pitch: The perceived highness of a sound, set by its fundamental frequency. Detected from zero-crossings or, more reliably, autocorrelation.
Pole: A root of an IIR filter's denominator. Poles inside the unit circle mean a stable filter; a pole near the circle gives a resonant peak that rings.
Q factor: How sharp and resonant a filter's peak is. Higher Q means a narrower, taller peak that rings longer.
Quantization: Rounding each sample to one of a finite number of levels set by the bit depth. The rounding error is quantization noise, which sets the noise floor.
Resampling: Computing a signal's samples at new time positions, to change sample rate or playback speed. Linear interpolation is the simplest method, and resampling is…
Resonance: A pronounced peak in a filter's response near the cutoff, set by the Q. High resonance makes the squelchy filter sweeps of electronic music.
Reverb: The sound of a space: thousands of overlapping echoes. The Schroeder design builds it from parallel comb filters and series allpass filters.
RMS: Root mean square, the effective level of a signal and the best simple correlate of perceived loudness: the square root of the mean of the squared samples.
Sample: A single measurement of a sound wave's amplitude at one instant. A digital signal is just a list of samples taken at a steady rate.
Sample rate: How many samples are taken per second, in hertz. CD audio is 44,100 Hz. The rate sets the highest frequency that can be captured, the Nyquist frequency.
Sinusoid: A pure sine (or cosine) wave, the atom of sound: every signal is a sum of sinusoids. Defined by its frequency, amplitude, and phase.
Spectral centroid: The magnitude-weighted average frequency of a spectrum, its center of mass, and the main correlate of perceived brightness.
Spectral flux: How much a spectrum changed between two frames. A spike in flux signals a note onset, the basis of beat tracking.
Spectral leakage: When a frequency does not fall exactly on a bin, its energy smears across neighboring bins. Tapering each frame with a window function reduces it.
Spectrogram: The magnitude of the STFT, a time-frequency image showing how a sound's spectrum evolves. The picture behind every audio editor.
STFT: The Short-Time Fourier Transform: chop the signal into short overlapping windowed frames and FFT each, giving frequency content as it changes over time.
Synthesis: Generating sound from scratch with math. Additive stacks sinusoids; subtractive filters a rich waveform; FM modulates one oscillator with another.
Timbre: The tone color that distinguishes a flute from a trumpet at the same pitch and loudness. It comes from the harmonic spectrum and how it evolves over time.
Time stretching: Changing a sound's duration without changing its pitch, by repositioning analysis frames at a different synthesis hop and overlap-adding them.
Tremolo: Modulating a signal's amplitude with an LFO, a periodic swell in loudness.
Vibrato: Modulating pitch with an LFO by reading from a wobbling delay line, which needs fractional-delay interpolation. The basis of chorus and flanging too.
Waveform: The shape of a signal over time. Sine, square, sawtooth, and triangle are the classic synthesis waveforms, each with a distinct harmonic spectrum.
Wavetable: One cycle of a waveform stored in a table; an oscillator steps through it at a rate set by the pitch. The cheap, flexible oscillator inside most digital synths.
Window function: A taper (Hann, Hamming, Blackman) applied to a frame before the FFT, fading its edges to zero to cut spectral leakage, at the cost of a little resolution.
Zero: A root of a filter's numerator, a frequency the filter pushes down toward silence. Poles boost, zeros cut.
Zero-crossing rate: How often a waveform crosses zero. A cheap pitch and noisiness cue: low for voiced tones, high for hiss and noise.