How Sound Is Stored Digitally
Learn how computers turn sound into numbers using sampling, sample rate and bit depth, why CD audio uses 44,100 samples per second, and how file size is calculated — with worked examples and a quiz.
Key takeaways
- Sound is a wave; computers store it by measuring the wave's height many times per second
- Each measurement is called a sample, and the result is a list of numbers
- Sample rate is how many samples are taken per second, measured in hertz (Hz)
- Bit depth is how precisely each sample is recorded; more bits means finer detail
- A higher sample rate and bit depth give better quality but a larger file
Sound is a wave
Clap your hands and you push the air, creating a wave that ripples outward until it reaches your ears. A microphone feels that same wave as a wiggle in air pressure and turns it into a wiggling electrical signal. The trouble is that this signal is smooth and continuous — it has a value at every instant — but a computer can only store numbers. So how do we squeeze an endless wiggle into a list of numbers?
The answer is the same trick computers use for everything: turn it into numbers, which underneath are just 0s and 1s, exactly as in How Computers Store Data in Binary. For sound, the method is called sampling.
Sampling: measuring the wave
To store a wave, the computer measures its height at tiny, regular time intervals — thousands of times every second. Each measurement is a sample. String the samples together and you have a list of numbers that traces the shape of the wave:
time: 0 1 2 3 4 5 6 7
sample: 0 18 31 25 -4 -22 -30 -15
Played back fast enough, those numbers recreate the original sound. The more often you measure, the more faithfully the list follows the true curve — just like a film is a smooth motion built from many still frames shown quickly.
Sample rate
The number of samples taken per second is the sample rate, measured in hertz (Hz). Common values:
| Sample rate | Where it is used |
|---|---|
| 8,000 Hz | Telephone calls (just enough for speech) |
| 44,100 Hz | CD-quality music |
| 48,000 Hz | Video and film audio |
A rate of 44,100 Hz means the wave is measured 44,100 times every second. That sounds extreme, but it is what is needed to capture the full range of sounds a human ear can hear. Too low a rate and high notes vanish or sound wrong — the audio version of a blurry, low-resolution photo.
Bit depth: how precise each sample is
The sample rate decides how often you measure. Bit depth decides how precisely you record each measurement. With more bits per sample, the height can be stored more finely:
- 8-bit audio stores each sample as one of 256 levels — fine for retro game sounds, but a bit grainy.
- 16-bit audio (CD standard) gives 65,536 levels — smooth and clear to the ear.
- 24-bit audio gives over 16 million levels, used in professional studios.
Low bit depth adds a faint background "fuzz" called quantisation noise, because the true height has to be rounded to the nearest available level.
Working out the file size
Because audio is just a list of numbers, you can calculate exactly how big a recording is. Multiply together: sample rate × bytes per sample × number of channels × seconds.
A 16-bit sample is 2 bytes. Stereo means 2 channels (left and right). For one second of CD audio:
44,100 samples/sec × 2 bytes × 2 channels = 176,400 bytes per second
That is about 176 KB every second, or over 10 MB per minute, before any compression. This is exactly why formats like MP3 exist — to shrink that down to a size you can stream and store.
A complete worked example
This Python program generates a one-second beep from scratch by computing samples of a sine wave, then saves it as a real .wav file using the built-in wave and math modules — no extra installs needed. Run it and play beep.wav.
import wave, math, struct
sample_rate = 44100 # samples per second
duration = 1.0 # seconds
frequency = 440 # Hz -> the musical note A
samples = []
for n in range(int(sample_rate * duration)):
# height of the wave at this instant, scaled to a 16-bit range
t = n / sample_rate
value = math.sin(2 * math.pi * frequency * t)
samples.append(int(value * 30000)) # 30000 keeps it within 16-bit limits
# Write the list of numbers into a standard WAV file
with wave.open("beep.wav", "w") as f:
f.setnchannels(1) # mono
f.setsampwidth(2) # 2 bytes = 16-bit samples
f.setframerate(sample_rate)
for s in samples:
f.writeframes(struct.pack("<h", s)) # "<h" = one 16-bit number
print("Wrote", len(samples), "samples")
print("File size estimate:", len(samples) * 2, "bytes")
Reading the code: the loop builds 44,100 samples, one for each measurement in the second. For every sample it computes the height of a 440 Hz sine wave at that moment and scales it into the 16-bit range. The wave module then writes those numbers into a proper audio file. The printed sample count (44,100) and size estimate (about 88 KB) match the formula above for one second of 16-bit mono audio.
Try it yourself
- Change the note. Set
frequency = 880and listen — it should sound exactly one octave higher (double the frequency). - Halve the quality. Set
sample_rate = 8000and compare. The beep still works but high detail is lost, and the file is much smaller. - Do the maths. Calculate the size of a 3-minute stereo CD-quality song:
44100 × 2 × 2 × 180bytes. Convert to megabytes by dividing by 1,000,000.
Challenge: Make a short two-note tune. Generate half a second at 440 Hz, then half a second at 523 Hz (the note C), join the two sample lists together, and write them to one WAV file. Then experiment: what happens to the sound if you add two sine waves of different frequencies into each sample instead of playing them one after another? You have just discovered how chords are built.
Quick quiz
Test yourself and earn XP
What does a computer measure to record a sound?
The computer samples the wave's height at regular intervals, turning it into numbers.
What is the sample rate?
Sample rate is the number of measurements taken per second, in hertz (Hz).
How many samples per second does CD-quality audio use?
CD audio uses 44,100 samples per second (44.1 kHz).
What does a higher bit depth give you?
More bits per sample means each measurement can be recorded more precisely.
What happens to file size if you raise the sample rate?
More samples per second means more numbers to store, so a larger file.
FAQ
Human ears hear up to about 20,000 Hz. A rule called the Nyquist theorem says you must sample at more than twice the highest frequency you want to capture, so you need over 40,000 samples per second. 44,100 was chosen because it sat just above that limit and fitted neatly with the video equipment used to master early CDs. It captures everything a person can hear.
An MP3 uses compression. It cleverly throws away parts of the sound that human ears are unlikely to notice — for example, a quiet noise hidden right after a loud one — and stores what remains more efficiently. The file shrinks dramatically with little obvious loss in quality. See the lesson on how data is compressed for the general idea.
Keep exploring
More in Coding