🔊
Coding🔬 Ages 11-13Beginner 10 min read

How Sound Is Stored Digitally

Learn how computers turn sound into numbers using sampling, sample rate and bit depth, why CD audio uses 44,100 samples per second, and how file size is calculated — with worked examples and a quiz.

Key takeaways

  • Sound is a wave; computers store it by measuring the wave's height many times per second
  • Each measurement is called a sample, and the result is a list of numbers
  • Sample rate is how many samples are taken per second, measured in hertz (Hz)
  • Bit depth is how precisely each sample is recorded; more bits means finer detail
  • A higher sample rate and bit depth give better quality but a larger file

Sound is a wave

Clap your hands and you push the air, creating a wave that ripples outward until it reaches your ears. A microphone feels that same wave as a wiggle in air pressure and turns it into a wiggling electrical signal. The trouble is that this signal is smooth and continuous — it has a value at every instant — but a computer can only store numbers. So how do we squeeze an endless wiggle into a list of numbers?

The answer is the same trick computers use for everything: turn it into numbers, which underneath are just 0s and 1s, exactly as in How Computers Store Data in Binary. For sound, the method is called sampling.

Sampling: measuring the wave

To store a wave, the computer measures its height at tiny, regular time intervals — thousands of times every second. Each measurement is a sample. String the samples together and you have a list of numbers that traces the shape of the wave:

time:    0   1   2   3   4   5   6   7
sample:  0  18  31  25  -4 -22 -30 -15

Played back fast enough, those numbers recreate the original sound. The more often you measure, the more faithfully the list follows the true curve — just like a film is a smooth motion built from many still frames shown quickly.

Sample rate

The number of samples taken per second is the sample rate, measured in hertz (Hz). Common values:

Sample rateWhere it is used
8,000 HzTelephone calls (just enough for speech)
44,100 HzCD-quality music
48,000 HzVideo and film audio

A rate of 44,100 Hz means the wave is measured 44,100 times every second. That sounds extreme, but it is what is needed to capture the full range of sounds a human ear can hear. Too low a rate and high notes vanish or sound wrong — the audio version of a blurry, low-resolution photo.

Bit depth: how precise each sample is

The sample rate decides how often you measure. Bit depth decides how precisely you record each measurement. With more bits per sample, the height can be stored more finely:

  • 8-bit audio stores each sample as one of 256 levels — fine for retro game sounds, but a bit grainy.
  • 16-bit audio (CD standard) gives 65,536 levels — smooth and clear to the ear.
  • 24-bit audio gives over 16 million levels, used in professional studios.

Low bit depth adds a faint background "fuzz" called quantisation noise, because the true height has to be rounded to the nearest available level.

Working out the file size

Because audio is just a list of numbers, you can calculate exactly how big a recording is. Multiply together: sample rate × bytes per sample × number of channels × seconds.

A 16-bit sample is 2 bytes. Stereo means 2 channels (left and right). For one second of CD audio:

44,100 samples/sec × 2 bytes × 2 channels = 176,400 bytes per second

That is about 176 KB every second, or over 10 MB per minute, before any compression. This is exactly why formats like MP3 exist — to shrink that down to a size you can stream and store.

A complete worked example

This Python program generates a one-second beep from scratch by computing samples of a sine wave, then saves it as a real .wav file using the built-in wave and math modules — no extra installs needed. Run it and play beep.wav.

import wave, math, struct

sample_rate = 44100      # samples per second
duration = 1.0           # seconds
frequency = 440          # Hz -> the musical note A

samples = []
for n in range(int(sample_rate * duration)):
    # height of the wave at this instant, scaled to a 16-bit range
    t = n / sample_rate
    value = math.sin(2 * math.pi * frequency * t)
    samples.append(int(value * 30000))   # 30000 keeps it within 16-bit limits

# Write the list of numbers into a standard WAV file
with wave.open("beep.wav", "w") as f:
    f.setnchannels(1)        # mono
    f.setsampwidth(2)        # 2 bytes = 16-bit samples
    f.setframerate(sample_rate)
    for s in samples:
        f.writeframes(struct.pack("<h", s))   # "<h" = one 16-bit number

print("Wrote", len(samples), "samples")
print("File size estimate:", len(samples) * 2, "bytes")

Reading the code: the loop builds 44,100 samples, one for each measurement in the second. For every sample it computes the height of a 440 Hz sine wave at that moment and scales it into the 16-bit range. The wave module then writes those numbers into a proper audio file. The printed sample count (44,100) and size estimate (about 88 KB) match the formula above for one second of 16-bit mono audio.

Try it yourself

  1. Change the note. Set frequency = 880 and listen — it should sound exactly one octave higher (double the frequency).
  2. Halve the quality. Set sample_rate = 8000 and compare. The beep still works but high detail is lost, and the file is much smaller.
  3. Do the maths. Calculate the size of a 3-minute stereo CD-quality song: 44100 × 2 × 2 × 180 bytes. Convert to megabytes by dividing by 1,000,000.

Challenge: Make a short two-note tune. Generate half a second at 440 Hz, then half a second at 523 Hz (the note C), join the two sample lists together, and write them to one WAV file. Then experiment: what happens to the sound if you add two sine waves of different frequencies into each sample instead of playing them one after another? You have just discovered how chords are built.

Quick quiz

Test yourself and earn XP

What does a computer measure to record a sound?

What is the sample rate?

How many samples per second does CD-quality audio use?

What does a higher bit depth give you?

What happens to file size if you raise the sample rate?

FAQ

Human ears hear up to about 20,000 Hz. A rule called the Nyquist theorem says you must sample at more than twice the highest frequency you want to capture, so you need over 40,000 samples per second. 44,100 was chosen because it sat just above that limit and fitted neatly with the video equipment used to master early CDs. It captures everything a person can hear.

An MP3 uses compression. It cleverly throws away parts of the sound that human ears are unlikely to notice — for example, a quiet noise hidden right after a loud one — and stores what remains more efficiently. The file shrinks dramatically with little obvious loss in quality. See the lesson on how data is compressed for the general idea.