Does compressing a file ever make it bigger?

It can. Compression relies on patterns and repetition. Data that is already random or already compressed (like a JPEG inside a ZIP) has almost no patterns left, so the file may shrink only slightly or even grow a little because of the extra bookkeeping the format adds.

Why does saving a JPEG over and over reduce quality?

JPEG is lossy, so each save throws away a bit more detail. Re-saving a JPEG decompresses the already-damaged image and compresses it again, adding new losses on top of the old ones — a build-up sometimes called generation loss.

When should I use lossless instead of lossy?

Use lossless (PNG, ZIP, FLAC) when every bit matters — text, code, spreadsheets, or images with sharp edges and text. Use lossy (JPEG, MP3, MP4) for photos, music, and video where small, unnoticeable losses are an acceptable trade for much smaller files.

How Data Is Compressed | Coding

Why we squeeze data

Every file on a computer is stored as bits — 1s and 0s. A photo, a song, or a document might be made of millions of them. Compression is the art of storing the same information using fewer bits, so files take less space and travel faster across a network.

Compression is everywhere: ZIP archives, the JPEG photos on your phone, the MP3 and streaming audio you listen to, and the video you watch online. Without it, a single high-quality movie could fill an entire hard drive.

There are two big families of compression, and the difference between them matters a lot.

Lossless: get it all back

Lossless compression shrinks a file in a way that is completely reversible. When you decompress it, you get back exactly the original data, bit for bit — nothing is lost. This is essential for things where every detail counts: text, program code, spreadsheets, and images with sharp lines.

ZIP files, PNG images, and FLAC audio all use lossless compression. The trick is to find and remove repetition and patterns in the data, then describe them more briefly. Let's see two classic methods.

Run-length encoding

Imagine a row of pixels in a simple image, where W is white and B is black:

WWWWWWWWWWWWBBBWWWWWWWWW

That is 23 letters. Run-length encoding (RLE) replaces each "run" of the same value with the value and a count:

12W 3B 9W

Now we store three short pairs instead of 23 letters. RLE works brilliantly on data with long runs — plain backgrounds, scanned documents, simple icons. It works poorly on noisy data where values keep changing, because then there are no long runs to shorten.

Huffman coding

Normally every character uses the same number of bits — for example 8 bits each. But in real text, some symbols appear far more often than others. Huffman coding takes advantage of this by giving common symbols short codes and rare symbols long codes.

Suppose a message uses only four letters with these frequencies:

A: very common   →  code 0
B: common        →  code 10
C: rare          →  code 110
D: rare          →  code 111

The letter A now takes just 1 bit instead of 8. Because A appears so often, the average number of bits per letter drops well below 8, and the whole message gets smaller. Decompression still works perfectly because no code is the start of another code, so the decoder always knows where one symbol ends. Real ZIP tools combine ideas like this with pattern-matching to do even better.

Lossy: throw away what you won't miss

Lossy compression takes a bolder approach: it permanently discards some data to make files much smaller. The cleverness is in choosing data your senses are unlikely to miss. You cannot get the original back exactly — but if it is done well, you cannot tell.

JPEG (photos). Human eyes are very sensitive to brightness but much less sensitive to fine colour detail and tiny changes between neighbouring pixels. JPEG keeps the important structure of an image and throws away subtle detail we are unlikely to notice. This can shrink a photo to a small fraction of its original size. Push it too far, though, and you start to see blocky artefacts — that is the lost data showing through.

MP3 and streaming audio. These use a model of human hearing. If a loud sound and a much quieter sound happen at the same moment, you cannot hear the quiet one — it is masked. Lossy audio compression simply removes sounds you would not have heard anyway, plus frequencies that are too high or too low to matter, saving enormous space.

Video (MP4, etc.) goes further still by noticing that most of one frame looks almost identical to the frame before it, so it only stores what changed.

Lossless vs lossy: choosing well

	Lossless	Lossy
Reversible?	Yes, exact	No, data is discarded
Typical use	Text, code, PNG, ZIP	Photos, music, video
Size saving	Modest	Often huge
Risk	None	Quality loss if overdone

The rule of thumb: use lossless when every bit matters, and lossy when a small, unnoticeable loss is worth a much smaller file.

A limit worth knowing

Compression depends on patterns. Data that is already random — or already compressed — has almost no patterns left to exploit. That is why zipping a folder of JPEGs barely shrinks them, and why no algorithm can compress every possible file. Compression trades away predictability, and you can only do that once.

Try this activity

Be a compressor. Take a short string with lots of repetition, such as AAAAABBBBBBBBCCAAAA, and write its run-length encoding by hand. Count the characters before and after. Then try a string with no repeats, like ABCDEFG, and explain why RLE makes it longer. Finally, list three files on your device that are probably lossy (photos, songs, videos) and three that must be lossless (a document, your code, a spreadsheet).

To understand the 1s and 0s being compressed, see How Images and Sound Are Stored as Data, and for the patterns algorithms search for, Lists and Arrays.

How Data Is Compressed

Key takeaways