Working with CSV Data in Python
Read and write spreadsheet-style data in Python with the csv module: parse rows, use DictReader and DictWriter, total a column, filter records and save results. Runnable code, a complete data program and a quiz.
Key takeaways
- A CSV file stores a table as plain text, with commas separating columns and newlines separating rows
- import csv and csv.reader(file) gives each row as a list of strings
- csv.DictReader gives each row as a dictionary keyed by the header names, which is far easier to read
- Values from a CSV are always strings, so convert numbers with int() or float() before doing maths
- csv.writer / csv.DictWriter save data back out; use newline="" when opening files for csv
Data in rows and columns
Spreadsheets are everywhere โ class grades, sports stats, science measurements, sales figures. Behind a huge number of them sits one of the simplest, most useful file formats ever invented: CSV, short for Comma-Separated Values. In this lesson you'll learn to read CSV data into Python, analyse it, filter it, and write results back out. This is the doorway to real data work.
You'll get the most from this if you already know about reading and writing files in Python and feel comfortable with dictionaries. Let's open some data.
What a CSV file actually is
A CSV file is just plain text. Each line is a row; within a row, commas separate the columns. Usually the first row holds the headers (the column names). Here's a tiny file, scores.csv:
name,subject,score
Amara,Maths,88
Ben,Maths,72
Chloe,Science,95
Dev,Science,61
You could open that in Excel and see a neat table, or in a text editor and see exactly the characters above. That dual nature โ readable by humans and by every program โ is why CSV is so popular.
Reading with csv.reader
Python has a built-in csv module for this. The simplest tool, csv.reader, hands you each row as a list of strings:
import csv
with open("scores.csv", newline="") as f:
reader = csv.reader(f)
for row in reader:
print(row)
This prints lists like ['name', 'subject', 'score'], then ['Amara', 'Maths', '88'], and so on. Two details matter:
- We open the file inside a
withblock (so it closes itself) and addnewline="", which the csv module needs to handle line endings correctly across operating systems. - Every value is a string, even
'88'. You can't add strings as numbers, so we'll convert when needed.
To skip the header row and total the scores, we'd index into each list with row[2] and convert:
import csv
total = 0
with open("scores.csv", newline="") as f:
reader = csv.reader(f)
next(reader) # skip the header row
for row in reader:
total += int(row[2]) # row[2] is the score, as text
print("Total of all scores:", total)
next(reader) reads and discards the first row. int(row[2]) turns the score text into a number so we can add it.
A nicer way: csv.DictReader
Remembering that the score is row[2] is fragile โ add a column and everything shifts. csv.DictReader fixes this by using the header row as keys, so each row becomes a dictionary:
import csv
with open("scores.csv", newline="") as f:
reader = csv.DictReader(f)
for row in reader:
print(row["name"], "got", row["score"])
Now row["score"] always means the score, no matter the column order. This is much clearer and is the approach you'll usually want. (DictReader skips the header automatically, since it turns it into the keys.)
Doing real analysis
With DictReader, calculations read almost like English. Let's find the average score and the top student:
import csv
names = []
scores = []
with open("scores.csv", newline="") as f:
reader = csv.DictReader(f)
for row in reader:
names.append(row["name"])
scores.append(int(row["score"])) # convert to number
average = sum(scores) / len(scores)
print(f"Average score: {average:.1f}")
best_index = scores.index(max(scores))
print("Top student:", names[best_index])
We collect names and (converted) scores into two lists. sum(scores) / len(scores) gives the mean, formatted to one decimal place. max(scores) finds the highest value, and scores.index(...) tells us where it is so we can look up the matching name.
Writing CSV back out
After processing data you often want to save the result. csv.writer writes lists of values as rows:
import csv
rows = [
["name", "score", "result"],
["Amara", 88, "Pass"],
["Ben", 72, "Pass"],
["Dev", 61, "Pass"],
]
with open("results.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(rows)
writerows writes every list as one CSV line. Note the "w" mode (write) and again newline="". There's also csv.DictWriter, which writes dictionaries โ you give it the column names up front:
import csv
people = [
{"name": "Amara", "score": 88},
{"name": "Chloe", "score": 95},
]
with open("out.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "score"])
writer.writeheader() # writes name,score
writer.writerows(people)
writeheader() writes the column titles, then writerows writes each dict, matching its keys to the fieldnames.
Worked example: pass/fail report
Let's build a complete program that reads scores.csv, decides who passed (60 or more), prints a summary, and writes a new CSV with a result column.
import csv
PASS_MARK = 60
def load_scores(path):
"""Return a list of {name, subject, score(int)} dicts."""
records = []
with open(path, newline="") as f:
reader = csv.DictReader(f)
for row in reader:
records.append({
"name": row["name"],
"subject": row["subject"],
"score": int(row["score"]),
})
return records
def add_results(records):
"""Add a 'result' key of Pass or Fail to each record."""
for r in records:
r["result"] = "Pass" if r["score"] >= PASS_MARK else "Fail"
return records
def save_report(records, path):
with open(path, "w", newline="") as f:
writer = csv.DictWriter(
f, fieldnames=["name", "subject", "score", "result"]
)
writer.writeheader()
writer.writerows(records)
# Run the pipeline
data = load_scores("scores.csv")
data = add_results(data)
passed = sum(1 for r in data if r["result"] == "Pass")
print(f"{passed} of {len(data)} students passed.")
save_report(data, "report.csv")
print("Saved report.csv")
Walking through it:
load_scoresreads the file withDictReaderand converts each score to anintas it builds a clean list of dictionaries โ a tidy in-memory copy of the table.add_resultsloops through and tags each record"Pass"or"Fail"using a conditional expression ("Pass" if ... else "Fail").save_reportusesDictWriterto write the enriched data โ now with four columns โ back to a new file.- The middle section counts the passes with a generator expression (
sum(1 for r in data if ...)) and prints a summary before saving.
Each function does one stage of the work: load, transform, save. That clean separation is exactly how professional data pipelines are structured.
Try it yourself
Extend the report program:
- Add a function
subject_average(records, subject)that returns the average score for one subject only (e.g. just"Maths"). Filter the records, convert nothing extra (scores are already ints), and divide the sum by the count. - Sort the students from highest to lowest score before saving. Hint:
data.sort(key=lambda r: r["score"], reverse=True). - Add a
gradecolumn: 90+ is "A", 70+ is "B", 60+ is "C", otherwise "F". Use if / elif / else inside a loop. - Challenge: handle a missing file by catching
FileNotFoundErrorinload_scoresand returning an empty list with a friendly message.
CSV is your gateway to data analysis. When your files grow large and your calculations get heavier, the next step is dedicated libraries โ see using Python libraries (modules) to discover tools built for exactly this kind of work.
Quick quiz
Test yourself and earn XP
What does CSV stand for?
CSV means Comma-Separated Values: a text format where commas separate the fields in each row.
What does csv.DictReader give you for each row?
DictReader uses the first row as keys, so each later row becomes a dict like {'name': 'Sam', 'score': '90'}.
If a CSV cell holds 90, what type is it when you read it?
Everything read from a CSV is text. You must call int() or float() to do arithmetic on it.
Why open a CSV file with newline="" for writing?
Without newline="", the csv writer can produce a blank line between every row on some systems.
Which class writes dictionaries back out as CSV rows?
csv.DictWriter takes a fieldnames list and writes dicts as rows, matching keys to columns.
FAQ
It is tempting, but splitting on commas breaks the moment a value itself contains a comma, like "Smith, John" or an address. Real CSV files wrap such values in quotes, and the csv module understands those quoting rules correctly. Using import csv saves you from a whole class of subtle bugs, so prefer it over line.split(',').
Yes. Spreadsheet programs can save (export) a sheet as a .csv file, which is plain text. Python's csv module reads that directly. Going the other way, a CSV your program writes will open straight back up in Excel or Sheets as a table. CSV is the universal bridge between code and spreadsheets.
Keep exploring
More in Coding