File Handling in Python
Definition: File handling refers to the ability to read from and write to files on the file system.In Data Science, you constantly work with files — CSVs, text files, JSON, configuration files, and log files.Understanding file handling in Python is essential.
---
Why File Handling Matters in Data Science
- Loading datasets from CSV, JSON, or text files.
- Saving model outputs, predictions, and reports.
- Logging experiment results.
- Reading configuration files for pipelines.
---
Opening a File
Syntax: file = open("filename", "mode")
File Modes:
| Mode | Description | Creates File? | Overwrites? |
|---|---|---|---|
"r" | Read (default) | ⌠No (error if not found) | ⌠No |
"w" | Write | ✅ Yes | ✅ Yes (erases existing) |
"a" | Append | ✅ Yes | ⌠No (adds to end) |
"x" | Create (exclusive) | ✅ Yes (error if exists) | ⌠No |
"r+" | Read + Write | ⌠No | ⌠No |
"b" | Binary mode (add to above) | — | — |
---
Reading Files
Method 1: read() — Reads entire file as a single string.
file = open("data.txt", "r")
content = file.read()
print(content)
file.close()
Method 2: readline() — Reads one line at a time.
file = open("data.txt", "r")
line1 = file.readline()
line2 = file.readline()
file.close()
Method 3: readlines() — Reads all lines into a list.
file = open("data.txt", "r")
lines = file.readlines() # ["line1\n", "line2\n", ...]
file.close()
---
Writing to Files
# Write mode (overwrites file)
file = open("output.txt", "w")
file.write("Hello, World!\n")
file.write("Data Science is fun!")
file.close()
# Append mode (adds to end)
file = open("output.txt", "a")
file.write("\nNew line appended!")
file.close()
---
The with Statement (Best Practice)
The with statement automatically closes the file when the block is exited, even if an error occurs. Always use with for file handling.
with open("data.txt", "r") as file:
content = file.read()
print(content)
# File is automatically closed here
---
Reading & Writing Comparison
| Operation | Method | Description |
|---|---|---|
| Read entire file | file.read() | Returns one big string |
| Read one line | file.readline() | Returns next line |
| Read all lines | file.readlines() | Returns list of lines |
| Write string | file.write(str) | Writes a string |
| Write list | file.writelines(list) | Writes a list of strings |
---
Working with CSV Files
CSV (Comma-Separated Values) is the most common data format in Data Science.
Using the csv module:
import csv
# Reading CSV
with open("data.csv", "r") as file:
reader = csv.reader(file)
for row in reader:
print(row) # Each row is a list
# Writing CSV
with open("output.csv", "w", newline="") as file:
writer = csv.writer(file)
writer.writerow(["Name", "Age", "City"])
writer.writerow(["Rahul", 21, "Delhi"])
Using Pandas (Preferred in Data Science):
import pandas as pd
df = pd.read_csv("data.csv") # Read
df.to_csv("output.csv", index=False) # Write
---
Working with JSON Files
JSON (JavaScript Object Notation) is commonly used in APIs and web data.
import json
# Reading JSON
with open("data.json", "r") as file:
data = json.load(file) # Returns dict or list
# Writing JSON
with open("output.json", "w") as file:
json.dump(data, file, indent=4)
---
File Handling Summary Table
| Format | Module | Read | Write |
|---|---|---|---|
| Text (.txt) | Built-in | open().read() | open().write() |
| CSV (.csv) | csv / pandas | csv.reader() / pd.read_csv() | csv.writer() / df.to_csv() |
| JSON (.json) | json | json.load() | json.dump() |
| Excel (.xlsx) | pandas / openpyxl | pd.read_excel() | df.to_excel() |
| Pickle (.pkl) | pickle | pickle.load() | pickle.dump() |
Summary
- File handling allows reading from and writing to files on disk.
- Always use
with open()to ensure files are properly closed. - File modes (
r,w,a,x) determine the operation and behavior. - CSV and JSON are the most common formats in Data Science.
- Pandas provides the simplest interface for reading/writing tabular data.