Image Processing for NABirds Dataset¶
Table of Contents¶
- Introduction
- Overview of the NABirds dataset
- Objectives of the notebook
- Read and Combine
- Loading dataset components
- Merging bounding box, class labels, train/test splits, and image paths
- Displaying the merged dataset
- Build Uniform Size and Cropping
- Identifying the largest bounding box
- Cropping and resizing images to a uniform dimension
- Saving preprocessed images
- Transition to the Training Notebook
- Summary of image processing
- Rationale for using a separate training notebook
- Next steps in the workflow
Introduction¶
This notebook focuses on the image processing pipeline for the NABirds dataset, preparing the data for machine learning model training. The NABirds dataset contains rich information about North American bird species, including image paths, bounding boxes, and class labels. This notebook covers the following steps:
Data Preparation:
- Loading and merging multiple dataset files such as bounding boxes, class labels, train/test splits, and image paths.
- Creating a unified dataset containing all necessary metadata for each image.
Image Processing:
- Identifying the largest bounding box dimensions to ensure uniform cropping.
- Cropping images around bounding boxes and resizing them to a consistent size of
224x224
pixels. - Saving the processed images into a new directory for further use.
The processed images will serve as the input for training a machine learning model to classify bird species. By completing these tasks locally, we optimize the workflow and ensure that only the essential, preprocessed data is transferred to the training environment.
Objectives of This Notebook¶
- To preprocess images by cropping and resizing them uniformly.
- To prepare a clean, structured dataset for efficient machine learning model training.
- To save preprocessed images into a separate directory for seamless integration with the training pipeline.
The output of this notebook will be used in a separate notebook for training and predicting bird species using the preprocessed dataset.
import os
import pandas as pd
import time
import os
import cv2
import shutil
from tqdm import tqdm
Declare the initial data path, as downloaded and extracted from the NA Birds dataset¶
data_path = r"C:\Users\kevin\Desktop\SEIS764_AI\FinalProject\nabirds"
start_time = time.time()
# Step-by-step process using the example logic
# Load bounding box information
file_path = os.path.join(data_path, "bounding_boxes.txt")
with open(file_path, 'r') as file:
lines = [line.strip() for line in file.readlines() if line.strip()]
bounding_boxes = pd.DataFrame(
[{'UUID': line.split()[0], 'bb_x': int(line.split()[1]), 'bb_y': int(line.split()[2]),
'bb_width': int(line.split()[3]), 'bb_height': int(line.split()[4])} for line in lines]
)
# Load class labels
file_path = os.path.join(data_path, "image_class_labels.txt")
with open(file_path, 'r') as file:
lines = [line.strip() for line in file.readlines() if line.strip()]
class_labels = pd.DataFrame(
[{'UUID': line.split()[0], 'class': int(line.split()[1])} for line in lines]
)
# Load train/test split
file_path = os.path.join(data_path, "train_test_split.txt")
with open(file_path, 'r') as file:
lines = [line.strip() for line in file.readlines() if line.strip()]
train_test_split = pd.DataFrame(
[{'UUID': line.split()[0], 'is_training_image': int(line.split()[1])} for line in lines]
)
# Load class descriptions
file_path = os.path.join(data_path, "classes.txt")
with open(file_path, 'r') as file:
lines = [line.strip() for line in file.readlines() if line.strip()]
class_descriptions = pd.DataFrame(
[{'class': int(line.split(" ", 1)[0]), 'description': line.split(" ", 1)[1]} for line in lines]
)
# Load image sizes
file_path = os.path.join(data_path, "sizes.txt")
with open(file_path, 'r') as file:
lines = [line.strip() for line in file.readlines() if line.strip()]
image_sizes = pd.DataFrame(
[{'UUID': line.split()[0], 'im_width': int(line.split()[1]), 'im_height': int(line.split()[2])} for line in lines]
)
# Load image paths
file_path = os.path.join(data_path, "images.txt")
with open(file_path, 'r') as file:
lines = [line.strip() for line in file.readlines() if line.strip()]
image_paths = pd.DataFrame(
[{'UUID': line.split()[0], 'path': line.split()[1]} for line in lines]
)
# Merge dataframes
merged_data = (bounding_boxes
.merge(class_labels, on="UUID")
.merge(train_test_split, on="UUID")
.merge(class_descriptions, on="class")
.merge(image_sizes, on="UUID")
.merge(image_paths, on="UUID"))
merged_data['class'] = merged_data['class'].astype(str)
# Display the merged dataset and number of rows
print("Merged DataFrame Head:")
print(merged_data.head(2))
# Print the number of rows
print(f"Number of rows in the dataset: {len(merged_data)}")
end_time = time.time()
time_elapsed = round((end_time - start_time),2)
print(f"\n\n\n Time to run this chunk of code: {time_elapsed} seconds")
Merged DataFrame Head: UUID bb_x bb_y bb_width bb_height \ 0 0000139e-21dc-4d0c-bfe1-4cae3c85c829 83 59 128 228 1 0000d9fc-4e02-4c06-a0af-a55cfb16b12b 328 88 163 298 class is_training_image description im_width im_height \ 0 817 0 Oak Titmouse 296 341 1 860 0 Ovenbird 640 427 path 0 0817/0000139e21dc4d0cbfe14cae3c85c829.jpg 1 0860/0000d9fc4e024c06a0afa55cfb16b12b.jpg Number of rows in the dataset: 48562 Time to run this chunk of code: 0.98 seconds
start_time = time.time()
# Define directories
base_data_dir = data_path
augmented_data_dir = os.path.join(base_data_dir, "processed_images")
image_dir = os.path.join(base_data_dir, "images")
bounding_boxes_file = os.path.join(base_data_dir, "bounding_boxes.txt")
image_paths_file = os.path.join(base_data_dir, "images.txt")
# Load bounding boxes and image paths
bounding_boxes = pd.read_csv(
bounding_boxes_file, delim_whitespace=True, header=None,
names=["UUID", "bb_x", "bb_y", "bb_width", "bb_height"]
)
image_paths = pd.read_csv(
image_paths_file, delim_whitespace=True, header=None, names=["UUID", "path"]
)
# Merge bounding boxes and paths
metadata = pd.merge(bounding_boxes, image_paths, on="UUID")
# Step 1: Find the largest bounding box dimensions
max_bb_width = metadata["bb_width"].max()
max_bb_height = metadata["bb_height"].max()
largest_box_size = int(max(max_bb_width, max_bb_height))
print(f"Largest box dimensions: Width={max_bb_width}, Height={max_bb_height}, Square={largest_box_size}")
# Step 2: Crop images to the largest box centered on the bird
if os.path.exists(augmented_data_dir):
shutil.rmtree(augmented_data_dir) # Clear processed images directory
os.makedirs(augmented_data_dir, exist_ok=True)
def crop_and_save_image(img_path, bb_x, bb_y, bb_width, bb_height, save_path, box_size):
"""Crop the image around the bird's bounding box and resize to uniform dimensions."""
img = cv2.imread(img_path)
if img is None:
print(f"Image not found: {img_path}")
return False
img_height, img_width, _ = img.shape
# Calculate the bird's center
bird_center_x = bb_x + bb_width / 2
bird_center_y = bb_y + bb_height / 2
### New stuff from Kevin O'Neill
max_side = max(bb_width, bb_height)
# Calculate the box coordinates (centered on bird and fit within image bounds)
x_min = max(0, int(bird_center_x - max_side / 2))
y_min = max(0, int(bird_center_y - max_side / 2))
x_max = min(img_width, int(bird_center_x + max_side / 2))
y_max = min(img_height, int(bird_center_y + max_side / 2))
# Crop and resize the image
cropped_img = img[y_min:y_max, x_min:x_max]
resized_img = cv2.resize(cropped_img, (box_size, box_size))
# Save the processed image
os.makedirs(os.path.dirname(save_path), exist_ok=True)
cv2.imwrite(save_path, resized_img)
return True
# Process all images
for _, row in tqdm(metadata.iterrows(), total=len(metadata)):
img_path = os.path.join(image_dir, row["path"])
save_path = os.path.join(augmented_data_dir, row["path"])
crop_and_save_image(
img_path, row["bb_x"], row["bb_y"], row["bb_width"], row["bb_height"],
save_path, 224
)
print("All images cropped and resized to uniform dimensions.")
end_time = time.time()
time_elapsed = round((end_time - start_time),2)
print(f"\n\n\n Time to run this chunk of code: {time_elapsed} seconds")
Largest box dimensions: Width=1024, Height=1024, Square=1024
100%|████████████████████████████████████████████████████████████████████████| 48562/48562 [18:18<00:00, 44.22it/s]
All images cropped and resized to uniform dimensions. Time to run this chunk of code: 1101.1499044895172 seconds
Transition to the Training Notebook¶
Process Summary¶
This notebook completes the preprocessing stage of the NABirds dataset. The following tasks were successfully executed:
- Data Merging:
- Unified bounding box, class label, train/test split, and image path data.
- Image Preprocessing:
- Cropped and resized bird images to a uniform dimension of
224x224
pixels. - Saved preprocessed images in the
processed_images
directory for efficient access during training.
- Cropped and resized bird images to a uniform dimension of
Rationale for Moving to a Separate Training Notebook¶
To streamline the workflow, we are transitioning the training and prediction stages to a separate notebook. The reasons for this modular approach include:
Decoupling Computation:
- Image preprocessing is computationally expensive and time-intensive. By isolating it in this notebook, we ensure that the preprocessing stage is completed once and stored for future use.
Cloud Resource Optimization:
- Training models will be conducted in a Google Colab environment, leveraging its GPU resources. By transferring only the preprocessed images, we avoid unnecessary reprocessing and make efficient use of Colab's resources.
Workflow Modularity:
- Splitting the workflow into distinct notebooks allows for easy debugging, updates, and enhancements to each stage without disrupting other parts of the project.
Next Steps¶
- Upload Processed Images to Colab:
- Transfer the
processed_images
directory to Google Colab. - Transfer the
bounding_boxes.txt
file to your Google Colab runtime.
- Transfer the
- Training and Prediction:
- Build and train a machine learning model for bird species classification.
- Evaluate the model's performance on test data and make predictions.
This modular pipeline ensures clarity, efficiency, and ease of use as we progress toward training a high-performance bird classification model.