Real-Time Object Detection (YOLO)

This project showcases a real-time object detection web application built using Python, Flask, Socket.IO, and OpenCV. Leveraging the YOLO (You Only Look Once) deep learning framework, the application can detect objects within images sent from a client interface in real-time. The backend, developed in Python, processes the images, detects objects, and sends the results back to the client via Socket.IO. The client interface, implemented using Next.js and Firebase for authentication, provides a seamless user experience for interacting with the object detection functionality. This project demonstrates the integration of machine learning models into web applications for real-time processing and interaction.

Usage

1. Backend Python Usage Overview: Real-Time Object Detection
from flask import Flask, jsonify, request
from flask_socketio import SocketIO, emit
import cv2
import numpy as np
# from flask_cors import CORS
import base64
from PIL import Image
import io
import json
app = Flask(__name__)
app.config["SECRET_KEY"] = "secret"
socketio = SocketIO(app)
socketio.init_app(app, cors_allowed_origins="*")
# CORS(app, origins="http://localhost:4200")
# Load YOLO
net = cv2.dnn.readNet("assets/yolov3.weights", "assets/yolov3.cfg")
classes = []
with open("assets/coco.names", "r") as f:
classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
# Function to perform object detection on an image
def detect_objects(image):
# Resize and normalize image
img = cv2.resize(image, None, fx=0.4, fy=0.4)
height, width, channels = img.shape
# Detect objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)
# Process detections
class_names_detected = []
confidences = []
boxes = []
for out in outs:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
# Object detected
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
if classes[class_id] not in class_names_detected:
class_names_detected.append(classes[class_id])
return boxes, confidences, class_names_detected
# Take in base64 string and return PIL image
def stringToImage(base64_string):
imgdata = base64.b64decode(base64_string)
return Image.open(io.BytesIO(imgdata))
# convert PIL Image to an RGB image( technically a numpy array ) that's compatible with opencv
def toRGB(image):
return cv2.cvtColor(np.array(image), cv2.COLOR_BGR2RGB)
@socketio.on("connect")
def handle_connect():
origin = request.headers.get("Origin")
print("New connection from origin:", origin)
@socketio.on("image")
def handle_image(data):
try:
print("getting data.....")
base64Image = data["img"]
# Decode base64 image
image_decoded = stringToImage(base64Image)
img_colored = toRGB(image_decoded)
# Perform object detection
boxes, confidences, items = detect_objects(img_colored)
# Emit detected objects
emit(
"detected_objects",
{"boxes": boxes, "confidences": confidences, "items": items},
)
except Exception as e:
print("Error:", e)
if __name__ == "__main__":
socketio.run(app, debug=True, host="localhost")
Utilizing Flask and Flask-SocketIO, this Python backend integrates a YOLO (You Only Look Once) deep learning model with OpenCV for real-time object detection. Upon receiving base64-encoded image data via WebSocket connections, the backend decodes, processes, and analyzes the images. Detected objects, complete with bounding box coordinates and confidence scores, are promptly communicated back to the client interface via Socket.IO for instantaneous visualization. Robust error handling ensures smooth operation, while performance optimizations, including image resizing and asynchronous processing, enhance scalability and responsiveness.