Short answer: yes — this is absolutely possible. You can take the Dahua XVR RTSP streams, run them on your Orange Pi 6 TOPS to do real-time object detection (cars, people, pets, etc.) and face recognition, and serve a processed live stream (MJPEG or WebRTC) or alerts. Below I give a complete, practical plan plus two tested-by-pattern code pipelines you can run on Ubuntu 22.04 (CPU-first, with optional acceleration notes for the NPU). I also include deployment & performance tips.
Overview — components & approach
- Input (RTSP from Dahua XVR)
The XVR exposes per-channel RTSP URLs. Typical template (replace credentials & channel):rtsp://<user>:<pass>@10.10.0.30:554/cam/realmonitor?channel=<N>&subtype=0
We’ll make the pipeline accept the RTSP URL as a parameter. - Processing computer
Orange Pi (6 TOPS NPU) running Ubuntu 22.04. We’ll start with CPU-only code so it runs immediately; later I explain how to use the NPU or hardware acceleration (ONNX, TensorRT, vendor SDK) for much better FPS. - Detection & recognition
- Object Detection: YOLO (small variant), or MobileNet-SSD for very light CPU usage. Detect persons, cars, dogs, cats, etc.
- Face Recognition: use
face_recognition
(dlib-based embeddings) or a lightweight embedding model (InsightFace/ArcFace/ONNX) to compare faces to a local database.
- Output
- Serve processed frames over HTTP as MJPEG (easy) — viewable in browser or in a small web UI.
- Optionally: send MQTT/REST events when objects/faces are detected, save snapshots, or integrate with a dashboard.
- Privacy & legal — keep in mind local laws around face recognition. Secure access to streams and person database.
Two practical pipelines
I give two pipelines: A — lightweight, easy, lower-CPU (OpenCV DNN MobileNet-SSD + face_recognition), and B — higher-accuracy (YOLOv8/YOLOv5 via Ultralytics + face_recognition). Use A first to verify everything, then migrate to B if you need accuracy and can afford CPU.
I assume you can run commands on the Orange Pi and can install packages. Replace
RTSP_URL
with the channel URL of your XVR. I won’t repeat the question about RTSP credentials; the code accepts them via env/argument.
Common prerequisites (Ubuntu 22.04)
Open a terminal on the Orange Pi:# system packages sudo apt update sudo apt install -y python3 python3-venv python3-pip build-essential libatlas-base-dev libopenblas-dev libhdf5-dev # (optional) improve video capture: sudo apt install -y ffmpeg v4l-utils
Create a venv:python3 -m venv ~/cctv-env source ~/cctv-env/bin/activate pip install --upgrade pip
PIPELINE A — Lightweight (recommended first test)
- Object detection: MobileNet-SSD (OpenCV DNN) — very light on CPU.
- Face recognition: face_recognition (dlib-based). Note: dlib build can be slow to compile; on ARM it may take long but it’s common.
Install Python deps
pip install opencv-python-headless numpy flask pillow imutils face_recognition
If
face_recognition
fails to build, you may needsudo apt install -y libboost-all-dev cmake libopenblas-dev libblas-dev liblapack-dev
and then retry.
Download models
Place models in ~/cctv/models/
:
- MobileNet SSD prototxt + caffemodel (common filenames):
deploy.prototxt.txt
(model architecture)res10_300x300_ssd_iter_140000.caffemodel
orMobileNetSSD_deploy.caffemodel
depending on the variant.
(If you don’t have the files, fetch them from the official sources — put them under ~/cctv/models/
.)
Project structure
~/cctv/ models/ MobileNetSSD_deploy.prototxt MobileNetSSD_deploy.caffemodel faces/ # known persons: folder per person with sample images app.py # main script (below)
app.py
(lightweight pipeline)
Save this full script as ~/cctv/app.py
:#!/usr/bin/env python3 # Lightweight real-time CCTV: MobileNet-SSD + face_recognition -> MJPEG stream import cv2, time, os, numpy as np from flask import Flask, Response, request, jsonify import face_recognition # CONFIG RTSP_URL = os.environ.get("RTSP_URL", "rtsp://user:pass@10.10.0.30:554/cam/realmonitor?channel=1&subtype=0") MODEL_PROTO = "./models/MobileNetSSD_deploy.prototxt" MODEL_WEIGHTS = "./models/MobileNetSSD_deploy.caffemodel" KNOWN_FACES_DIR = "./faces" CONF_THRESHOLD = 0.5 # MobileNet-SSD classes (COCO-lite) CLASSES = ["background","aeroplane","bicycle","bird","boat","bottle","bus","car","cat","chair", "cow","diningtable","dog","horse","motorbike","person","pottedplant","sheep", "sofa","train","tvmonitor"] # Load detection model net = cv2.dnn.readNetFromCaffe(MODEL_PROTO, MODEL_WEIGHTS) # Load known faces known_encodings = [] known_names = [] print("Loading known faces...") for name in os.listdir(KNOWN_FACES_DIR): person_dir = os.path.join(KNOWN_FACES_DIR, name) if not os.path.isdir(person_dir): continue for fname in os.listdir(person_dir): path = os.path.join(person_dir, fname) img = face_recognition.load_image_file(path) encs = face_recognition.face_encodings(img) if encs: known_encodings.append(encs[0]) known_names.append(name) print(f"Loaded {len(known_names)} known faces") # Video capture cap = cv2.VideoCapture(RTSP_URL) if not cap.isOpened(): print("ERROR: cannot open stream. Check RTSP URL.") # The script will still run; streaming generator will fail gracefully. app = Flask(__name__) def gen_frames(): while True: ret, frame = cap.read() if not ret: # try reconnect time.sleep(0.5) continue # Resize for speed h, w = frame.shape[:2] in_frame = cv2.resize(frame, (300, 300)) blob = cv2.dnn.blobFromImage(in_frame, 0.007843, (300, 300), 127.5) net.setInput(blob) detections = net.forward() # run face recognition on a smaller resized copy rgb_small = cv2.resize(frame, (0,0), fx=0.5, fy=0.5)[:, :, ::-1] face_locations = face_recognition.face_locations(rgb_small) face_encodings = face_recognition.face_encodings(rgb_small, face_locations) # annotate faces for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings): # scale back top *= 2; right *= 2; bottom *= 2; left *= 2 matches = face_recognition.compare_faces(known_encodings, face_encoding, tolerance=0.5) name = "Unknown" if True in matches: first_match_index = matches.index(True) name = known_names[first_match_index] cv2.rectangle(frame, (left, top), (right, bottom), (0,255,0), 2) cv2.putText(frame, name, (left, top-6), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0), 2) # annotate object detections for i in range(detections.shape[2]): conf = float(detections[0,0,i,2]) if conf < CONF_THRESHOLD: continue idx = int(detections[0,0,i,1]) label = CLASSES[idx] if idx < len(CLASSES) else str(idx) box = detections[0,0,i,3:7] * np.array([w, h, w, h]) (startX, startY, endX, endY) = box.astype("int") cv2.rectangle(frame, (startX, startY), (endX, endY), (255,0,0), 2) text = f"{label}: {conf:.2f}" cv2.putText(frame, text, (startX, startY - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255,0,0), 2) # encode as jpeg ret2, jpeg = cv2.imencode('.jpg', frame, [int(cv2.IMWRITE_JPEG_QUALITY), 70]) if not ret2: continue frame_bytes = jpeg.tobytes() yield (b'--frame\r\n' b'Content-Type: image/jpeg\r\n\r\n' + frame_bytes + b'\r\n') @app.route('/video_feed') def video_feed(): return Response(gen_frames(), mimetype='multipart/x-mixed-replace; boundary=frame') @app.route('/') def index(): return "<html><body><h2>Camera Feed</h2><img src='/video_feed' /></body></html>" if __name__ == '__main__': app.run(host='0.0.0.0', port=5000, threaded=True)
Run:export RTSP_URL="rtsp://admin:YourPassword@10.10.0.30:554/cam/realmonitor?channel=1&subtype=0" python app.py
Open http://<orangepi-ip>:5000/
in a browser to see the processed MJPEG stream.
PIPELINE B — Higher accuracy (YOLOv8/YOLOv5 + face_recognition)
Use this when you want better detection accuracy. It’s heavier on CPU. If your Orange Pi has an NPU, you can convert the model to ONNX and use ONNX Runtime with NPU backend.
Install (CPU) — simpler with Ultralytics
pip install ultralytics opencv-python-headless flask face_recognition numpy pillow
ultralytics
provides YOLOv8 models andpip install ultralytics
will install PyTorch. On ARM CPU, PyTorch wheels may not be available — you may need to install a CPU-only PyTorch build for ARM or usepip install onnxruntime
with a converted ONNX model. If PyTorch errors appear, revert to converting an ONNX variant and useonnxruntime
.
Example app_yolo.py
(YOLO detection + face_recognition)
#!/usr/bin/env python3 # Use ultralytics YOLO (v8) for object detection; face_recognition for faces import os, time, cv2, numpy as np from ultralytics import YOLO from flask import Flask, Response import face_recognition RTSP_URL = os.environ.get("RTSP_URL", "rtsp://user:pass@10.10.0.30:554/cam/realmonitor?channel=1&subtype=0") KNOWN_FACES_DIR = "./faces" MODEL_PATH = os.environ.get("YOLO_MODEL", "yolov8n.pt") # yolov8n small model # load model print("Loading YOLO model:", MODEL_PATH) model = YOLO(MODEL_PATH) # known faces known_encodings = [] known_names = [] for name in os.listdir(KNOWN_FACES_DIR): person_dir = os.path.join(KNOWN_FACES_DIR, name) if not os.path.isdir(person_dir): continue for f in os.listdir(person_dir): img = face_recognition.load_image_file(os.path.join(person_dir, f)) encs = face_recognition.face_encodings(img) if encs: known_encodings.append(encs[0]) known_names.append(name) print("Known faces:", known_names) cap = cv2.VideoCapture(RTSP_URL) app = Flask(__name__) def gen_frames(): while True: ret, frame = cap.read() if not ret: time.sleep(0.3) continue h, w = frame.shape[:2] # YOLO model expects RGB rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) results = model.predict(source=rgb, imgsz=640, conf=0.35, iou=0.45, single_cls=False) # results is a list; take first for r in results: boxes = r.boxes # Boxes object for box in boxes: xyxy = box.xyxy[0].cpu().numpy().astype(int) # [x1,y1,x2,y2] conf = float(box.conf[0].cpu().numpy()) cls = int(box.cls[0].cpu().numpy()) label = model.names.get(cls, str(cls)) # Filter classes of interest (car, person, dog, cat etc.) if label in ("person","car","truck","bus","dog","cat","bicycle","motorbike"): x1, y1, x2, y2 = xyxy cv2.rectangle(frame, (x1,y1),(x2,y2),(0,0,255),2) cv2.putText(frame, f"{label} {conf:.2f}", (x1, y1-6), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,0,255),2) # face recognition (smaller scale) small = cv2.resize(frame, (0,0), fx=0.5, fy=0.5)[:, :, ::-1] face_locations = face_recognition.face_locations(small) face_encodings = face_recognition.face_encodings(small, face_locations) for (top,right,bottom,left), enc in zip(face_locations, face_encodings): top*=2; right*=2; bottom*=2; left*=2 matches = face_recognition.compare_faces(known_encodings, enc, tolerance=0.5) name = "Unknown" if True in matches: name = known_names[matches.index(True)] cv2.rectangle(frame, (left,top),(right,bottom),(0,255,0),2) cv2.putText(frame, name, (left, top-6), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0),2) # return jpeg ret2, jpeg = cv2.imencode('.jpg', frame, [int(cv2.IMWRITE_JPEG_QUALITY),70]) if not ret2: continue yield (b'--frame\r\nContent-Type: image/jpeg\r\n\r\n' + jpeg.tobytes() + b'\r\n') @app.route('/video_feed') def video_feed(): return Response(gen_frames(), mimetype='multipart/x-mixed-replace; boundary=frame') @app.route('/') def index(): return "<html><body><h2>YOLO Camera</h2><img src='/video_feed' /></body></html>" if __name__ == '__main__': app.run(host='0.0.0.0', threaded=True, port=5000)
Run:export RTSP_URL="rtsp://admin:YourPass@10.10.0.30:554/cam/realmonitor?channel=1&subtype=0" export YOLO_MODEL="yolov8n.pt" # or yolov5n.pt etc python app_yolo.py
Notes:
- If PyTorch installation fails on ARM/OrangePi, convert
yolov8n.pt
to ONNX (on a x86 machine), copy.onnx
to Orange Pi, and useonnxruntime
to run inference (faster and more portable on CPU/NPU backends). - You can set
model = YOLO("yolov8n.pt")
or download the small model locally.
Face database & workflow
- Create
faces/<PersonName>/
and drop 3–10 face images per person (frontal and various angles). - The scripts automatically load encodings at start. To add new faces at runtime you can create an API to add images and recompute encodings or reload.
Simple script to add a face (example, add_face.py
):# run with args: python add_face.py "Alice" file1.jpg file2.jpg import sys, os, shutil name = sys.argv[1] os.makedirs(f"faces/{name}", exist_ok=True) for f in sys.argv[2:]: shutil.copy(f, f"faces/{name}/{os.path.basename(f)}") print("Added")
Performance tuning & NPU acceleration
- Resize frames before detection (we used 300×300 or 640). Smaller size → higher FPS.
- Use smaller models:
yolov8n
oryolov5n
, or tiny YOLOv4. - Batching is not applicable to streaming single frames.
- NPU on Orange Pi (the 6 TOPS NPU): you must use the vendor SDK (e.g.,
rknn
ornncase
, or the specific Orange Pi SDK) to convert models (ONNX, TensorFlow) to NPU format. Then run inference via that SDK. Steps:- Export model to ONNX (on a x86 machine if conversion tools aren’t available on Orange Pi).
- Convert ONNX to vendor runtime model using the NPU compiler.
- Use their runtime (C or Python) to run inference and get boxes/embeddings.
- Use ONNX Runtime (CPU + potential OpenVINO backend) — it often beats raw PyTorch CPU.
- Use multiple processes: one process reads RTSP, one runs detection, one serves HTTP — avoids GIL bottlenecks.
Reliable RTSP handling
- OpenCV
VideoCapture
is easy but can be flaky for long RTSP sessions. Useffmpeg
/gstreamer
-based capture orrtsp-simple-server
/ffmpeg
pipelines for more robust reconnection. - Example: use
ffmpeg
to pipe frames into your app or to restream to an internal RTSP server that you then read from.
Alerts, logging & snapshot storage
Add code hooks where you detect a target (person, face match, car) to:
- save a snapshot image,
- publish an MQTT event,
- send an HTTP webhook,
- store detection metadata to a local DB (SQLite).
Security & deployment
- Run the Flask app behind a reverse proxy (nginx) with HTTPS and basic auth.
- Secure the RTSP credentials and restrict access to the Orange Pi.
- Consider retention policies for face data and snapshots.
Troubleshooting & tips
- If
face_recognition
has build problems, consider usingface_recognition_models
wheels or use a pure ONNX/InsightFace pipeline. - If CPU is too slow: convert detectors to ONNX and use
onnxruntime
withOpenVINO
/CPU
execution provider; or compile models for the Orange Pi NPU. - Start with one camera (channel) to benchmark FPS and CPU. Then scale.
Quick checklist to get started right now
- On Orange Pi: create venv and install pip packages (see prerequisites).
- Create
~/cctv/models/
and put MobileNet-SSD model files there. - Create
~/cctv/faces/<PersonName>/
with sample photos. - Save
app.py
(lightweight) and run withRTSP_URL
env var. - Open browser to
http://<orangepi-ip>:5000/
.
Privacy & legal reminder
Face recognition can be subject to strict regulations. Ensure you have lawful basis, signage, and secure storage. Limit who can access the stream and face database.
If you want, I can:
- produce an optimized ONNX pipeline and provide conversion commands (I can write the conversion scripts here),
- give you a
systemd
service file to auto-run the app on boot, - or provide a small web UI (HTML + JS) with detection logs and snapshot gallery.
Tell me which next step you’d like (e.g., “give me ONNX conversion + onnxruntime code for the Orange Pi” or “show me how to robustly re-stream the Dahua channels if RTSP drops”), and I’ll produce the exact scripts and config.