Objective and Design Philosophy
The goal of this device is to develop a smart security system capable of detecting motion at an entry point, capturing and processing image data, and securely communicating visual alerts via wireless protocols. The system is designed to function as a reliable and low-power video monitoring device for the smart home system. Emphasis is placed on reliable wireless communication and latency optimization for real-time streaming.
Core Components
The system is built around the following hardware and software elements:
- ArduCAM Mini 5MP Plus (V5642): Used to capture 320 x 240 JPEG images at 10 frames per second and compress them on board to reduce transmission burden
- Pi Arduino UNO: Controls the ArduCAM and transmits JPEG image data line-by-line over UART
- Raspberry Pi Pico W: Reconstructs the JPEG and conditionally transmits the image via MQTT
- HC-SR501 PIR Motion Sensor: Connected to the Arduino to detect motion events. It triggers a notification on the Smart Mobile Device to prompt the user to open the live feed.
Implementation Strategy:
The architecture separates the camera control and network communication into two distinct subsystems (Arduino and Pico) to simplify firmware design and isolate real-time sensor capture from communication logic. JPEGs are encoded and buffered on the Arduino, then streamed over UART with start/end markers. The Pico W parses and validates the stream, reconstructs the JPEG, and selectively publishes the JPEG data to MQTT.
Motion detection processing is done using the Arduino, with state changes transmitted alongside image headers. The MQTT topic camera/motion on the HUB receives real-time motion state updates, while camera/frames/latest receives base64-encoded JPEG payloads.
Engineering Challenges and Solutions
- The Arduino operates at a higher voltage than the Pico. To reduce the voltage to safe levels for the Pico, we implemented a voltage divider on the TX line.
- JPEG data transfer was originally too slow for real-time video (9600 baud rate). We increased the baud rate to 57200, which was the maximum value we achieved for successful image transfer. Additionally, the UART buffering was optimized to accommodate full-frame streaming.
- JPEGs were either unreadable or misaligned due to incomplete parsing. We added stream parsing logic to detect JPEG start (0xFFD8) and end (0xFFD9) markers and discard all pre/post garbage.
- The image start and end markers were occasionally split across multiple reads, so we introduced a rolling window buffer and character-by-character UART parsing to catch markers regardless of alignment.
- Because images failed to transfer after a soft reboot, we added UART buffer flush routines and introduced handshake logic between the Arduino and Pico to re-align the stream.
Performance:
System performance was evaluated in terms of image capture latency, transmission delay, and data consistency. With the ArduCAM configured to capture images at a reduced resolution of 320 x 240 pixels, average JPEG file sizes ranged between 8-15 KB depending on scene complexity. Simpler, low-texture scenes yielded smaller files and faster transmission times, while high-contrast or detailed scenes increased JPEG size and introduced transmission lag.
The total end-to-end latency from image capture to image availability in Home Assistant was measured to be approximately 1 second under worst case conditions. The ArduCAM completes image capture in a consistent 100ms, while the remaining 900ms is split between UART transmission and Wi-Fi/MQTT publishing. Since image transfer over UART accounts for the bulk of this delay, the residual 200-400 ms was conservatively attributed to JPEG parsing, base64 encoding, MQTT transmission, broker processing, and Home Assistant rendering. While precise timing of each software layer was not instrumented, these estimates align with expected performance from similar MQTT-Wi-Fi pipelines. Transmission speed was strongly influenced by image complexity, as more detailed images required more bytes over UART and Wi-Fi. This resulted in occasional latency variability of ±0.5 seconds.
While the system is not intended for high frame-rate streaming, it is well-optimized for periodic or event driven image transmission. Further optimizations such as having event-driven image capture with motion detection for a set number of frames could improve power performance. Additionally, the circuit could be implemented on a PCB for a better UART connection and a higher baud rate ceiling. For practical purposes, this latency is acceptable for a security camera that allows the user to monitor feed outside their room.