Deep learning helps detect distracted driving behavior

May 7, 2020
Current vision-based advanced driver assistance systems may recognize when drivers become drowsy but they aren't built to detect eating, drinking, or smartphone use. A new system is designed to account for that shortcoming.

A new advanced driver assistance system (ADAS) under development by ARRK Engineering (Munich, Germany; www.arrkeurope.com) detects sources of driver inattention caused by elements other than drivers falling asleep at the wheel.

Some ADAS detect driver drowsiness via analysis of driving behavior, for instance if a car begins drifting out of its lane. Other systems use camera-based computer vision to monitor head and eye motion to gauge whether a driver is drowsy. If the driver’s head droops down, for example, that’s a reasonable indication that the driver is falling asleep. A driver’s eyes closing entirely and remaining closed represents an almost sure indication. In either case, the ADAS can emit an alert and rouse the driver.

While current vision based ADAS systems may readily detect the radical changes in a driver’s viewing direction and head position caused by falling asleep, registration of more subtle changes associated with other sources of driver inattention such as eating, drinking, or using a smartphone present a challenge.

ARRK Engineering’s new driver distraction system targets these shortcomings. The system consists of two FLIR Machine Vision (Richmond, BC, Canada; www.flir.com/mv) FL3-U3-13Y3M-C cameras with 940 nm infrared (IR) LEDs and 70° field of view attached to the A-pillars on either side of the driver. Both cameras run at 30 fps and capture 8-bit greyscale images at 1280 x 1024 resolution. A Raspberry Pi 3 Model B+ single-board computer synchronizes image capture by sending a signal to both cameras.

Each camera features a Midwest Optical Systems (Palatine, IL, USA; www.midopt.com) LP780 IR long-pass filter that blocks out most light under 780 nm wavelengths, to make sure light is captured primarily from the IR LEDs, versus capturing ambient light. Blocking visible daylight also eliminates shadow effects in the driver’s immediate area that may interfere with the accuracy of face recognition algorithms.

To generate images for training and testing, 16 subjects of varying gender and age pretend to drive a stationary vehicle and simulate driving behavior like moving the steering wheel, looking at side and rearview mirrors, looking out of side windows, and paying normal attention to the road.

Test subjects wear hats, eyeglasses, or sunglasses of different varieties; eat and drink different foods and beverages; and pretend to use their personal smartphone devices to generate a variety of different smartphone models in the images.

Test subjects act out five categories of distracted behavior: no visible distraction, talking on smartphone, manual smartphone use, eating or drinking, and holding food or beverage. Each test subject switches between these activities during their image capture session.

Image sets generated by 13 of the test subjects and annotated with the appropriate driver distraction categories train four convolutional neural network (CNN) models, the VGG-16, VGG-19, ResNeXt-34, and ResNeXt-50 models.

The CNNs train for 50 epochs on an AMD (Santa Clara, CA, USA; www.amd.com) Ryzen Threadripper 1920x 64 GB processor and NVIDIA (Santa Clara, CA, USA; www.nvidia.com) GeForce RTX 2080 Ti graphics processing unit (GPU). ARRK Engineering also uses the Adam adaptive learning rate optimization algorithm (bit.ly/VSD-ADAM).

The CNNs then label images, taken from the recording sessions of the remaining three test subjects, with one of the five distraction categories. The CNNs separately label test images taken from the left camera or the right camera.

“The use of two cameras, each with a separately trained CNN model, enables ideal case differentiation for the left and right side of the face,” says Benjamin Wagner, Senior Consultant for Driver Assistance Systems at ARRK Engineering.

The ResNeXt-34 and ResNeXt-50 models performed best, with 92.88% accuracy on the left camera and 90.36% accuracy on the right camera. The test results indicate proof of concept for a driver assistance system that recognizes when drivers eat, drink, or use a smartphone, and emit an alert when the driver engages in these hazardous behaviors.

ARRK Engineering plans to further develop the system by analyzing whether classification of objects, for instance smartphones and beverages, and determining the relative positions of those objects, or where the driver is holding them, potentially with bounding box detection and semantic segmentation techniques can improve the system’s accuracy.

About the Author

Dennis Scimeca

Dennis Scimeca is a veteran technology journalist with expertise in interactive entertainment and virtual reality. At Vision Systems Design, Dennis covered machine vision and image processing with an eye toward leading-edge technologies and practical applications for making a better world. Currently, he is the senior editor for technology at IndustryWeek, a partner publication to Vision Systems Design. 

Voice Your Opinion

To join the conversation, and become an exclusive member of Vision Systems Design, create an account today!