For my CS283: Computer Vision Final Project, I created an application to control Google Maps using your hand and a webcam.
It uses the Chrome API to access the webcam, and frames are sent to a Tornado server that runs the hand tracking pipeline, annotates the frame, and sends back the displacement vector to update the view of the map. I am using OpenCV for it’s Haar Cascade libraries.
You can view the code on github and setup an instance of the server locally!
You can also view the paper I wrote to describe the process I used in the pipeline. Below is a summary of the problem statement and the stages in the pipeline.
Given a poorly trained Haar Cascade Classifier (250 positive samples and 100 negative samples) to recognize hands, this project assembles a pipeline to improve the quality of the tracking. These steps include:
- Face detection and removal of faces.
- Background subtraction.
- Use a simplified Kalman-Filter-esque technique to estimate the bounding box of the hand. This assumes that a hand moves in a smooth manner.
- Use our hand classifier to detect the largest hand within the bounding box.
- Compute the optical flow of points within the bounding box using Lucas-Kanade.
- Use the optical flow and the measured position of the hand to correct our Kalman-Filter estimate.