We plan to implement real-time hand gesture recognition to run on mobile phones (iOS) without using Apple’s API. Capturing the motions of the hands and interpreting the meanings associated with each gesture.
The project can have various applications, including enhancing the mobile gaming experience, enabling hand-free control of devices, and facilitating communication for disabled users.
We use CNN and PyTorch trained model for classification, but since only CoreML models can be run on iOS devices, we’ll use Apple’s CoreML converter tool to convert the trained model to be suitable for mobile devices. We also experiment with using the iPhone's depth sensor to help with the classification if the depth map improves the accuracy of classification/hand detection.
For training and testing the models, we use the dataset from https://arxiv.org/abs/2206.08219.
This dataset contains 552,992 samples divided into 18 classes of gestures. The annotations consist of bounding boxes of hands with gesture labels and markups of leading hands. The proposed dataset allows for building HGR systems, which can be used in video conferencing services, home automation systems, the automotive sector, services for people with speech and hearing impairments, etc. We are especially focused on interaction with devices to manage them. That is why all 18 chosen gestures are functional, familiar to the majority of people, and may be an incentive to take some action. In addition, we used crowdsourcing platforms to collect the dataset and took into account various parameters to ensure data diversity. We describe the challenges of using existing HGR datasets for our task and provide a detailed overview of them. Furthermore, the baselines for the hand detection and gesture classification tasks are proposed.