Playing with the Leap Motion controller
I've been wanting to play around with the Leap Motion controller, and the recent Codeaholics hackathon was perfect for it. We got a full weekend to mess around with the device and SDK. The 42labs also had a number of other cool toys like the Oculus Rift and the Sphero.
I've never used Unity (or any 3d software before), so I wanted to try to build something. The first thing I built was a simple drum set. The Leap Motion samples are pretty good, and I had almost no trouble with the actual Leap SDK. Most of my issues were with figuring Unity out. Writing small bits of object code and associating them with 3D objects in the GUI reminded me of writing code-behind for Visual Basic forms.
There's a fundamental problem with using 3D tracking devices (like the Leap or Kinect) and mapping it to a virtual 3D world. There is always going to be a dissonance because while your hands exist in both worlds, the other objects only exist in one of the two. So you have to choose between a one-to-one mapping of hands, which means that your virtual hands cannot behave with realistic physics since they will pass through other virtual objects, or you can give your virtual hand physics, which will throw off the one-to-one mapping whenever the virtual hand encounters a collision. Neither of this is perfect, and unless there is some sort of haptic feedback one just has to choose one of them based on your app.
extended? property on each finger. This becomes harder with the higher numbers since some poses are pretty hard to pull off with your hand (try 01010), so you might need to try a couple times. I mapped the numbers to ASCII. I came up with a simple mechanism for typing to account for the fact that it's hard to do some of the poses on the first try. The right hand performs the individual letter, and the left hand confirms it with a "squeeze" gesture. This way, you can see the recognized letter and confirm it by squeezing it onto the end of the text. This works pretty well, though two hand tracking does get a bit wonky. I'm not sure if this was a systemic problem or I wasn't dealing with the data correctly (probably the latter).
The third project was an offshoot of the previous one. The difficulty I encountered with using the previous system was because of the naive lettering system (since your hand isn't really suited to showing binary numbers). I wanted to find some way to implement the American Sign Language alphabet, but without manually defining each pose in code. I turned to brain.js, a neural network trainer. Brain.js has a really easy to use API. You just give it a set of constrained numbers and the expected outputs for each set for training. If the net trains successfully, it will give predicted output for any arbitrary input you throw at it. I used a single frame of fingertip position data, normalized, for the training. The end product is a small app in which you can define symbols and train the model by performing the pose. The results are surprisingly good depending on the set of poses. ASL has some challenges, as a number of poses look really similar to the Leap. But if you choose a set of poses that have a good distance between them, the model recognizes the gesture almost every time. In the demo video I train it to recognize rock-paper-scissors.
There's probably a few more things I can do with this. A modified version of ASL would probably work pretty well with this system. Some of the ASL letters are also dynamic in nature, so a single frame won't capture the pose. Multiple frames will also probably help with noise reduction for the other letters. If the training error is sufficiently low, I could probably start appending to a sentence with just the single hand. Alternating between the letter and a neutral pose (like an open palm) would allow you to type out any word, and together with movement gestures for backspace and punctuation should be pretty usable. You can probably increase speed by using both hands and alternating gestures.
I've read both praise and criticism of the Leap Motion device. I think the current generation of the hardware does suffer from limited range and tracking issues. I'm not sure if the latter is because of noisy data, or if more can be done on the software side. The v2 skeletal SDK is already much better than the original. It's a pity there isn't currently a way to access the raw data, because I'm sure there is a lot that can be done with improving the gesture recognition. There are also interesting possiblities for 3D face scanning, lip reading, etc.
The sensing range of the device is something that's definitely limiting what can be done. It's a lot smaller than the natural range of arm movement, so it's very easy to inadvertantly move out of range. Of course, the tradeoff is that you get really accurate finger positions within that box, something that the Kinect cannot do. It will be interesting to see if upcoming devices like the Myo can break that tradeoff barrier.