Eye tracking technology: The future of maximum accuracy input?

Fruit Ninja, a popular mobile game, requires split-second discrimination to obliterate fruit while avoiding the occasional bomb. It’s designed to be played on a touchscreen device or with Kinect hand movements, but playing it with your eyes alone is as exhilarating as it is revealing. If the hand is like an engine that idles at 600 RPM and can reach redline at 6000 RPM, one might liken the eye to a turbine idling at 20,000 RPM that can spool to 100,000 RPM and back in milliseconds.

The godlike control of Fruit Ninja’s sword has been bestowed by eye tracking hardware known as Senseye, made by the Danish company Utechzone. It comprises a power strip-sized assemblage of infrared LEDs and cameras that is tucked beneath the monitor. This device is bulky and hardware developers are now wrestling with the Promethean task of doing the same with the existing flesh of smartphones or tablets.

The i Beam tablet, from NTT DoCoMo, is one of the front runners that promises full eye control using dual integrated front-facing cameras. Other developers claim to need only the single, native camera of a more phone-sized device. They are all hoping to avoid the premature release missteps that have dogged technologies like Smart Stay for the Samsung Galaxy S III. Smart Stay was designed to allow the device screen to remain active while the user was looking at it, but in practice operating conditions were limited by variations in ambient light, relative position and angle of the device to the head, and the stability therein.

The transition of eye tracking from merely being an assistive technology to potentially becoming a mandatory one is in part being aided by seemingly unrelated advances in 3D viewing. Developers are becoming more familiar with advanced signal processing techniques like sensor fusion, Kalman filters, and wavelet transforms among others, and we are seeing the mind-jarring results with visual apps that leap out of the screen as the device is tilted or manipulated. 3D mapping, orientation, and the ability to flit effortlessly through media galleries are among the benefactors of these new techniques. While these apps garner considerable positioning information about the device on which they run, they have none about the person they are presenting information to.

Apple filed patents as early as 2010, when Nintendo was gearing up to release the 3DS, for integrating sensor data to establish a precisely defined point in space for a device. Presumably, at the time, knowledge of the location and orientation of the user’s head and eyes could be combined to create a convincing presentation of 3D content.

Eye tracking on Android was meanwhile given a boost when the OpenCV API port was announced giving developers a choice to use either a C++ wrapper or the JavaCV API. The image processing libraries in OpenCV have traditionally been straightforward to use, easily compiled with existed programs, and had the ability to reliably produce the coordinates of features on the human face, even as the face moved.

Successful eye tracking depends on accurate separation of movements of the head from movements of the eye within the head to ascertain gaze direction. Nathan Myhrvold showcased OpenCV’s power when he successfully tracked and zapped mosquitoes with a laser using his “backyard Star Wars” device.

Hands-free scrolling through a web page, or turning pages of an e-book, is certainly a convenience, but the real advantage to using the eyes for control is speed. To draw a comparison we can take a look at one of the intrinsic muscles of the hand, like the adductor pollicus (the little bump that swells up when you draw the thumb inwards). The brain drives muscles like this one with relatively slow pulse trains, or “spikes”, from its motor neurons. These pulse trains max out perhaps at a frequency of a few tens of hertz when a command to move is initiated from the brain.

At baseline, when there is no movement or requirement for force, these muscle drivers idle against the onset of rigor mortis with just a few spontaneous spikes occurring on second timescales. But the motor neurons that control the six eye muscles fire away in constant readiness at a very high spontaneous rate, with peak rates ramping up to several hundred hertz at the onset of a saccade, the technical term for a stereotypical eye movement.

What this all translates into, in terms of control, is extremely low latency both to first initiate movement, and also to effect subsequent multistep eye movements. These rapid compound eye movements are essentially pre-programmed ballistic endeavours. In games like Fruit Ninja, this equates to a lot more sliced fruit then would be possible with ordinary mouse control. The drawback is that in order to discriminate between the good fruit and the bad bombs one has to look at them, which in the absence of further control is equivalent to selecting them. The user thus quickly develops an innate emotional fear of the bombs and adapts by relying more on peripheral vision. The longer term physiologic effects of this remain to be seen.

One way to approach the observation versus selection problem in eye control would be to incorporate features similar to those used in products like the Ion Wireless Air Mouse Glove, available in the US from Bellco as a replacement for an ordinary mouse. It can be quite difficult to use the glove for tasks like grabbing the corner of a window for resizing, but for relaxed armchair scrolling, or effecting a mouse click by a near effortless twitch of the thumb, it is second to none. The key feature of interest here is a little pause button which takes the position tracking function of the glove offline when the user wants to make a non-purposeful gesture like scratching the chin.

Methods to toggle out the eye control function with a blink or quick glance to a “home” position on the screen are being actively explored, but a clear and practical mechanism has yet to emerge. An obvious problem is that with the absence of blinks, dry eyes and discomfort would soon result.

One remedy might be a program running in the background that signals when the eye tracking software is offline with a comforting blue dot in a corner of the screen. The regularity of the dot’s appearance could be set according to a predefined delay, or possibly a detected change in the reflective or refractive property of the eye due to dryness. Compulsory blinks might then be trained in a Pavlovian fashion, such that neither the dot nor incidentals of the blink attain disruptive consciousness perception.

There is still much to be done to bring eye tracking into a peaceable coexistence with the normal functioning of the eye. The full resolution and speed theoretically achievable for selection or gesture with the eye is unlikely to be fully tapped with present-day hardware and will require additional breakthroughs. Peripherals like contact lenses impregnated with fluorescent microbeads or magnetic agents have been investigated and could eventually find application. Current software can already ascertain physiologic variables like rate of droop for the eyelid and constriction of the pupil to estimate arousal. These parameters, delivered for little additional cost once the pupil or other source of contrast has been tracked, await future application.

Eye tracking, once the province of £20,000 specialised instruments, now has the attention of all the major smart device players. The question facing us is – are the demonstrations now circulating among trade shows and conventions niche gimmicks, high-wire circus acts, representing the limits of current hardware tweaked to the extreme after much practice and patience, or do they flow surely and easily from our current technologies, achieved inevitably like a child beginning to walk? If the latter, many of us will soon be all too eager to surrender our blinks to our machines.

Image Credit: BYU