When Microsoft launched the Kinect nine years ago it was touted as a revolution in gaming and more. This turned out to be utter rubbish.
Allowing users to operate and interact with the Xbox 360 using voice and gestures, Kinect won many accolades. USA Today compared it to the futuristic control scheme seen in Minority Report. The New York Times' David Pogue said players would feel a "crazy, magical, omigosh rush the first time you try the Kinect".
While third-party developers found some after-market uses for Kinect's sensor features, the gaming line itself flopped and was eventually discontinued. Finally, at the start of 2018, Microsoft eventually killed Kinect just as Apple was parading the iPhone X, which used the same infrared dot tech for facial recognition developed after its 2013 acquisition of PrimeSense, the company that created that feature on the Kinect in the first place.
Miniaturised and much improved, with an infrared dot array installed in the iPhone X, Apple had taken a previously failing piece of tech, uprated the spec and found a better use for it. This is a tactic that Google is hoping to emulate with the Pixel 4's “Motion Sense” tech.
Over the last five years, Ivan Poupyrev, the technical projects lead at Google’s Advanced Technology and Projects division, has been working on Project Soli, to miniaturise radar into the size of a computer chip. Poupyrev, the man also behind Google's Project Jacquard wearable tech that led to the Levi's Commuter jacket, has laboured to create a chip small enough to fit in a phone which emits electromagnetic waves which will detect movement of less than a millimetre. It can perform this task 3,000 times a second and sounds impressively accurate.
It is this that powers the Pixel 4's Motion Sense, and Google claims the technology works better than 3D cameras. But this is irrelevant.
When the Pixel 4 launches in October, it will have limited gesture controls that will include the obvious options of stopping alarms, skipping songs and silencing calls. Such abilities are at best unremarkable, and if Motion Sense ends there then will be disappointing in the extreme.
Google is, no doubt, counting on all of us getting swept up in a wave of enthusiasm for this new mobile UI, while developers take gesture control and run with it, dreaming up all manner of hitherto unthought of interactions that are better suited to hand or head movements rather than physical touch.
Sadly, the chances are the immediate future will not be like this. So far, the track record for gesture UI is, well, shocking, riddled with examples where motion detection faltered, camera recognition was laggy, processing speed glacial and, crucially, where it should never have been used in the first place.
Moving on from those heady days of waving your arms maniacally in front of the first Kinect, trying desperately to feel that crazy, magical, omigosh rush, car companies such as BMW employed gesture recognition in vehicles thinking it would be better, or safer, than pressing buttons while driving.
Now, the vast majority of in-car BMW tech works well, but I could never get on with its in-cabin gesture controls. Frequently it misidentified the command I was attempting, forcing me into multiple attempts. But, worse than this, not only was it distracting and thus perhaps not the safer option after all, it also took much longer than pressing a button on the steering wheel, which I was already holding, that did the same job.
In short, the tech should never have been employed in that manner. Voice control in cars? Yes please. Spinning your arms around to turn the volume up and down looking to outside observers like you’re in some sort of desperate mobile silent disco? No thank you.
This falls firmly into that old category of because you can do a thing doesn’t necessarily mean you should. Samsung was so excited it could put Wi-Fi in a washing machine it went ahead and did so, never stopping to think if it would add any value whatsoever. Now Google has brilliantly, amazingly shrunk super-accurate radar down to a chip small enough to put in a phone, should it actually do that?
Google’s own promo video championing this shift in mobile tech shows a woman swiping in front of a Pixel 4 floating at head height a few feet away from her. This is deliberate. The misleading art direction hides the simple futility of gesturing to a phone you're already holding.
Let’s ask an expert. Let’s ask the man behind the gesture-control systems dreamt up for both the aforementioned Minority Report and Iron Man, John Underkoffler.
“The first threshold that would recommend or disqualify the use of gesture is whether it's easier, faster and more accurate to you use than the existing UI in any device that it is augmenting,” Underkoffler says. “Is it better to raise your hand in front of the phone to change to the next song? Is that better than a button on the screen? It would have to be better than simply using the touchscreen it augments that is already in place.”
“For me, as someone who's dedicated more than a quarter century to working on UI, it's really only interesting if it's more than a gimmick, if it's the first step toward something that looks more like an entire language or an approach to UI. If it's a proposition that stalls at a few gestures that are vague and non-spatial, then ultimately I don't think that will progress the field.”
“Non-spatial” is the key phrase there. Underkoffler’s work on Minority Report focused on a way of efficiently manipulating a vast screen with hundreds of windows and files, one too large to physically interact with using human reach. Likewise, with Iron Man, Underkoffler was tasked with working out how one might manipulate a holographic UI in three dimensions. For both these situations, using gesture control spatially makes complete and utter sense. Conversely, replacing touching a button on a device you are holding with a swipe of the hand in mid-air does not.
“The important problem is the match between what gesture is ultimately very good at and the display space,” Underkoffler says. “Gesture attached to a small device that you have intimate access to because you're already holding it? That's a that's a tricky proposition. What I was working to show in Iron Man, and more fundamentally earlier in Minority Report, was the value of a large display space. The characters are working with immense amounts of data. And the best way for them to literally get their minds wrapped around it is to see as much of it at the same time as possible. That's why the displays are large, or holographic.”
“That gets us right to what is the most powerful aspect of gestural interfaces, which is that they are capable of being spatial. Human pointing is a phenomenal capability. It's wired into all of us and no one has to learn it. By the time you're eight or 12 months old, you know how to point at things that you want. You're trying to communicate your mind and your will, and other people's attention who are watching you, to a distant thing that you can't touch.”
None of this fits the bill of someone brandishing a mobile phone with a six-inch display. And Underkoffler thinks Google could be coming at things from the wrong direction. “If I were working at Google, or Samsung, or any major handset manufacturer, I would turn the problem inside out,” he offers. “I would make the phone itself the instrument of gesture. Attach the precise kind of spatial location and orientation sensing capabilities to the phone so that you can both use it as we know how to use it today, but also as a device that you can point around the room to control various other devices: interact with your TV, other displays or environmental controls. That would be something incredibly powerful.” This indeed sounds very much like a consumer version of Underkoffler’s remote wand that controls his company Oblong’s Mezzanine G-Speak conferencing tech for businesses.
However, I can see incidences where Motion Sense would be useful in a small space. It could fix the issue of tiny smartwatch screens in one stroke. Gesture control may be comparatively useless on a smartphone, but when there isn’t enough screen real estate then waving, pinching and blinking may be the perfect means to accurately and deftly control a device on the wrist with a 1.3in display. It could make the need to carry a phone obsolete, even.
There is no doubt the technology unveiled here for the Pixel 4 is exciting, after all, depth-sensing tech ended up being a boon for robotics and machine vision. It’s the implementation of it we have to be hopeful for, and not just from Google. Samsung this week announced the new Galaxy Tab S6 with an upgraded S Pen that through BLE remote control allows gesture control, too, which the company is calling “S Pen Air” actions. You can bet your house the imminent Note 10 will have this ability as well.
“There is a temptation to deploy technology solely for technology's sake, or in pursuit of what are effectively marketing concerns,” says Underkoffler. “And we're at such an early stage with the Pixel phone that we're right to wonder if we're in that territory. But clearly this is just the beginning of what's going to happen with Soli embedded into devices.”
“The best outcome would be that this is just the first step. The early iPhone had incredibly limited vocabulary. It didn't even have copy, cut and paste. But it was enough to get people started. So that's what we don't know yet. We don't know which of those two zones were in: one of them is disappointing, and we’ve seen it many times. The other would be exciting: that we're in the very modest, earliest days of something that does turn out to be transformative.”
I want this to be the dawn of a new UI, not yet another damp squib. I want to live in a world where whole environments respond to my physical commands. But is Google capable of breaking out beyond the obvious orders of forwarding songs to make the most of this new miniaturised radar tech and change our world, yet again? Fingers crossed, everybody.