Are We Closer to Living Life In Layers?
At Mobile World Congress we will have another peek at the future but not everyone is seeing it with an augmented reality overlay. I have had a sneak peek at some of what lies ahead and I have a world view on augmented reality which permutes everything I see. Welcome to my layer of reality.
I was able to sit down with DigitalOptics who are introducing their MEMS autofocus lens technology today and discussing a host of visual processing software and visual processor technologies aimed at improving real time experiences. The mems|cam is a silicon actuated camera module that features rapid autofocus, able to move a lens in and out much faster than a traditional voice coil motor (VCM).
Silicon stage (left), spring (middle), and electrostatic comb drive (right) of the mems |cam. The tiny mems | cam has silicon actuators which claim micron-level positioning accuracy and highly uniform tilt while focusing rapidly to provide faster response time and accuracy for autofocus in compact smartphones.
The ability to rapidly focus while consuming less power is most useful in video recording where a combination of intelligent focus and rapid response can deliver smoother autofocus. These visual benefits are also important to making it possible to more rapidly and automatically extract features in real time. It is very difficult for a camera to focus in odd lighting conditions because it cannot quickly figure out if the thing it is focusing on is the right thing. Similarly, it is difficult to detect and extract features if they are blurry. Simply, real time visual processing and augmented reality based on extracted features needs fast focus.
An ideal state is moving focus intelligently and accomplishing the shift within a frame to frame window of opportunity and before we can notice it. The mems | cam is getting close and has a better chance at making transitions between frames, or at least stopping before it gets there. The ability to track moving objects or retain feature detection while the camera is in motion will benefit from faster response times. In addition, compact designs require accurate positioning between the lens and the sensor and mems | cam appears to excel.
In a full focus mode with an apples to apples comparison utilizing the same autofocus algorithm of 5 autofocus cycles, mems | cam claims state it can complete focus in less than 200ms compared to 600ms or more with a VCM.
DigtalOptics data showing the autofocus time of a mems | cam powered phone compared to phones like the iPhone5, Galaxy S3, and HTC OneX+ all disguised to avoid directly alienating any future customers
DigitalOptics compares mems | cam to VCM
Integrating Feature Recognition into the Image Processor
DigialOptics is working on visual processing core logic with the goal of moving image processing code into the visual processor to improve feature detection and classification performance while decreasing compute required of the application processor.
The mainstream applications for these technologies requires a combination of capabilities, firmware that talks to accelerating hardware in the image processor to create experiences otherwise out of reach of today’s ARM-based application processors, not unlike video decoders and everyday mobile camera image processors.
DigitalOptics demonstrated one such application, Face Beautification, currently in OEM handsets currently available in China. Combining real-time facial recognition at oblique angles, target region decomposition, targeted image enhancement including color and soften, and finally augmented overlay and display yields something extraordinary. Better looking people. DigitalOptics Face Beautification makes a phone which always sees a more beautiful you and makes your friends better looking in real time. The effect is superior to mere softening and blending or color correction because it combines a contextual understanding of which facial feature it is working on and applies algorithms appropriate to each one.
Deploying feature detection and classification while executing interesting algorithms at 1080P and 30 FPS is no small feat. Yet, it is this capability that will make augmented reality viable, just as moving video decode to specialized processing blocks accelerated delivery of video to mainstream platforms. The difference, this time, is that the PC did not blaze the trail – these algorithms are being reduced to specialized silicon on mobile platforms first.
Real-time beautification is a very mainstream way to justify a lot of really cool image processing.
That’s augmented reality at work – making your world slightly better in an almost invisible layer upon the real world. Anyone who uses their cell phone as a mirror or has friends who prefer better looking memories will instantly get it.
The depth of Digital Optics IP portfolio for real time image processing is impressive with sophisticated logic blocks supporting hardware accelerated algorithmic detection, decomposition, definition of regions of interest, geometric rectification, and region-specific algorithmic processing. Making people beautiful is a wonderful application of this technology but it holds more promise and sophistication that what you might see on the surface.
Frameworks, Fashion, and Maps
Augmented reality is like an awkward teenager trying to fit in and solve the problem of social integration and ultimately distinction - in making experiences mobile to render them accessible and inconspicuous as to make them socially acceptable but exceptional enough to make you aspire to own them. But as any teenager will tell you, you can barely wear a hoodie, black trench coat, or nose ring in public without undue scrutiny, much less the world’s most cybernetic HUD enabled spy glasses, however fashionable the design or attractive the advertising model.
Augmented reality requires the health and advancement of an interwoven ecosystem. Progress in augmented reality requires advancement in social integration: mobility and social acceptability; and technology: camera, visual processing, application processing, cloud connectivity, geospace, markers and unmarkers.
Right now, the space suffers consumer-limiting deficits. Further, the assumptions generally made about how these are overcome are frustrating. For example, facial recognition is useful to detect faces for image enhancement but will be completely shut down when we try to use it to map social identities. The thought of Jenny or Tommy’s facial data permanently floating around the web and bound to their identity data will make identity theft look like child’s play – and Europe is already shutting it down.
An augmented overlay (or even well-blended enhancement) for places – you know, businesses, geospatial offers, geocaches or spontaneous meet-ups, while intriguing, is up against a monopolist who is tough to beat. The map, So far, the map seems to trump the augmented overly in virtually all use cases except the randomly wandering lost tourist looking for something important while on foot like an ATM, café, or public restroom and not minding the fact that they appear to be taking a photo of everything around themselves with a cell phone.
Maps can be used in cars. Maps can be used on phones. Maps show everything around you at once, like you have a 360 degree eyeball with perfect depth perception. Maps are amazing, even the ones that don’t automatically update your position or offer totally searchable engines and immediate placement of found targets. This is utterly disappointing as the map is really old technology and ought to have been outdone long ago. But alas, it has not, and in its current mobile + cloud implementation it has become spectacularly useful and popular.
Augmented Reality and Glass
Also likely to make a flyby at Mobile World Congress is Google Glass, which is not solely for augmented reality, yet it inherits all of the challenges and deficits whether it wants them or not. I am a fan of Glass like I am a fan of Jackass – you know the guy in the shopping cart is gonna hit the cement wall head-on but you respect the fact that he has got to know that he is about to hit the cement wall head-on. Google is going straight at the problem of the norms associated with head mounted displays and cameras and saying !*@#&$ it, we are just going to do it.
If Sergey Brin wearing Google Glass is news then the world may not yet be ready for wearable computing. If Sergey would like to know the alternatives I’ll be in California this weekend and I would be glad to stop by.
But not only is there a problem of norms. Augmented reality of place and thing is not particularly enticing and already resolved by your existing mobile phone. And more, placing an overly over your eye costs you a lot of style points. Today’s socially aware youth are a few steps closer than previous generations but they are remarkably sensitive to the challenges posed by an always on, social surveillance society. Today, it is not OK to film everyone all the time, even if you are there. Acceptable contexts begin with those where you would pull out your old-school camcorder or smartphone– recording the noteworthy and fun moments of people with whom you have a legitimate or spontaneous social connection. The more limited the acceptable use scenarios of glass, the more use is relegated to niche activities or quality time standing on a rooftop peering down at Gotham.
In the pessimistic case, placing the screen in my hand avoids the preponderance of embarrassment of headgear while enabling me to capture and augment the moments that matter. Glass may actually disguise the fact that there is a camera attached to your head, and over time, this disguise may become plausibly deniable urban camouflage.
I am not a distant onlooker who thinks it is professionally acceptable to criticize Google Glass while sitting on my Aeron doing nothing to push these use cases forward. However, we must acknowledge the challenges and not simply try to market our way around them. Our society will retain conservative habits until enough progress, erosion, and aesthetic improvement has occurred and we become comfortable with change.
Wearing cybernetic headgear is going to be a problem for the next few years. There are alternatives, but these may be no less cavalier in their assumptions about what society will at first put up with and ultimately come to love. Look at tablets, a non-existent category before 2010 and now everywhere. Looking ahead, Apple is working on a wearable iOS device, I hesitate to call it a watch, and everyone seems to believe we’ll all have one a week after launch. Both of these categories appear nearer to consumers and less aggressive in their assumptions for evolving norms and social sensitivities. Yet disruptions will happen, and chances are, they are upon us and we are simply blind to what is right in front of our nose.
This brings us back to the brilliance of using existing socially acceptable platforms that are coupled with super smart logic and improved hardware to get around the problem. Making people look prettier is awesome if not particularly hardcore augmented reality.
The aspirations of augmented reality are far reaching from advertising to gaming to social to imaging. Placing messages, social overlays, my little pony toys, and real dragons in augmented layer space is fundamentally cool but will benefit from an irresistible raison d’etre before it can break through. Many AR firms want advertisers to believe that people will look at ads if they do something really cool as soon as you point your phone at them. As I try to tear my children away from the free to play but completely unagumented ‘Littlest Petshop’ mobile app I often wonder if it could possibly be any harder if the pink poodle were actually sitting in an augmented layer in the same room? Hard to say, and frankly, hard to imagine.
Placing hardware processing in support of augmenting layers in image processing core logic is a big deal. As we consider a future of augmented reality applications and platforms, multiple sensors simultaneously expanding our world view, and several cameras all simultaneously active and observing, we are going to need efficient processing of real time visual data in ways that would give today’s PC’s a real headache. We are on the cusp of fusing interesting data with interesting people and places to construct layer upon layer of experiences which transform daily life and our moment to moment perception of real space. Augmented reality is about to have its moment, if only the titans figure out what, and the upstarts start to tell them how and why.
Simon Solotko develops augmented reality experiences and technology as the President of All Future Parties, and virtual reality solutions as an advisor to Sixense Entertainment. He has been neglecting his post at Bright Side of News, spending his days hard at work building Community as a Service with SixDI, an innovative solution for rapidly reaching and engaging large audiences. At some point he’ll write a story on technology careers in the post-career era. He’ll talk to almost anyone except the real crazies. Feel free to reach out to @solotko on Twitter.