A couple of weeks ago Gartner published their yearly “Hype Cycle for Emerging Technologies” report. While analysts have a somewhat mixed reputation for accurately predicting the future, their opinion should certainly be taken into account as one factor, when drawing a picture of the future. (And kooaba has been featured in several of the Gartner reports).
So what do they say about visual recognition? Both this year’s and last year’s Hype Cycles are shown below. While last year image recognition was listed as a separate point, this year its “wrapped” under “automatic content recognition”. Both points are even before the peak of inflated expectations. Hype about visual recognition has not reached it’s peak yet. Interestingly, Augmented Reality, which heavily relies on visual recognition for it’s future, is beyond the peak already.
Both technologies are estimated to reach the plateau of productivity in 5-10 years, though. That seems like an awful long time. But when put in context with e.g. cloud computing (2-5 years, but it’s quite established already) or speech recognition (think Siri, 2-5 years) the timing seems just right to build services in that area. Not too early (like Quantum Computing, more than 10 years), not too late either.
So why does it take 5 or more years until visual recognition is there big time?
One factor is well described in Michio Kaku’s best-selling book “The physic of the future”. In that book Kaku makes an attempt of predicting the next 100 years of technology, arguing that things in fact a) change much faster than we think and b) can be quite accurately predicted with today’s knowledge of science by “insiders”. (Kaku himself is a Physics professor). While Kaku paints an extremely optimistic picture of the future where self-driving cars, contact lenses with augmented reality, and widespread presence of robots are just around the corner, he specifically points out the following regarding the state of visual recognition:
“Given the glaring limitations of computers compared to the human brain, one can appreciate why computers have not been able to accomplish two key tasks that humans effortlessly: pattern recognition and common sense. These two problems have defied solution for the past half century. This is the main reason why we do not have robot maids, butlers, and secretaries. The first problem is pattern recognition. Robots can see much better than a human, but they don’t understand what they are seeing.”
Basically he’s saying visual recognition is really hard, in fact so hard that it makes it a bottleneck for many other technologies. We couldn’t agree more.
Besides robots this includes: real augmented reality, visual search, contextual advertising for images and videos, automatic organization and tagging of your photo and video collections, self-driving cars (to some extent), … etc. So the potential for the technology is huge.
We once had a chance to chat with Niklas Zennström, founder of Skype, and what he said about it is that he likes what we (at kooaba) do, because it’s such a hard problem. We couldn’t say it better, and that’s why we love our work here at kooaba.
In fact, even with all the challenges, significant progress is made by academic researchers around the world, big companies like Google, and startups like us. In the coming months we plan to add some new technologies to our portfolio, which will enable more applications and bring the future a step closer. We believe 2013 and 2014 will bring amazing progress, moving visual recognition rapidly closer towards the plateau of productivity. This would be a good time to check out our API