Sunday, April 4, 2010

Renaissance of computer vision

Before I get into renaissance, I think I should explain what computer vision is all about and why either the field itself or its renaissance interests me. Here is a simple reason why. Computer vision is close to me because it was my topic of doctoral thesis. I did my PhD in computer vision and image processing in the best part of the decade of 1990. But what is computer vision? Well, very loosely speaking, it is that field of science which is related to getting computers to do everything that our eyes do. There is an element of all - seeing, observing, analysing, relating all for improving overall understanding. They say, "A picture is worth thousand words" but really only for humans. When the maxim is applied to computers trying to do what humans do, even today many times we may be better off with words ;-)

Humans are always fascinated by exploration of unknown and almost always there are certain innovations at time that enable a plethora of research on particular topic around that time epoch.

Lets look at the time line. To get computers to do what human eyes do, could not have been possible, for a start, without computers. So this field could not have existed or flourished before 1970s when computers did not exist in abundance. The bursting of personal computers on the scene around the 1980s paved way for the existence of computer vision. Also, in the decade of 1980s, a certain David Marr from the MIT Media Lab had applied the cognitive theory and neurological study to understand how computers could be used to mimic human visual system. David Marr and the other contemporaries were primarily from the established fields of science who were exploring if computers could be applied for understanding their fields better.

1990s was then, that romantic time for computer vision research, a time when a fundamental theory was in place for understanding human visual system and was offering enormous challenges and opportunities for refining those theories and exploring new ones. So that way, I was lucky to have belonged to the right time! We are talking of a duration of around 10-20 years only that saw rapid strides in the field of computer vision. So by the time I joined my PhD program, it was a "happening" field, but by the time I passed out, although it still was "happening", a lot of research was in place and the practitioners had already started of talking of 'saturation' of the field. There was very little to innovate from an algorithmic point of view, or so it was believed then.

Whenever we talk of a particular discipline, it is important to talk of related disciplines that either enhance or hinder its progress. The hardware in the case of computer vision was one such limitation then. To do a real-time replication of a human eye required both hardware that was capable of seeing as fast as human eyes do and a computer that could provide the software that was capable of thinking as fast as human brains do. Generally the progress is only as fast as the weakest link and computer vision took a (relatively speaking) backseat in frontal science.

While inter-disciplinary dependencies define the growth of a field, I would also like to draw your attention to my earlier comment upon new innovations preceding a huge spurt in research of some of the other related fields. There are two major revolutions that we should look at in context.

Firstly, Let us not forget that 1991 was the year of the Internet and 1994 was the year of the Web. These two fascinations, that had even escaped the science fiction writers of up to 1970s, impacted the human society world-wide in a big way. Many new innovations happened around the web and computer vision too benefited from this, just as all other fields.

Secondly, around 2000, telecommunication got a big boost through mobile phones. The mobiles reached many consumers that were deprived of telecommunications before and that enabled research around mobile phones. Mobile phone features are mostly driven by consumer interest and music and video remained the top two (outside of it first being a phone).

Semiconductor research boomed around the same time and enabled manufacturing of mobile phones that could offer extremely sleek, high resolution CCD image capture devices on such portable device as a mobile phone. Something 10 years back was difficult to manufacture in volumes, irrespective of size ! Today it is estimated that mobile phones constitute almost 80% of total CCD image sensor market and almost 50% of image sensor revenue in next 5 years is likely to be enabled by mobile phones!

Now let me come to the renaissance. Computer vision is back and back to benefit large size of human civilization through devices that have both hardware and software capability (I talked of before) in abundance on a small device. What is then required is only a market that comes up with requirements that showcase these capabilities. Gaming is a huge market in telecommunications today and whether it is motion based sensors such as accelerometers or 3-D games, are fed through the work of 30-40 years of computer vision research. Whether it is open source Android based phones, or close source Apple's iPhone, or even the Google App engine in the cloud, all are enabling a huge interest in machine or computer vision.

The saturated field 10 years back is back in demand, may be not for its research potential, but for its potential to put that research to use for humans. Either way, computer vision as a field would benefit and would be richer.

I was part of the golden years of technology evolution, through the 1990s and continue to be so in early part of 21st century. I was benefited by the field that evolved at that time around the evolution of computers, semiconductors, Internet and global sharing of knowledge. I have benefited from having worked with mobile phones and wireless communication through its evolution. Hopefully, now I can benefit the community at large with its renaissance now!

No comments:

Post a Comment