Investing in Artificial Visual Intelligence

Written By Jason Stutman

Posted May 19, 2015

With the exception of the brain, the eye is the single most complex organ in the human body.

It’s composed of over 2 million working parts, and the information it retrieves accounts for nearly 65% of the neural pathways in our brain.

In fact, research suggests that 90% of all the information learned throughout our lifetimes comes directly from the eye. The reason, of course, is that the visual world is incredibly dense with information.

In many ways, vision is absolutely essential to intelligence as we know it. Without the ability to perceive and make sense of light waves, we simply would not be as knowledgeable about the world as we are today.

When it comes to artificial and robotic intelligence, the same rules will apply. That is, vision is just as crucial for machines as it is for humans.

The reality is if we’re ever to develop truly intelligent computer systems, they’ll need to accurately interpret the outside world, just like we’re able to.

Eventually, the machines will do it even better.

Artificial Perception

Machines decoding the physical world into digital information is nothing new. We’re all familiar with Apple’s Siri, which can interpret spoken language, or the mobile app Shazam, which can instantly identify the name and artist of almost any song in the world.

Computer vision systems, though, have yet to become as ubiquitous as machines that can hear. Sure, every modern phone has a camera attached to it, but interpreting the images they collect is an entirely different story.

A good analogy is that of foreign language characters. Anyone with eyes can see Japanese, Korean, or Chinese, but unless you know the language, you won’t actually be able to understand any of it.

When you take a photo with your mobile phone, it doesn’t know what it’s looking at any more than you would know the words printed in the original language of The Art of War. Your phone might be able to locate your face, but it will locate your dog’s face just as well, and it will do so without knowing the difference.

Your phone does not know if you’re happy or sad. It doesn’t know if you’re dressed well. It doesn’t know if you just cut your hair or not.

With cameras as ubiquitous as they are today, though, this ineptitude won’t last for very long. There is simply too much value in computers that can “see” for this kind of technology to be overlooked.

For instance, we’ve already seen MicroBlink’s “PhotoMath” app, which can take a photo of math problems in a textbook and solve them in seconds.

We’ve also seen the Google Translate app, which can look at real-world text and translate it between multiple languages:

google-translate-example

These two apps should be seen as indications of where we’re headed. Of course, it will take more than just recognizing characters for computers to truly become visually intelligent.

Image Recognition

Recognizing characters is easy enough. There are only so many numbers and letters to upload into a computer’s memory bank. The real challenge is breaking down less defined physical imagery.

Last week, Wolfram Alpha revealed the Image Identification Project, an algorithm intended to identify what an image is about. You simply visit the company’s website and drag and drop an image into the appropriate slot.

If the image is straightforward, it will tell you what you’re looking at:

Wolfram Image 1

If the image is not straightforward, the results are usually still accurate, albeit a bit less descriptive:

Wolfram Image 2


Now, I’ve chosen the image above not just for comedic purposes but also to point out that Wolfram’s Image Identification Project is still quite basic. Yes, that is indeed a placental mammal, but if you were to ask any human what they see here, the response would obviously be quite different.

The true holy grail of image recognition is a program that could see the image above and say, “A cat holding an AK-47 with a nuclear explosion in the background,” or better yet, “Wow, that’s absolutely ridiculous!”

This is why every year, companies from around the world join together at the Embedded Vision Summit — a technology conference focused entirely on vision-enabled applications — in an effort to push things to the next level.

Beyond Standard Search

At this year’s Embedded Vision Summit, Chinese search engine giant Baidu (NASDAQ: BIDU) may have done just that. Using a massive supercomputer named Minwa, the company has claimed a new record for image recognition.

Beating even Google out for the gold, Minwa recorded an error rate of just 4.58%. For perspective, human error on the same test is 5.1%.

In doing this, the company has moved us one step closer to what you might call “real-world search” — that is, the ability to take a picture of something with your phone and have it immediately tell you what you’re looking at.

But applications for image identification move well beyond mobile apps and search engines.

Autonomous cars, for instance, will benefit immensely from computers that can tell the difference between, say, a harmless plastic bag and a small child in the middle of the road. Security cameras, too, will benefit from advanced image recognition by identifying not only faces but emotions as well.

As far as visual intelligence goes, technology is still in a stage of infancy. For investors, this is certainly a good thing, as it allows us to get in ahead of the curve.

One route to take is to invest in image recognition software. The other is to take a stake in companies producing physical sensors and cameras.

Keep an eye out as we delve into those companies in the coming weeks.

Until next time,

  JS Sig

Jason Stutman

follow basicCheck us out on YouTube!

Angel Pub Investor Club Discord - Chat Now