August 21, 2016

Cats & Dogs: An intelligent look at AI

I am not a data scientist or expert in machine learning. However, I strongly believe the modern approaches to machine learning has been neither “intelligent” or “learning”. I am not the first person to point out this but perhaps I will have a novel approach that may add additional insight. An infant who has seen a dog and a cat a few times would likely be able to point to the correct animal when inquired which one is the dog. The same sort of task takes an incredible amount of samples to “learn” which is which. You see clear examples of just how unintelligent these systems are with big mistakes like this from Google identifying people with darker skin color as gorillas. In order to omit that result they had to make an exception. This isn’t the same as a child who ran across the street would be scolded by a parent. When a child is scolded the hope is that they understand the severe danger they can put themselves and therefore take additional caution. A neural network doesn’t make such a distinction, its merely a directive with a higher priority, since it really doesn’t “know” what its looking at.

Others have discussed this topic in passing (here, and here, as well as here) but I haven’t seen an example of human-like learning. Let’s take a look. After looking at Wikipedia as a simple reference point to establish the evolution regarding computer vision things became a great deal clearer. This example is not limited to vision, though for the moment let’s use it as our example.

Let’s separate visual comprehension (such as being able to look at a picture of a dog or a cat and correctly classify it) into two components:

characteristics or traits
colors, textures, and depth.

Today many facial recognition systems will use measurements of parts of the face to uniquely identify one person from another. For a simpler case let’s look at the dog and cat. I think that if you enumerate the characteristics of a dog ordered by most important to least important you will have somewhat of a similar list regarding a cat. As you may recall Google had a little game for improving their image search. One of the things they did was tell you what keyword you could not use. With that being said take your dog/cat list and make a second list for each that must be exclusive. Those lists are the unique identifiers that would help you distinguish a dog from a cat. These factors may not be the most important traits of the respective animal, however they are unique to those animals.

If we separated the mechanism to identify those traits from the traits themselves that would begin to more closely resemble something I would be comfortable calling intelligence. What we lack today is the instinctive low-level capacity of self-learning. Vision is a single domain, one that is very complex. The point is that we need to work on building the base intelligence rather than the domain specific intelligence. Natural language processing is awful, even with the most advanced things out there I can very easily trick them. I had great fun playing with the publicly available Watson API’s. One of which attempted to identify “tone” of a document. This is very tricky and one that was quite easily broken. It was thinking like a robot, if I can identify “positive” adjectives or look for “negative” words there can be assumptions made and a general inference of the tone. Of course with a little imagery I can use very beautiful and poetic imagery of some very dark stuff! I pretended to be a Cannibal writing a letter to someone who he wanted as his next meal. It was fun but a real reassurance that this isn’t intelligence of any sort. It’s an improved Webster’s dictionary.

What is the new thing that is trying to embrace the world? The virtual assistant. Siri sucks, it’s Apple what do you expect? Google has gotten better. So I’ve read from a number of sources that Hound is supposed to be the next evolution of the VA. One of the major things I used to do was ask compound questions. Like “What is the tallest building in the world?”…got that answer “in Dubai”. Then I ask “What is the population of Dubai?” easy one as well. “What is the population of the tallest building in the world?”…ehhh nope. What it can do which is an improvement is remember context. So I can ask what the tallest building in the world. Then ask a follow-up question, what is the population there? It understands the pronoun is referencing the answer to the previous question. This is not about a wealth of information. It also isn’t about natural language processing. It’s about a much more “intelligent” vehicle that drives these basic processes.

I hope to discuss more about what this look like soon. Until then, I have little to no fear that the little steps that we take in supervised, or unsupervised learning is quite literally teaching the dumb. Our problem isn’t it the method of “learning”…it’s our student.