The term Artificial Intelligence (AI) has changed meaning since the first time it was coined, so to address an intelligence that can match or surpass humans, the term Artificial General Intelligence (AGI) was created. AGI means an AI that can successfully perform any intellectual task a human can. We don’t have AGI yet, and I anticipate it will be some time before we see it. So for now, we will stick to the AI technology we do currently have.
AI and Data Quality
The present day, state-of-the-art AI depends heavily on the quality of training data. Crappy input information results in crappy output and ill-informed decisions. Precise and high-quality input information fosters precise and high-quality decisions. This is very similar to how we as humans are heavily reliant on information quality when looking to make informed decisions.
That is why putting a company’s important data into a big data lake, without knowing what is put in, and hoping that AI will somehow become clever by processing all that information, is a false hope. This is also the reason why data analyst jobs are becoming higher and higher in demand and data automation tools like Discovery Hub® – which can move, merge and clean data from many sources – are quite logically seen as imperative in the context of AI.
The Wolf Conundrum
To provide an example of what we are dealing with, let’s look at the ability of neural networks to identify images. A neural network was developed that was surprisingly precise in detecting dog breeds from provided images. It was successful in matching any dog’s picture to its breed. That was, however, until people found that one very clear picture of a husky dog was recognized by it as a wolf.
Researchers had no idea why it kept failing with this one picture, while being so successful in identifying nearly every other picture they tested, including pictures of actual wolves. Unfortunately, it is very difficult to see how AI "thinks" and what it "looks at", so it took a while for the researches to finally spot the problem. Did AI look at the eyes, muzzle, or jaw of the dog when building its conclusion? Nope.
What they have finally found out was that AI only looked at the corners of the picture and ignored the entire dog! What was in the corners? Snow. There was snow in the corners, and nearly every picture of a wolf that was shown to the AI to train it to detect a wolf had snow in the background. Therefore, AI "realized" that snow means a wolf.
Quality of input data is extremely essential for AI. It is much more crucial than what people usually think when they see how clever AI can be. You can tell this wolf story to whoever thinks that AI can somehow find patterns in complete chaos. It cannot.
Even high-quality data, like tens of thousands of pictures of only dogs and wolves – and not a single picture of a cow pretending to be a dog – can make AI biased. Given the fact that most of the time we cannot know how AI makes any specific decision, it is critical not to make it biased because of unreliable, biased training data.
Utilization and Challenges with AI
To make the picture more complete, we must first understand what information AI can use and secondly, what the biggest challenge is with machine learning. Without this it is difficult to grasp the idea of AI.
AI can process any information that can be converted to numbers, e.g. text, database content, images or video (pixels), voice (raw sound samples), etc. Any information can come in and any information can come out if it can be converted to numbers. Since computers store 100% of information as numbers, all information that our computers store and the entire Internet content can be processed by AI.
Data format of input and output doesn’t even have to be identical, e.g. detecting human faces through pictures or video stream (pixels come in, any classification data like text or numbers comes out), text to speech synthesis (text comes in, sound file comes out), etc. As surprising as it is, AI doesn’t really care what comes in and what comes out. Video stream as input is as good as tables from a SQL Server database. From AI’s perspective, it makes no difference.
Despite being so flexible, the biggest challenge with machine learning is tell AI what we want. AI needs to know how different its predictions are from our initial expectations. If what we want is either simple or can be formulated in a simple way, then AI can automatically see if the results it gives are right or wrong, and it can train further by itself. We can leave it overnight and, in the morning when we come back, AI will accomplish astonishing results, with almost no supervision from our side. Unfortunately, most of what we want from AI is not simple to formulate and requires human intelligence to supervise the training process.
What We Have Learned
AI can use any information stored in a computer and in the Internet to train from and make any predictions or transformations. There are no limits to what can be regarded as input and output of AI. It is very difficult, however, to peek inside a neural network and see why it produces an output from a given input. We may not even be aware if the result is biased or not. This will be a challenge when we let AI make autonomous decisions affecting humans in the future.
We also learned that we need to supervise the learning process most of the time and prepare the training data manually because there aren’t many tasks where a machine can learn by itself. The quality of AI training data, which only humans can provide, is very important.
In Part Two of this blog series, I will go deeper into the two types of machine learning.