Posts Tagged ‘Neural Networks’

Introduction to Neural Networks Part 1 : Concepts

Monday, December 14th, 2009

Neural networks might sound complicated and abstract (and indeed they can be), but they can also be very useful in solving a wide variety of problems in the field of Artificial Intelligence and Machine Learning, such as product recommendation and categorization. Lets not be mistaken, these areas can be very big business, as demonstrated with the Netflix Prize. Whilst these simple examples may not have been able to net you the $1million prize, this series will introduce some of the basics behind this fascinating subject, and give you some code to start programming with.

Before we start, I want to give a very quick definition of a Neural Network. It is far from being accurate and all-encompassing, but it will give you enough to get you through this tutorial. For now, this is what you need to know:

A Neural Network is a “machine” which takes an input and approximates a desired output, based on what it has been “trained” to do.

Symbol Processors and Elementary Mathematics

When we were first taught about basic functions in Maths class, we were asked to think of them like a “machine” or a “black box”. You feed a number into this box, and receive a different number out, depending on what the function was supposed to do. For example, a function named “Plus 2″ would take any input and return that input + 2.

A simple "function" as taught in elementry Maths classes. Notice the function being approximated is static and hard-coded

A simple "function" as taught in elementary Maths classes. Notice the function being approximated is static and hard-coded

This is what we’ve all implicitly known as a Symbol Processor. Feed in symbol, operate on it, receive a new symbol.

Not too far down the road we were shown more complex examples, such as non-linear functions. Using a suitably complex non-linear function, you could get your “machine” to return any output you see fit. We were also shown functions that had more than one input:

Same again, but non-linear and multiple inputs

Same again, but non-linear and multiple inputs

On a side note, this is how the preceding function might be written in Groovy:

[[1,8],[2,7],[3,6]].collect { it[0]**-2 + Math.E**it[1] };

Learning through doing

But what if we want do describe a particular function for which we do not have an equation, but instead a set of example data. E.g. we know what output values we want to receive from a particular set of inputs, and would like the function to approximate this (and fill in the gaps as best it can).

This is where Neural Networks come into play. Think of one of these as a moldable function, which we shape by through training.

For example, lets pretend we wanted to create a function which simulated the thought processes of a pet Gorilla. It takes inputs in the form of thoughts and ideas, and returns what it thinks is acceptable behavior:

Contept(s)Rating
PeopleGood
EatingGood
Eating, PeopleBad
Eating, FoodGood
TalkGood (I'm an excellent monkey-trainer!)
Talk, [to] PeopleGood
Talk, [to] FoodBad
Talk, [to] People, [whilst] EatingBad (Very bad manners)
Talk, [to] Food, [in front of] PeopleDisaster!

This big mess would require quite some algebraic function to replicate! How would that even look like if we were to build that function in Mathematics? We could on the other hand use a lookup table to model this, but then any in-between state would be ambiguous – life is never black and white.

The astute amongst you might have noticed the clue within the title. Neural Networks are how the the living brain performs computation, and Artificial Neural Networks (ANNs) are our way of harnessing that technique and for computational purposes.

Similarly, our brains were given a training set - a set of rules which we had to learn similar to the above – there was never any mathematics involved.

Inside the Box

So what is an ANN? Well, what it looks like does depend on what you want it to do (indeed, not all parts of the brain look the same, so why can’t we cheat as well?). Generally speaking, an ANN contains neurons and synapses (just like a human brain) – each synapse linking two neurons together at either end. As with normal brain activity, a “charge” is passed from neuron to neuron via these synapses. The degree to which this transmission occurs is specified by the weight of the synapse, ie, a synapse with weight w=0.0 will transfer no charge, similarly if w=1.0 all of the charge is transmitted.

The human brain contains millions of such networks (or one big network if you think of it like that), but when using ANNs we can simplify to just a handful:

A small Artificial Neural Network

A small Artificial Neural Network

The network above is what is known as a Multi Layer Perceptron, and consists of at least 3 layers:

  • An input layer which represents the inputs to the function
  • An output layer which represents the outputs of the function
  • One or more hidden layers which encodes the complexity

Using a suitably complex network, which has previously been exposed to appropriate training data (to be explored in Part 2), the following sequence takes place:

  • We decide what the network represents. For the network shown above, we may wish to say that the 1st input means “Eat” and the 2nd means “People”. The output would mean “Good” or “Bad”
  • We show our inputs to the input layer. This is in the form of a number between 0 and 1. For example, inputting a charge of  [1 , 0] would mean “Eat” “Not People”
  • The charge from these neurons are propagated to the next layer, depending on the weight of each individual synapse. This is where the magic happens.
  • The hidden neurons pass on their signals to the output layer, once again based on the weight of each connecting synapse.
  • The signal reaches the output layer, where we can extract our result.
The yellow neurons show how much charge they have

"Charged" neurons, shown glowing in yellow

The key thing to keep in mind here is that what you get out of the network is entirely dependent on the weights of the individual synapses, different combinations of which can produce entirely different outcomes. In effect, the knowledge the network stores IS the network itself. Complexity increases with the number of neurons and synapses, so the larger the network, the more patterns we can approximate.

Training

Because the weights of the synapses in an ANN are randomized at first, the network will most likely produce entirely the wrong result. To combat this, we train the network. When showing inputs to the network and observing an output, we decide how close this is to our desired output. For example, if we stimulate “Eat, People” and the network returns “Good”, then we tinker with the weights in the network ever so slightly in order to get closer to the desired outcome. Just like training a real gorilla, after many repetitions “Show, Observe, Adjust Behavior” the network learns the data.

And this relates to the real-world….how?

Simple, for example you can quite quickly build a function which outputs a list of films your customers might like to rent, based on ones they have already rented. All you have to do is get data in the form “She rented film X, then rented film Y”, continually show it to the network, and sit back and watch it learn (OK, it might not be that simple, but my point stands).

And talking of Netflix, we’ve arrived back at the start.

That’s where I will leave you for today. In Part 2, I will examine more closely how these networks operate, and the mathematics behind them. In Part 3, we will be programming a basic Multilayer Perceptron in Groovy. Part 4 will finally discuss some alternatives to MLPs and their uses.