A **neural network** is a network of interacting **processing units**. Traditional neural networks tend to have a small number of hidden layers, guided by the Universal Approximation Theorem, and perform feature learning. Modern neural networks combine stochastic techniques and biology-inspired techniques with many hidden layers and are referred to as deep learning.

#### Feedforward Neural Networks

Can be viewed as extensions of logistic regression.

$H$ hidden units
$d$ dimensions on the input data
$H$ d-dimensional weight vectors
One H-dimensional output vector of weights

A simple example uses a single hidden layer. In general, $h_k(x)$ and $f(x)$ are non-linear functions of x. You can add more hidden units and you can add more hidden layers.

#### Relationship with Logistic Regression

A trivial neural network with no hidden units results in a logistic regression function $f(x)$.
$f(x) = \frac{1}{1 + e^{\sum_{k=1}^M \alpha_h h_k(x)}}$ Where $h_k(x)$ is an output of the hidden unit and the $\alpha_h$ are output weights to be learned.

The complication of neural networks is that they are not guaranteed convex optimization functions. Feedforward networks are also easier to overfit by adding additional hidden units. The advantage over logistic regression is that neural networks are inherently more expressive.

### Activation Functions

The processing unit inside the network applies an **activation function** to inputs.

#### ReLUs

#### ELUs

### Learning

Let's say we have $d$-dimensional data.

For a single layer of $H$ hidden units, need to learn:

$d$ biases for each visible unit

$H$ biases for each hidden unit

$d*H$ weights for each edge between them

$d$ probabilities $p(\underline{h}=1)$, a probability that each hidden unit is activated

$H$ probabilities $p(\underline{v}=1)$, a probability that each visible unit is activated

#### Backpropagation

## Basic Architectures

### Restricted Boltzmann Machines

Restricted Boltzmann Machines (RBMs) form a bipartite graph with hidden and visible units. Nodes in the same layer are *restricted* from having dependencies or connections to each other. Many neural networks are analogous to stacks of RBMs ^{1)}.

### Autoencoders

Comparable to information compression techniques, autoencoders restrict the number of hidden nodes to fewer than the number of input values. This forces the hidden layer to adopt some useful aspect of the input signal.