## About

This is the notes for Improving Deep Neural Networks taught by Andrew Ng at Coursera. Here, lesson notes and assignments will be provided in order to enhance my comprehension about Neural Networks. You can view my github for the programming assignment.

## Content

There are two main parts in this week, which has been listed in the title:

- Face Recognition
- Neural Sytle Transfer

Both of them are quite interesting, and at the end of this week’s course, we’ll build a face recognition system and a neural style transfer machine. Let’s learn some basic techniques behind these two funny things.

### Face Recognition

This technique has two different categories:

- Face Vertification
- Face Recognition

But When you implement the assignment you’ll find face recognition is built just beyond face vertification. There’s no much difference between them.

To let the recognition much faster, we use one-shot learning. The idea of one-shot learning is to transform the original image into a 128 neuron as a vector. Then what we do is to compare the difference between two vectors generated by input image and image in the database. Of course, it is the convolutional computations that we need to generate vectors, which we called the whole process, the Siamese Network. You can view *DeepFace* paper by Taigman for detailed description on what is Siamese Network. Futhermore, the goal of learning is :

In order to train our neural network, we put forward with **Triplet Loss** as our cost function. If we define the image saved in the database `Anchor`

, abbreviated as `A`

, positive image(The right person picture) as `P`

, and negative image as `N`

. Then our triplet loss can be expressed as below:

||F(A)-F(P)||^{2} - ||F(A)-F(N)||^{2} + ∂ ≤ 0 (∂ here is the margin)

Thus, the cost function can be written like this:

Give 3 images, A,P,N,

L(A,P,N)=max(||F(A)-F(P)||^{2}- ||F(A)-F(N)||^{2}+ ∂ , 0)

J=∑L(A,P,N) for all images.

**Attention**: Since triplet loss can be satified easily for most images, you need to find those images that are hard to train. Only in this way can we update the parameters to build a well-performed machine.

Just to note that you can also use tanh activation as the final layer for the final output is just 0/1.

### Neural Style Transfer

This funny deep learning way is published in the paper *A Neural Algorithm of Artistic Style*. You can view the paper yourself if you like. Here I only cover how to find our cost function in order to build our neural network.

Before implementing this technique, you need to know what’s the neural network actually do. Here is a visualization of neural network *Visualizing and Understanding Convolutional Networks*. The same as we defined before, we’ll again define 3 abbreviations: Content image as `C`

, Style image as `S`

and Generated image as `G`

. Next, we will define the content cost function and style cost function repectively. Then we just add them together with different parameters as our final cost function.

**Content Cost Function**

Here we just map the image into a vector, just like what we do in the face recognition (Here we use a pre-trained VGG net as our map function). Then use L2 norm to measure the similarity of two images. The smaller value of content cost function, the more similar between these two pictures.

**Style Cost Function**

Again, we use a VGG net as our map function. But instead of choosing the output as our final value, a middle layer should be chosen in order to get the general style of Style image. Suppose here you get the output of layer l (your chosen layer),its shape is (`n_H*n_W*n_C`

), then we use this product to generate Gram matrix for the purpose of get the colrelation of two images through matrix. Here is the concrete step of getting our style matrix:

Then, the cost function can be defined as this:

Finally, we can define the cost function as :

J(G)=∂ J_{content}(C,G)+ß J_{style}(S,G)

Use this formula, you will get your own neural style transfer machine!

## Assignment 1 Face Recognition

### The Triplet Loss

We’ve covered the detailed implement of triplet loss function. Here is the code:

1 | def triplet_loss(y_true, y_pred, alpha = 0.2): |

### verify function

Three steps are needed with our given function.

1.Compute the encoding of the image from image_path

2.Compute the distance about this encoding and the encoding of the identity image stored in the database

3.right if distance is less than 0.7, otherwise false.

Code:

1 | # GRADED FUNCTION: verify |

### face recognition

You will find there’s little difference between this and face vertification. Just loop over the database dictionary to compare the image one by one.

1 | # GRADED FUNCTION: who_is_it |

## Assignment 2 Deep Learning & Art: Neural Style Transfer

One thing to mention, we use transfer learning to shorten the process of training a new CNN. Here we use VGG-19 for our Neural Style transfer.

### Computing the content cost

We would like the “generated” image G to have similar content as the input image C. Suppose you have chosen some layer’s activations to represent the content of an image. In practice, you’ll get the most visually pleasing results if you choose a layer in the middle of the network–neither too shallow nor too deep. (After you have finished this exercise, feel free to come back and experiment with using different layers, to see how the results vary.)

So, suppose you have picked one particular hidden layer to use. Now, set the image C as the input to the pretrained VGG network, and run forward propagation. Let $a^{(C)}$ be the hidden layer activations in the layer you had chosen. (In lecture, we had written this as $a^{(C)[l]}$, but here we’ll drop the superscript $[l]$ to simplify the notation.) This will be a $n_H \times n_W \times n_C$ tensor. Repeat this process with the image G: Set G as the input, and run forward progation. Let $$a^{(G)}$$ be the corresponding hidden layer activation. We will define as the content cost function as:

1 | # GRADED FUNCTION: compute_content_cost |

### Style Cost Function

The style matrix is also called a “Gram matrix.” In linear algebra, the Gram matrix G of a set of vectors (v*{1},…,v*{n}) is the matrix of dot products, whose entries are G_{ij}=v_{i}^{T}v_{j}=np.dot(v_{i},v_{j}). In other words, $ G_{ij} $ compares how similar $v_i$ is to v_{j} : If they are highly similar, you would expect them to have a large dot product, and thus for G_{ij} to be large.

Note that there is an unfortunate collision in the variable names used here. We are following common terminology used in the literature, but $G$ is used to denote the Style matrix (or Gram matrix) as well as to denote the generated image $G$. We will try to make sure which $G$ we are referring to is always clear from the context.

In NST, you can compute the Style matrix by multiplying the “unrolled” filter matrix with their transpose:

The result is a matrix of dimension （n_{c},n_{c}) where $n_C$ is the number of filters. The value G_{ij} measures how similar the activations of filter $i$ are to the activations of filter $j$.

1 | # GRADED FUNCTION: gram_matrix |

After generating the Style matrix (Gram matrix), your goal will be to minimize the distance between the Gram matrix of the “style” image S and that of the “generated” image G. For now, we are using only a single hidden layer $a^{[l]}$, and the corresponding style cost for this layer is defined as:

where $G^{(S)}$ and $G^{(G)}$ are respectively the Gram matrices of the “style” image and the “generated” image, computed using the hidden layer activations for a particular hidden layer in the network.

1 | # GRADED FUNCTION: compute_layer_style_cost |

You can combine the style costs for different layers as follows:

### Total Cost Function

Finally, let’s create a cost function that minimizes both the style and the content cost. The formula is:

**Exercise**: Implement the total cost function which includes both the content cost and the style cost.

1 | # GRADED FUNCTION: total_cost |

### Integrate all

Finally, let’s put everything together to implement Neural Style Transfer!

Here’s what the program will have to do:

- Create an Interactive Session
- Load the content image
- Load the style image
- Randomly initialize the image to be generated
- Load the VGG16 model
- Build the TensorFlow graph:
- Run the content image through the VGG16 model and compute the content cost
- Run the style image through the VGG16 model and compute the style cost
- Compute the total cost
- Define the optimizer and the learning rate

- Initialize the TensorFlow graph and run it for a large number of iterations, updating the generated image at every step.

## Summary

That’s all for CNN part. I’ve learnt to read some primary paper and basic knowledge about Covoluntional Neural Network. Next I’ll go through the RNN, the sequence model!