Tensorflow

TensorFlow’s eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later.

A TensorFlow variable are manipulated via the tf.Variable class. A tf.Variable represents a tensor whose value can be changed by running ops on it.

my_variable = tf.Variable(tf.zeros([1, 2, 3]))
with tf.device("/device:GPU:1"):
  v = tf.Variable(tf.zeros([10, 10]))
  
v = tf.Variable(0.0)
w = v + 1  # w is a tf.Tensor which is computed based on the value of v.
           # Any time a variable is used in an expression it gets automatically
           # converted to a tf.Tensor representing its value.
    
v = tf.Variable(0.0)
v.assign_add(1)
v.read_value()  # 1.0

To do operation with Tensor include math operations (such as tf.add and tf.reduce_mean), array operations (such as tf.concat and tf.tile), string manipulation ops (such as tf.substr)

https://www.tensorflow.org/guide/ragged_tensor

import tensorflow as tf
print(tf.__version__)
## 2.0.0-alpha0

a = tf.constant(100)
b = tf.constant(50)
c = a//b 
print(c)
c = a/b
print(c)
print(c.numpy())

## tf.Tensor(2, shape=(), dtype=int32)
## tf.Tensor(2.0, shape=(), dtype=float64)
## 2.0

## Eager and AutoGraph
## Eager Executions： 动态图，即看即所得
## 1.0 tensor graph session ？ 2.0 tensor

with tf.device('/cpu:0'):  ## choose cpu or gpu
    a = tf.constant([1.0, 2.0, 3.0], shape = [3], name = "a")
    b = tf.constant([1.0, 2.0, 3.0], shape = [3], name = "b")
with tf.device("/cpu:0"):
    c = a + b
print(c)
## tf.Tensor([2. 4. 6.], shape=(3,), dtype=float32)

## Autogrph, tensorflow has graph but if we only use the structure of tensor and use Python to control. Autograph can transform it to graph calculation
 ## If a function pass through the decorator(装饰器) then it will become graph mode based in tensorflow and to be calculated.

@tf.function
def simple_nn_layer(x, y):
    return tf.nn.relu(tf.matmul(x, y))

x = tf.random.uniform((3, 3))
y = tf.random.uniform((3, 3))

simple_nn_layer(x, y)
## <tf.Tensor: id=42, shape=(3, 3), dtype=float32, numpy=
# array([[0.47395706, 0.57921314, 1.2477942 ],
#       [0.30422333, 0.39322716, 0.32906225],
#       [0.5224111 , 0.66899484, 0.7995    ]], dtype=float32)>
simple_nn_layer
## <tensorflow.python.eager.def_function.Function at 0x7f91bafee3c8>

def simple_nn_layer(x, y):
    return tf.nn.relu(tf.matmul(x, y))

x = tf.random.uniform((3, 3))
y = tf.random.uniform((3, 3))

simple_nn_layer(x, y)

#<tf.Tensor: id=15, shape=(3, 3), dtype=float32, numpy=
#array([[0.7527748 , 0.5676385 , 0.7480458 ],
#       [0.55286586, 0.32944995, 0.54005927],
#       [0.5926941 , 0.47855514, 0.58787084]], dtype=float32)>

simple_nn_layer
# <function __main__.simple_nn_layer(x, y)>

## If a function pass through the decorator, it also will be used as graph
def linear_layer(x):  
  return 2*x + 1

linear_layer
#<function __main__.linear_layer(x)>


@tf.function
def deep_net(x):
    return tf.nn.relu(linear_layer(x))

deep_net(tf.constant(1, 2, 3))
#<tf.Tensor: id=58, shape=(3,), dtype=float64, numpy=array([3., 3., 3.])>

deep_net
# <tensorflow.python.eager.def_function.Function at 0x7f91c83dbf60>

@tf.function
def sum_even(items):
    s = 0
    for c in items:
        if c % 2 > 0:
            continue
        s += c
    return s

sum_even(tf.constant([10, 12, 15, 20]))
#<tf.Tensor: id=142, shape=(), dtype=int32, numpy=42>

 ## Tensor 
 ## initial of Tensor

## tf.constant  has value, dtype, shape, name
## tf.variable 
tf.constant(100)
#<tf.Tensor: id=144, shape=(), dtype=int32, numpy=100>
tf.Variable(100)
#<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=100>

Conventional Neural Network

Sparse Connected

If we have fully connected neural network, such as a 1000*1000 image. Then we have 1M hidden units and $10^{12}$ parameter. This also lead to an overfit, that is why we need sparse connected or locally connected neural network.

Many hidden layer can share same features in the image

Equivariance

$Transform (Represent(X))= Represent(Transform(X))$

Which means it not sensitive to the position of some features in the image.

Input Layer

The input layer has depth, width and height. We always need to the do normalization(归一化), centralisation(去均值) and Whitening(白化).

Normalization can make all data in the different features have same “range”. Max-min normalization and variance normalization(for no clear range features). When we try to find the best solution by using the gradient descent, the data after normalization can converage faster.

Centralisation let all features data minus the mean of this feature to make sure all features has centre at 0. If we do not have centralised data, the $W\cdot x + b$ may has a large value and the the W may become small which lead to overfit.

Whitening makes features have less relationship and all features has variance 1. PCA $\Rightarrow$ divide the standard deviation.

Gradient Descent

Random initial points in Neural Network will ignore the local minimum.