PyTorch

What is PyTorch

In official website: https://pytorch.org/tutorials/ . It states that:

  • A replacement for NumPy to use the power of GPUs
  • A deep learning research platform that provides maximum flexibility and speed

Diff between Tensorflow and PyTorch

The most important difference between the two is the way these frameworks define the computational graphs. While Tensorflow creates a static graph, PyTorch believes in a dynamic graph. So what does this mean? In Tensorflow, you first have to define the entire computation graph of the model and then run your ML model. But in PyTorch, you can define/manipulate your graph on-the-go. This is particularly helpful while using variable length inputs in RNNs.

Tensors

1
2
3
4
5
6
7
8
9
10
11
from __future__ import print_function
import torch

x = torch.empty(5,3)
x
# out
tensor([[1.1210e-44, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00]])
1
2
3
4
5
6
7
8
x = torch.rand(5, 3)
x
# out
tensor([[0.9122, 0.0691, 0.9595],
[0.2535, 0.0617, 0.5030],
[0.3705, 0.4274, 0.8880],
[0.0304, 0.0172, 0.9135],
[0.9683, 0.9874, 0.5131]])
1
2
3
4
5
6
7
8
x = torch.zeros(5, 3, dtype=torch.long)
x
# out:
tensor([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])

Reuse the input tensor

1
2
3
4
5
6
7
8
x = x.new_ones(5, 3, dtype=torch.double)
x
# out
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], dtype=torch.float64)

Override dtype

1
2
3
4
5
6
7
8
x = torch.randn_like(x, dtype=torch.float)
x
# Out:
tensor([[-0.6438, -1.6627, -1.0903],
[ 0.3002, 0.4009, -0.7618],
[ 0.1420, 0.9419, 0.1807],
[-1.2571, 0.0923, 0.3649],
[-0.2423, 0.1674, 0.6538]])

AUTOGRAD:自动分化
PyTorch中所有神经网络的核心是autograd软件包。让我们先简要地介绍一下,然后再训练第一个神经网络。

autograd软件包为Tensor上的所有操作提供自动区分。这是一个按运行定义的框架,这意味着您的backprop是由代码的运行方式定义的,并且每次迭代都可以不同。

让我们通过一些示例以更简单的方式看待这一点。

张量
torch.Tensor是程序包的中心类。如果将其属性.requires_grad设置为True,它将开始跟踪对其的所有操作。完成计算后,您可以调用.backward()并自动计算所有梯度。该张量的梯度将累积到.grad属性中。

要停止tensor(张量)跟踪历史记录,可以调用.detach()将其与计算历史记录分离,并防止跟踪将来的计算。

为了防止跟踪历史记录(和使用内存),您还可以使用torch.no_grad():包装代码块。这在评估模型时特别有用,因为模型可能具有可训练的参数并使 require_grad = True,但我们不需要梯度。

还有另外一类对autograd非常重要,它是一个 Function

TensorFunction相互连接并建立一个无环图,该图对完整的计算历史进行编码。每个tensor都有一个.grad_fn属性,该属性引用创建了张量的函数(用户创建的张量除外-它们的grad_fn is None)。

如果要计算导数,可以在Tensor上调用.backward()。如果Tensor是scalar标量(即,它包含一个元素数据),则无需为Backward()指定任何参数,但是,如果Tensor具有更多元素,则需要指定gradient梯度参数,该参数应匹配张量的shape。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import torch

x = torch.ones(2, 2, requires_grad=True)
x
# out
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
# use tensor opperation
y = x + 2
y
# out
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)

y.grad_fn
# out
<AddBackward0 at 0x7fac10f9d390>

# more operation
z = y * y * 3
out = z.mean()
out

# out
tensor(27., grad_fn=<MeanBackward0>)

# requires_grad has default False
a = torch .randn(2, 2)
a = ((a * 3) / (a-1))
print(a.requires_grad)

a.requires_grad_(True)

print(a.requires_grad)

b = (a * a).sum()
print(b.grad_fn)

# out
False
True
<SumBackward0 object at 0x7fac11114940>

Gradients

Now backprop, since outout contains a single scalar then out.backward() is equivalent to out.backward(torch.tensor(1,))

1
2
3
4
5
out.backward()
x.grad
#out
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])

Since the outout has tensor out=14i3(xi+2)2out = \frac{1}{4}\sum_i 3(x_i+2)^2 then the first deritive is outxi32i(xi+2)\frac{\partial out}{\partial x_i}\frac{3}{2}\sum_i (x_i+2). Hence the outxixI=1=9/2=4.5\frac{\partial out}{\partial x_i}|_{x_I=1} = 9/2 = 4.5

In math, the gradient of the vector valued function is defined as a Jacobian matrix:

J=(y1x1y1xnymx1ymxn)J = \begin{pmatrix} \frac{\partial y_1}{\partial x_1} & \dots &\frac{\partial y_1}{x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \dots &\frac{\partial y_m}{x_n} \end{pmatrix}

Nerual Network

Neural networks are in the torch.nn package.

Now we can use autograd and nn depends autograd to define the model and differentiate them. An nn.Module contains layers and forward(input) that returns output

A simple feed-forward network has the input and feeds it through several layers then produce an output.

A typical training procedure for a neural network is as follows:

  • Define the neural network that has some learnable parameters (or weights)
  • Iterate over a dataset of inputs
  • Process input through the network
  • Compute the loss (how far is the output from being correct)
  • Propagate gradients back into the network’s parameters
  • Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient

Define Network

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 3x3 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 3)
self.conv2 = nn.Conv2d(6, 16, 3)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features

net = Net()
net

#out
Net(
(conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear(in_features=576, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)

After define the forward function, and the backward function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the forward function.

The learnable parameters of a model are returned by net.parameters()

1
2
3
4
5
6
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

net.zero_grad()
out.backward(torch.randn(1, 10))

Loss Function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1, -1) # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)
#out
tensor(0.8399, grad_fn=<MseLossBackward>)

print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU
#out
<MseLossBackward object at 0x7fac110ef518>
<AddmmBackward object at 0x7fac110d9b00>
<AccumulateGrad object at 0x7fac110d97f0>

Backprop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

#out
conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([ 0.0171, -0.0057, -0.0081, 0.0024, 0.0039, 0.0138])

Update the weight

The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):

1
2
> weight = weight - learning_rate * gradient
>

We can implement this using simple Python code:

1
2
3
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)

In Nerual network we want to use several different method to update

1
2
3
4
5
6
7
8
9
10
11
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update

Training A Classifier

For any knid of data like image, text, audio or video data, we can use standard python packages that load data into a numpy array. Then we can convert this array into a torch.*Tensor.

  • For images, packages such as Pillow, OpenCV are useful
  • For audio, packages such as scipy and librosa
  • For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful

The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.

We will do the following steps in order:

  1. Load and normalizing the CIFAR10 training and test datasets using torchvision
  2. Define a Convolutional Neural Network
  3. Define a loss function
  4. Train the network on the training data
  5. Test the network on the test data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import torch
import torchvision
import torchvision.transforms as transforms

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x


net = Net()

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2): # loop over the dataset multiple times

running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data

# zero the parameter gradients
optimizer.zero_grad()

# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0

print('Finished Training')
Author: shixuan liu
Link: http://tedlsx.github.io/2020/01/07/pytorch/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
Donate
  • Wechat
  • Alipay

Comment