A deep learning research platform that provides maximum flexibility and speed
Diff between Tensorflow and PyTorch
The most important difference between the two is the way these frameworks define the computational graphs. While Tensorflow creates a static graph, PyTorch believes in a dynamic graph. So what does this mean? In Tensorflow, you first have to define the entire computation graph of the model and then run your ML model. But in PyTorch, you can define/manipulate your graph on-the-go. This is particularly helpful while using variable length inputs in RNNs.
Tensors
1 2 3 4 5 6 7 8 9 10 11
from __future__ import print_function import torch
x = torch.empty(5,3) x # out tensor([[1.1210e-44, 0.0000e+00, 0.0000e+00], [0.0000e+00, 0.0000e+00, 0.0000e+00], [0.0000e+00, 0.0000e+00, 0.0000e+00], [0.0000e+00, 0.0000e+00, 0.0000e+00], [0.0000e+00, 0.0000e+00, 0.0000e+00]])
1 2 3 4 5 6 7 8
x = torch.rand(5, 3) x # out tensor([[0.9122, 0.0691, 0.9595], [0.2535, 0.0617, 0.5030], [0.3705, 0.4274, 0.8880], [0.0304, 0.0172, 0.9135], [0.9683, 0.9874, 0.5131]])
x = torch.ones(2, 2, requires_grad=True) x # out tensor([[1., 1.], [1., 1.]], requires_grad=True) # use tensor opperation y = x + 2 y # out tensor([[3., 3.], [3., 3.]], grad_fn=<AddBackward0>)
y.grad_fn # out <AddBackward0 at 0x7fac10f9d390>
# more operation z = y * y * 3 out = z.mean() out
# out tensor(27., grad_fn=<MeanBackward0>)
# requires_grad has default False a = torch .randn(2, 2) a = ((a * 3) / (a-1)) print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum() print(b.grad_fn)
# out False True <SumBackward0 object at 0x7fac11114940>
Gradients
Now backprop, since out contains a single scalar then out.backward() is equivalent to out.backward(torch.tensor(1,))
Now we can use autograd and nn depends autograd to define the model and differentiate them. An nn.Module contains layers and forward(input) that returns output
A simple feed-forward network has the input and feeds it through several layers then produce an output.
A typical training procedure for a neural network is as follows:
Define the neural network that has some learnable parameters (or weights)
Iterate over a dataset of inputs
Process input through the network
Compute the loss (how far is the output from being correct)
Propagate gradients back into the network’s parameters
Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient
defforward(self, x): # Max pooling over a (2, 2) window x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # If the size is a square you can only specify a single number x = F.max_pool2d(F.relu(self.conv2(x)), 2) x = x.view(-1, self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x
defnum_flat_features(self, x): size = x.size()[1:] # all dimensions except the batch dimension num_features = 1 for s in size: num_features *= s return num_features net = Net() net
After define the forward function, and the backward function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the forward function.
The learnable parameters of a model are returned by net.parameters()
1 2 3 4 5 6
input = torch.randn(1, 1, 32, 32) out = net(input) print(out)
net.zero_grad() out.backward(torch.randn(1, 10))
Loss Function
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
output = net(input) target = torch.randn(10) # a dummy target, for example target = target.view(1, -1) # make it the same shape as output criterion = nn.MSELoss()
loss = criterion(output, target) print(loss) #out tensor(0.8399, grad_fn=<MseLossBackward>)
print(loss.grad_fn) # MSELoss print(loss.grad_fn.next_functions[0][0]) # Linear print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU #out <MseLossBackward object at 0x7fac110ef518> <AddmmBackward object at 0x7fac110d9b00> <AccumulateGrad object at 0x7fac110d97f0>
Backprop
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward') print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward') print(net.conv1.bias.grad)
#out conv1.bias.grad before backward tensor([0., 0., 0., 0., 0., 0.]) conv1.bias.grad after backward tensor([ 0.0171, -0.0057, -0.0081, 0.0024, 0.0039, 0.0138])
Update the weight
The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):
1 2
> weight = weight - learning_rate * gradient >
We can implement this using simple Python code:
1 2 3
learning_rate = 0.01 for f in net.parameters(): f.data.sub_(f.grad.data * learning_rate)
In Nerual network we want to use several different method to update
1 2 3 4 5 6 7 8 9 10 11
import torch.optim as optim
# create your optimizer optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop: optimizer.zero_grad() # zero the gradient buffers output = net(input) loss = criterion(output, target) loss.backward() optimizer.step() # Does the update
Training A Classifier
For any knid of data like image, text, audio or video data, we can use standard python packages that load data into a numpy array. Then we can convert this array into a torch.*Tensor.
For images, packages such as Pillow, OpenCV are useful
For audio, packages such as scipy and librosa
For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful
The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.
We will do the following steps in order:
Load and normalizing the CIFAR10 training and test datasets using torchvision
defforward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x