系列(一)至(七)讨论了logistic regression相关的理论基础,本文用训练集训练logistic model,并用于识别测试集中猫的图片。

本文根据课程neural-networks-deep-learning的第一次programming assignment:PA1-Logistic Regression with a Neural Network mindset.

Package

课程提供了所需的图片集和相应的script:

  • datasets/train_catvnoncat.h5: 训练集
  • datasets/test_catvnoncat.h5: 测试集
  • lr_utils.py: 用于导入图片集的代码

首先,导入所需的package:

import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset

%matplotlib inline

Overview and Preprocess the Problem set

把data load进numpy的ndarray。

# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
  • load_dataset(): 是lr_utils.py内定义的方法
  • train_set_x_orig: 包含训练集所有图片的samples
  • train_set_y: 训练集对应的图片有猫还是无猫(0 or 1)
  • test_set_x_orig: 包含了测试集所有图片的samples
  • test_set_y: 测试集对应的图片有猫还是无猫(0 or 1)

训练集中一幅猫的图片例子:

# Example of a picture
index = 25
plt.imshow(train_set_x_orig[index])
print ("y = " + str(train_set_y[:,index]) + ", it's a '" + classes[np.squeeze(train_set_y[:,index])].decode("utf-8") +  "' picture.")
Fig.1 a cat image
Fig.1 a cat image

训练集中一幅不是猫的图片例子:

# Example of a picture
index = 26
plt.imshow(train_set_x_orig[index])
print ("y = " + str(train_set_y[:,index]) + ", it's a '" + classes[np.squeeze(train_set_y[:,index])].decode("utf-8") +  "' picture.")
Fig.2 a non-cat image
Fig.2 a non-cat image

查看图片集的shape

在开始训练之前,首先要对图片集做预处理,第一步要清楚图片集ndarray的shape。

### START CODE HERE ### (≈ 3 lines of code)
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]

print("train shape: " + str(train_set_x_orig.shape) )
print("test shape: " + str(test_set_x_orig.shape) )

print("m_train:" + str(m_train) + " num_px:" + str(num_px) + " channel:" + str(train_set_x_orig.shape[3]) )
print("m_test:" + str(test_set_x_orig.shape[0]) )
### END CODE HERE ###

train shape: (209, 64, 64, 3)
test shape: (50, 64, 64, 3)
m_train:209 num_px:64 channel:3
m_test:50

可以看出:

  • 训练集(train_set_x_orig): 共有209幅图片,每幅图片64x64的pixels,每个pixel是r,g,b三通道
  • 测试集(test_set_x_orig): 共有50幅图片,其它和训练集一致

reshape图片集

同样是预处理的一步:把图片集从(num of images,64,64,3) dimension reshape成为(64x64x3,num of images) dimension:

train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T
print("train_set_x_flatten: " + str(train_set_x_flatten.shape) )
print("train_set_y: " + str(train_set_y.shape) )

test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T
print("test_set_x_flatten:" + str(test_set_x_flatten.shape) )
print("test_set_y: " + str(test_set_y.shape) )

train_set_x_flatten: (12288, 209)
train_set_y: (1, 209)
test_set_x_flatten:(12288, 50)
test_set_y: (1, 50)

这样train_set_x_flatten和test_set_x_flatten就相当于前文提到的matrix:

其中:n=12288,m=209或50

standardize dataset

train_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.

除以pixel intensity的最大值255,使得所有的值都在[0,1]之间。

Building the parts of our algorithm

sigmoid function

实现logistic regression的sigmoid function:

# GRADED FUNCTION: sigmoid

def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    x -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    ### START CODE HERE ### (≈ 1 line of code)
    s = 1 / (1 + np.exp(-z))
    ### END CODE HERE ###
    
    return s

print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(9.2) = " + str(sigmoid(9.2)))

sigmoid(0) = 0.5
sigmoid(9.2) = 0.9998989708060922

Initializing parameters

初始化parameters,例如w和b,通常是直接initialize为0:

# GRADED FUNCTION: initialize_with_zeros

def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    w = np.zeros((dim,1))
    b = 0
    ### END CODE HERE ###

    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b

# test
dim = 2
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))

w = [[0.] [0.]]
b = 0

Forward and backward propagation

实现propagate() function计算cost function和gradient。前文讨论了foward propagation用来计算cost function;backward propagation用来计算cost function对w和b的偏导数。

cost function:

以及对w和b的偏导数:

# GRADED FUNCTION: propagate

def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    
    Tips:
    - Write your code step by step for the propagation
    """
    
    m = X.shape[1] # number of samples
    
    # FORWARD PROPAGATION (FROM X TO COST)
    ### START CODE HERE ### (≈ 2 lines of code)
    Z = np.dot(w.T,X)+b
    A = sigmoid(Z)
    cost = np.sum(-(Y * np.log(A) + (1-Y) * np.log(1-A) ) ) / m
    ### END CODE HERE ###
    
    # BACKWARD PROPAGATION (TO FIND GRAD)
    ### START CODE HERE ### (≈ 2 lines of code)
    dZ = A - Y
    dw = np.dot(X,dZ.T) / m
    db = np.sum(dZ) / m
    ### END CODE HERE ###

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost
    
# test
w, b, X, Y = np.array([[1], [2]]), 2, np.array([[1,2], [3,4]]), np.array([[1, 0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

dw = [[0.99993216] [1.99980262]]
db = 0.49993523062470574
cost = 6.000064773192205

这样,dw包含了cost function对所有w的偏导数,db是cost function对b的偏导数。

Optimization

计算了dw和db之后,使用梯度下降,不断迭代w和b,计算cost function的极小值,并得到相应的w和b。

# GRADED FUNCTION: optimize

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    This function optimizes w and b by running a gradient descent algorithm
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    
    Tips:
    You basically need to write down two steps and iterate through them:
        1) Calculate the cost and the gradient for the current parameters. Use propagate().
        2) Update the parameters using gradient descent rule for w and b.
    """
    
    costs = []
    
    for i in range(num_iterations):
        
        # Cost and gradient calculation (≈ 1-4 lines of code)
        ### START CODE HERE ### 
        grads, cost = propagate(w, b, X, Y)
        ### END CODE HERE ###
        
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        
        # update rule (≈ 2 lines of code)
        ### START CODE HERE ###
        w = w - learning_rate * dw
        b = b - learning_rate * db
        ### END CODE HERE ###
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        
        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" % (i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs
    
# test
params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))

w = [[0.1124579 ] [0.23106775]]
b = 1.5593049248448891
dw = [[0.90158428] [1.76250842]]
db = 0.4304620716786828

predict

此时参数已经训练完成,保存在w和b中。把训练的model用于图片集,以评估参数的效果。

# GRADED FUNCTION: predict

def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    m = X.shape[1] # number of samples
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)
    
    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    ### START CODE HERE ### (≈ 1 line of code)
    A = sigmoid(np.dot(w.T,X) + b) # logistic model
    ### END CODE HERE ###
    
    for i in range(A.shape[1]):
        # Convert probabilities a[0,i] to actual predictions p[0,i]
        ### START CODE HERE ### (≈ 4 lines of code)
        Y_prediction = np.where(A < 0.5, 0, 1.0) # 如果小于0.5,图片没有猫;大于0.5,图片有猫
        ### END CODE HERE ###
    
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction
    
 ### test
 print("predictions = " + str(predict(w, b, X)))

predictions = [[1. 1.]]

Merge all functions into a model

集成所有实现的functions:

# GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """
    
    ### START CODE HERE ###
    # initialize parameters with zeros (≈ 1 line of code)
    w,b = initialize_with_zeros(X_train.shape[0]) # shape[0] - number of features
    
    # Gradient descent (≈ 1 line of code)
    params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
    # Retrieve parameters w and b from dictionary "parameters"
    w = params["w"]
    b = params["b"]
    
    # Predict test/train set examples (≈ 2 lines of code)
    Y_prediction_train = predict(w, b, X_train)
    Y_prediction_test = predict(w, b, X_test)

    ### END CODE HERE ###

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

运行于图片集:

d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

Cost after iteration 0: 0.693147
Cost after iteration 100: 0.584508
Cost after iteration 200: 0.466949
Cost after iteration 300: 0.376007
Cost after iteration 400: 0.331463
Cost after iteration 500: 0.303273
Cost after iteration 600: 0.279880
Cost after iteration 700: 0.260042
Cost after iteration 800: 0.242941
Cost after iteration 900: 0.228004
Cost after iteration 1000: 0.214820
Cost after iteration 1100: 0.203078
Cost after iteration 1200: 0.192544
Cost after iteration 1300: 0.183033
Cost after iteration 1400: 0.174399
Cost after iteration 1500: 0.166521
Cost after iteration 1600: 0.159305
Cost after iteration 1700: 0.152667
Cost after iteration 1800: 0.146542
Cost after iteration 1900: 0.140872
train accuracy: 99.04306220095694 %
test accuracy: 70.0 %

可以看出经过2000次的迭代,cost function的值逐渐减小,如下图,符合预期。并且用于训练集的accuracy已经超过了99%,而测试集的accuracy是70%。

Fig.3 cost-iter-plot
Fig.3 cost-iter-plot

一个识别错误的例子

把训练出的参数用于测试集,只有70%的accuracy,也就是有30%是识别错误的。一个错误的例子:

# Example of a picture that was wrongly classified.
index = 5
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
print ("y = " + str(test_set_y[0, index]) + ", you predicted that it is a \"" + classes[int(d["Y_prediction_test"][0, index])].decode("utf-8") +  "\" picture.")
Fig.4 recognition fail
Fig.4 recognition fail

Further analysis

比较不同的learning rate对收敛速度以及accuracy的影响。

learning_rates = [0.01, 0.001, 0.005, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

learning rate is: 0.01
train accuracy: 99.52153110047847 %
test accuracy: 68.0 %
——————————————————-
learning rate is: 0.001
train accuracy: 88.99521531100478 %
test accuracy: 64.0 %
——————————————————-
learning rate is: 0.005
train accuracy: 97.60765550239235 %
test accuracy: 70.0 %
——————————————————-
learning rate is: 0.0001
train accuracy: 68.42105263157895 %
test accuracy: 36.0 %

Fig.5 compare learning rate
Fig.5 compare learning rate

如果learning rate取值过大,造成cost function在极小值附近振荡;而取值过小,会造成需要更多的迭代次数才能收敛,进而降低了算法的performance。