Coursera の Deep Learning Specialization の Logistic Regression with a Neural Network の演習内容(演習はPythonで実装)をJulia で実装してみる.
# dataset loading
using HDF5
function load_dataset()
train_set_x_orig = nothing
train_set_y_orig = nothing
test_set_x_orig = nothing
test_set_y_orig = nothing
classes = nothing
train_dataset = h5open("datasets/train_catvnoncat.h5", "r") do file
train_set_x_orig = read(file["train_set_x"])
train_set_y_orig = read(file["train_set_y"])
train_set_y_orig = reshape(train_set_y_orig, :, 1)
end
test_dataset = h5open("datasets/test_catvnoncat.h5", "r") do file
test_set_x_orig = read(file["test_set_x"])
test_set_y_orig = read(file["test_set_y"])
classes = read(file["list_classes"])
test_set_y_orig = reshape(test_set_y_orig, :, 1)
end
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
end
# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset();
この演習では,正方形の画像が与えられたときに,その画像が猫の画像(y=1)か猫以外の画像(y=0)かを判定するロジスティック回帰モデルを実装します.
トレーニングセットとテストセットの画像データは以下のようになっています.
※ Pythonで h5py ライブラリを使ってデータを読み込んだ場合と各次元の順番が異なりました.特に画像データの次元が(チャネル,幅,高さ)の順で読み込まている点は注意が必要.
# データのサイズを取得
m_train = size(train_set_y, 1)
m_test = size(test_set_y, 1)
num_px = size(train_set_x_orig, 3)
println("Number of training examples: m_train = $m_train")
println("Number of testing examples: m_test = $m_test")
println("Height/Width of each image: num_px = $num_px")
println("Each image is of size: (3, $num_px, $num_px)")
println("train_set_x shape: ", size(train_set_x_orig))
println("train_set_y shape: ", size(train_set_y))
println("test_set_x shape: ", size(test_set_x_orig))
println("test_set_y shape: ", size(test_set_y))
Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Height/Width of each image: num_px = 64
Each image is of size: (3, 64, 64)
train_set_x shape: (3, 64, 64, 209)
train_set_y shape: (209, 1)
test_set_x shape: (3, 64, 64, 50)
test_set_y shape: (50, 1)
using PyPlot
# 画像の例
index = 26
# 画像データの次元が(チャネル,幅,高さ)の順になっているので,(高さ,幅,チャネル)に変換して表示
image = permutedims(train_set_x_orig[:, :, :, index], (3, 2, 1))
imshow(image)
display(gcf())
println("y = ", train_set_y[index], ", it's a '", String(classes[train_set_y[index] + 1]), "' picture.")
y = 1, it's a 'cat' picture.
データセットをフラット化します.具体的には (3, num_px, num_px) の形状を (num_px * num_px * 3, 1) に変換します.
# データセットのフラット化
train_set_x_flatten = reshape(train_set_x_orig, :, m_train)
test_set_x_flatten = reshape(test_set_x_orig, :, m_test)
size(train_set_x_flatten), size(test_set_x_flatten)
((12288, 209), (12288, 50))
続いてデータを正規化します.ここではピクセル値の最大値255で除算し,型をfloat型にします.
train_set_x = train_set_x_flatten / 255.0;
test_set_x = test_set_x_flatten / 255.0;
ロジスティック回帰では,あるサンプル $x^{(i)}$ に対して
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$
$$ \mathcal{L}(a^{(i)}, y^{(i)}) = – y^{(i)} \log(a^{(i)}) – (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$$
を計算します.式(3) は損失関数で,その平均がコスト関数 $J$ になります.
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{4}$$
以下では次の手順で作業を進めます.
- モデルパラメーターの初期化.
- コスト関数を最小化するようにモデルパラメーターを学習.
- 学習したパラメーターを用いてテストセットに対して予測精度を測定.
まず,シグモイド関数を定義します.
sigmoid(z) = 1 ./ (1 .+ exp.(-z))
#test
x = [0.5, 0, 2.0]
println(sigmoid(x))
[0.6224593312018546, 0.5, 0.8807970779778823]
パラメーター $w$, $b$ をゼロで初期化する関数を定義します.
function initialize_with_zeros(dim)
w = zeros(dim)
b = 0.0
return w, b
end
#test
dim = 2
w, b = initialize_with_zeros(dim)
@assert typeof(b) == Float64
println("w = $w")
println("b = $b")
w = [0.0, 0.0]
b = 0.0
続いて,コスト関数と勾配を計算する propagate()
を実装します.
勾配の計算は
$$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{5}$$
$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{6}$$
とします.
実装の前に勾配の計算式 (5), (6) について,サンプル数 $m=1$ かつ $x$ がスカラーのケースで導出してみる.
$$
\begin{aligned}
z &= wx + b \\
a &= \sigma(z) \\
J &= \frac{-1}{m}\left( y \log(a) + (1-y) \log (1-a)\right)
\end{aligned}
$$
各勾配は
$$
\begin{aligned}
\frac{\partial z}{\partial w} &= x \\
\frac{\partial z}{\partial b} &= 1 \\
\frac{\partial a}{\partial z} &= \sigma(z) (1 – \sigma(z))\\
\frac{\partial J}{\partial a} &= \frac{-1}{m}\left( \frac{y-a}{a(1-a)}\right)
\end{aligned}
$$
であることから
$$
\begin{aligned}
\frac{\partial J}{\partial w} &= \frac{\partial J}{\partial a} \frac{\partial a}{\partial z} \frac{\partial z}{\partial w}\\
&= \frac{-1}{m}\left( \frac{y-a}{a(1-a)}\right) (a(1-a)) x\\
&= \frac{x}{m}\left(a-y\right)
\end{aligned}
$$
$$
\begin{aligned}
\frac{\partial J}{\partial b} &= \frac{\partial J}{\partial a} \frac{\partial a}{\partial z} \frac{\partial z}{\partial b}\\
&= \frac{-1}{m}\left( \frac{y-a}{a(1-a)}\right) (a(1-a))\\
&= \frac{1}{m}\left(a-y\right)
\end{aligned}
$$
となり,式 (5), (6) と同様の形が得られた.
function propagate(w, b, X, Y)
m = length(Y)
A = sigmoid(w' * X .+ b)
cost = -sum(Y .* log.(A') + (1 .- Y) .* log.(1 .- A'))/m
dw = (X * (A' - Y))/m
db = sum(A' .- Y)/m
grads = Dict("dw" => dw, "db" => db)
return grads, cost
end
#test
w = [1., 2]
b = 1.5
X = [1. -2. -1. ; 3. 0.5 -3.2]
Y = [1, 1, 0]
grads, cost = propagate(w, b, X, Y)
println("dw = ", grads["dw"])
println("db = ", grads["db"])
println("cost = $cost")
dw = [0.2507153166184082; -0.06604096325829123;;]
db = -0.12500404500439652
cost = 0.15900537707692405
勾配降下法によるパラメーターの最適化を実装します.
function optimize(w, b, X, Y, num_iterations=100, learning_rate=0.009, print_cost=false)
w = copy(w)
b = copy(b)
costs = Float64[]
dw = zeros(size(w))
db = 0.0
for i in 0:num_iterations-1
grads, cost = propagate(w, b, X, Y)
dw = grads["dw"]
db = grads["db"]
w -= learning_rate * dw
b -= learning_rate * db
if i % 100 == 0
push!(costs, cost)
if print_cost
println("Cost after iteration $i: $cost")
end
end
end
params = Dict("w" => w, "b" => b)
grads = Dict("dw" => dw, "db" => db)
return params, grads, costs
end
params, grads, costs = optimize(w, b, X, Y)
println("w = ", params["w"])
println("b = ", params["b"])
println("dw = ", grads["dw"])
println("db = ", grads["db"])
print("costs = ", costs)
w = [0.8095604646163251; 2.050820199511423;;]
b = 1.5948713189708588
dw = [0.1786050467423819; -0.04840655526979531;;]
db = -0.08888460336847771
costs = [0.15900537707692405]
最適化されたパラメータを使用して予測を行う関数を実装します.予測は次の2ステップで行います.
- 予測値 $\hat{y}$ を計算
- 予測値を閾値0.5で二値化
function predict(w, b, X)
m = size(X, 2)
Y_prediction = zeros(1, m)
A = sigmoid(w' * X .+ b)
Y_prediction .= A .> 0.5
return Y_prediction
end
#test
w = [0.1124579, 0.23106775]
b = -0.3
X = [1. -1.1 -3.2 ; 1.2 2. 0.1]
println("predictions = ", predict(w, b, X))
predictions = [1.0 1.0 0.0]
最後に,モデルを構築する関数を実装します.
function model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False)
w, b = initialize_with_zeros(size(X_train, 1))
params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
w = params["w"]
b = params["b"]
Y_prediction_test = predict(w, b, X_test) # outputは横ベクトル
Y_prediction_train = predict(w, b, X_train) # outputは横ベクトル
if print_cost
println("train accuracy: ", 1 - sum(abs.(Y_prediction_train' - Y_train))/length(Y_train))
println("test accuracy: ", 1 - sum(abs.(Y_prediction_test' - Y_test))/length(Y_test))
end
# Y_prediction を縦ベクトルに変換
d = Dict("costs" => costs, "Y_prediction_test" => Y_prediction_test', "Y_prediction_train" => Y_prediction_train', "w" => w, "b" => b, "learning_rate" => learning_rate, "num_iterations" => num_iterations)
return d
end
model (generic function with 4 methods)
学習を実行してみよう.
num_iterations=2000
learning_rate=0.005
print_cost=true
logistic_regression_model = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations, learning_rate, print_cost)
Cost after iteration 0: 0.6931471805599453
Cost after iteration 100: 0.5845083636993086
Cost after iteration 200: 0.46694904094655476
Cost after iteration 300: 0.37600686694802077
Cost after iteration 400: 0.33146328932825125
Cost after iteration 500: 0.30327306747438293
Cost after iteration 600: 0.2798795865826048
Cost after iteration 700: 0.26004213692587574
Cost after iteration 800: 0.2429406846779662
Cost after iteration 900: 0.2280042225672607
Cost after iteration 1000: 0.21481951378449643
Cost after iteration 1100: 0.20307819060644988
Cost after iteration 1200: 0.19254427716706857
Cost after iteration 1300: 0.18303333796883514
Cost after iteration 1400: 0.17439859438448876
Cost after iteration 1500: 0.16652139705400332
Cost after iteration 1600: 0.15930451829756614
Cost after iteration 1700: 0.152667324712965
Cost after iteration 1800: 0.14654223503982347
Cost after iteration 1900: 0.1408720757031016
train accuracy: 0.9904306220095693
test accuracy: 0.7
Dict{String, Any} with 7 entries:
"w" => [0.00961402; -0.0264683; … ; -0.0294478; 0.0237811;;]
"Y_prediction_test" => [1.0; 1.0; … ; 1.0; 0.0;;]
"b" => -0.0159062
"learning_rate" => 0.005
"Y_prediction_train" => [0.0; 0.0; … ; 0.0; 0.0;;]
"num_iterations" => 2000
"costs" => [0.693147, 0.584508, 0.466949, 0.376007, 0.331463, 0.…
# コストのプロット
costs = logistic_regression_model["costs"]
fig, ax = subplots()
ax.plot(costs)
ax.set_ylabel("cost")
ax.set_xlabel("iterations (per hundreds)")
ax.set_title("Learning rate = " * string(logistic_regression_model["learning_rate"]))
display(fig)
最後に学習率を変えてみて,学習結果がどのように変わるかを確認してみます.
learning_rates = [0.01, 0.005, 0.001, 0.0001]
models = Dict{String, Any}()
for lr in learning_rates
println("Training a model with learning rate: ", lr)
models[string(lr)] = model(train_set_x, train_set_y, test_set_x, test_set_y, 1500, lr, false)
println("\n-------------------------------------------------------\n")
end
fig, ax = subplots()
for lr in learning_rates
ax.plot(models[string(lr)]["costs"], label=string(models[string(lr)]["learning_rate"]))
end
ax.set_ylabel("cost")
ax.set_xlabel("iterations (hundreds)")
legend = ax.legend(loc="upper right", shadow=true)
display(fig)
Training a model with learning rate: 0.01
-------------------------------------------------------
Training a model with learning rate: 0.005
-------------------------------------------------------
Training a model with learning rate: 0.001
-------------------------------------------------------
Training a model with learning rate: 0.0001
-------------------------------------------------------
面白いことに,学習率を0.01にすると,学習の反復途中でコストが上下に振動しました.そしてその後収束し,最も低い値となりますが,トレーニングセットに対する誤差は少ないものの,テストセットに対する誤差は学習率0.005よりも悪く,過学習になっている模様.
実験を通して学習率がモデルの学習に与える影響を確認することができた.今後は深層展開によりこの収束性能がさらに向上するのかについて検証してみたい.
Leave a Reply