avatar

线性回归与随机梯度下降

实验目的


  1. 进一步理解线性回归,闭式解和随机梯度下降的原理。
  2. 在小规模数据集上实践。
  3. 体会优化和调参的过程。

实验数据


线性回归使用的是LIBSVM Data中的Housing数据,包含506个样本,每个样本有13个属性。请自行下载scaled版本,并将其切分为训练集,验证集。

实验步骤

线性回归的闭式解

  1. 读取实验数据,使用sklearn库的load_svmlight_file函数读取数据。
  2. 将数据集切分为训练集和验证集,本次实验不切分测试集。使用train_test_split函数切分数据集。
  3. 选取一个Loss函数。
  4. 获取闭式解的公式,过程详⻅课件ppt。
  5. 通过闭式解得到参数W的值。
  6. 在训练集上测试并获得Loss函数值loss_train,在验证集上获得Loss函数值。
  7. 输出Loss,Loss_train和loss_val的值。

线性回归和随机梯度下降

  1. 读取实验数据,使用sklearn库的load_svmlight_file函数读取数据。
  2. 将数据集切分为训练集和验证集,本次实验不切分测试集。使用train_test_split函数切分数据集。
  3. 线性模型参数初始化,可以考虑全零初始化,随机初始化或者正态分布初始化。
  4. 选择Loss函数及对其求导,过程详见课件ppt。
  5. 随机选取训练集中的一个样本,求得该样本对函数的梯度。
  6. 取梯度的负方向,记为。
  7. 更新模型参数,为学习率,是人为调整的超参数。
  8. 在训练集上测试并得到Loss函数值loss_train,在验证集上测试并得到Loss函数值loss_val。
  9. 重复步骤5-8若干次,输出loss_train和loss_val的值。

代码示例


导入依赖

1
2
3
4
5
6
7
import numpy as np
import pandas as pd
import sklearn.datasets as sd
import sklearn.model_selection as sms
import matplotlib.pyplot as plt
import math
import random

读取数据

1
2
# 读取实验数据
X, y = sd.load_svmlight_file('housing_scale.txt',n_features = 13)

划分训练集与验证集

1
2
# 将数据集切分为训练集和验证集
X_train, X_valid, y_train, y_valid = sms.train_test_split(X, y)
1
2
3
4
5
# 将稀疏矩阵转为ndarray类型
X_train = X_train.toarray()
X_valid = X_valid.toarray()
y_train = y_train.reshape(len(y_train),1)
y_valid = y_valid.reshape(len(y_valid),1)
1
X_train.shape, X_valid.shape, y_train.shape, y_valid.shape

模型参数初始化

1
2
# 线性模型参数初始化,可以考虑全零初始化,随机初始化或者正态分布初始化。
theta = np.zeros((14, 1))

定义loss函数

1
2
3
4
5
6
# 选取一个Loss函数,计算训练集的Loss函数值,记为loss
def compute_loss(X, y, theta):
hx = X.dot(theta)
error = np.power((hx - y), 2).mean() / 2
# reg = np.power(theta[1:theta.shape[0]],2).mean()
return error

为X添加偏移量

1
2
X_train = np.concatenate((np.ones((X_train.shape[0],1)), X_train), axis = 1)
X_valid = np.concatenate((np.ones((X_valid.shape[0],1)), X_valid), axis = 1)
1
X_train.shape, X_valid.shape

查看当前训练集的loss

1
2
loss = compute_loss(X_train, y_train, theta)
loss

闭式解

定义闭式解函数

1
2
3
# 闭式解函数
def normal_equation(X, y):
return (np.linalg.inv(X.T.dot(X))).dot(X.T).dot(y)

求出闭式解

1
2
theta = normal_equation(X_train, y_train)
theta

闭式解在训练集下的loss

1
2
loss_train = compute_loss(X_train, y_train, theta)
loss_train

闭式解在验证集下的loss

1
2
loss_valid = compute_loss(X_valid, y_valid, theta)
loss_valid

梯度下降

定义梯度函数

1
2
def gradient(X, y, theta):
return X.T.dot(X.dot(theta) - y)

定义下降函数

1
2
3
4
5
6
7
8
9
def descent(X, y, theta, alpha, iters, X_valid, y_valid):
loss_train = np.zeros((iters,1))
loss_valid = np.zeros((iters,1))
for i in range(iters):
grad = gradient(X, y, theta)
theta = theta - alpha * grad
loss_train[i] = compute_loss(X, y, theta)
loss_valid[i] = compute_loss(X_valid, y_valid, theta)
return theta, loss_train, loss_valid

全批量梯度下降

1
2
3
4
5
6
# 全批量梯度下降
theta = np.zeros((14,1))
alpha = 0.001
iters = 25
opt_theta, loss_train, loss_valid = descent(X_train, y_train, theta, alpha, iters, X_valid, y_valid)
loss_train.min(), loss_valid.min()

1
2
3
4
5
6
7
8
9
iteration = np.arange(0, iters, step = 1)
fig, ax = plt.subplots(figsize = (12,8))
ax.set_title('Train')
ax.set_xlabel('iteration')
ax.set_ylabel('loss')
plt.plot(iteration, loss_train, 'b', label='Train')
# plt.plot(iteration, loss_valid, 'r', label='Valid')
plt.legend()
plt.show()

尝试Adam、Momentum、RMSprop、SGD、Mini-Batch等算法

读取新数据集

1
2
3
4
5
6
7
8
9
10
11
12
# 这是一个新的数据,以验证Adam方法的效果
X, y = sd.load_svmlight_file('cpusmall_scale.txt',n_features = 13)
# 将数据集切分为训练集和验证集
X_train, X_valid, y_train, y_valid = sms.train_test_split(X, y)
# 将稀疏矩阵转为ndarray类型
X_train = X_train.toarray()
X_valid = X_valid.toarray()
y_train = y_train.reshape(len(y_train),1)
y_valid = y_valid.reshape(len(y_valid),1)
X_train = np.concatenate((np.ones((X_train.shape[0],1)), X_train), axis = 1)
X_valid = np.concatenate((np.ones((X_valid.shape[0],1)), X_valid), axis = 1)
X_train.shape, X_valid.shape, y_train.shape, y_valid.shape

定义随机梯度下降函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# 随机梯度下降的执行函数 batch_size = 1
def stochastic_descent(X, y, theta, alpha, iters, batch_size, X_valid, y_valid, opt = 'None'):
# 参数初始化
# beta1 为 Momentum参数,值越大,则之前的梯度对现在的方向影响越大
# beta2 为 RMSprop的衰减速率, epsilon防止分母为0
data = pd.DataFrame(np.concatenate((y.reshape(y.size,1),X), axis = 1))
loss_train = np.zeros((iters,1))
loss_valid = np.zeros((iters,1))

v1= np.zeros((X.shape[1],1))
v2= np.zeros((X.shape[1],1))
beta1 = 0.9
beta2 = 0.999
epsilon = 1e-8
t = 1

for i in range(iters):
batch = data.sample(batch_size, replace=True)
batch = batch.values
data_X = batch[:,1:theta.size+1]
data_y = batch[:,0]

grad = gradient(data_X, data_y, theta)

if(opt=='Momentum'):
v1 = beta1 * v1 + (1-beta1) * grad
theta = theta - alpha * v1
elif(opt=='RMSprop'):
v2 = beta2 * v2 + (1-beta2) * grad * grad
theta = theta - alpha * grad/(np.sqrt(v2+epsilon))
elif(opt=='Adam'):
v1 = beta1 * v1 + (1-beta1) * grad
v2 = beta2 * v2 + (1-beta2) * grad * grad
# v1 = v1 / (1 - np.power(beta1, t))
# v2 = v2 / (1 - np.power(beta2, t))
t = t + 1
theta = theta - alpha * v1/(np.sqrt(v2)+epsilon)
elif(opt=='None'):
theta = theta - alpha * grad

loss_train[i] = compute_loss(X, y, theta)
return theta, loss_train

Mini-batch和SGD的比较

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
alpha = 0.001
iters = 1000

# 随机梯度下降
SGD_theta = np.zeros((14,1))
batch_size = 1

SGD_theta, SGD_train = stochastic_descent(X_train, y_train, SGD_theta, alpha, iters, batch_size, X_valid, y_valid)
SGD_valid = compute_loss(X_valid, y_valid, SGD_theta)
print(SGD_train.min())
print(SGD_valid)

# 批量梯度下降
Mini_theta = np.zeros((14,1))
batch_size = 16

Mini_theta, Mini_train = stochastic_descent(X_train, y_train, Mini_theta, alpha, iters, batch_size, X_valid, y_valid)
Mini_valid = compute_loss(X_valid, y_valid, Mini_theta)
print(Mini_train.min())
print(Mini_valid)


iteration = np.arange(0, iters, step = 1)
fig, ax = plt.subplots(figsize = (12,8))
ax.set_title('SGD vs MiniBatch')
ax.set_xlabel('iteration')
ax.set_ylabel('loss')
plt.plot(iteration, SGD_train,'b', label='SGD_train')
plt.plot(iteration, Mini_train,'r', label='Mini_train')
plt.legend()
plt.show()

Mini-batch和Momentum的比较

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
alpha = 0.0001
iters = 1000
Mini_theta = np.zeros((14,1))
batch_size = 16

Mini_theta, Mini_train = stochastic_descent(X_train, y_train, Mini_theta, alpha, iters, batch_size, X_valid, y_valid)
Mini_valid = compute_loss(X_valid, y_valid, Mini_theta)
print(Mini_train.min())
print(Mini_valid)


Momentum_theta = np.zeros((14,1))

Momentum_theta, Momentum_train = stochastic_descent(X_train, y_train, Momentum_theta, alpha, iters, batch_size, X_valid, y_valid, 'Momentum')
Momentum_valid = compute_loss(X_valid, y_valid, Momentum_theta)
print(Momentum_train.min())
print(Momentum_valid)

right = iters
iteration = np.arange(0, right, step = 1)
fig, ax = plt.subplots(figsize = (12,8))
ax.set_title('Momentum vs MiniBatch')
ax.set_xlabel('iteration')
ax.set_ylabel('loss')
plt.plot(iteration, Momentum_train[0:right],'b', label='Momentum_train')
plt.plot(iteration, Mini_train[0:right],'r', label='Mini_train')
plt.legend()
plt.show()

Mini-batch和RMSprop的比较

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
alpha = 0.01
iters = 500

RMSprop_theta = np.zeros((14,1))

RMSprop_theta,RMSprop_train = stochastic_descent(X_train, y_train, RMSprop_theta, alpha, iters, batch_size, X_valid, y_valid, 'RMSprop')
RMSprop_valid = compute_loss(X_valid, y_valid, RMSprop_theta)
print(RMSprop_train.min())
print(RMSprop_valid)

right = iters
iteration = np.arange(0, right, step = 1)
fig, ax = plt.subplots(figsize = (12,8))
ax.set_title('RMSprop vs MiniBatch')
ax.set_xlabel('iteration')
ax.set_ylabel('loss')
plt.plot(iteration, RMSprop_train[0:right],'b', label='RMSprop_train')
plt.plot(iteration, Mini_train[0:right],'r', label='Mini_train')
plt.legend()
plt.show()

Mini-batch和Adam的比较

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
alpha = 0.01
iters = 500
Adam_theta = np.zeros((14,1))

Adam_theta,Adam_train = stochastic_descent(X_train, y_train, Adam_theta, alpha, iters, batch_size, X_valid, y_valid, 'Adam')
Adam_valid = compute_loss(X_valid, y_valid, Adam_theta)
print(Adam_train.min())

print(Adam_valid)

right = iters
iteration = np.arange(0, right, step = 1)
fig, ax = plt.subplots(figsize = (12,8))
ax.set_title('Adam vs MiniBatch')
ax.set_xlabel('iteration')
ax.set_ylabel('loss')
plt.plot(iteration, Adam_train[0:right],'b', label='Adam_train')
plt.plot(iteration, Mini_train[0:right],'r', label='Mini_train')
plt.legend()
plt.show()

Author: WJZheng
Link: https://wellenzheng.github.io/2020/04/09/%E7%BA%BF%E6%80%A7%E5%9B%9E%E5%BD%92%E4%B8%8E%E9%9A%8F%E6%9C%BA%E6%A2%AF%E5%BA%A6%E4%B8%8B%E9%99%8D/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.

Comment