개발로 하는 개발

[CS231n] Assignment 1 - SVM 본문

Study

[CS231n] Assignment 1 - SVM

jiwon152 2023. 7. 25. 17:21

KNN

- space inefficient : have to remember all the data in the training set

- classifying is expensive : must calculate all the distances to all of the training set

-> Use SVM

 

SVM

Linear Classification - Score function, Loss

function 사용

 : minimize the loss function with respect to the parameters of the score function.

 

CIFAR-10 we have a training set of N = 50,000 images, each with D = 32 x 32 x 3 = 3072 pixels, and K = 10, since there are 10 distinct classes.

32*32*3 -> 3072로 변환

linear_svm.py

1. def svm_loss_naive(W, X, y, reg):

just naive implementation run

- implementation of gradient calculation

def svm_loss_naive(W, X, y, reg):
	dW = np.zeros(W.shape)  # initialize the gradient as zero, size : (3073, 10)

    # compute the loss and the gradient
    num_classes = W.shape[1] # 10
    num_train = X.shape[0] # 500
    loss = 0.0
    for i in range(num_train):
        scores = X[i].dot(W) # (10, ) = array of (sum of w * x)
        correct_class_score = scores[y[i]]
        for j in range(num_classes):
            if j == y[i]: # if the class is equal to the answer
                continue
            margin = scores[j] - correct_class_score + 1  # note delta = 1
            if margin > 0:
                loss += margin
                # gradient of the loss function with respect to W = d(margin) / dW
                # d (scores[j]) / dW = X[i] for all j != y[i]
                dW[:,j] += X[i]
                # - d (correct_class_score') / dW = - X[i]
                dW[:,y[i]] -= X[i]
                
    # Right now the loss is a sum over all training examples, but we want it
    # to be an average instead so we divide by num_train.
    loss /= num_train
    dW /= num_train
    return loss, dW
    # Add regularization to the loss.
    loss += reg * np.sum(W * W)
    dW += 2* reg * W # (reg * W * W)' = 2 * reg * W

2. def svm_loss_vectorized(W, X, y, reg):

implement loss calculation

scores = X.dot(W) # (500, 10)
correct_scores = scores[np.arange(X.shape[0]), y] #  (500, ), get the correct score x from x,y in scores
# note that for correct class, margin == 1
margins = np.maximum(0, scores - correct_scores[:, np.newaxis] + 1) # (500, 10)
# zero out the correct class -> insure hinge loss margins = 0
margins[np.arange(X.shape[0]), y] = 0
# normalize
loss = np.sum(margins) / X.shape[0] # (1, )
# add regularization
loss += reg * np.sum(W * W)

implement gradient calculation

# create a binary mask where each element is 1 when margins > 0
mask = (margins > 0).astype(int) # shape same with margin (500, 10)
# for correct class : mask[np.arrange(X.shape[0]), y] 
# subtract X for all the margins > 0, so save the # of margins that are positive
mask[np.arange(X.shape[0]), y] -= np.sum(mask, axis=1)
# dot product of mask and X
dW = np.transpose(X).dot(mask)
# normalize
dW /= X.shape[0]
# add regularization
dW += 2* reg * W # (reg * W * W)' = 2 * reg * W

check time difference

linear_classifier

1. def train( self, X,  y, learning_rate=1e-3, reg=1e-5, num_iters=100, batch_size=200, verbose=False)

SGD implementation.

For each iteration, get #batch_size random indices and set X_batch and y_batch. Evaluate loss and gradient. Update the weights using the gradient and the learning rate.

# get random indices to sample elements
random_indices = np.random.choice(X.shape[0], size=batch_size, replace=False)
X_batch = X[random_indices] # (200, 3073)
y_batch = y[random_indices] # (200, )
# perform parameter update
self.W += - learning_rate * grad

 

2. predict(self, X)

implement predict function based with trained weight

y_pred = np.argmax(X.dot(self.W), axis = 1) # (N, )

 

Hyperparameter tuning

# Provided as a reference. You may or may not want to change these hyperparameters
learning_rates = [1e-7, 5e-5]
regularization_strengths = [2.5e4, 5e4]

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

for lr in learning_rates:
    for reg in regularization_strengths:
        svm = LinearSVM()
        svm.train(X_train, y_train, lr, reg, num_iters=1500, verbose=True)
        y_train_pred = svm.predict(X_train)
        y_val_pred = svm.predict(X_val)
        results[(lr, reg)] = np.mean(y_train == y_train_pred), np.mean(y_val == y_val_pred)
        if  np.mean(y_val == y_val_pred) > best_val:
            best_val = np.mean(y_val == y_val_pred)
            best_svm = svm

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))

print('best validation accuracy achieved during cross-validation: %f' % best_val)

 

Results

Visualize results

 

Test accuracy

 

Visualize the learned weights for each class.