Info

Source code: https://github.com/lenguyen1807/mnist-cpp

Tutorial: Cùng mình xây dựng Neural Network đơn giản (Phần 1) (only in Vietnamese for now, my English writing is terrible 🥲).

What I've learned:

  • How Pytorch training (and testing) loop works. I used the same approach, I don't know it is the correct way or not 🥲 but it looks like this:
/* Training part */
for (size_t i = 0; i < trainImgs.size(); i++)
{
    // forward pass
    MatrixPtr logits = nn.Forward(trainImgs[i]->data);
 
    // Apply softmax
    MatrixPtr pred = SoftMax(logits);
 
    // calculate loss with label
    double loss = CrossEntropyLoss(*pred, *(trainImgs[i]->label));
    trainLoss += loss;
 
    // calculate accuracy
    size_t predLabel = pred->ArgMax();
    size_t trueLabel = trainImgs[i]->label->ArgMax();
    trainCorrect += (predLabel == trueLabel) ? 1.0 : 0.0;
 
    // zero all previous gradients
    nn.ZeroGrad();
 
    // backward pass (calculate gradients)
    nn.Backward("Cross Entropy", pred, trainImgs[i]->label);
 
    // optimize
    nn.Optimize();
}
 
/* Testing part */
for (const auto& img : testImgs)
{
    // forward pass
    MatrixPtr logits = nn.Forward(img->data);
 
    // Apply softmax
    MatrixPtr pred = SoftMax(logits);
 
    // calculate loss with label
    testLoss += CrossEntropyLoss((*pred), *(img->label));
 
    // calculate accuracy
    size_t predLabel = pred->ArgMax();
    size_t trueLabel = img->label->ArgMax();
    testCorrect += (predLabel == trueLabel) ? 1.0 : 0.0;
}
  • I also learned how to manage memory in C++ using smart pointer (std::shared_ptr and std::unique_ptr). Althought it is not really success but it's alright btw 😑.
  • And finally I can finish a side project (not just dropped it after few lines of code). I look forward to build a new project in Deep Learning Compiler (this is the first step), the next step is building the AutoGrad system like Pytorch or smaller like TinyGrad 1.

What can be improved:

  • Implement mini-batch.
  • Implement momentum to SGD.
  • Implement more optimization algorithm (AdaGrad, AdamW).
  • Implement Dropout (prevent overfitting).
  • Implement Batch Normalization.
  • Use GPU (otherwise multi-thread on CPU).

Note: After a while, this project is suck now, no performance and the implementation is really bad 🥲.

Showcase

Result
Result

Footnotes

  1. https://github.com/tinygrad/tinygrad