• Aman Arora’s Blog
Categories
All (21)
AI (1)
Clip (1)
Computer Vision (17)
Image Segmentation (2)
Jeremy Howard (1)
Kaggle (2)
Loss Function (1)
Model Architecture (7)
Model Architecure (2)
Multimodal (2)
NLP (1)
Object Detection (3)
Programming (2)
Transformers (6)

I work as Data Science Lead at REA Group, where we work with Property data, text and images. A part of my job is to experiment with the latest research and as part of this blog, I document my learnings as I go. I am also the authors of timm docs.

You will often see me writing about research papers (mostly in the field of Computer Vision) explaining them in a simple language in theory along with their PyTorch Implementation. Please feel free to subscribe to receive regular updates regarding new blog posts.

During my time as Machine Learning Engineer at Weights and Biases, I also wrote the following blog posts:

Previous blog posts
Title Link
Explained: Characterizing Signal Propagation to Close the Performance Gap in Unnormalized ResNets [link]
Revisiting ResNets: Improved Training and Scaling Strategies [link]
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases [link]
Is MLP-Mixer a CNN in Disguise? [link]
Are fully connected and convolution layers equivalent? If so, how? [link]
A faster way to get working and up-to-date conda environments using “fastchan” [link]
Inside Hugging Face’s Accelerate! [link]

The Annotated CLIP (Part-2)

Learning Transferable Visual Models From Natural Language Supervision
Multimodal
Transformers
Clip

This post is part-2 of the two series blog posts on CLIP (for part-1, please refer to my previous blog post). In this blog, we present the PyTorch code behind CLIP for model building and training. This blog post is in itself a working Jupyter Notebook.

Mar 11, 2023
Aman Arora

The Annotated CLIP (Part-1)

Learning Transferable Visual Models From Natural Language Supervision
Multimodal
Transformers

This post is part-1 of the two series blog posts on CLIP. In this blog, we present an Introduction to CLIP in an easy to digest manner. We also compare CLIP to other research papers and look at the background and inspiration behind CLIP.

Mar 3, 2023
Aman Arora

Swin Transformer

Hierarchical Vision Transformer using Shifted Windows
Computer Vision
Model Architecure
Transformers

Swin Transformer Model Architecture explained with PyTorch implementation line-by-line.

Jul 4, 2022
Aman Arora

The Annotated DETR

End-to-End Object Detection with Transformers
Computer Vision
Model Architecure
Object Detection
Transformers

DETR Model Architecture explained with PyTorch implementation line-by-line.

Jul 26, 2021
Aman Arora

The sad state of AI and tech startups in Australia today and what can we do about it

AI
Jeremy Howard
“Did you know that Australia’s…
May 15, 2021
Aman Arora

Adam and friends

Adam, SGD, RMSProp from scratch in PyTorch.
Computer Vision

Basic optimizers from scratch in PyTorch with working notebook.

Mar 13, 2021
Aman Arora

Vision Transformer

An Image is Worth 16x16 Words - Transformers for Image Recognition at Scale
Computer Vision
Model Architecture
Transformers

In this blog post, we will be looking at the Vision Transformer architectures in detail, and also re-implement in PyTorch from scratch.

Jan 18, 2021
Aman Arora

The EfficientDet Architecture in PyTorch

Computer Vision
Model Architecture
Object Detection

In this blog post, we will look at how to implement the EfficientDet architecture in PyTorch from scratch.

Jan 13, 2021
Aman Arora

EfficientDet - Scalable and Efficient Object Detection

Computer Vision
Model Architecture
Object Detection

As part of this blog post I will explain how EfficientDets work step-by-step.

Jan 11, 2021
Aman Arora

Top 100 solution - SIIM-ACR Pneumothorax Segmentation

Computer Vision
Kaggle
Image Segmentation

In this blog post, we will looking at Image Segmentation based problem in Pytorch with SIIM-ACR Pneumothorax Segmentation competition serving as a useful example and create a solution that will get us to the top-100 leaderboard position on Kaggle.

Sep 6, 2020
Aman Arora

GeM Pooling Explained with PyTorch Implementation and Introduction to Image Retrieval

Computer Vision

As part of this blog post we will be looking at GeM pooling and also look at the research paper Fine-tuning CNN Image Retrieval with No Human Annotation. We also implement GeM Pooling from scratch in PyTorch.

Aug 30, 2020
Aman Arora

U-Net A PyTorch Implementation in 60 lines of Code

U-Net Convolutional Networks for Biomedical Image Segmentation
Computer Vision
Model Architecture
Image Segmentation

As part of this blog post we will implement the U-Net architecture in PyTorch in 60 lines of code.

Aug 30, 2020
Aman Arora

SIIM-ISIC Melanoma Classification - my journey to a top 5% solution and first silver medal on Kaggle

Winning solution for SIIM-ISIC Melanoma Classification
Computer Vision
Kaggle

As part of this blog post I share my winning solution for SIIM-ISIC Melanoma Classification Kaggle Competition.

Aug 23, 2020
Aman Arora

EfficientNet

Rethinking Model Scaling for Convolutional Neural Networks
Computer Vision
Model Architecture

Look at the current SOTA, with top-1 accuracy of 88.5% on ImageNet.

Aug 13, 2020
Aman Arora

Group Normalization

Computer Vision

In this blog post, we will look at Group Normalization research paper and also implement Group Normalization in PyTorch from scratch.

Aug 9, 2020
Aman Arora

DenseNet Architecture Explained with PyTorch Implementation from TorchVision

Densely Connected Convolutional Networks
Programming
Computer Vision
Model Architecture

In this blog post, we introduce dense blocks, transition layers and look at the TorchVision implementation of DenseNet step-by-step.

Aug 2, 2020
Aman Arora

Squeeze and Excitation Networks Explained with PyTorch Implementation

Squeeze-and-Excitation Networks
Computer Vision
Model Architecture

In this blogpost, we re-implement the Squeeze-and-Excitation networks in PyTorch step-by-step with very minor updates to ResNet implementation in torchvision.

Jul 24, 2020
Aman Arora

Label Smoothing Explained using Microsoft Excel

Better language models and their implications
Computer Vision

In this blogpost, we re-implement Label Smoothing in Microsoft Excel step by step.

Jul 18, 2020
Aman Arora

An introduction to PyTorch Lightning with comparisons to PyTorch

Better language models and their implications
Programming
Computer Vision

In this blogpost, we will be going through an introduction to Pytorch Lightning and implement all the cool tricks like - Gradient Accumulation, 16-bit precision training, and also add TPU/multi-gpu support - all in a few lines of code. We will use Pytorch Lightning to work on SIIM-ISIC Melanoma Classification challenge on Kaggle.

Jul 12, 2020
Aman Arora

What is Focal Loss and when should you use it?

Better language models and their implications
Computer Vision
Loss Function

In this blogpost, we will understand what Focal Loss and when is it used. We will also take a dive into its math and implement step-by-step in PyTorch.

Jun 29, 2020
Aman Arora

The Annotated GPT-2

Better language models and their implications
NLP
Transformers

This post presents an annotated version of the paper in the form of a line-by-line implementation in PyTorch. This document itself is a working notebook, and should be a completely usable implementation.

Feb 18, 2020
Aman Arora
No matching items

Subscribe to Aman Arora's blog:

* indicates required