My Projects

I have used Netflix Dataset containing information about Movies and TV shows on the popular platform. I have mainly used two attributes from the dataset to build this model- title and description in Python using Tensorflow. Applied TfidfVectorizer on description and used it further to cluster the movies and shows into 10 different groups by implementing K-Means algorithm. Finally, I have built a BERT model to classify new films and series into the relevant clusters by providing descriptions which will help understand their genre. The accuracy achieved by the Transformer model is 86%.

Propaganda Classification on Online Content

I have developed a NLP based binary classification model on data collected from political speeches, news and tweets. First, I have performed information extraction tasks like POS tagging and lemmatization using spaCy library and used Word2vec to generate word embeddings. Further, I have implemented 3 models using Logistic Regression, Support Vector Machine and Keras Sequential Neural Network. Finally, I evaluated the classifiers to find Keras Sequential API with K-fold cross validation algorithm as the best model with a F-1 score of 83%.

The dataset used in this project contains booking information for a city hotel and a resort hotel over the period 2015-2017. I performed data-processing, exploratory data analysis and determined best time of year to book a hotel and classified customers based on likelihood to cancel booking. Further, I predicted number of future guests in a hotel using ARIMA and likelihood of special requests a hotel can receive using Random forest. I achieved an AUC score of 0.94 for prediction of booking cancellations using XGBoost model.

Human Age and Gender Recognition using Convolutional Neural Networks

I have developed a real-time video based system developed using OpenCV by training a CNN model on Adience Benchmark dataset. I have used Haar cascade classifier for face detection Gil Levi and Tal Hassner CNN trained models-Caffe for classification. Lastly, the classification results are uploaded on the Amazon S3(Simple Storage Service) cloud. This project was sponsored by Indicus Software Pvt.Ltd. and achieved an accuracy of 75%.

My Skills