AWS, Apache Spark, RStudio, Canopy, Visual Studio, GitHub, Android Studio
Data Science Skills: Machine Learning, Deep Learning in Tensorflow, Scikit-learn
Data Science InternJun 2016 - Aug 2016 DigitasLBiBoston
Built a Machine Learning ETL Pipeline to filter bot-like users using Python and Spark on AWS Platform.
Performed EDA to analyze distributions, find correlations and gain understanding of data(5M rows per day), User Level Feature Generation to create features to distinguish bot and human behavior and lead to meaningful clustering results, Unsupervised Learning using K-Means Clustering to identify bot clusters and labelling bots, Supervised Model Building to predict bots, innovatively built Multiscale Bootstrap Logistic Ensemble, Stability Analysis to evaluate scalability and to deploy over a month of data.