Implemented and tested a scalable module for validating and normalizing raw transactional data in production pipelines in Spark (Java).
Collaborated with Data Science team to transfer production pipelines from Redshift to Hive.
Designed and developed ETL pipelines in Oozie.
Summer InternJun 2011 - Aug 2011 Federal University of Espirito SantoVitória, Brazil
Developed a Color-Mixture based Traffic Sign Recognition System using German Traffic Sign Recognition Data.
Improved the recognition efficiency by using numerous image preprocessing techniques built in OpenCV using C.
Genetic Data Clustering using different Clustering Algorithms - Java, Weka: Implemented K-Means, Hierarchical Agglomerative, DBScan and Mixture Model based clustering for genes.Visualized data using PCA and validated clustering results by means of Jaccard coefficient and Correlation.
Context Based Ad Serving - Solr, SolrJ, Java, JSF, Facebook OAuth API: Built a News Search Engine for serving relevant News and Ads to maximize Return on Investment.Customized Vickrey auction model for computing final bid price by utilizing demographic data of users.
Simple Dynamo, Micro Implementation of Amazon's Dynamo - Java, Android: Adapted the Dynamo, to build a robust storage system with fault tolerance of single node failure. Created utilizing the concepts of Linearizability, Quorum, Replication and Failure Detection.
Simple Distributed Hash Table - Java, Android: Developed a storage system based on "Chord - A Scalable Peer-to-peer Lookup Service". Supported global as well as local insertion, querying and deletion on a maximum of five AVDs.
Master of Science: Computer Science and EngineeringFeb. 2015 State University of New YorkBuffalo, NY
Coursework includes : Data Mining, Information Retrieval, Distributed Systems, Computer Vision.
Bachelor of Technology: Design & Manufacturing Computer Science & EngineeringAug. 2012 Indian Institute of Information Technology