Client: US Department of Transport:
Project: Waze Data Collection Platform
- Architected a streaming data collection platform to store data securely and analyze data using pySpark on AWS EMR.
- Created a DevOps framework for deploying a full-stack application which provides near real-time search capability across the WAZE user/traffic data
- Designed and implemented a data pipeline using kinesis firehose to stream the data from the WAZE REST API.
- Created automated deployment pipelines using Jenkins and automated the infrastructure and application setup
- Provided a secure streaming HTML5 workstation and provided Jupyter notebook capabilities to analyze the data within the workspace.
Client: American Heart Association
Project: Precision Medicine Platform
- Involved as a Senior Consultant to drive the initiative of engaging the researcher community to use cloud services and analyze genomic data.
- Created a Jenkins infrastructure automation pipeline to deploy infrastructure into multiple accounts simultaneously using Terraform.
- Implemented the Blue/Green deployment process and adopted a very fast release cycle to ensure that we have a release every sprint to push features as they are available.
- Managed an off-shore team and was responsible for the overall researcher support and platform satisfaction.
- Enforced process-oriented operations and created a slack bot using Amazon Lex which helped us improve our day-to-day operations.
- Created a storyboarding session with the entire team and captured new ideas and stories to further optimize the PMP platform.
Client: American Heart Association
Project: DataScience Workspace
- Implemented a well-architected, secure 7-layer virtual private cloud to host various application services in their own isolated subnets.
- Developed and Implemented a solution that provides a data science environment using AWS EMR with the tools and services that are needed for a researcher to perform analysis on the health data within the platform.
- Created data governance models to support a large number of researchers trying to access and analyze genomic data which is stored in a centralized location.
- Developed a shell script to bootstrap the EMR cluster into a data Science environment to support different tools like Spark, TensorFlow, Shiny, Rstudio and etc.
- Developed an ETL process to normalize the data that is received from different consortium's to make sure that the data is unified within the AHA-PMP platform.
- Assembled an innovative solution using App Stream 2.0 which optimized the overall user experience as well as solved some of the complex problems related to dataset access.
- Developed infrastructure as code using Terraform and implemented serverless infrastructure deployment using Lambda, StepFunctions, and boto3 APIs.
- Implemented Chef configuration management service to automate the process of installing packages using chef recipes and to make sure that the deployed infrastructure is in compliance with the architecture.