LiveCareer-Resume

sr data engineer technical lead resume example with 8+ years of experience

Jessica Claire
  • , , 609 Johnson Ave., 49204, Tulsa, OK 100 Montgomery St. 10th Floor
  • H: (555) 432-1000
  • C:
  • resumesample@example.com
  • Date of Birth:
  • India:
  • :
  • single:
  • :
Summary
  • Professional, result-oriented, skilled Data engineer with 7+ years of experience in building Data Integration, Data Processing, Data incentive applications.
  • Major contribution in building automated ETL framework for data ingestion, data transformations, data validation which helped client to migrate loads of existing traditional DBMS jobs to Databricks/spark with end-to-end testing.
  • In-depth knowledge and expertise on Databricks, Snowflake, Hadoop/Spark and ETL tools.
  • Build cloud data governance tool which can streamline the overall governance of the cloud environment to ensure optimum utilization of cloud resources, ease of compliance, enhanced security and standardization of processes for seamless scaling of the environment.
  • Expert in implementing data pipelines and orchestration using docker base containerized Airflow.
  • Key involvement in automating end-to-end process on-prem to cloud data migration for multiple clients using AWS, Azure, GCP & Databricks.
Skills and Certifications
  • Databricks certified developer – Apache spark 2.x
  • Google Cloud certified Professional data engineer.

M101J: MongoDB for Java Developers.

Experience
Sr. Data Engineer (Technical Lead) , 03/2021 - Current
Splunk Hawthrone, NJ,

Project: Metazoo-Metadata-driven data ingestion and ETL (OCT 2022- MAR 2023)

Client: Advent Health

Role: Cloud Data Engineer

Description: This project is migrating different on-prem data sources (Oracle, MySQL, Salesforce, etc.) to azure cloud/snowflake. Building automated metadata-driven framework and pipelines using azure data factory, creating a datalake in ADLS, and loading data to Snowflake for further reporting and analytics.

Environment: Azure, Salesforce, SQL server, snowflake, python, Azure data factory, Azure DevOps

Key Responsibilities:

  • Built Metazoo automation framework for salesforce metadata generation.
  • Automated source/salesforce schema extraction, schema processing, and job generation using python-based framework which can map salesforce data to snowflake.
  • Built parameterized ADF pipelines from extracted metadata as input parameters and ingested data into Azure data lake storage.
  • Ingested extracted parquet data into snowflake tables and created views on the top of it for further analysis.
  • Implemented different load strategies full/initial load, incremental load, and Type2 while loading data into snowflake.
  • Replicated on-prem nifi data pipeline in the cloud using azure data factory.
  • Test end-to-end ADF data pipeline and data validation for ingested data.
  • Document end-to-end process, and performance analysis on confluence.

Project : BDP(Business Data intelligent platform) (FEB 2022- OCT 2022)

Client : MyFitnessPal

Role : Sr. Data Engineer

Project Description : MyFitnessPal is one of the best weight loss apps and fitness apps, helping nearly 1 million members reach their nutrition and fitness goals every year. This project is migrating their application data to snowflake data warehouse for their BI needs as well as implementing ETL & Data Warehousing using Snowflake and orchestrates & automates complete end-to-end flow using Airflow jobs.

Environment : Aws s3, AWS Managed Apache Airflow, Snowflake

Key Responsibilities:

  • Create and manage data pipelines using MWAA airflow dags to load data from aws s3 to snowflake.
  • Created ETL jobs using snowflake to copy raw data into the landing schema of snowflake.
  • Implemented delta/incremental load with type 2, overwrite and append load strategies from landing/raw layer to staging layer.
  • Transformed and performed data curation & cleansing on raw variant data into suitable structured format using snowflake scripts.
  • Used snowflake streams to identify inserts, updates and deletes operations on raw data.
  • Created parameterized dags for different environments (PROD, DEV & QA) to orchestrate and schedule complete end-to-end ETL process.
  • Developed a custom log snowflake operator in Airflow for logging, debugging and auditing of Airflow jobs.
  • Worked closely with different stakeholders, BA, solution architect, QA as well as BI team to achieve project goals and meet project timelines.
  • Worked on process flow, lineage and different Sop documentation.

Project : Batch - Ingestion & On prem to Cloud data migration ( MAR 2021- JAN 2022)

Client : HSBC

Role : Sr. Data Engineer

Project Description : This project is data migration from on-prem to google cloud and implementing data ingestion strategies from GCP storage to big query using Airflow as orchestration tool

Environment : Python, GCP, GCS, Google Big query, Airflow, Juniper

Key Responsibilities:

  • Migration of source files with different file formats (.csv, .cobol, fixed width,.avro) from on prem servers to Google cloud storage using Juniper data migration tool.
  • Created juniper feeds for transferring files from on-prem virtual machine to gcs buckets.
  • Developed parameterized python scripts to perform data conversion, audit process, reconciliation of data before loading it into a big query table.
  • Wrote Cobol parser in python to read fixed width files and to load into target big query tables.
  • Replaced existing Control-M orchestration to Airflow.
  • Created Airflow dags to orchestrate complete end-to-end ingestion process and scheduling.
  • Performed data validations and unit testing using python.
  • created interdependent dags in Airflow using triggerdagrun operator and task sensors in airflow.
  • Created SOP documents for complete end-to-end ingestion process using confluence.

Client: Exeliq Consulting Inc. / Trustmark

Project: Cloud Governance

Description: The “Cloud Governance” tool streamlines the overall governance of the client-side cloud environment

after migrating to the cloud. The tool ensures that the cloud environment ensures ease of compliance, enhanced security, optimum utilization of resources,cost optimization and standardization of

Processes for seamless scaling of the environment.

Environment: Azure platform, Python, Azure Databricks, Azure AD, Azure Storage, Python, GitHub

Github Link: https://github.com/hitesh09p/TSPL/tree/master/cloudgovernance-master

Key Responsibilities:

  • Built and setup End-to-end Cloud governance framework for Client cloud environment
  • customized cloud governance as per the client needs
  • Created python framework with specific local and global industry compliance standards.
  • Optimized workloads and resource allocations for Significant cost optimization.
  • Studied and tested insightful reports and recommendations for a continuous cloud cost & resource optimization process.
  • Automated centralized cloud monitoring which enabled Audit, Security, and Compliance of the cloud platform.
  • Created and managed role-based access control for enhanced security compliance, granular level security and policy management using python framework.
  • Built cloud resource and cost monitoring customized dashboards using Tableau.
Data Engineer, 06/2016 - 02/2021
TSPL India City, STATE,

Project : HCA (Harbour Capital Advisors) (JAN 2020- FEB 2021)

Role : Sr. Data Engineer

Project Description : This project is implementing and migrating informatica ETL to Databricks PySpark & test spark automation framework.

Environment : Databricks, Spark, AWS S3, PostgreSQL, vagrant, informatica

Key Responsibilities:

  • Implemented data model from existing PostgreSQL to databricks Pyspark.
  • Converted RDBMS SQL stored procedure into Spark program using Spark libraries.
  • Migrated informatica ETL into Spark transformations and loaded data in PostgreSQL.
  • Used data from AWS S3 for processing and upload data to AWS S3 using KMS security.
  • Processed input text files and dimension table in csv format to load in PostgreSQL.
  • Parsed, extracted data from COBOL file using pyspark Jobs.
  • Implemented a testing framework to compare existing processed file extracts and new pyspark processed files.
  • Optimized the Spark code for large data processing using spark recommended performance tuning techniques.
  • Debug existing testing framework and do changes according to the requirement.
  • Upgraded databricks internal hive metadata store.
  • Migrated complete local testing framework to Databricks.

Project : Data Xform (FEB 2018 - DEC 2019)

Client : Ingredion

Role : Sr. Data Engineer

Description : “Data Xform” provides a seamless journey for data migration and transformation from a

plethora of legacy databases to the cloud environment. It works on database discovery, assessment and migration by using an industry specific architecture and ensuring minimal downtime and data loss while switching over to the cloud-hosted providers. The tool also ensures that integration of data across various databases is done efficiently and effectively.

Key Responsibilities:

  • Created and managed Single automated hybrid data integration framework using Apache Spark.
  • Data ingestion to AWS S3 from various Data sources like CSV, EXCEL, SQL server, MongoDB etc.
  • Performed Data cleansing, Data profiling on raw data using Spark-Scala in AWS Databricks.
  • Implemented End-to-end ETL automated framework on AWS Databricks platform.
  • Created Databricks template to load data to Datamart using different load-strategies like append, upsert, overwrite, Type-2 etc. using Spark.
  • Created Stream consumption pipelines using Kafka and Spark integration.
  • Consumed and processed near real time data received from KAFKA in the form of JSON format and reading and storing the same into Databricks delta tables using spark.
  • Developed Cost-efficient and fully managed cloud data transformation tool that scales on demand & Reduce Overhead cost.

Project : CAT-BOMBOD (AUG 2017 - JAN 2018)

Client : Caterpillar

Description : This project is automation and orchestration of complete BOM and BOD Pre and Post validation process.

Environment : Google Cloud Composer, Google Dataproc, Google storage , Google cloud functions, Google Compute Engine

Key Responsibilities:

  • Created Pre-validation, Canonical, ingestion, standardization, Neo4j engines and Post validation DAGS using Google Cloud Composer
  • Migrated GCE single node airflow instance to Multi Node Google Cloud Composer for orchestration and automation.
  • Implemented ETL using Google DataProc.
  • Enabled email notifications for failed facilities and validation.
  • Used trigger and external sensor and branch operator for interdag dependency.
  • Invoked Google cloud functions using SimpleHttpOperator through Google cloud composer.

Project : Numerator (2016 - JULY 2017 )

Description : This project is implementing a Data warehouse using Pentaho and Snowflake. Also, migrating Pentaho to airflow for distributed processing & automation.

Environment : Airflow, Snowflake, Pentaho, Python, Shell script, R scripts, GitHub.

Key Responsibilities :

● Designed and implemented POC using Databricks spark cluster.

● Migrated pentaho ETL jobs to Airflow dags for orchestration and automation.

● Migrated oracle stored procedures to snowflake scripts.

● Managed to load data in the snowflake system using the Airflow tool.

● Implemented Airflow data pipelines, creating DAGs in python to load data into snowflake with docker.

● Created and executed parameterized task workflows in Pentaho as per business requirement.

● Scheduled the tasks using airflow scheduler.

● Uploaded data to AWS S3 for data archival.

● Applied performance tuning techniques on snowflake data model.

Technology Mentor, 01/2015 - 11/2019
UpGrad.com City, STATE,

Big data Professional Trainer & Mentor

  • Providing professional training and mentor-ship services to Upgrad candidates.
  • Includes online Big Data training on various big data tools & platforms like Hadoop, Mapreduce, Hive, Hbase, Spark etc. & Student mentoring on data engineering interviews.
Education and Training
Bachelor of Science: , Expected in
-
Bachelors in Computer Engineering - Mumbai University,
GPA:
Status -

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

Your data is safe with us

Any information uploaded, such as a resume, or input by the user is owned solely by the user, not LiveCareer. For further information, please visit our Terms of Use.

Resume Overview

School Attended

  • Bachelors in Computer Engineering

Job Titles Held:

  • Sr. Data Engineer (Technical Lead)
  • Data Engineer
  • Technology Mentor

Degrees

  • Bachelor of Science

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

*As seen in:As seen in: