LiveCareer-Resume

senior data engineer resume example with 5+ years of experience

Jessica Claire
Senior Data Engineer
  • Montgomery Street, San Francisco, CA 94105 609 Johnson Ave., 49204, Tulsa, OK
  • Home: (555) 432-1000
  • Cell:
  • resumesample@example.com
Summary

Experienced, result-oriented, resourceful and problem-solving Data Engineer with 6+ years of diverse experience in Information Technology field, includes Development, and Implementation of various applications in Big Data and Cloud environments in Storage, Querying and Processing.

Certifications

AWS Certified Developer – Associate

,
Technical Skills
  • AWS Cloud Technologies - AWS S3, EMR, Athena, AWS Glue, Redshift Spectrum, Redshift, Lambda , Step Functions, CloudWatch, CloudTrail, SNS, SQS, IAM, DMS, DynamoDB, Snowflake and Microservices.
  • GCP Cloud Technologies - GCS, Dataproc, BigQuery, Dataflow, gsutil, Cloud Functions, Cloud Composer, Pub/Sub, Cloud SQL and Cloud Monitoring.
  • Big Data Components - Hadoop, YARN, Hive, Sqoop, Oozie, Kafka, Impala, Hue, HBase and Spark (Core, Spark SQL and Streaming).
  • Databases - Oracle, Teradata, SQL-Server and My SQL.
  • Programming Languages - Python, Scala, Core Java, SQL and Shell Scripting.
  • Data Visualization - Tableau and Trifacta.
  • Orchestration - Airflow, Oozie and Autosys.
  • Tools - Docker, Kubernetes, Maven, Ansible, Jenkins, JIRA, Git hub and Bit Bucket.
Experience
Senior Data Engineer, 07/2020 to Current
Factset Research Systems Inc.Remote - United States, NY,
  • Designed and implemented scalable, secure cloud architecture based on Amazon Web Services for registry specific data on AAN, AAO and AUA.
  • Worked with AIML and Analytics team to architect and build Data Lake using various AWS services like EMR, S3, Athena, Glue, Redshift Spectrum, Redshift, Spark, SparkSQL and Airflow.
  • Developed generic Spark frameworks to assist different Quantitative science teams on onboarding multi-format datasets from various sources.
  • Leveraged AWS S3, Lambda and Glue to build serverless event driven data pipelines.
  • Ingested continues data from various microservices using confluent Kafka connect.
  • Developed Spark generic UDFs to perform record level business logic operations.
  • Developed an Event driven pipeline using SNS, SQS, AWS Lambda, Glue and AWS step function.
  • Created the views by masking PHI/PII Columns for the table, so that data in the view for the PHI/PII columns cannot be seen by unauthorized teams.
  • Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
  • Deployed Kafka connectors using Docker and Kubernetes using ECR.
  • Deployed data pipelines with CICD process using Jenkins and Ansible.
Senior Data Engineer, 02/2019 to 06/2020
SplunkAcworth, GA,
  • Worked with APLA Enterprise Data Analytics team on data ingestion, transformation, and consumption views for Nike Direct forecast across the APLA (Asia Pacific Latin America) data built on AWS Cloud using EMR, S3, Lambda, Pyspark, Hive, Airflow and Snowflake.
  • Integrated various source systems and analyzed data to support pre channel and post channel sales transformation.
  • Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables and published data to Snowflake for Tableau dashboard data sources.
  • Experienced in consuming near real-time data using Spark Streaming with Kafka as a data pipe-line system.
  • Experience in handling JSON datasets and writing custom Python functions to parse through JSON data using Spark.
  • Built AWS Lambda function with Boto3 to de-register unused AMIs in all application regions to reduce the cost for EC2 resources.
  • Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes.
  • Worked on scheduling all jobs using Airflow scripts using python along with creating custom Operators.
Data Engineer, 01/2018 to 01/2019
VerizonGrandview, MO,
  • Worked with data stewardship and analytics teams to migrate an existing on-prem data pipelines to GCP using cloud native tools such as GCS Bucket, G - Cloud functions, Cloud dataflow, Pub/Sub cloud shell, gsutil , BQ command line utilities, Cloud Composer, Pyspark, Python, Dataproc and Big Query.
  • Expertise in handling large datasets using Partitions, Spark in Memory capabilities, efficient Joins, Transformations and other during ingestion process itself.
  • Developed generic Python script for data validation between source file and BigQuery tables along with maintains of archival process in GCS bucket.
  • Hands on experience in building and architecting multiple data pipelines, end to end ETL and ELT process for Data ingestion, transformation in GCP.
  • Created BigQuery authorized views for row level security or exposing the data to other teams.
  • Worked with cloud composer the run end to end data pipeline to schedule jobs and dependencies.
Data Engineer, 02/2016 to 12/2017
Principal Financial GroupMadison, WI,
  • Worked with Wholesale loss Forecasting team that comes under Global Risk Analytics platform of BOA to support CCAR (Comprehensive Capital Analysis & Review) cycles which is a part of Federal regulation.
  • Developed Pyspark scripts to reduce the costs of organization by 30% and migrating the legacy systems from Teradata, Oracle to build Data Lake in Hadoop.
  • Loading the data from the different Data sources like (Teradata and Oracle) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Integrated existing code logic in HiveQL to SparkSQL applications for data transformation and aggregation and write it to hive table.
  • Implemented dimensional Data Modeling to deliver Multi-Dimensional STAR, Snowflake schemas by normalizing the dimension tables as appropriate in Data Lake.
  • Extensively worked on Impala for querying hive tables for low latency and given to end users.
  • Developed application specific common utilities in Python to perform Data Quality (DQ) checks on data before being used by subsequent process or published to downstream users.
  • Developed oozie workflows and sub workflows to orchestrate the Sqoop scripts, hive queries, Spark scripts to automate the ETL process.
Education
Master of Science: Computer Science, Expected in 12/2015 to Monmouth University - West Long Branch, NJ,
GPA:
Bachelor of Technology: Computer Science And Engineering, Expected in 05/2013 to Jawaharlal Nehru Technological University - Hyderabad, India,
GPA:

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

Your data is safe with us

Any information uploaded, such as a resume, or input by the user is owned solely by the user, not LiveCareer. For further information, please visit our Terms of Use.

Resume Overview

School Attended

  • Monmouth University
  • Jawaharlal Nehru Technological University

Job Titles Held:

  • Senior Data Engineer
  • Senior Data Engineer
  • Data Engineer
  • Data Engineer

Degrees

  • Master of Science
  • Bachelor of Technology

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

*As seen in:As seen in: