data engineer resume example with 4+ years of experience

Jessica Claire
, , 100 Montgomery St. 10th Floor (555) 432-1000,

Experienced professional with a strong background in designing, developing, testing, implementing, and maintaining Data Warehousing Systems and Business Intelligence applications across various platforms and industries. Skilled in utilizing AWS services such as Glue, Amazon Managed Kafka, Athena, CloudFormation, ECS, Network Load Balancer, API Gateway, IAM roles, and policies. Proficient in working with AWS services including S3, Redshift Spectrum, Redshift, EMR, Glue, Data Pipeline, Step Functions, CloudWatch, SNS, and CloudFormation. Expertise in building ETL and ELT data pipelines using Databricks and AWS Glue. Hands-on experience in real-time data streaming solutions using Apache Spark, Spark SQL & Data Frames, Kafka, and Spark Streaming. Strong knowledge of Spark ecosystems, including Spark core, Spark SQL, and Spark Streaming libraries. Familiarity with big-data databases such as HBase, MongoDB, and Cassandra. Skilled in working with Databricks, utilizing DFS (dbutils fs), Notebook, Widgets, Mount, and Secret Scopes. Proficient in working with Delta Tables and the Delta File system. Strong scripting skills in Python, Linux, and UNIX Shell. Extensive experience working with Data Warehouses like Teradata, Oracle, SAP, and HANA. Proficient in importing and exporting data between HDFS and Relational Systems using tools like Sqoop. Experience in analyzing and transforming analytics data as per requirements. Solid understanding of database concepts and expertise in creating and modifying database objects using SQL. Proficient in data migration, transformation, and integration. Knowledge of data modeling concepts such as Star-Schema Modeling, Snowflake Schema Modeling, and Fact and Dimension tables. Adheres to best practices for Data Warehousing, Data Lake, and Lake House methodologies. Familiarity with Metadata and repositories within a disciplined lifecycle methodology. Adaptable team player with the ability to tackle Big Data challenges in both on-premises and cloud environments. Experienced in working with Agile, Waterfall, and Scrum methodologies. Proficient in Agile methodologies and skilled in using Jira for managing sprints and issue tracking.

  • Languages:
  • Java, Python, Linux, C++
  • Version Control: Git, GitHub
  • Databases: MySQL, NoSQL, MongoDB, Cassandra
  • Big Data Stack: Hadoop, Spark (Core & SQL), MapReduce, HDFS, Hive, Sqoop, Kafka, HBase
  • Cloud: AWS (S3, EC2, Redshift, Lambda, Glue, Snowflake, Kinesis)
  • Python Modules: NumPy, Pandas, TensorFlow
  • IDE Tools: Eclipse, Jupyter, Anaconda, PyCharm, VS Code
08/2022 to 05/2023 Data Engineer Splunk | Olathe, KS,
  • Designed and Developed ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift
  • ETL Pipelines brings and transforms the huge volumes of data from different source systems
  • Created pipelines, data flows and complex data transformations and manipulations using AWS Glue and PySpark with Databricks
  • Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark
  • Implemented Serverless architecture using AWS Lambda with Amazon S3 and Amazon Dynamo DB
  • Scheduled clusters with Cloud watch and created Lambda to generate operational alerts for various workflows
  • Worked on AWS EC2, IAM, S3, LAMBDA, EBS, Elastic Load balancer (ELB), auto scaling group services
  • Designed, Developed and Deployed data pipelines for moving data across various systems
  • Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Data bricks
  • Designed and implemented highly performant data ingestion pipelines from multiple sources using Apache Spark and/or Databricks Delta Lake
  • Strong experience in working with developers for preparing Dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies in Tableau
  • Experienced in working on analytics data to visualize and analyze data and transform as per requirements
  • Expertise in creating and modifying database objects like Tables, Indexes, Views, Triggers, Synonyms, Sequences and Materialized views using SQL
  • Analysed the sql scripts and designed it by using PySpark SQL for faster performance
  • Developed spark applications in python (PySpark) on distributed environment
  • Solid experience in Data Warehousing best practices working with Metadata, repositories and experience within a disciplined lifecycle methodology.
06/2019 to 06/2021 AWS Data Engineer Accenture Contractor Jobs | Havre De Grace, MD,
  • Created Data mappings, Tech Design, loading strategies for ETL to load newly created or existing tables
  • Worked with Kafka for building robust and fault tolerant data Ingestion pipeline for transporting streaming data into HDFS and implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions
  • Created Kafka broker for structured streaming to get structured data by schema
  • Extracted real time guest data using Kafka and Spark streaming by creating DStreams and converting them into RDD, processing it and pushing them into Cassandra leveraging DataStax Spark-Cassandra connector
  • Developed Elastic Search Connector using Kafka Connect API with source as Kafka and sink as elastic search
  • Used Spark Data Frames API over platforms to perform analytics on Hive data and used Spark Data Frame operations to perform required validations in the data
  • Hands on experience on developing UDF, Data Frames and SQL queries in Spark SQL
  • Created and modified existing data ingestion pipelines using Kafka and Sqoop to ingest the database tables and streaming data into HDFS for analysis
  • Finalized the naming Standards for Data Elements and ETL jobs and create a Data dictionary for Meta Data Management
  • Worked on developing ETL workflows on the data obtained using Python for processing it in HDFS and HBase using Flume
  • Analyzed large and critical datasets using HDFS, HBase, Hive, HQL, Pig, Sqoop and Zookeeper
  • Developed multiple POC's using Spark, Scala and deployed on the Yarn Cluster, compared the performance of Spark, with Hive and SQL
  • Responsible for building scalable distributed data solution using Hadoop Cluster environment with Hortonworks distribution
  • Used Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as storage mechanism
  • Capable of using AWS utilities such as EMR, S3 and Cloud Watch to run and monitor Hadoop and Spark jobs on AWS
  • Maintained AWS Data pipeline as web service to process and move data between Amazon S3, Amazon EMR and Amazon RDS resources
  • Data cleaning, pre-processing and modeling using Spark and Python
  • Implemented real-time data driven secured REST APIs for data consumption using AWS (API, API Gateway, Route 53, Certificate Manager, CloudWatch, Kinesis), Swagger, Okta and Snowflake.
11/2017 to 05/2019 ETL Developer Children's Place | Bay Shore, NY,
  • Involved in business meetings to gather requirements, business Analysis, Design, review and Development, testing
  • Created ETL Mappings for the Operational dashboard for various KPIs, Business Metrics, allow powerful drill down, for Detail reports to understand the data at a very detailed level
  • Involved in complete SDLC including Requirement Specifications, Analysis, Design, Development, & Testing of BI and Data warehouse application
  • Involved in preparing Functional Specifications, Technical Specifications, Testing Plans, and other documentation as required by SDLC
  • Created Jobs and scheduled Packages using SQL Server Management Studio for the Daily Load
  • Performed tuning and optimization of complex SQL queries using Teradata
  • Responsible for Collecting Statistics on FACT tables
  • Developed Python scripts for ETL load jobs using Pandas functions
  • Created tables, views in Teradata, according to the requirements
  • Performed bulk data load from multiple data source (ORACLE 8i, legacy systems) to TERADATA RDBMS using BTEQ, Multiload and Fast Load
  • Designed, created, and tuned physical database objects (tables, views, indexes, PPI, UPI, NUPI, and USI) to support normalized and dimensional models
  • Created a cleanup process for removing all the Intermediate temp files that were used prior to the loading process
  • Worked on creating few Tableau dashboard reports, Heat map charts and supported numerous dashboards, pie charts and heat map charts that were built on Teradata database.
Expected in 05/2023 to to M.S | Computer Science Arkansas State University, , GPA:
Expected in to to Cloud Computing, Machine Learning, Time Series Analysis, Unix Programming, Structured Programming | , , GPA:
Object Oriented Programming, Analysis of Algorithms, Software Security.

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

Your data is safe with us

Any information uploaded, such as a resume, or input by the user is owned solely by the user, not LiveCareer. For further information, please visit our Terms of Use.

Resume Overview

School Attended

  • Arkansas State University

Job Titles Held:

  • Data Engineer
  • AWS Data Engineer
  • ETL Developer


  • M.S
  • Cloud Computing, Machine Learning, Time Series Analysis, Unix Programming, Structured Programming

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

*As seen in:As seen in: