Jessica Claire
  • Montgomery Street, San Francisco, CA 94105
  • H: (555) 432-1000
  • C:
  • Date of Birth:
  • India:
  • :
  • single:
Professional Summary

Data Engineer with over 6 years of successful experience in Big Data and related technologies. Recognized consistently for performance excellence and contributions to various Big Data Projects. Worked extensively with Hadoop technologies like HDFS, Spark with Scala, Pig, Hive, HBase, Sqoop, Flume, Oozie, Kafka, Ranger, and well versed with SQL, Java, and Python, etc.

Also, have experience training a large group of audience (40-50) on distributed systems and Hadoop technologies including cloud based services. Have excellent reputation for resolving problems, improving customer satisfaction, and driving overall business requirements. Consistently driven many projects using resources with no/minimal knowledge on Big Data and successfully completed them. Received Project of the year award for three consecutive years for the projects I have worked on.

  • Built a tool to automate the Hadoop installation process which reduced 3000 man hours/year
  • Won star of the month Award, Project of the year award for three consecutive years
  • As an active member of Technology Research Team, Big Data Center of Excellence team, developed several POCs which are converted into Full time projects with in less time
  • Python and Bash proficiency
  • Apache Hadoop, Hive, Pig, Spark, Spark Streaming,Kafka
  • All Major Hadoop Distributions HDP, MAPR, Cloudera including EMR and HD Insight
  • Data Stage, Talend, SSIS
  • Programming languages like Java, C, C#, SQL
  • AWS, Azure Services
  • Snowflake
  • Databases like Greenplum, Hive, HBase, MS SQL Server, Mongo Db
Work History
Senior Data Engineer, 07/2019 - Current
Thoughtworks Norfolk, VA,
  • Analyzed all ETL processes and gathered all dependencies, Upstream, Downstream for each of the processes that are reading/writing to Sybase
  • Design data flow and standard data quality checks that are needed to make the process run without human intervention
  • Developed and managed a delta process which copies over the data nightly over to Greenplum from Sybase using CDC. This reduced the load on application backend(Sybase)
  • Developed, implemented, supported and maintained data analytics protocols, standards and documentation.
  • Contributed to internal activities for overall process improvements, efficiencies and innovation
  • Manage all the client communication and business calls to prioritize process migration and designed processes to meet the business requirements
  • Managed a Team of 20 offshore developers and handover, take over work items from the team
Senior Hadoop Developer, 08/2018 - 07/2019
Wells Fargo Golden, CO,
  • Collaborated with cross-functional development team members to analyze potential system solutions based on evolving client requirements.
  • Gathered requirements and designed solutions to migrate GP SQL scripts to Spark using Scala.
  • Implemented Data quality checks in Spark. Triggered alerts for any invalid records in source files and stopped the process without proceeding further as per requirements from the client
  • Optimized Spark jobs and configurations as per changing data size for SLA adherence.
  • Validated huge sets of data in Hadoop against existing GP system by defining unit test cases using gpfdist process and made sure all functionalities are covered
  • Developed a Python application that parses the bxml job and provides list of tasks, flow of tasks as visual diagram. This helped the team to get a holistic view of the entire flow
  • Migrated all the Spark jobs and Greenplum jobs to snowflake due to cost cutting plan by client management
  • Migrated Pig jobs & Pig UDF’s to Snowflake using SnowSQL and JavaScript functions
  • Used Airflow to orchestrate the jobs
Big Data Developer, 12/2016 - 07/2018
Cognizant Technology Solutions Hillsboro, OR,
  • Led all the development initiative as Subject Matter Expert and primary point-of-contact for project management staff.
  • Gathered requirements and designed solutions to migrate Hive scripts to Spark using Scala
  • Migrated the hive tables from text storage to Avro storage to handle changing requirements and eliminate the downstream dependencies
  • Designed and developed fault tolerant Python application to trigger jobs based on data from SQL server using Python and Bash
  • Written SQL queries and developed SSIS packages to trigger the jobs
  • Optimized GP queries for exporting data from Hadoop to GP using gpfdist, which gave 300% performance improvement
  • Wrote bash scripts to automate certain tasks like, application reset and data generation etc for unit testing, data quality testing
  • Developed Spark applications in Java to replicate the logic implemented in RDBMS to Spark using RDDs, DF transformations and Spark-SQL
  • Implemented resource queues in Yarn to manage the resources for a client
  • Developed Bash scripts to automate the process of moving the files to a HDFS location as soon as the files are placed in landing location and developed Oozie workflow to trigger the data load process once the data is available for a client using Oozie-datasets
  • Mentored a team of 6 .Net developers to deliver the tasks on Hadoop technologies
  • Developed Spark Jobs in Java to distribute the data in hive to downstream applications
Hadoop Administrator and Developer, 01/2016 - 11/2016
89 Degrees City, STATE,
  • Implemented, developed, and tested installation and update of file servers, print servers, and application servers in all departments.
  • Understand the business requirements, SLA and propose a solution with estimates for the work items
  • Installed and configured Hadoop using Hortonworks distribution platform including Kerberos security and enabled Ranger auditing for hive and HDFS
  • Defined ACLS with Ranger for enhanced security of data in Hive, HDFS
  • Developed Shell scripts and Python scripts to perform data quality checks and Embodied a feature to get the processed record count, rejected records count for every data load and auto send this information in an email to their database team
  • Designed data flow and data warehouse architecture for processing raw data using hive and developed Hive scripts to process the data
  • Implemented High availability for Hadoop services like HDFS, YARN, Hive metastore, Hiveserver2, Ranger, Oozie and PostgreSQL database that stores metadata of various Hadoop services like Hive, Oozie, Ranger
  • Built a tool to automate the Hadoop installation process using Python and bash scripts
Bachelors(B.E Hons): Electrical And Electronics , Expected in 05/2015

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy


Resumes, and other information uploaded or provided by the user, are considered User Content governed by our Terms & Conditions. As such, it is not owned by us, and it is the user who retains ownership over such content.

How this resume score
could be improved?

Many factors go into creating a strong resume. Here are a few tweaks that could improve the score of this resume:


resume Strength

  • Formatting
  • Length
  • Personalization
  • Target Job

Resume Overview

School Attended


Job Titles Held:

  • Senior Data Engineer
  • Senior Hadoop Developer
  • Big Data Developer
  • Hadoop Administrator and Developer


  • Bachelors(B.E Hons)

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

*As seen in: