big data developer resume example with 7+ years of experience

Jessica Claire
Montgomery Street, San Francisco, CA 94105 609 Johnson Ave., 49204, Tulsa, OK
Home: (555) 432-1000 - Cell: - - -
Professional Summary
  • Around 4+ years of total IT Experience, including 3+ year's hands-on experience in Big dataTechnologies.
  • Driven Big Data Developer with experience performing complex integrations while developing code. Enthusiastic technical professional with background supporting administrators during configuration and deployment. Strong history of accuracy in deadline-driven environment.
  • Very good experience in the Application Development and Maintenance of SDLC projects using various technologies such as Java/J2EE, Scala, JavaScript, Data Structures, UNIX shell scripting etc.
  • Expertise in all components of Hadoop Ecosystem- Hive, Hue, Pig, Sqoop, Impala, Flume, Zookeeper, Oozie, Airflow, and Apache Spark.
  • Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
  • Having sound experience in Big Data Hadoop Ecosystems experience in Ingestion, storage, querying, processing and analysis of big data.
  • Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce programming paradigm.
  • Strong understanding of the entire AWS Product and Service suite primarily Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, Cloud Watch, SNS, SES, SQS and other services of products and their applicable use cases, best practices and implementation, and support considerations.
  • Experience in importing and exporting data from different RDBMS like MySQL, Oracle and SQL Server into HDFS and Hive using Sqoop.
  • Expertise in designing clustered tables on Snowflake database to improve query performance for consumers.
  • In-Depth understanding of Snowflake as SaaS cloud technology.
  • In-depth knowledge of Snowflake Database, Schema, Table structures ,Credit Usage, Multi-cluster Warehouses, Data Sharing, Stroe Procedures , UDF's in Snowflake.
  • Experience in using Snowflake Clone and Time Travel.
  • Schema and Snowflake modeling, proficiency in data warehousing techniques for data cleaning, Slowly Changing Dimension phenomenon, surrogate key assignment and CDC (Change Data Capture).
  • Good understanding of NoSQL databases and hands on experience in writing applications on NoSQL databases like HBase.
  • Experience in developing custom MapReduce programs using Apache Hadoop to perform Data Transformation and analysis as per requirement.
  • Knowledge on NoSQL databases such as HBase and creating mapping Phoenix tables in HBase to query HBase using SQLs.
  • Hands on experience on Scala language features - Language fundamentals, Classes, Objects, Traits, Collections, Case Classes, High Order Functions, Pattern Matching, Extractors etc.
  • Experience on Creating Internal and External tables and implementation of performance improvement techniques using partitioning tables, bucketing tables in Hive.
  • Experience in analyzing data using HBase and custom Map Reduce programs in Java.
  • Experience in creating PIG and HIVE UDFs using java in order to analyze data sets.
  • Experience in Spark Streaming in order to ingest real time data from multiple data sources into HDFS.
  • Experience in Design & Development, tuning and maintenance of NoSQL databases such as MongoDB, HBase, Cassandra and its Integration with Hive.
  • Worked on reading multiple data formats on HDFS using Spark API's.
  • Programming Languages: SQL, C, C++, Java, Core Java, Java 8, J2EE, Python, Scala, Pig Latin, HiveQL and Unix shell scripting
  • Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Hue, Sqoop, Storm, Kafka, Oozie, AirFlow, Spark SQL, Spark Streaming, PySpark, Flume, Zookeeper, Cassandra, Spark, Cloudera, Delta Lake, Jupyter/Notebook, Zeppelin.
  • Cloud Technologies: AWS, EC2, S3, VPC, Lambda, Redshift, EMR, SnowFlake, DBT, Databricks.
  • Databases: Oracle, MySQL, SQL Server, DB2 for Mainframes, Familiar with NoSQL (HBase, Cassandra, MongoDB)
  • Scripting & Query Languages :UNIX Shell scripting, SQL and PL/SQL.
  • Web Technologies :XML, CSS, HTML, XHTML, JavaScript, AJAX, JDBC.
  • Hadoop Paradigms :MapReduce, YARN, In-memory computing, High Availability, Real-time Streaming.
  • Operating Systems :Windows, UNIX, Linux distributions (Centos, Ubuntu), Mac OS
  • Other Tools : Eclipse, Tableau, JUnit, QTP, JIRA, QC (Quality Center)


  • Edureka trained and certified Apache Spark and Scala developer .


  • Successfully earned SnowPro Core certification.
  • Successfully earned certification on Apache Spark and Apache Hadoop with IBM Big Data University .
  • Successfully earned certification on Big data analysis with Apache Spark and Distributed Machine Learning with Apache Spark with edX BerkeleyX Sponsored by Databricks .
Work History
12/2018 to Current
Big Data Developer Cognizant Technology Solutions Coral Gables, FL,
  • Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation.
  • Responsible for developing prototypes of selected solutions and implementing complex big data projects with focus on collecting, parsing, managing, analyzing and visualizing large sets of data using multiple platforms.
  • Understand how to apply new technologies to solve big data problems and to develop innovative big data solutions.
  • Developed various data loading strategies and performed various transformations for analyzing datasets by using Cloudera Distribution for Hadoop ecosystem.
  • Worked extensively on designing and developing multiple Spark Scala ingestion pipelines both Realtime and Batch.
  • Responsible for handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on importing metadata into Hive/impala and migrated existing legacy tables and applications to work on Hadoop by using Spark, Hive and impala.
  • Work on POC's to perform change data capture (CDC) and Slowly Changing Dimension phenom in HDFS using Spark and Delta Lake open-source storage layer that brings ACID transactions to Apache Spark.
  • Extensively worked on POC to ingest data from S3 bucket to snowflake using external stages.
  • Developed generic store procedures using Snow Sql and Javascrit to transform and ingest transactional tables data into Snowflake relational tables from external S3 stages.
  • Worked on Prototype to create external function in snowflake to call remote service implemented in AWS Lambda.
  • Developed multiple POCs using Spark and deployed on Yarn cluster, compared performance of Spark, with Hive and Impala.
  • Responsible for Performance tuning Spark Scala Batch ETL jobs by changing configuration properties and using broadcast variables.
  • Worked on Batch processing for History load and Real-time data processing for consuming live data on Spark Streaming using Lambda architecture.
  • Developed Streaming pipeline to consume data from Kafka and ingest into HDFS in near real time.
  • Worked on Performing tuning of Spark Streaming Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Implemented Spark SQL optimized joins to gather data from different sources and run ad-hoc queries on top of them.
  • Wrote Spark Scala Generic UDFs to perform business logic operations at record level.
  • Developing Spark code in Scala and Spark SQL environment for faster testing and processing of data and Loading data into Spark RDD and doing In-memory computation to generate output response with less memory usage.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Worked on parsing and converting JSON/XML formatted files to tabular format in Hive/impala by using Spark Scala, Spark SQL and Dataframe API's.
  • Worked on various file formats and compressions Text, Json, XML , Avro, Parquet file formats, snappy, bz2, gzip compression.
  • Worked on performing transformations & actions on RDDs and Spark Streaming data.
  • Involved in converting Hive QL queries into Spark transformations using Spark RDDs, Spark SQL and Scala.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Installed Open source Zeppelin Notebook for using Spark Scala, PySpark, Spark SQL and Spark R API's interactively via web interface.
  • Worked on integrating Zeppelin with LDAP for multiuser support in all environments.
  • Responsible for Zeppelin Estimating resources usage and configuring interpreters for optimal use.
  • Developed workflow in Oozie to automate tasks of loading data into HDFS and pre-processing data and used Zookeeper to coordinate clusters.
  • Used Zookeeper for various types of centralized configurations.
  • Met with key stakeholders to discuss and understand all major aspects of project, including scope, tasks required and deadlines.
  • Supervised Big Data projects and offered assistance and guidance to junior developers.
  • Multi-tasked to keep all assigned projects running effectively and efficiently.
  • Achieved challenging production goals on consistent basis by optimizing .

Environment: Hadoop, Cloudera distribution, Scala, Python, Spark core, Spark SQL, Spark Streaming, Hive, HBase, Pig, Sqoop, Kafka, Zookeeper, Java 8, and UNIX Shell Scripting, Zeppelin Notebook, Delta Lake, AWS S3, AWS Lambda, Snowflake, SnowSql.

07/2016 to 12/2018
Hadoop/Spark Developer Internet Brands, Inc. Auburn Hills, MI,
  • Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS , processing and analyzing the data in HDFS.
  • Hands on experience in designing, developing, and maintaining software solutions in Hadoop cluster.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Spark Yarn.
  • Worked on POC's with Apache Spark using Scala to implement spark in project.
  • Build Scalable distributed data solutions using Hadoop Cloudera Distribution.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Developed Hadoop streaming jobs to process terabytes of Json/xml format data.
  • Developed complex MapReduce streaming jobs using Java language that are implemented Using Hive and Pig and using MapReduce Programs using Java to perform various ETL , cleaning and scrubbing tasks.
  • Developing and running Map-Reduce Jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
  • Develop code in Hadoop technologies and perform Unit Testing.
  • Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in MapReduce way.
  • Designed the ETL runs performance tracking sheet in different phases of the project and shared with Production team.
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala.
  • Involved in using SQOOP for importing and exporting data between RDBMS and HDFS.
  • Used Hive to analyze the Partitioned and Bucketed data and compute various metrics for reporting.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting & used the hive optimization techniques during joins and best practices in writing hive scripts using HiveQL.
  • Involved in developing Hive DDLS to create, alter and drop Hive tables.
  • Involved in loading data from Linux file system to HDFS.
  • Created PIG scripts to load, transform and store the data from various sources into HIVE metastore.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in managing and reviewing Hadoop log files.
  • Identify and design most efficient and cost-effective solution through research and evaluation of alternatives.
  • Demonstrated Hadoop practices and broad knowledge of technical solutions, design patterns, and code for medium/complex applications deployed in Hadoop production.
  • Ingested semi structured data using Flume and transformed it using Pig.
  • Inspected and analyzed existing Hadoop environments for proposed product launches, producing cost/benefit analyses for use of included legacy assets.
  • Developed highly maintainable Hadoop code and followed all best practices regarding coding.

Environment: Hadoop, MapReduce, Java, Scala,Spark, Hive, Pig, Spark SQL, Spark Streaming, Sqoop, Python, Kafka, Cloudera, DB2,Scala IDE(Eclipse), Maven, HDFS.

02/2016 to 10/2016
Java Developer Intern It Keysource Inc., City, STATE,
  • Involved in Various Stages of Software Development Life Cycle (SDLC) deliverables of the project using the AGILE Software development methodology.
  • Attended Daily scum meeting, sprint grooming/review and demo with management and other teams.
  • Implemented the application using Spring MVC Framework and handled the security using spring security.
  • Involved in batch processing using Spring Batch framework to extract data from database and load into corresponding Loan App tables.
  • Developed Controller Classes using spring MVC, spring AOP frame work.
  • Designed and Developed End to End customer self service module using annotation-based Spring MVC, Hibernate, Java Beans and jQuery.
  • Developed the User Interface using JSP, jQuery, HTML5, CSS3, Node JS, Bootstrap JS and Angular JS.
  • Implement functionality such as searching, filtering, sorting, categories, validating using Angular framework.
  • Used Angular directives, working on attribute level, element level and class level directives.
  • Implemented Bean classes and configured in spring configuration file for dependency injection.
  • Implemented persistence layer using Hibernate that use the POJOs to represent the persistence database.
  • Created mappings among the relations and written named HQL queries using Hibernate.
  • Design common framework for REST API consumption using Spring Rest Templates.
  • Used Design Patterns like Facade, Data Transfer Object (DTO), MVC, Singleton and Data Access Object (DAO).
  • Written SQL and PL/SQL queries like stored procedures, triggers, indexes, views.
  • Used Log4j, Junit for logging and Testing.
  • Documented all the SQL queries for future testing purpose.
  • Prepared test case scenarios and internal documentation for validation and reporting.
  • Coordinating with the QA team, and resolving the QA defects.
  • Wrote services to store and retrieve user data from the Mongo DB.
  • Worked with WebSphere application server that handles various requests from Client.
  • Deploying fixes and updates using IBM WebSphere application server.
  • Experience in developing automated unit testing using Junit, Mockito frameworks.
  • Used Git controls to track and maintain the different version of the project.
  • Reviewed code and debugged errors to improve performance.
  • Reworked applications to meet changing market trends and individual customer demands.
  • Researched new technologies, software packages and hardware products for use in website projects.
  • Worked with business users and operations teams to understand business needs and address production questions.
  • Wrote, modified and maintained software documentation and specifications.

Environment: Java, Spring MVC, Spring Batch, Hibernate, Web Services, Html 5, CSS3, Java Script, Bootstrap, MAVEN, WebSphere, Eclipse,JUnit, Mockito, jQuery, log4j, Mongo DB, Windows, Git.

01/2014 to 05/2016
Project Involved in My Coursework Florida Institute Of Technology City, STATE,


Securing Files In cloud: The project is to make a Desktop application that deals with the issues like cloud and security. Wherein it provides security to the personal files and as well as the important sharing files which are going to be uploaded to various cloud environments like OneDrive and Drobox. The application enables only specific users to login and provides an interface for the user wherein user can save different files on to the cloud. At this point by using encryption algorithm the files which we are going to upload are encrypted by a private key, so that the file in cloud is safe from the hands of hackers and other threats. At the end, in the same way the user can also download the file from the cloud and decrypt to it using the same private key which we are given at the time of uploading.


· Programming Language: Java

· Operating System: Windows

· IDE: NetBeans, Eclipse

· Cloud: Drop Box and Google Drive

· Database: MySQL 5.7

Expected in 05/2016
Master of Science: Computer Information Systems
Florida Institute of Technology-Melbourne - Melbourne, FL
Expected in 05/2014
Bachelor of Science: Computer Science And Engineering
Jawaharlal Nehru Technological University - Anantapur, AP, India,

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

Your data is safe with us

Any information uploaded, such as a resume, or input by the user is owned solely by the user, not LiveCareer. For further information, please visit our Terms of Use.

Resume Overview

School Attended

  • Florida Institute of Technology-Melbourne
  • Jawaharlal Nehru Technological University

Job Titles Held:

  • Big Data Developer
  • Hadoop/Spark Developer
  • Java Developer Intern
  • Project Involved in My Coursework


  • Master of Science
  • Bachelor of Science

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

*As seen in:As seen in: