spark developer resume example with 11+ years of experience

Montgomery Street, San Francisco, CA 94105 609 Johnson Ave., 49204, Tulsa, OK
Home: (555) 432-1000 - Cell: - - -
Overall 8 Years of experience in IT Industry including 5+Years of experience as Hadoop/Spark Developer using Big data Technologies like Hadoop Ecosystem, Spark Ecosystems and 2+Years of Java/J2EE Technologies and SQL.
*Hands on experience in installing, configuring and using Hadoop ecosystem components like HDFS, MapReduce Programming, Hive, Pig, Yarn, Sqoop, Flume, Hbase, Impala, Oozie, Zoo Keeper, Kafka, Spark. *In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLib *Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC). *Experienced in writing MapReduce programs in Java to process large data sets using Map and Reduce Tasks. *Experience in using Accumulator variables, Broadcast variables, RDD caching for Spark Streaming. *Worked on HBase to perform real time analytics and experienced in CQL to extract data from Cassandra tables. *Hands on experience in various big data application phases like data ingestion, data analytics and data visualization. *Expertise in using Spark-SQL with various data sources like JSON, Parquet and Hive. *Experience in usage of Hadoop distribution like Cloudera 5.3(CDH5,CDH3), Horton works distribution & Amazon AWS *Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP. *Experience in creating tables, partitioning, bucketing, loading and aggregating data using Hive. *Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD. *Experience in working with flume to load the log data from multiple sources directly into HDFS. *Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka *Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java *Experience in NoSQL Column-Oriented Databases like Hbase, Cassandra and its Integration with Hadoop cluster. Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data. *Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume. *Involved in Cluster coordination services through Zookeeper. *Good level of experience in Core Java, J2EE technologies as JDBC, Servlets, and JSP. *Hands-on knowledge on core Java concepts like Exceptions, Collections, Data-structures, Multi-threading, Serialization and deserialization. *Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP. *Developed web application in open source java framework Spring. Utilized Spring MVC framework.
  • Languages :
  • Java/J2EE,Python,SQL,HiveQL, NoSQL, Piglatin
04/2016 to Present
Spark Developer Infosys Ltd Normal, IL,
  • Developing Spark programs using Scala API's to compare the performance of Spark with Hive and SQL.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • Used Impala for querying HDFS data to achieve better performance.
  • Implemented Apache PIG scripts to load data from and to store data into Hive.
  • Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD's.
  • Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Imported the data from different sources like AWS S3, LFS into Spark RDD.
  • Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and backup.
  • Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
  • Develop Spark/MapReduce jobs to parse the JSON or XML data.
  • Involved in HBASE setup and storing data into HBASE, which will be used for analysis.
  • Used Scala libraries to process XML data that was stored in HDFS and processed data was stored in HDFS.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Wrote different pig scripts to clean up the ingested data and created partitions for the daily data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Used Avro, Parquet and ORC data formats to store in to HDFS.
  • Used Oozie workflow to co-ordinate pig and hive scripts.
  • Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, AWS, Python, Java, JSON, SQL Scripting and Linux Shell Scripting, Avro, Parquet, Hortonworks.
07/2014 to 02/2016
Hadoop/Spark Developer Infosys Ltd Northbrook, IL,
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Application master, Node Manager, Resource Manager, NameNode, DataNode and MapReduce concepts.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in mapReduce way.
  • Good experience with NoSQL database Hbase and creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Written Mapreduce code that will take input as log files and parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Developed java code to generate, compare & merge AVRO schema files.
  • Developed complex Mapreduce streaming jobs using Java language that are implemented Using Hive and Pig.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Used hive optimization techniques during joins and best practices in writing hive scripts using HiveQL.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Writing the HIVE queries to extract the data processed.
  • Teamed up with Architects to design Spark model for the existing MapReduce model and Migrated MapReduce models to Spark Models using Scala.
  • Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Implemented Spark using Scala and utilizing Spark Core, Spark Streaming and Spark SQL API for faster processing of data instead of Mapreduce in Java.
  • Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using Spark SQL.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Hbase tables to store variable data formats of data coming from different Legacy systems.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and backup.
08/2012 to 06/2014
Hadoop Developer Alight MI, State,
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Developed Map/Reduce jobs using Java for data transformations.
  • Develop different components of system like Hadoop process that involves Map Reduce, and Hive.
  • Migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Responsible for developing data pipeline using Sqoop, MR and Hive to extract the data from weblogs and store the results for downstream consumption.
  • Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
  • Using Sqoop to extract the data back to relational database for business reporting.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
  • Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring.
  • Developed Hive queries and UDFS to analyze/transform the data in HDFS.
  • Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Debugging and identifying issues reported by QA with the Hadoop jobs by configuring to local file system.
  • Implemented Flume to import streaming data logs and aggregating the data to HDFS.
  • Experienced in running Hadoop streaming jobs to process terabytes data.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Involved in evaluation and analysis of Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
  • Environment: Hadoop, Cloudera Manager, Linux, RedHat, Centos, Ubuntu Operating System, Map Reduce, Hbase, Sqoop, Pig, HDFS, Flume, Pig, Python.
09/2010 to 08/2012
Java / SQL Developer Mindteck India Ltd City, , India
  • Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
  • Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
  • Used Web sphere Application Server for deploying the application.
  • Used Rational Application Developer (RAD) for developing the application.
  • Used SVN as version control system for the source code.
  • Created XML-SOAP Web Services to provide partner systems required information.
  • Used SOAP for the data exchange between the backend and user interface.
  • Developed user interface using JSP, HTML, CSS and Java Script to simplify the complexities of the application.
  • Developed code using various patterns like Singleton, Front Controller, Adapter, DAO, MVC Template, Builder and Factory Patterns.
  • Developed stored procedures and Triggers in PL/SQL and Wrote SQL scripts to create and maintain the database, roles, users, tables, views, procedures and triggers.
  • Utilized Java and MySQL from day to day to debug and fix issues with client processes.
  • Used JIRA tracking tool to manage and track the issues reported by QA and prioritize and take action based on the severity.
  • Wrote SQL statements Stored procedures and functions that are called in Java.
  • Used SQL queries to perform backend testing on the database.
  • Extensively used Core Java such as Multithreading, Exceptions, and Collections.
  • Hands on experience using JBOSS for the purpose of EJB and JTA, and for caching and clustering purposes.
  • Generated server side SQL scripts for data manipulation and validation and materialized views.
  • Created database access layer using JDBC and SQL stored procedures.
  • Worked on Java based connectivity of client requirement on JDBC connection.
  • Involved in analyzing system failures, identifying root causes and recommended course of actions.
  • Worked on root cause analyses for all the issues that occur in batch and provide the permanent fixes for the issues.

Environment : Java, JSP, HTML, CSS, RAD, JDBC JavaScript, Jboss, Struts, Servlets, Web Sphere, Windows XP, Eclipse, JavaScript, Apache Tomcat, EJB, XML, SOA.

06/2009 to 08/2010
SQL Developer NTT DATA City, , India
  • Involved in Business requirement gathering, Technical Design Documents, Business use cases and Data mapping.
  • Gathered requirements for the creation of Data Flow processes for the SSIS packages.
  • Worked in transferring the data using SQL Server Integration Services packages Extensively used SSIS Import/Export Wizard for performing the ETL operations.
  • Created new database objects like Tables, Procedures, Functions, Triggers, and Views using T- SQL.
  • Taking care of Database Performance issues by tuning SQL queries and stored procedures by using SQL Profiler , Execution plan in Management studio.
  • Import & Export of data from one server to other servers using tools like Data Transformation Services (DTS).
  • Involved mostly on installation, configuration, development, maintenance, administration and upgrade.
  • Created database maintenance planner for the performance of SQL Server, which covers Database integrity checks, update Database statistics and re-indexing.
  • Written stored procedures for those reports which use multiple data sources.
  • Converted the existing reports to SSRS without any change in the output of the report.
  • Extensively used Extract Transform Loading (ETL) tool of SQL Server to populate data from various data sources and converted SAS environment to SQL Server.
  • Built SSIS packages to load data to OLAP Environment and monitoring the ETL Package Job.
  • Created the automated processes for the activities such as database backup processes and SSIS Packages run sequentially using Control M.
  • Involved in Performance Tuning of Code using execution plan and SQL profiler.
  • Added Indexes to improve performance on tables.
  • Environment: MS SQL Server 2005/2008, Integration Services (SSIS), Reporting Services (SSRS), T - SQL, SQL Profiler, Data Transformation Services,.
AJAX, Apache, API, Application master, automate, backup, big data, C, C++, capacity planning, clustering, Controller, CSS, client, version control, DAO, data modeling, DTS, Databases, Database, Debugging, disaster recovery, downstream, Eclipse, EJB, ETL, XML, HTML, Web Sphere, indexing, J2EE, Java, JSP, JavaBeans, JavaScript, Java Script, JBOSS, JDBC, JSON, Latin, Linux, Logic, memory, access, C#, exchange, Windows XP, Migration, MongoDB, MVC, MySQL, NoSQL, OLAP, Operating Systems, Operating System, optimization, Oracle, Developer, PL/SQL, processes, Programming, Python, QA, RAD, RDBMS, real time, RedHat, relational database, reporting, Requirement, SAS, SDLC, servers, Servlets, Shell, scripts, Shell Scripting, Scripting, SOAP, Software development, MS SQL Server, SQL, SQL Server, statistics, strategy, Structured, Struts, Tables, Tomcat, T - SQL, T- SQL, trend, Unix, upgrade, user interface, validation, Vista, Web Servers, web server, workflow, Written

By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

Your data is safe with us

Any information uploaded, such as a resume, or input by the user is owned solely by the user, not LiveCareer. For further information, please visit our Terms of Use.

Resume Overview

School Attended

    Job Titles Held:

    • Spark Developer
    • Hadoop/Spark Developer
    • Hadoop Developer
    • Java / SQL Developer
    • SQL Developer


      By clicking Customize This Resume, you agree to our Terms of Use and Privacy Policy

      *As seen in:As seen in: