Detail and goal-oriented data scientist, analytics professional who quickly gains a deep understanding of a company's mission and using skills and strategic perspective to effectively increase its profitability. Passionate about using data to driven decisions and effectively communicating actionable insights. Obtained profound knowledge in various data analytical techniques and programming language and statistical tools. Strongly motivated professional with a proven ability to meet deadlines possesses intrapersonal and strong communication skills. My professional philosophy is working and growing together. 8 years of working experience in data analysis and statistical modeling with extensive use of SAS & R
Proficient in R-Commander, R-Studio and Base SAS, SAS/Macro, SAS/Stat, SAS/Graph, SAS/SQL.
Expert in Hypothesis testing, ANOVA, and Linear and Logistic Regression Analysis.
Adept in Factor analysis, Decision Trees, clustering (K-means/Hierarchical, DBscan) techniques.
Accomplished in Text Analytics using NavieBayes classification method using R-Studio.
Experienced in causal and mechanistic analysis of the given scenario to recognizing key performance indicators (KPI).
Proficient in Survey Design, Questionnaire Design, Design of Experiment and Conjoint Analysis. BI and Visualization : Tableau Desktop 8.3, R, Python and SAS
Databases : MS SQL Server, MS Access, and MySQL
Operating Systems : Windows
Other Tools : MS Office (including Excel, Word, PPT, and Access)
Statistical Tools : R-Studio, Numpy(Python), Base SAS, SAS/Macros, SAS/Graph, SAS/Stat, SAS
Factor Analysis, Logistic Regression, Text Analytics (Naive Bayes), Decision Trees, and Cluster (K-means/ Hierarchical), Forecasting/Time Series Analysis (ARIMA model)
Capable to work in time series analysis using ARIMA model.
Adept skill both in structured programing in SAS Data stage and dynamic programing using SAS Macros.
Extensive experience of using advanced statistical Procs like ANOVA, GLM, UNIVARIATE etc.
Deep understanding of Statistical Modeling, Multivariate Analysis and Standard Procedures.
Familiar with model testing, problem analysis, model comparison and validation.
Familiar with a large number of SAS functions and SAS data step options.
Accustomed with shell scripting to handle SAS files and manage SAS program.
Strong understanding of Data Warehousing concepts like Fact Tables, Dimension Tables, Star and Snow Flake Schema, Metadata and Data marts.
Familiar in collecting data from various database and cleaning data for statistical analysis and model.
Proficiency in using SQL to manipulate data, query expressions, join statements, subquery etc.
Proficient in Python scripting.
Worked in stats function with Numpy, visualization using Matplotlib and Pandas for organizing data.
Used Ski-kit packages in Python for predictions.
Proficient in Boosting Algorithms such as Gradient Boost (make powerful Predictions), Adaboost (adaptive boost) and XgBoost.
Used Dimensionality Reduction methods such as PCA (Principal component Analysis), Factor Analysis etc.
Implemented bootstrapping methods such as Random Forests (Classification), K-Means Clustering, KNN (K nearest neighbors), Naïve Bayes, SVM (Support vector Machines), Decision Tree, Linear and Logistic Regression Methods.
Considerable understanding of RDBMS (Relational Database management system)- OLAP, OLTP and query via T-SQL.
Knowledge of basic construct of HDFS (Hadoop File Distribution System), Map Reduce and use of tools like DMX-h for operations on a Hadoop Cluster.
Profound analytical and problem solving skills along with ability to understand current business processes and implement efficient solutions to problems.
Ability to present complex data and analytics to non-analytical audience.
Detailed oriented professional, ensuring highest level of quality in reports & data analysis.
Advanced written and verbal communication skills.
Expert in innovation and formulation of new ideas and predictive models.
Proven ability of multi-tasking to engage with stake holders at various levels to process data at large scale (Big Data) with enterprise systems.
Member of Analytics club.
Data Scientist10/2015 to CurrentWells FargoOakland, CA
Project description: This project is intended to carry out to identify and predict who can be the defaulters of credit loan based on the applications received.
To reach this goal, we used our previous customer's information such as their demographics, work pattern, residence status, credit to debt ratio, and previously defaulted or not and so on.
Segmentation and Logistic regression are the techniques that has been used for creating the probability list of defaulters from the new list.
Responsibilities: Involved in gathering requirements while uncovering and defining multiple dimensions.
Extracted data from one or more source files and Databases.
Participated in continuous interaction with Marketing and Finance teams for obtaining the data and data quality.
Accomplished multiple tasks from collecting data to organizing data and interpreting statistical information.
Unearthed the raw data by doing the Explanatory Data Analysis (Classification, splitting, cross-validation).
Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
Utilized various techniques like Histogram, Bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.
Conducted data exploration (Dplyr, TidyR) to look for trends, patterns, grouping, and deviations in the data to understand the data diagnostics.
Designed various reports using Pivot - tables, and different charts like Bar plot, Pie plot, Histograms etc.
Identified the financial and non-financial independent attributes that were to be used in modeling.
Developed segmentation trees (Optimization, Pruning, Modelling) to find out high risk segment of the population.
Achieved multi-dimensional segmentation analysis to discover business rules and finalize the segmentation procedure.
Used Logistic Regression to obtain the probabilities for non-defaulters and defaulters.
Identified Key performance indicators (KPI's) among all the given attributes.
Executed what if scenario analysis to discover effective, implementable ways of reducing loan defaults.
Maintained a log of all the iterations performed in R during the data modeling process.
Fashioned scoring model to score propensity of loan applicants to default with high degree of accuracy in capturing defaulters.
Created an ROI dashboard on campaigns spending and measuring its efficacy.
Led med-sized teams for production support and handle multiple tasks with strong interpersonal communication, technical aptitude and learning skill to adapt to environment.
Environment: R, SQL-Server, Microsoft Excel and Tableau.
Research Scientist11/2014 to 09/2015Lincoln UniversityOakland, CA
Project is to setup large scale experiments to test hypothesis and reporting for quantitative analysis and other application platforms.
Perform fundamental Machine Learning analysis and methods, distributed data analysis, streaming data analysis, automated efficient research model selection.
Responsibilities: Collected a database of the proposed research project.
Accumulated raw data and filtered to RDBMS.
Performed Chi-Square test, ANOVA test to identify significance between data samples.
Performed classification, clustering and Time series analysis in collaboration with Research faculty.
Contributed to development in identifying grants and funding opportunities for projects as well as maintain grant life cycle Coordinating with management and diverse academic and technical staff identifying challenges and developing appropriate strategies for maintenance and generating a platform for projects across diverse domains (ie; Finance, Marketing, IT research).
Co-ordinated with Research faculty team in identifying and developing trends in Business Research and sustainability of long term projects.
Created research report of Projects.
Environment: - R, SQL, MS EXCEL.
Data Scientist08/2013 to 10/2014Snagout
05/2011 to 04/2013Financial ReportingSan leandro, CA
Project description: Project was to analyze customer behavior and trends in online shopping and design a market analysis model.
Analyze the complex patterns towards data driven decision process in terms of generating revenue as well as to stay top in trends of ecommerce.
Responsibilities: Collected Database of sales of items in all aspects.
Cleaned, filtered and transformed data to specified format.
Prepared the work space for Markdown.
Accomplished Data analysis, statistical analysis, generated reports, listings and graphs Instigated the Test Analysis to understand the potentiality of insurer.
Embedded code ie; Weaved code narrating to single doc format, rendering the doc to create a finished output.
Customized the process and open the door for automated targeted reporting.
Responsible for all data reporting, data mining activities and fraud detection activities including data prep and design, model development and reporting results.
Used R to identify product performance via Classification, tree map and regression models along with visualizing data for interactive understanding and decision making.
Find outliers, anomalies, trends and fraudulent behavior.
Using a combination of R and No SQL models and analysis and deployed the same in real time.
Customized R code chunks, labelling and reusing code chunks.
Used to Forecast and ARIMA model for time-series analysis of customer behavior and purchase.
Provided insights on effectively running of marketing campaigns including direct mail, email, mobile and other digital channels.
Documented all programs and procedures to ensure an accurate historical record of work completed on assigned project as well as to improve quality and efficacy Produced quality reports for business team and business data manager.
SEO optimization to stay top in search of web search.
Environment: R, SAS/Macros, SQL, No SQL, MS Excel, MS Access, Tableau.
Smart Edge software ltd.
Data Analyst08/2010 to 05/2011Kolkata Bhubaneswar, India
Project description: Smart Edge Software offers financial products and services for corporate and institutional clients (HDFC, ICICI etc.).
Services include sales, trading, research and origination of debt and equity.
My role here was to create core custom analytics tools and services for trading, sales and business analytics.
Responsibilities: Scraped both structured and structured data from desperate sources (Web page and public repositories) * Used SQL joins extensively on SQL Developer to fetch data from MS SQL database Developed multiple prepared statements, stored procedures for efficient update of database to achieve speedup Performed outlier detection analysis on data as part of Dodd Frank requirements Produced bond and attribute coverage of financial instruments present in the database on regular basis Developed modules for enhancement of time-series and term-structure functionalities Performed Principal Component Analysis using PROC FACTOR to develop logic for hedging of portfolio Used proprietary analytics library to develop exponential and cubic spline term-structures for various bond markets Used SAS to perform data mining/prescriptive analysis on bond data to identify under(over) valued bonds Developed scripts that updated time-series databases with trade data from internal trading systems and external sources (flat files for futures data, risk from dbRisk system and CSA rate from FUSE systems) Updated database using SQL queries for front-end manipulation (hide/show of various columns, markets, sectors, attributes, sources) Used Proc SQL, Proc Report, Proc Mean, Proc Freq, Proc Summary, Proc Content, Proc Tabulate extensively to create sector based reports for credit research desk Performed discussions with sales and research teams for timelines/deliverables of feature requests Performed ad-hoc analysis on trade idea for clients as well as sales team Presented application features to various audiences by reproducing trade ideas from research journals Involved in Data preparation over multiple iterations with inputs from senior analysts for the problem at hand Assisted in creating fact and dimension tables in star schema model based on requirements Implemented algorithms like Brownian Bridge Construction to interpolate missing values Created tables in Oracle database and stored rich cheap data using SAS PROC SQL Developed scripts and ad-hoc tests to ascertain data validity and correct attribute calculation Performed statistical and predictive analysis on corporate market data to identify trends, buy-sell opportunities Optimized data access by efficient SQL coding for high throughput and low latency Executed rich reports after close of business to provide users instant access to last day reports Performed correlation and time-series analysis to recommend pairs trading strategies to management Performed advanced statistical analysis like scenario analysis and back testing as per requirements Created profit and loss report for collateral desk detailing profit at counterparty level, trade level, book level and desk level granularities Environment: R, MS Excel, Tableau Desktop 8.3, PL/SQL Techno soft pvt ltd The project was to analyze and create database of all customers of a private insurance firm.
The client required all demographic, personal and occupational details of the customers of south and eastern zone of the state also the purchase type and duration of insurance.
Responsibilities: Created the Database from raw existing data.
Organized the data to required type and format for further manipulation.
Performed statistical analysis, and generate reports, listings and graphs using SAS/Base, SAS/Macros, SAS/Stat, SAS/Graph, SAS/SQL, SAS/ODS and SAS/Access.
Used different SAS procedures such as PROC REPORT, UNIVARIATE, TABULATE, FREQ, MEANS, TRANSPOSE, SUMMARY and Data _NULL_ Integrated SAS datasets into Excel using Direct Data Exchange, using SAS to analyze data, statistical tables, listings and graphs for reports.
Used SAS/ ODS to format HTML and RTF reports.
SAS macros for data cleaning, reporting and to support routing processing.
Created and maintained ad hoc SAS programs/Macros for the validation, Extraction, Presentation, manipulation, analysis, and reporting.
Used SAS/EG in multi-user environment for intermediate data manipulation, analysis and summary statistics.
Optimized existing code for efficiency and automation of SAS Programs to improve reporting efficiency.
Pull out data from the clinical database and prepare customized analysis datasets for specific reporting needs.
Transfer and migrate data from one platform to another to be used for further analysis, Extract data from Oracle ODBC and SQL pass through facility or LIBNAME method.
Responsible for the proper coding documentation and validation of SAS programs/macros/procedures to produce the standardized display.
Environment: UNIX SAS/Base, SAS/Macros, SAS/Graph, SAS/Stat, SAS/SQL, SAS/ODS, SQL Server 2100, MS Excel, MS Access.
Programmer Analyst09/2008 to 06/2009Vodafone Communications Pvt. ltdBhubaneswar, India
Introduced to the programming language C and did class project to apply the concept of programming.
Learned Data structure concept and SQL Server.
Applied the data modeling concept in dummy project using the SQL Server platform.
Learned Base SAS and implemented this knowledge in class project where concept of reading raw data, making table in SQL Library and producing report using SQL programming had been used.
Develop SQL queries for data analysis and data extraction Environment: SQL Server, MS Excel, MS Access, C.
Master's of Business Administration
Bachelor's of Science
GraduateUtkal UniversityIndia) -(2004 -2007)
academic, ad, automation, bonds, book, BI, Business Research, C, charts, Clustering, Chi, interpersonal communication, Concept, Content, credit, client, clients, Data Analysis, data mining, data modeling, data visualization, Databases, Database, decision making, Design of Experiment, dimensions, direct mail, documentation, ecommerce, Edge, email, equity, experiments, features, fetch, Finance, financial, Forecasting, futures, grants, graphs, HTML, Insurance, ie, logic, Machine Learning, Macros, market analysis, marketing, market, Access, MS Access, Microsoft Excel, MS EXCEL, Excel, Exchange, MS Office, Windows, Word, modeling, MySQL, ODBC, ODS, Operating Systems, Optimization, Oracle, Oracle database, Developer, PL/SQL, organizing, Pivot - tables, predict, producing, profit, profit and loss, coding, programmer, programming, Python, quality, quantitative analysis, RDBMS, reading, real time, rendering, reporting, Research, Retail, routing, sales, SAS, scripts, MS SQL Server, MS SQL, SQL, SQL-Server, SQL Server, statistical analysis, statistics, structured, Survey Design, Tableau, tables, trading systems, type, UNIX, validation, Web page