LiveCareer
LiveCareer
  • Dashboard
  • Jobs
  • Resumes
  • Cover Letters
  • Resumes
    • Resumes
    • Resume Builder
    • Resume Examples
      • Resume Examples
      • Nursing
      • Education
      • Administrative
      • Medical
      • Human Resources
      • View All
    • Resume Search
    • Resume Templates
      • Resume Templates
      • Nursing
      • Education
      • Medical
      • Human Resources
      • Customer Service
      • View All
    • Resume Services
    • Resume Formats
    • Resume Review
    • How to Write a Resume
    • CV Examples
    • CV Formats
    • CV Templates
    • Resume Objectives
  • Cover Letters
    • Cover Letters
    • Cover Letter Builder
    • Cover Letter Examples
      • Cover Letter Examples
      • Education
      • Medical
      • Human Resources
      • Customer Service
      • Business Operations
      • View All
    • Cover Letter Services
    • Cover Letter Templates
    • Cover Letter Formats
    • How to Write a Cover Letter
  • Jobs
    • Mobile App
    • Job Search
    • Job Apply Tool
    • Salary Calculator
    • Business Letters
    • Job Descriptions
  • Questions
  • Resources
  • About
  • Contact
  • 0Notifications
    • Notifications

      0 New
  • jane
    • Settings
    • Help & Support
    • Sign Out
  • Sign In
Member Login
  • LiveCareer
  • Resume Search
  • Statistician
Please provide a type of job or location to search!
SEARCH

Statistician Resume Example

Resume Score: 80%

Love this resume?Build Your Own Now
STATISTICIAN
Summary
10 years' statistician and 6 years' software maintenance and development, and database management experiences and 20 more publications in statistics, genetics, mathematics and computation Designed and implemented 3 software programs for genomewide analysis and deep database analysis timestamp data analysis using c, R, SQLite, and Python Proficiency in classical statistical methodologies and technologies (R, SAS, S-PLUS), including experience in the analysis of experimental data using parametric and non-parametric methods Sampling and sample size estimation, missing data handling and data quality control, survival analysis, regression and modeling, prediction and validation, analysis of variance and covariance, hypothesis test and inferences Familiar with high throughput genomic data and analysis (GeneSpring, Partek, SAMtools, VCFtools for Array and NGS data), working knowledge in high performance computing technologies like cloud computing and local cluster computing with modern scripting technologies like Python. Basic knowledge of ICH and FDA regulations (IND, NDA) and standards (CDISC, CDASH); Electronic Health Record database and data warehouse like NIH's CRIS and BTRIS systems; Strong problem solving ability; Quick to learn new technologies and efficient to extract online knowledge; Top 2% in college; GPA 4.0 and 3.92 in Master and PhD programs.
Highlights
  • R, S-plus, SAS
  • Python, Jupyter Tableau
  •  Oracle, SQL, Excel
  • C, C++, JAVA,
  • JavaScript, PERL
  •  Windows, Linux and Unix
Accomplishments
  • Developed new methodology and implemented into computer program using R and Python, and applied the program to National data and provided recommendations based on the evidence derived from the data. The methodology developed was published in the proceedings of Knowledge Discovery and Data Mining (KDD) 2016 Conference at San Francisco.
  • Performed statistical methods in medical research to help investigators from gene selections, bioinformatics searching, bio-marker validation to the establishment of a small Bio-Tech company for breast cancer prediction. The researches involved in this project have been resulted in several publications and grants.
  • Theoretical research in Algebraic Topology in which I completely solved a problem that had been actively worked by many mathematicians in the world for more than two decades. Two publications resulted in this area of research.
​
Experience
Statistician02/2015 to CurrentNational Institute of HealthRockville, MD
  • Collaborative research for batched timestamp data analysis The research focus on Case Status Change Model for data from the Social Security Administration (SSA)'s Case Process and Management System (CPMS) Improved the existing method by introducing constrained optimization in the method.
  • Proposed new batch combination procedure to improve the batch model estimation Established the asymptotic property of the match based estimator Improved the existing method by introducing the optimal batch searching algorithm.
  • Developing some different methods for the batched timestamp data analysis.
  • Implemented the methods into a program using R and applied to real data from SSA's two offices up to 2014.
  • Improving the program for using additional information.
  • Improving the program for large datasets by parallel computing in R Working on SSA's 2012 and 2013 datasets (more than 200 offices, about 15 Million rows with 31 columns and 112 Million rows with 10 columns respectively) Collaborative research for multidimensional data analysis with application in SSA data Multidimensional clustering analysis including connectivity-based such as Hierarchical Clustering, Centroid-based clustering like kmeans, Distribution-based clustering like mclust, and density-based clustering such as dbscan.
  • Multidimensional regression and classification including Naïve Bayes, Logistic, Support Vector Machine, CART, Random Forest, Discriminant analysis etc.
  • Model selection, validation and ROC analysis High dimensional reduction and visualization PCA, t-SNE etc.
  • Familiar with EHR data structure and database system like NIH's CRIS and EHR data warehouse system like BTRIS Collaboration research in Survey data analysis, Item Response Theory and Computerized Adaptive Testing with application in Functional measures for disability.
  • Collaboration research in NLP with application in SSA'a disability application processing.
Statistician07/2014 to 01/2015
  • High Dimensional Predictive modeling Developed a variable transformation method for high dimensional predictive modeling Implemented the method into a program using R Applied the method to Tox21 data by participated in the 2014 Tox21 Challenge Competition High Performance Computing Set up and tested small network connected cluster using laptop, desktop in windows and Linux Tested Online Statistical and Computing Services using the network cluster.
Biostatistician01/2006 to 06/2014National Human Genome Center Statistical Genetics and Bioinformatics Unit Howard UniversityWashington, DC
  • Collaborative research for breast cancer, from gene selection based on microarray data, to Real Time PCR data validation and biomarker confirmation (four publications and two grants).
  • The researches focus on breast cancer with benign.
  • The research results in some prominent caner related biomarkers, and a new bio-tech company was recently established based on those biomarkers.
  • I performed most of the statistical analysis, including study design, power and sample size determinations, hypothesis test, data analysis such as quality control, sample comparisons (t-test, Chi-square, Wilcoxon Rank test, Kolmogorov-Smirnov GOF test Kruskal-Wallis Rank Sum test etc.), model selection, model fitting and model predictions, regression, ANOVA, cluster analysis, survival analysis, multivariate analysis, ROC curve analysis, and validation analysis.
  • The software programs applied include S-PLUS, R, SAS, Excel, GCOS, GeneSpring etc.
  • Database searched: Gene Expression Omnibus (GEO), Gene Ontology (GO), Clinical Trial database (ClinicalTrial.gov).
  • The data sources and types: The samples in the researches were collected from clinical centers and hospitals through physicians and other investigators following the HHS's regulations (45 CFR 46).
  • Data types include Affymetrix microarray expression data, Real-Time PCR data, Image data etc.
  • Designed and implemented a program using R and SQLite to perform automated deep database analysis for NIH's GEO microarray database.
  • Symposium presentation and manuscript preparing).
  • This is a program that uses two NIH GEO database related R packages and combined with the GEO database itself to find the most relevant genes for any user specified disease.
  • It analyzes each dataset in the database by automatically finding the designed variables and using a uniform statistical procedure to analyze the gene expressions based on all these variables.
  • It combines all the results from each dataset based on all the variables to form a combined score for each gene, which can be used to select the most relevant genes, with any given disease, like breast cancer.
  • Currently I am working to implement a database supported Web Application using big data computing technology such as Hadoop based products like Hive, H2O, ec2 with Python, Ruby, Rails, AWS for web application technologies.
  • Formulated the relationship between identical-by -state (IBS) and identical-by-descent (IBD) for sib-pairs and implemented it as a computer program using c to do linkage analysis based on sib-pair sharing with applications for SNP and Microsatellite genome-wide data (one publication).
  • I found a linear transformation matrix between IBS and IBD distributions which can be used to find the unobservable IBD from observable IBS for sib-pairs.
  • The program implemented by using this IBS-IBD transformation matrix is fast as it only needs some frequencies.
  • Simulated data with flexible settings can be generated by the program for testing and validation.
  • It can be modified to analyze next-generation sequence data.
  • Collaborative research for case-control gene expression data analysis implemented in R (one publication).
  • This program provides a gene selection method that takes the advantage of t-test and Rank test and overcomes their disadvantages due to unpredictable gene expression distributions.
  • I participated in the algorithm development and implemented it in R to perform testing and validation in terms of comparisons with well-known similar programs for power, sample size and type I error calculations with various settings by simulations.
  • Collaborative research for genetic network analysis for gene expression data implemented in R (one publication) This research is focus on gene-gene interaction network analysis by using a technique in image processing.
  • I participated in the algorithm development and implemented it in R to perform testing and validation in terms of comparisons with well-known similar programs for power, sample size and type I error calculations with various settings by simulations.
  • Collaborative research for nonparametric genetic association study implemented in SPLUS (two publications) These researches combined both DNA sequence frequencies and DNA haplotype block structures to study genome-wide association analysis for individual or family case-control data.
  • I participated in the algorithm development and implemented it in SPLUS to perform testing and validation in terms of comparisons with well-known similar programs for power, sample size and type I error calculations with various settings by simulations.
  • Collaborative research for facial landmark analysis implemented in R (manuscript submitted) This clinical research is to study the effects on new born babies by mother's drug use during pregnancy.
  • The data was collected from a Hospital, but due to the complexity of the data and the lack of a competent statistician, the research had been waiting for a long time to be effectively and completely analyzed.
  • After I was involved in the research, I wrote an R program to do all the data management and analysis, including about 100 sub-graph drawings and corresponding analyses.
  • Within a few months, the manuscript is ready to submit.
  • Statistical support for investigators and medical students in their researches, grant applications, dissertations and theses in terms of study designs, power and sample size determinations, data analysis such as quality control, sample comparisons, model selection, model fitting and model predictions, regression, ANOVA, cluster analysis, survival analysis, multivariate analysis, morphometric analysis and time series analysis with all kind of data types and databases such as clinical trial data, spatial data (facial and geographic ), time-dependent data, Real-Time PCR data, SNP data, Microsatellite data, Microarray data, Next-generation sequencing (NGS) data, GEO, dbGaP, TCGA databases.
  • Knowledge on High Performance Cluster Computing, setup and running a cloud system to test deep statistical analysis using Hadoop ecosystem and R with web-applications for modeling, prediction and machine learning analysis; Research on data loading methodologies for Genomic database into Hadoop's HDFS so that big data deep analysis can be more efficiently performed.
  • Currently I am working on NIH's GEO database loading into H2O, which uses Hadoop, R and JSON to do math (and statistics) online, and perform deep analysis on a virtual cloud computing network at home.
  • Teaching medical students (statistics and segregation analysis) Knowledge of clinical trial study, including CRF design and review, study design, data collection and analysis, Code of Federal Regulations, compliance training (HIPPA) and other requirements as in ClinicalTrial.org database Knowledge of survey research, including questionnaire design, sample selection, survey methodology comparison, survey study design, survey data analysis and survey finding report and dissemination.
01/2002 to 01/2003National Center for Biotechnology Information (NCBI), National Institutes of Health NIHBethesda, MD
  • Special new technology training course for Database Management and Bioinformatics.
01/2002Pittsburg Supercomputer Center Pittsburg, MD, Bioinformatics and Computation Workshop
01/2001 to 06/2006National Human Genome Center Statistical Genetics and Bioinformatics Unit Howard UniversityWashington DC
  • Scientific programmer Software application development and maintenance.
  • Maintenance of a genetic analysis software in c, LEX and YACC Develop, implement, and evaluate new algorithms for the software Participated in the design and implement in the specification file of the software using LEX and YACC Database management.
  • Participated in a SQL database management system for investigators using MySQL, SQLite.
  • Participated in data processing (using shell scripting, SED, AWK, Perl), quality control assurance (outlier, missing and invalid value checking etc.), data analysis (R, S-PLUS, SAS, Excel, etc.), and presentation and report to management and general users.
  • Provide data services using internal and external databases to support management and general users.
  • Database query analysis for fine tuning performance Web application Participated in creating database supported web applications using C++, JAVA, JavaScript, VB, VBScript, HTML, ASP, and FrontPage etc.
01/1999 to 01/2000InfoAge Systems, IncRockville, MD
Lecturer III08/1997 to 12/2000Howard UniversityWashington, DC
  • Teaching Algebra II, Pre-Calculus, Calculus Developed curriculum/syllabus and administered lessons for classes of 60-100 students.
  • Advised, assisted and evaluated performance of diverse group of students.
  • Participated in departmental meetings and attended weekly seminars for research.
  • Collaborative research in vector bundle decomposition (one publication) We formulate the general "Obstruction Classes" that obstruct the vector bundle decomposition.
  • Those Obstruction Classes lie in some Cohomology Groups with coefficients in Homotopy Groups of some Grassmann Manifolds on which the Fundamental Group acts.
  • We are the first to demonstrate a geometric interpretation of a Cohomology group with "twisted" integer as coefficients.
  • Prior 1997, has been worked as an assistant professor.
Education
Ph.D: Mathematics Computer Science1997Howard UniversityWashingtonMathematics Computer Science 3.92/4.00
Skills
application development, ASP, AWK, big data, C, C++, Calculus, cancer, clinical research, Clustering, Chi, data management and analysis, database analysis, data analysis, data collection, data processing, data validation, data warehouse, databases, Database, Database Management, database management system, database Software, DNA, DHTML, XML, fast, focus, FORTRAN, FrontPage, Functional, GCOS, grant applications, grants, HTML, image processing, Image, interpretation, JAVA, JavaScript, JSON, Linux, machine learning, Mathematica, math, meetings, Excel, Windows, modeling, MySQL, NLP, network analysis, network, Next, Operating Systems, optimization, Oracle, PCR, PERL, programmer, Programming, publications, publication, Python, quality control, Real Time, Real-Time, Research, SAGE, SAS, Scientific, SED, seminars, shell scripting, specification, SQL, Statistical Analysis, statistics, Symposium, Tableau,Teaching, type I, Unix, validation, VBScript, VB, web-applications, web applications
Build Your Own Now

DISCLAIMER

Resumes, and other information uploaded or provided by the user, are considered User Content governed by our Terms & Conditions. As such, it is not owned by us, and it is the user who retains ownership over such content.

Resume Overview

Companies Worked For:

  • National Institute of Health
  • National Human Genome Center Statistical Genetics and Bioinformatics Unit Howard University
  • National Center for Biotechnology Information (NCBI), National Institutes of Health NIH
  • Pittsburg Supercomputer Center Pittsburg, MD, Bioinformatics and Computation Workshop
  • InfoAge Systems, Inc
  • Howard University

School Attended

  • Howard University

Job Titles Held:

  • Statistician
  • Biostatistician
  • Lecturer III

Degrees

  • Ph.D : Mathematics Computer Science 1997

Create a job alert for [job role title] at [location].

×

Advertisement

Similar Resumes

View All
Statistician-resume-sample

Statistician

GlaxoSmithKline Vaccines(formerly Novartis Vaccines)

Apex, North Carolina

DATA-ANALYST/-STATISTICIAN-resume-sample

DATA ANALYST/ STATISTICIAN

Tennessee Valley Authority

Wolcott, Connecticut

Senior-Statistician-and-Assistant-Professor-resume-sample

Senior Statistician and Assistant Professor

NYU School of Medicine

Bayside, New York

About
  • About Us
  • Privacy Policy
  • Terms of Use
  • Sitemap
Help & Support
  • Work Here
  • Contact Us
  • FAQs
Languages
  • EN
  • UK
  • ES
  • FR
  • IT
  • DE
  • NL
  • PT
  • PL
Customer Service
customerservice@livecareer.com
800-652-8430 Mon- Fri 8am - 8pm CST
Sat 8am - 5pm CST, Sun 10am - 6pm CST
  • Stay in touch with us
Site jabber winner award

© 2021, Bold Limited. All rights reserved.