H3ABioNet

Pan African Bioinformatics Network for H3Africa

Introduction to Biostatistics for Genome Wide Association testing

Workshop training materials are available at: http://training.h3abionet.org/biostats_pop_gen_2015/

Workshop venue: Institut Pasteur de Tunis, Tunis, Tunisia

Workshop dates: 16th March to 26th March, 2015

Workshop organisers: H3ABioNet and the Institut Pasteur de Tunis H3ABioNet Node

Workshop Applications open: 18th December, 2014

Workshop Applications close: 18th January, 2015

Workshop Selection notification date: 27th January, 2015

Link to workshop application form:http://surveys.h3abionet.org/biostats2015/

Workshop Sponsors: H3ABioNet

Workshop Overview: The "Introduction to Biostatistics for Genome Wide Association testing" workshop provides an introduction to the basic concepts in biostatistics required for genome wide association studies, e.g. data types, experimental design, statistical tests, probabilities, R, etc.  During the second week, basic statistical concepts relevant to population genetics and genome wide association tests will be introduced to build foundational knowledge for genome wide association studies. The workshop will consist of a series of theory and lectures during the morning sessions and “hands on” practical lab sessions during the afternoon periods.

Intended Audience: This workshop is aimed at H3Africa project members who will be involved in the design and analyses of population data and genome wide association testing, and those who need statistical skills for their research.

Participation: Open application for H3Africa consortium members and partners with selection, giving priority to those with demonstrated need for this training. A total of 18 participants from all the applications received will be selected.

Syllabus and Tools: Participants will learn about Biostatistics and R, population genetics

Objectives: After this workshop participants should be able to:

  • Determine which statistical tests are appropriate given their data types
  • Conduct hypothesis testing and correct for multiple hypothesis testing
  • Determine what types of sampling there are and the effective sample size required
  • Have a good familiarity with R and conducting statistical tests within R
  • Have a good background on the theory behind population genetics
  • Be able to conduct a variety of population genetics statistical tests

Workshop limitations: This workshop will only provide a foundation for continued learning in biostatistics and population genetics and will not make you an expert in biostatistics, population genetics and genome wide association testing and studies.

Trainers and support staff:  Dr. Jean-Baka Domelevo Entfellner,  Prof. Ahmed Rebai, Dr Najla Kharrat, Dr Amel Ghouila, Dr Kais Ghedira, Prof. Alia Benkahla, Mohamed Alibi
 
Prerequisites:
Participants will need to work through the following resources to enable them to gain the most from the workshop:
http://www.ee.surrey.ac.uk/Teaching/Unix/
http://www.r-tutor.com/r-introduction

Workshop Application: All potential applicants must complete the application form and upload a motivation letter from their supervisor. Incomplete applications will NOT be reviewed. The successful applicants for the workshop will be contacted to complete an airline ticket booking form, a short biosketch with a recent picture and the H3ABioNet workshop policy. Participants will also be requested to come to the workshop with a poster describing their research.

H3ABioNet will cover the costs of the a return economy air ticket to Tunisia, accommodation on a sharing basis from the night of the 15th of March till the night of the 26th of March, 2015 and all meals from the 16th of March till the 26th of March 2015. H3ABioNet will not cover the costs of visa fees or airport transfers from your local area or any other costs. Successful applicants have the option to upgrade their accommodation to a single room basis and bear the difference in cost from a double room.

Please note, if a participant is unable to attend this workshop after acceptance, their place will be passed onto to applicants on the waiting list and not to other recommended members from their H3Africa programme.

Workshop Program:

Time

Topic

Trainer

16th March 2015

8:00

Registration and Introductions

 

8:30

Pre-workshop evaluation

 

9:00

Why do we use statistics? To deal with incomplete data or partial knowledge.
Type of data: numerical (discrete/continuous: integer/real), ordinal, categorical.
Relationship between a sample and the underlying population.
Sample statistics vs "true" underlying distributions and parameters.

Dr. Jean-Baka Domelevo Entfellner

10:30

Tea break

 

11:00

Validity of a statistical analysis == transferrability to the underlying population of results computed on a sample.
Basic descriptive statistics: mean of a sample, median, quantiles, sample variance, standard error. Graphical representations: histograms, box-and-whisker plots.

Dr. Jean-Baka Domelevo Entfellner

1:00

Lunch

 

2:00

Manipulating data with R: data types (modes), data structures (vectors, matrices, data frames).
Univariate (vector) and mutlivariate data (data frames).
Calling functions in R.
First built-in functions (length, dim, mean, sd, summary, etc).
Getting help with R.

Dr. Jean-Baka Domelevo Entfellner

3:30

Tea Break

 

4:00

Selecting a subset of the data according to one or several criteria. Applying a function to data margins (vectorized operations).
Categorical variables in R (factors).
Graphical representations: histograms, piecharts, box-and-whisker plots, scatterplots.
Superimposing graphs.

Dr. Jean-Baka Domelevo Entfellner

6:00

Workshop End

 

17th March 2015

9:00

Introducing the theory of probability distributions. Random variables. Describing a probability distribution: density, cumulative distribution function, quantile function.
Discrete/continuous distributions. Example of a discrete distribution: the binomial. Continuous distributions: uniform and normal distributions.
Normal distributions: location and scale parameters, standard normal distribution and transformation operations.
Importance of the normal distribution: Central-Limit Theoreom.

Dr. Jean-Baka Domelevo Entfellner

10:30

Tea Break

 

11:00

Estimation: estimate parameters of the underlying distribution from sample values.
Confidence intervals (e.g. on the mean).
Bivariate data, correlation measures (Pearson correlation coefficient, Spearman correlation coefficient).
Introduction to study design (epidemiology/clinical). The different types of studies. Careful selection of the sample and the variables to capture: bias, miulticollinearity, confounding.

Dr. Jean-Baka Domelevo Entfellner

1:00

Lunch

 

2:00

Reading data from files into R, outputting to files.
Random generation functions.
p-, d-, q- and r- functions associated to a theoretical distribution.

Dr. Jean-Baka Domelevo Entfellner

3:30

Tea Break

 

4:00

Normal distributions: playing with location and scale parameters, standard normal distribution and transformation operations. Approximate convergence of binomial and chi-square distributions to the normal. Uniform distribution.
"Seeing" the Central-Limit Theorem.
Scatterplots, correlation measures, confidence intervals.

Dr. Jean-Baka Domelevo Entfellner

6:00

Workshop End

 

18th March 2015

9:00

The framework of hypothesis testing: null and alternative hypotheses, Type I and Type II errors, threshold values, p-value. Power of a test.
Parametric tests: testing for the mean of a sample (Z-tests, T-tests -- the Student distribution), testing for the equality of means (paired or unpaired T-tests), for homeoscedasticity (F-tests).

Dr. Jean-Baka Domelevo Entfellner

10:30

Tea Break

 

11:00

Nonparametric tests (signed test, Wilcoxon signed-rank tests, Kruskal-Wallis test).
Fitting distributions: Shapiro-Wilk normality test, Kolmogorov-Smirnov tests

Dr. Jean-Baka Domelevo Entfellner

1:00

Lunch

 

2:00

Choosing the right test according to the experimental question and setting. All these built-in test functions in R.

Dr. Jean-Baka Domelevo Entfellner

3:30

Tea Break

 

4:00

Controlling the inbuilt test functions in R via argument passing. Interpreting their output.

Dr. Jean-Baka Domelevo Entfellner

6:00

Workshop End

 

19th March 2015

9:00

Bayes' rule. The Bayesian viewpoint: prior and posterior probabilities. Monte Carlo methods to sample the Omega space.
Contingency tables and hypothesis testing on categorical data: Chi-square test, Mac Nemar's test, Fisher's exact test.
Odds ratios from contingency tables.

Linear and logistic regression models, R² measures
ANOVA: intra-categotry variance, inter-category variance.

Dr. Jean-Baka Domelevo Entfellner

10:30

Tea Break

 

11:00

Estimating parameters of complex models, not expressed analytically: likelihood function.
Maximum-likelihood estimates, Maximum a posteriori estimates.
An iterative method to get these: Expectation-Maximisation algorithm.

Dr. Jean-Baka Domelevo Entfellner

1:00

Lunch

 

2:00

(g)lm function for linear or generalized model fitting. Interpreting the output of a regression analysis. Graphical representation of the model fitting.
Overlaying regression lines.

Dr. Jean-Baka Domelevo Entfellner

3:30

Tea Break

 

4:00

ANOVA methods.
Maximum likelihood calculations on phylogenetic models.

Dr. Jean-Baka Domelevo Entfellner

6:00

Workshop End

 

20th March 2015

9:00

Multiple hypothesis testing and procedures for the correction of p-values: FDR, Bonferonni, Benjamini-Hochberg.

Dr. Jean-Baka Domelevo Entfellner

10:30

Tea Break

 

11:00

High-dimensional dataset analysis. Principal Component Analysis and dimension reduction.

Dr. Jean-Baka Domelevo Entfellner

1:00

Lunch

 

2:00

Installing and loading additional R packages.
Using FactoMineR package to perform PCA. Analysis of the output of a PCA.

Dr. Jean-Baka Domelevo Entfellner

3:30

Tea Break

 

4:00

Example of a gene expression dataset (yeast activity during alcoolic fermentation in wine processing).

Dr. Jean-Baka Domelevo Entfellner

6:00

Workshop End

 

21st March 2015

9:00

Recap of topics – Open question and answer session

Dr. Jean-Baka Domelevo Entfellner

10:30

Tea Break

 

11:00

Assessment / assignment

Dr. Jean-Baka Domelevo Entfellner

1:00

Lunch and Workshop Ends

 

22nd March 2015

                                                                     Free day

23rd March 2015

9:00

Basics of genetics: Mendelian inheritance, multiple-factor inheritance, Population genetics : organization of genetic variation, Hardy Weinberg principle, haplotypes and linkage disequilibrium, quantitative genetics

Prof.Ahmed Rebai

 

 

10:30

Tea Break

 

 

 

11:00

Genetic drift and natural selection, population stratification / structure and admixture concepts.

Prof. Ahmed Rebai

1:00

Lunch

 

2:00

Data formats and files, Estimating allele/genotype frequencies, Hardy –Weinberg tests, F and D statistics (using R functions and packages)

Prof. Ahmed Rebai, Dr Najla Kharrat

3:30

Tea Break

 

4:00

Structure plots and K-means clustering for various population stratification and relatedness (using R functions and packages)

Prof. Ahmed Rebai, Dr Najla Kharrat

6:00

Workshop End

 

24thMarch 2015

9:00

Molecular markers and Study designs for association studies: population-based: case-controls, cohorts and family-based: trios, nuclear families, pedigrees, Testing genotypic and allelic association for single markers, correcting for population stratification (genomic controls, structured association, Mixed models)

Prof. Ahmed Rebai

10:30

Tea Break

 

11:00

Haplotype inference from family and population genotypes; Hapmap project, measures of linkage disequilibrium (LD); power/sample size computation in association testing

Prof. Ahmed Rebai

1:00

Lunch

 

2:00

R functions and packages for testing association, haplotype inference, estimating LD, use of Plink package

Prof. Ahmed Rebai, Dr Najla Kharrat

3:30

Tea Break

 

4:00

Power/sample size calculations (analytical and empirical) using R function and packages, interpretation of results

Prof. Ahmed Rebai, Dr Najla Kharrat

6:00

Workshop End

 

25thMarch 2015

9:00

Testing association with continuous phenotypes, Testing genetic association of by logistic regression, estimating unadjusted/adjusted odds ratios, population attributable risk, Bayesian testing of association , methods for SNP tagging

Prof. Ahmed Rebai

10:30

Tea Break

 

11:00

GWAS, Testing association of multiple SNPs: using unphased genotypes (logistic regression), combining single locus tests, haplotype-based tests; correction for multiple testing (Bonferroni correction, FDR, permutation), other complicating factors: missing genotypes, epistasis, gene-environment interaction, imputation

Prof. Ahmed Rebai

1:00

Lunch

 

2:00

Testing multiple SNPs association using Plink and R packages, calculating risk associated to SNPs

Prof. Ahmed Rebai, Dr Najla Kharrat

3:30

Tea Break

 

4:00

Computer tools for SNP tagging and genotypes imputation or testing association in presence of missing data

Prof. Ahmed Rebai, Dr Najla Kharrat

6:00

Workshop End

 

26thMarch 2015

9:00

Challenges in design and analyses of GWAS: subject ascertainment, marker selection, analysis and interpretation, validation and replication, rare variants, intermediate phenotypes, clinical translation of GWAS findings, integrating information from GWAS

Prof. Ahmed Rebai

10:30

Tea Break

 

11:00

Question session

Prof. Ahmed Rebai

1:00

Lunch

 

2:00

Possible complementary practical session if some exercises were not finished in previous days or need further explanation

Prof. Ahmed Rebai

3:30

Tea Break

 

4:00

Assessment and workshop survey completion form.

Prof. Ahmed Rebai

6:00

Workshop end