HRP 259 Introduction to Probability and Statistics for Epidemiology - 2007/2008 Edition

This course aims to provide epidemiologists and clinical researchers with a firm grounding in the foundations of probability and statistical theory.  The course emphasizes conceptual understanding rather than a “black box” approach or rigorous mathematical proofs.  Students will be exposed to software to do statistical analysis on data sets but will not receive formal instruction on data management.

 

Specific topics that will be discussed include:  random variables, expectation, variance, probability distributions, the Central Limit Theorem, sampling theory, hypothesis testing, confidence intervals; correlation, regression, analysis of variance, nonparametric tests, and an introduction to least squares and maximum likelihood estimation. 

 

There will be an emphasis on analysis of biomedical data.

Contact information

 

Professor

Teaching Assistant(s)

Raymond R. Balise, Ph.D.
Redwood Bldg. T213D, MC 5092 
Stanford, California  94305-5405 

balise at stanford 
Voice (650) 724-2602 
Fax (650) 725-6951

Lamiya Sheikh

 

 

 

lamiyas at stanford

 

 

Prerequisites

A comfortable knowledge of Windows XP/Vista, Mac OS or UNIX.

Lectures

Monday and Wednesday 4:15-5:45 in M108B in the medical school.  The classroom is by the Café in medical school (close to Lane library and below the computer lab) and the map is here:

http://lane.stanford.edu/graphics/maps/learningspaces_map.pdf

Office Hours

By appointment in Redwood Building T213D.  Directions can be found here:  www.stanford.edu/~balise/FindBalise.htm

Newsgroup

If you would like to ask a question or help others, please visit the course newsgroup which is named:  su.class.hrp259. While not truly required for the class, you will suffer if you don’t have access to the news.  If you do not know how to subscribe to a newsgroup and you use Windows http://www.stanford.edu/services/email/config/thunderbird/newsreader/pc/ or a Mac http://www.stanford.edu/services/email/config/thunderbird/newsreader/mac/.  Screenshots of my setup can be found here: www.stanford.edu/class/hrp223/2007/newsgroup.ppt 

Readings

Biostatistics: A Methodology for the Health Sciences (2nd Edition) by Gerald van Belle, Lloyd Fisher, Patrick J. Heagerty, Thomas Lumley http://www.amazon.com/Biostatistics-Methodology-Sciences-Probability-Statistics/dp/0471031852/ref=pd_bbs_sr_1/102-7869045-8398536?ie=UTF8&s=books&qid=1190400480&sr=8-1 

Optional Books

The Little SAS Book for Enterprise Guide 4.1 at SAS: http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=61054

SAS for Dummies http://www.amazon.com/SAS-Dummies-Computer-Tech/dp/0471788325/ref=pd_bbs_sr_1/102-7869045-8398536?ie=UTF8&s=books&qid=1190400573&sr=1-1

Grading

Assignments and Grading:

Class Participation                          10%

Homework                                      30%

Take-Home Midterm                       20%

Take-Home Final Exam                   40%

 

Grading Policy:

1. If I grade on a curve, the class mean will be set at a B+.  I generally do not give A+’s.

2. For students needing to use the pass/fail option (e.g., medical students), you will need to achieve a grade of B or higher to pass this course.

 

Homework: 

A short problem set will be due at the beginning of most class sessions.  Problem sets will be graded as:

ˇ         2 points = excellent (completed, mostly correct)

ˇ         1.5 points = satisfactory (completed, missing some concepts)

ˇ         1 point = incomplete (not finished or poor effort, but made some attempt)

ˇ         0 points = not handed in

 

You must not violate the computer virus policy below. 

Turning in Homework and Viruses

Any student that sends me a software virus (or any other malicious code) will fail the course.  There will be no exceptions made.  Therefore, you are strongly advised to download the latest version of the Symantec AntiVirus definitions and check your files prior to sending me any email.  If you need virus protection check here http://www.stanford.edu/services/ess/ and you can download the software for free.  If you have any questions about how to update your virus definitions, ask!

Late Policy

Each of the assignments will be due at the beginning of class on the day specified. Assignments will be downgraded 50% each 24 hours the assignment is late.

Computer Platforms

Many of the assignments will require you to work with a statistical analysis package.  You can choose your tools but I will be showing examples using SAS/Enterprise Guide.  I can provide good support for SAS on Windows, and some support for R or S-Plus.

Core Lecture Material

Sept 24 – A Gentle Introduction to Statistical Modeling

  • This will be a broad overview of the entire class (and really the entire HRP 259 – 261 series).
  • Generalized Linear Models
  • Error
  • What are statistics and parameters?

 

PowerPoint slides are here.

van Belle chapters 1 and 2 

Assignment 1 is here.

Sept 26 – Data

  • What are variables?
  • Graphical and Numeric summaries of categorical data
  • Graphical and Numeric summaries of continuous data
  • Problems with data summaries

 

PowerPoint slides are here.

van Belle Chapter 3 especially 3.3.2 through Note 3.7

Assignment 2 is here.

 

October 1 – Statistical Programming

  • Commonly used software packages
  • An introduction to SAS Enterprise Guide
  • An introduction to R and Rcmdr

 

PowerPoint slides are here.

Sample datasets and EG projects are here.

October 3 – Introduction to Distribution Theory

  • Empirical vs. theoretical distributions
  • The normal distribution
  • The central limit theorem 

 

PowerPoint slides are here.

van Belle Chapter 4 especially 4.1-4.2 4.4-4.5

 

October 8 – Introduction to Inference (Meeting in M108 in the Medical School)

  • Standard Error
  • Probability functions
  • Hypothesis testing
  • Type 1 and Type 2 errors
  • Intro to Power
  • Tails

 

PowerPoint slides are here.

van Belle Chapter 4 and 5 especially 4.6 5.8.

Sample datasets and EG projects are here.

Assignment 3 is here and is due before class starts on October 10th.

 

October 10 –Inference take 2

  • Alpha vs beta (again)
  • The t distribution
  • Comparing means with unknown variance
  • How comparing three means with ANOVA works

 

PowerPoint slides are here.

Sample datasets and EG projects for alpha and beta discussion are here.

Sample R code for alpha and beta discussion is here.

Sample datasets and EG projects for t-tests are here.

 

October 15 – Comparing Groups

  • More about comparing means
  • Nonparametric analyses
  • ANOVA

 

PowerPoint slides are here.

van Belle Chapter 8 (especially 8.1-8.6, 8.9), 10 (but don’t panic over the sigma notation).

 

EG project to do the primary analysis in van Belle Chapter 10 is here.

EG project to do nonparametric analysis is here.

EG project to do One-way ANOVA is here.

 

The midterm is here and will be due on 10/28.

 

October 17 – Comparing Groups 2

  • More about comparing means
  • Two-way ANOVA
  • R vs SAS
  • Interactions

 

PowerPoint slides are here.

van Belle Chapter 10

 

EG project to do Two-way ANOVA is here.

Generalized Anxiety Disorder data is here.

Memory data is here.

Oct 24 – Probability (bonus lecture) - staring Lamiya

PowerPoint slides are here.

Oct 29 – More on comparing two or more groups – staring Dr. Kristin Sainan

PowerPoint slides are here.

Oct 31 – Linear Models

  • Indicator variables for ANOVA
  • Predicting with a continuous variable instead of categorical
  • The regression line
  • Maximum likelihood and minimizing errors
  • Residuals
  • The math of ordinary least squares
  • Hypothesis testing for with t-tests vs. F-tests

 

 

PowerPoint slides will be here.

Mortality Enterprise Guide project is here. (Edited for Nov5)

Mortality Excel file is here.

 

van Belle Chapter 9.

Nov 5 – More on LM, Regression/Correlation

  • Confidence intervals on regression lines
  • Partitioning variance
  • Strength of association
  • Bivariate distributions
  • Covariance
  • Pearson’s R
  • Pathological correlations
  • Polynomials
  • Interpretation of correlation
  • Cautions on correlations

 

PowerPoint slides are here.

van Belle Chapter 9 and 11.

Nov 7 – Multiple regression

PowerPoint slides are here.

Polynomial Enterprise Guide project is here.

Nov 12 – Discrete Probability Distributions (with Kristin Sainani)

PowerPoint slides are here.

Nov 14 The Binomial Distribution (with Kristin Sainani)

PowerPoint slides are here.

Nov 26 Odds Ratios and Chi-square (with Kristin Sainani)

PowerPoint slides are here.

Nov 28/Dec 3 Analyzing Categorical Data

  • Analyzing binomial data and proportions with SAS
  • Analyzing Risk Differences with SAS
  • Analyzing Odds Ratios with SAS
  • Analyzing Relative Risks with SAS
  • Chi-Square with SAS
  • Visualizing contingency tables with R
  • Common assumptions for contingency tables
  • Beyond RR.  The NNT
  • Why mess with Odds Ratios
  • Screening and Diagnostic testing with SAS
    1. Sensitivity
    2. Specificity
    3. Predictive value of positive
    4. Predictive value of negative
  • McNemar’s Test
  • Kappa

 

PowerPoint slides are here.

Code for binomial data is here.

Code for heart disease data is here.

Code for AZT and cat scratch favor data is here.

Sensitivity, Specificity, PVP, PVN is here.

A bunch of the code from Fleiss’ categorical analysis book can be found here.

 

Van Belle Chapter 6.1-6.4 and Chapter 7.1-7.4

Dec 3 Higher Order Contingency Tables

  • An analysis plan for contingency tables
  • Cocran Mantel-Haenszel - proc freq
  • Breslow Day - proc freq /cmh
  • 2xN tables: dose response
  • Nx2
  • N:N:N tables
  • logistic theory vs OLS regression
  • odds vs. probability
  • logits
  • unconditional  logistic regression - proc logistic

 

PowerPoint slides are here.

Van Belle Chapter 6.1-6.4 and Chapter 7.1-7.4

 

Dec 5 Summary and Q&A Free-for-all

PowerPoint slides are here.  These are essentially the slides from the first lecture so you may want to save a tree and not print them.

 

The final exam is here with a dataset for problem 1 (here) and is due Dec 11 before 10 PM.