Chapter 4 - Generalizing from a Sample

Click here to read the chapter (link works only for UC affiliates)

Lecture Slides:    Powerpoint     PDF

Learning Objectives

  • Articulate the difference between sample coefficients (b) and population coefficients (β)
  • Develop the three steps to generalize from a sample to a population
  • Explain the classical regression assumptions
  • Connect the data type to potential assumption failures and potential relevant populations

Example

  • Google flu trends:      Wikipedia  — Details on the method  —  Epic failure  —  Coronavirus
    • Google flu trends started as an example of correlation not indicating causation (google searches do not cause the flu, they merely indicate the presence of flu).  It may be remembered as an example of correlations not being stable over time. 

What We Learned

  • Three steps to generalizing from a sample:
  1. Define your population and research goal
  2. Make assumptions about your population (and how your sample represents it)
  3. Compute statistics to measure OLS accuracy
  • When defining the population, consider scope, time period, and how the X variables are determined.
 
  • Three critical assumptions are required to generalize to a population:
    CR1: representative sample
    CR2: homoskedastic errors
    CR3: uncorrelated errors
    CR4: normally distributed errors (only needed in small samples)
    CR5: values of the right-hand-side variables are exogenous (only relevant if doing causality analysis)
     
  • ​​​​​​Knowing your data type tells you which assumptions are likely to fail and which populations are potentially relevant,