Are dummy variables quantitative

Analysis of cross-sectional data. Regression with dummy variables

Transcript

1 Analysis of cross-sectional data regression with dummy variables

2 Why is the following sessions doing? Date Lecture Introduction Examples Research designs & data structures Variables Continuous variables Bivariate regression Control of third-party variables Multiple regression Statistical inference Significance tests I Significance tests II.0.06 Specification of the independent variables Specification of the regression function Heteroscedasticity Categorical variables Regression with dummy variables Logistic regression 2

3 structure. Repetition: categorical variables 2. Dichotomous categorical variables a. as independent variables b. as dependent variables 3. polytomes categorical variables 3

4 Definition: categorical variables Categorical variables are characteristics that have a limited number of characteristics (categories). Variables with a large number of characteristics do not count as categorical variables. If these measurements are based on a continuous property, we will refer to them as continuous variables. 4th

5 Example: Public administration Survey of public administration employees in a large western German city (n = 60, mabt60.dta) Income Monthly net income in DM Supervisor position yes / no Highest general education secondary school / secondary school leaving certificate / technical college entrance qualification / university entrance qualification 5

6 Example: categorical variables survey of public administration employees in a large western German city (n = 60, mabt60.dta) Income Monthly net income in DM Supervisor function yes / no Highest general education secondary school / secondary school leaving certificate / technical college entrance qualification / university entrance qualification 6

7 Example 2: Survey of eligible voters for the Bundestag election (n = 750, appendix4.dta) Voter participation yes / no Age in years Denomination yes / no Party preference SPD / CDU / CSU / FDP Education secondary school / secondary school leaving certificate / university entrance qualification 7

8 Example 2: categorical variables Survey of eligible voters for the federal election (n = 750, appendix4.dta) Voter participation yes / no Age in years Denomination yes / no Party preference SPD / CDU / CSU / FDP Education Hauptschule / Mittelreife / Fachhochschulreife / Hochschulreife 8

9 Definition of dummy variable For some statistical evaluations, it is helpful to know whether a unit of investigation shows a certain expression of a categorical variable or not. For this purpose, a so-called dummy variable is created with the values ​​and 0: = value is available 0 = value is not available In principle, k dummies are conceivable for a total of k values ​​for a categorical variable. In practice, however, only (k-) dummies are necessary to completely map the k values: the (omitted) k-th value can be recognized by the fact that all dummies have the value 0. 9

10 Example: Dummy variables Idnr function prefixed formation qual qual2 qual3 qual4 no 0 vs, hs no 0 vs, hs yes vs, hs no 0 fhsr yes hsr no 0 mr no 0 vs, hs::::::::: 0

11 outline. Repetition: categorical variables 2. Dichotomous categorical variables a. as independent variables b. as dependent variables 3. Polytome categorical variables

12 regression with dummy added as x monthly net income in dm fitted values ​​2

13 Regression by group monthly net income in the age of the surveyed age of the surveyed monthly net income in the fitted values ​​graphs by 3

14 Regression with dummy variables Interpretation of the parameters Regression constant Mean value of the reference group (dummy = 0) general: Group in which all x-variables are zero Regression coefficient (of the main effect) Difference to the reference group (level) Regression coefficient (of the interaction effect) Difference in the slope in Comparison to the reference group Interpretation aid Define dummy variables Subgroups Write the regression model for the different groups 4

15 Regression with dummy variables 2 T-test corresponds in the bivariate case with a dichotomous categorical independent variable to a T-test for differences in mean values ​​between two groups.F-test corresponds in the bivariate case with a polytomous categorical independent variable to an F-test for differences in mean values ​​between more than one group Separate regression models for different groups as two groups (simple analysis of variance) deliver the same results as a regression model for the total sample, if this total model contains the interaction of each independent variable with the grouping variable.Group differences can be tested by linear restrictions (Chow test) 5

16 outline. Repetition: categorical variables 2. Dichotomous categorical variables a. as independent variables b. as dependent variables 3. polytomes categorical variables 6

17 regression with dummy as y turnout by age turnout (= yes) age in years 7

18 Limits of the linear probability model voter turnout by age voter turnout (= yes) age in years voter non-linear model linear model 8

19 Linear probability model Regression model with a dummy variable as the dependent variable Interpretation Model prognoses Probability of occurrence of the dummy variable. Regression coefficients Change in probability when the x-variable is increased by one unit. Disadvantage The linear model does not guarantee that the model prognoses are in the valid value range of a probability [0,]. 9

20 outline. Repetition: categorical variables 2. Dichotomous categorical variables a. as independent variables b. as dependent variables 3. polytomes categorical variables 20

21 Polytomes Characteristic Education Income after graduation, 000,500 2,000 2,500 Avg. Net income vs, hs mr fhsr hsr

22 Categorical vs. continuous modeling Duration of training Income categorically linear 22

23 Polytome categorical variables Important k values ​​result k dummies regression uses only (k-) dummies k-th (omitted) dummy = reference group interpretation of the parameters regression constant mean value of the reference group (all (k-) dummies zero) general: group in which all x-variable zero are regression coefficient (of the main effect) Difference to the reference group (level) Regression coefficient (of the interaction effect) Difference in the slope compared to the reference group 23

24 Continuous versus categorical variables Models with categorical x-variables allow different changes in the y-variables between the values ​​of x Models with continuous x-variables always assume the same change in the y-variables when x increases by one unit 24

25 Continuous versus categorical variables 2 By means of suitable linear restrictions on the parameters, the model with a categorical variable can be transformed into a model with a linear variable. Categorical variables more general model class 25

26 Finally

27 Summary categorical variable Dummy x Dummy y Dummy variable with few values ​​Indicator for the value of a categorical variable Constant: Reference group Coefficient: Difference to the reference group Forecast: Probability y = Coefficient: Change in probability 27

28 Important technical terms German English German English Dummy Variable dummy variable Reference group reference group Interaktionseffekt interaction effect 28

29 Further Reading Wooldridge (2003) Chapter 7 (WO) discusses the use of categorical variables in linear regression. The use of dummy variables as independent variables and the linear probability model with a dichotomous dependent variable are demonstrated. 29