Log- Linear Models


This paper looks at the relationship between three variables in assessing whether they are independent or associated in explaining magazine subscriptions.  Overall, there is a statistically significant relationship between the three variables- that is, the main effects model implies the variables are not independent of each other.  In addition, three academic papers are examined to understand log linear models in more detail.

  • Introduction

This paper will perform two tasks.  First, a statistical analysis will be conducted on a data set using log linear regression. The regression will explore the relationship between count data of three different variables. Second, the paper will look at three academic papers to see how the loglinear model is used in different contexts.

  • Data Analysis

The data set under consideration, demo.xls, has approximately 20 variables and 6400 observation. However, because the question under consideration is more limited in scope, a three factor analysis, there were only three variables selected for analysis: 1) Newspaper subscription; 2) Income; 3) Response.  Regarding how the variables are composed, income is continuous (minimum 9; maximum 1114), while newspaper subscription and response are dichotomous variables (0/1).

In order to analyze the data set, a general log linear model analysis was selected on SPSS with a backward elimination set.  There were a total of three variables included in the model-  the main pre-analysis diagnostics are listed in appendix A.  Because a backward modeling specification was used, the program searched for different combinations of the three variables examined based on statistical significance.  The main models examined are in section B- the main effect model of the three variables was significant; the interaction models pertaining to the three models were found to be insignificant.

  • Literature Review

Overall, there were three academic articles reviewed for this paper.  Gokcekus et al. create a log linear model based on assessing whether more upsets occur in men’s or women’s college basketball games.  In the model, the dependent variable (upset) is dichotomous (0/1) based on the difference in ranking between the two teams playing: if the lower ranked team wins (ranking is exogenous to model), it is considered and upset; if the higher ranked team wins (ranking is exogenous to model) it is not considered an upset.  The model  tests a number of independent variables: 1) RPI Difference – the difference in RPI ranking between the two teams; 2) Gender (based on a dichotomous variable on game); 3) Top- Three Scorers (differences between the three top players) ; 4) Freshman (difference in number freshman between the two teams); 5) Seniors (difference in number of seniors between the two teams). In the model, only two of the variables were significant: gender (.0371) and rpi difference (.0430).   Based on these results, the authors found there are more upsets in men’s games, and thus, there may be more stability associated with being a woman in sports and corporate endeavors.

Vesiten et al. (2010) explores how road safety experts conceive of cost-benefit analysis (CBA).   For the study, the authors asked 83 road safety decision makers from different countries in Europe regarding the use of cost-benefit analysis.  After an initial question was asked regarding the use of cost-benefit analysis, a follow-up question was asked gauging the level of confidence in the expert’s answer- a hallmark of information reference testing (IRT).  The authors employed homogeneity and logit analysis to figure out that a higher level of assurance (and comfort with cost-benefit analysis) was associated with economists; a lower level of comfort was associated with non-economists (Vesiten et al., 2010).

Finally, Tanner and Young (1982) put forth a new model to discuss the methodology for analyzing the structure of disagreement.  The authors create a model with two components: The first component serves as a proxy of chance (probability); the second component serves as a represents disagreement among raters (Tanner & Young, 1982).  Different from the previous two models, the first dealing with economics and the second dealing with survey data, this article deals with log models in a psychological context.  The article builds a model based on establishing whether there is agreement or disagreement in the model- that is, whether the model is independent or not.



Gokcekus, O., Godet, A. & Ramsey, H.  (2010). Are women more predictable than men? Applied Economics 42 (1), 641-645.

Tanner, M.A. & Young, M.A. (1985). Modeling Ordinal Scales Disagreement. 98(2), 408-415.

Veisten, K., Elvik, R. & Bax, C. (2010).  Assessing conceptions of cost-benefit analysis among road safety decision-makers: misunderstanding or disputes?

Appendix A

Data Information
Cases Valid 6400
Out of Rangea 0
Missing 1
Weighted Valid 6400
Categories Income 1108
News 2
Response 2
a. Cases rejected because of out of range factor values.
K-Way and Higher-Order Effects
K df Likelihood Ratio Pearson Number of Iterations
Chi-Square Sig. Chi-Square Sig.
K-way and Higher Order Effectsa 1 4431 32219.747 .000 124694.405 .000 0
2 3322 1346.898 1.000 1460.629 1.000 2
3 1107 186.123 1.000 178.648 1.000 4
K-way Effectsb 1 1109 30872.849 .000 123233.776 .000 0
2 2215 1160.775 1.000 1281.982 1.000 0
3 1107 186.123 1.000 178.648 1.000 0
df used for these tests have NOT been adjusted for structural or sampling zeros. Tests using these df may be conservative.
a. Tests that k-way and higher order effects are zero.

b. Tests that k-way effects are zero.

Appendix B 

Step Summary
Stepa Effects Chi-Squarec df Sig. Number of Iterations
0 Generating Classb Income*News*Response .000 0 .
Deleted Effect 1 Income*News*Response 186.123 1107 1.000 4
1 Generating Classb Income*News, Income*Response, News*Response 186.123 1107 1.000
Deleted Effect 1 Income*News 709.267 1107 1.000 2
2 Generating Classb Income*Response, News*Response 895.389 2214 1.000
Deleted Effect 1 Income*Response 402.006 1107 1.000 2
3 Generating Classb News*Response, Income 1297.395 3321 1.000
Deleted Effect 1 News*Response 49.503 1 .000 2
2 Income 26213.419 1107 .000 2
4 Generating Classb News*Response, Income 1297.395 3321 1.000
a. At each step, the effect with the largest significance level for the Likelihood Ratio Change is deleted, provided the significance level is larger than .050.

b. Statistics are displayed for the best model at each step after step 0.

c. For ‘Deleted Effect’, this is the change in the Chi-Square after the effect is deleted from the model.

Goodness-of-Fit Tests
Chi-Square df Sig.
Likelihood Ratio 1297.395 3321 1.000
Pearson 1354.494 3321 1.000