Confusing Stats Terms Explained: Internal Consistency
Internal consistency refers to the general agreement between multiple items (often likert scale items) that make-up a composite score of a survey measurement of a given construct. This agreement is generally measured by the correlation between items.
For example, a survey measure of depression may include many questions that each measure various aspects of depression, such as:
Assuming the items are worded appropriately and asked of an appropriate sample, we would expect that each of these items would correlate with each of the other items, since they are all indicators depression (see correlation matrix below).
To the extent that this is true, internal consistency would be high, giving us confidence that our measure of depression is reliable (see alpha above, explanation of Cronbach's Alpha to come).
However, if an item is poorly worded or does not belong in there at all, the internal consistency of the scale could be threatened. For example, if we replaced the question about Lethargy in our measure of depression with the new question below, our internal consistency is likely to be threatened.
- Loss of interest in activities (X1)
- Negative Mood (X2)
- Weight Loss/Weight Gain (X3)
- Sleep Problems (X4)
- Number letters in your last name (Y1)
Internal consistency is likely to be threatened because "Number of letters in your last name" is unlikely to be highly correlated with any of the other four items (see low correlation coefficients circled in image below), because it is not really an indicator of depression. Thus, replacing the "Lethargy" question with the "Number letters in your last name" question will lower internal consistency of our Depression scale and ultimately, lower the reliability of our measurement (see below, explanation of Cronbach's Alpha below).
Internal consistency is typically measured using Cronbach's Alpha (α). Cronbach's Alpha ranges from 0 to 1, with higher values indicating greater internal consistency (and ultimately reliability). Common guidelines for evaluating Cronbach's Alpha are:
- .00 to .69 = Poor
- .70 to .79 = Fair
- .80 to .89 = Good
- .90 to .99 = Excellent/Strong
…if you get a value of 1.0 then you have "complete agreement" (i.e. redundancy) in your items, so you likely need to eliminate some. Items that are in perfect agreement with each other do not each uniquely contribute to the measurement in the construct they are intended to measure, so they should not both be included in the scale. Occasionally, you may also see a negative Cronbach's Alpha value, but this is usually indicative of a coding error, having too few people in your sample (relative to the number of items in your scale), or REALLY poor internal consistency.
If Cronbach's Alpha (i.e. internal consistency) is poor for your scale, there are a couple ways to improve it:
- Eliminate items that are poorly correlated with other items in your scale (i.e. "Number letters in your last name" item in previous example)
- Add highly reliable items to your scale (i.e. that correlate with existing items in your scale, but are not redundant with items already in your scale)
As always, I hope this is helpful and please let me know if you have questions in the comments! What stats terms do you find confusing?