Reliability & Validity Testing

Analyses and report by:

Jordan Vossen, MS (Psychology PhD student); Ekaterina Burduli, PhD; Celestina Barbosa-Leiker, PhD; Washington State University, Spokane, WA. Fall 2017 - Spring 2018



Data was used from a large project where clinicians were assessed using the MICA tool over a two-year period. Because the original dataset included multiple sessions for each clinician we anticipated that the correlation between sessions within the same clinician to be highly related, thus leading to inflated correlations and reliability. Therefore, we created two data sets. The first one consisted of the original data that includes all sessions (including repeated sessions with the same clinician; N = 2274) and second one included a sample dataset that reflected only one session for each clinician (N = 1178). Overall, results show only minor differences in the correlations between the two datasets. The results below are for the analyses on the full original dataset.

Results Summary:

Correlations between the 7 global scores (5 MI Intentions items and 2 Strategic Response items) revealed that these items are highly related. Correlations between these scores are statistically significant, range from 0.66 to 0.86, and are all in the positive direction. This suggests that, for example, as a clinician becomes more proficient in responding to Change Talk, they also demonstrate more proficiency in intentions to express empathy. Overall, these findings suggest that high scores on one of the 7 global items is related to high scores on each of the other 7 global items.

The correlations between the 7 global scores and the number of questions the clinician asks are small, ranging from -0.11 to -0.004, only 4 out of the 7 items are statistically significant, and none of them are substantial. Overall, this suggests that the number of questions a clinician asks likely unrelated to the 7 global measures. The correlations between the 7 global scores and the number of reflections a clinician makes range from 0.30 to 0.57 and are all statistically significant. This suggests that higher numbers of reflections are related to greater proficiency in the 5 Intention items and the 2 Strategic Responding items. For example, increases in the number of reflections is related to increases in proficiency in responding to Sustain Talk or intentions to support autonomy and activation.

Reliability analyses between the items revealed that the items have good internal consistency. The alpha for the 7 items altogether, the 5 intention items, and the 2 Strategic Response items ranged from .85 to .96, suggesting good internal consistency for the scales.

Convergent Validity


Six experienced MICA coders independently coded six sessions that have been coded and publically posted by MITI experts. Two of these sessions were considered low in MI congruence, two were considered medium, and two were considered high. Coders were instructed to use the MICA manual and code each session objectively. Both MITI and MICA scores were collected and compiled for analysis.

Results Summary

We first recoded the MITI total scores from Low, Medium, and High into numeric values such that sessions that were categorized as low corresponded to 3 (the middle value for the MICA low range), medium corresponded to 6 (middle value for MICA med range), and high corresponded to 9 (middle value for MICA high range). For the MICA total scores, we calculated the average total MICA score from each of the six coders for each of the six different sessions.

To assess the convergent validity of the four subscales of the MICA and the MITI that were comparable (Sustain Talk, Change Talk, Partnering, and Empathy), we used the original scores that were given for the MITI and the average scores across the six coders for the MICA.

The correlation between the MICA and MITI total scores revealed that these two measures are highly related and, thus, have good convergent validity, r = 0.99, n = 6, p < .001. These findings suggest that higher scores on the MICA are associated with higher scores on the MITI.

Correlations between the four sub-measures of the MICA and MITI measures can be found in the Table 1. Correlations between each of the sub-measures revealed that three out of the four measures were highly related and had good convergent validity (correlations ranged from .95 to .98 and were statistically significant at p < .001). The correlation between the MICA Sustain Talk sub-measure and the MITI Sustain Talk sub-measure was not statistically significant (r = .47, p>.05). Similarly, the correlations between the MICA Sustain Talk sub-measure and the rest of the MICA and MITI sub-measures were not statistically significant (r ranging from .46 to .52, p >.05) (see Table 1 below). This could have occurred because there was one session (Overuse of Directing) where the MICA coders averaged to a 1.3 for Sustain Talk but the equivalent MITI score was a 4. The rest of the sessions line up but because we have such a small sample size that one off session could affect the correlation quite a bit.

Interrater Reliability


To assess interrater reliability of the MICA measure, we calculated the Intraclass Correlation Coefficient (ICC) between six coders for six sessions, independently coded. Streiner and Norman (1995) have determined that an ICC above .75 is considered an acceptable interrater reliability, with idea IRR reaching at least .90.

Results Summary

For each of the sub-measures of the MICA (Sustain Talk, Change Talk, Partnering, Empathy, Evoking, Support, Guide, and RQ Ratio) and the total MICA score the ICCs ranged between .985 and .994 with 95% confidence intervals ranging from .953 to .999, which indicates excellent interrater reliability (see Table 2).