Within-case analysis

A very useful book by Mahoney and Goertz (A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences, 2012) makes a distinction between within-case analysis and cross-case analysis. EvalC3 is designed primarily to facilitate cross-case analysis. But to get the maximum value from this kind of analysis it is important that it is well informed at two different stages by within-case analysis:

  1. At the planning stage: When selecting what attributes to include in a data set and to make use of when analysing that data , either through the use of EvalC3  or other methods such as QCA  or using Decision Tree algorithms. Ideally the selection of which attributes to investigate in terms of their possible relationship to which outcomes, would be informed by some prior notion or theory of what might be happening, rather than random choice. The development of those views is likely to be enhanced by familiarity with the details of the cases that are making up the data set.
  2. After the cross-case analysis:  When good prediction rules have been found and modal (i.e. representative) cases have been identified. Once modal cases have been selected (see Selecting Cases) they can be put to use in various ways:
    1. As illustrative examples of the results predicted by the model (True Positives), and or incorrect results (False Positives) . Here within-case inspection is an important to verify if the attributes of the case in the data set are a correct description of the actual modal case i.e. a measurement validity check
    2. As sources of causal explanations. The examination of individual cases should provide much more detailed information which could shed light on what (if any) causal mechanisms are at work that make the prediction work.
    3. As sources of complementary information which could disprove causal explanations that are developed.These could include confounders, i.e.a background factor that is a cause of both the attributes in a model and the associated outcome

Steps to take to identify and test likely causal mechanisms

There are four types of cases that can be selected for more in-depth inquiries about any underlying causal mechanisms that may be at work.

  1. Cases which exemplify the True Positive results, where the model correctly predicted the presence of the outcome. Look within these cases to find any likely causal mechanisms connecting the conditions that make up the configuration. Two sub-types would be useful to compare:
    1. Modal cases, which represented the average characteristics of cases in this group, taking all attributes into account, not just those within the prediction model.
    2. Marginal cases, which represent those which were most dissimilar to all other cases in this group,apart from having the same prediction model characteristics.
  2. Cases which exemplify the False Positives, where the model incorrectly predicted the presence of the outcome.There are two possible explanations that could be explored:
    1. In the False Positive cases there are one or more other factors that all the cases have in common, which are blocking the model configuration from working i.e. delivering the outcome
    2. In the True Positive cases there are one or more other factors that all the cases have in common, which are enabling the model configuration from working i.e. delivering the outcome, but which are absent in the False Positive cases.
  3. Cases which exemplify the False Negatives, where the outcome occurred despite the absence the attributes of the model. There are two types of interest here:
    1. There may be some False Negative cases that have all but one of the attributes found in the prediction model. These cases would be worth examining, in order to understand why the absence of a particular attribute that is part of the predictive model does not prevent the outcome from occurring. There may be some counter-balancing enabling factor at work, enabling the outcome.
    2. Where a data set has many missing data points it is possible that a number of cases have been classed as FNs because they missed specific data on crucial attributes that would have otherwise classed them as TPs. In these circumstances it would be worth investigating the incidence of missing data on each of the attributes of a good performing model, and then scanning FN cases for those which have many of the necessary attributes but where the data on the others is missing.
    3. Where models have been developed by using QCA it is possible and likely that some cases with the expected outcome are not covered by any of the “solutions” (aka models). By default these will fall into the False Negative category. They should be subject to particular attention because it is possible that the attributes that predict this outcome are outside the data set. They can only be discovered by doing a within-case investigation of these uncovered cases.
  4. Cases which exemplify the True Negatives, where the absence the attributes of the model is associated with the absence of the outcome
    1. There may cases here with all but one of the model attributes. These can be found using the column sort facility in Excel.  If so then the missing attribute may be viewed as an INUS attribute i.e. an atribute that is Insufficient but Necessary in a configuration that is Unnecessary but Sufficient for the outcome (See Befani, 2016). It would then be worth investigating how these critical attributes have their effects by doing a detailed  within-case analysis of the cases with the critical missing attribute.
    2. Caveat: INUS status cannot be claimed for an attribute if the same configuration with all but one essential model attributes can also be found in the False Negatives group of cases (i.e. where the outcome is present).

The cases that fit each of the four types can be seen in the “View Cases ” worksheet. The same worksheet will include a measure of case similarity (i.e Hamming Distance).This measure is the average similarity of the selected case with all other cases of the same type e.g. all True Positives. Note that a low value on Hamming Distance means a high level of similarity.

Postscript: When looking within individual True Positive cases in order to find causal mechanisms at work it may be of value to look at particular attributes in the model. Tweaking of a model, by selectively removing and replacing one attribute at a time, will show which attributes make the biggest difference to the model’s overall performance. It is these attributes which should be of particular interest when looking for casual mechanism at work within a TP case.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s