Analysis sequence

There are two main stages:

  1. Manual testing of pre-existing hypotheses. This is done by entering  attributes hypothesized as important into the Design menu and observing the model’s performance in the Confusion Matrix
  2. Algorithmic search for better models, as described in detail below.

In both stages the overall aim is to find a model with attributes that maximise the number of True Positives (TPs) and True Negatives (TNs) and minimises the number of False Positives (FPs) and False Negatives (FNs).

With algorithmic search the overall approach is to start by searching for the simplest possible good performing models and then progress towards the more complex good performing models.

The best strategy may depend on the objective of the analysis. When trying to understand, what has happened as part of a research or evaluation exercise,  we may need to find a number of models, which as a set do the best in accounting for all the outcomes. When trying to work out what best to do next a much less comprehensive analysis may be all that is needed. We just need to find one or  more models which seems to work well, and which we can have some confidence in.

The best EvalC3 tool for identifying a comprehensive set of  models is the Decision Tree algorithm.

The rest of the advice below is oriented towards finding a smaller number of models that best account for the outcome of interest

  1. Start by searching for single attributes that are
    • Sufficient and Necessary – these are by definition unambiguous and essential
      • Set these search constraints: FP=0, FN=0.
    • Necessary but not Sufficient  – these are by definition important but there needs to be needs additional clarification re other relevant attributes that then enable the outcome to be present.
      • Set this search constraint: FN=0.
    • Sufficient but not Necessary – unambiguous and useful
      • Set this search constraint: FP=0.
  2. Then search for configurations of multiple attributes  that are:
    • Sufficient but not Necessary. This can be done using exhaustive search, evolutionary search or single cumulative search.
    • In the process you may not find any, but you may find configurations that are Not Sufficient and Not Necessary, but where the number of TPs is relatively high compared to FP and FN. In other words, the model still scores relatively well on Accuracy
  3. When sufficient good performing models have been developed to account for sufficient outcomes consider doing a sensitivity analysis of each model.
  4. Proceed to do within-case investigations after identifying relevant cases from the View Cases worksheet using the guidance provided on Selecting Cases

The end goal?

Ideally it will be possible to develop one or more predictive models that have minimal numbers of False Positives. These can be reduced through within-case analysis

While there may be some False Negatives for any given model these  cases will be covered by other models.

Ideally there will be minimal overlap of cases covered by these different models.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s