The basic steps

Although EvalC3 exists as a set of tools there is a desirable sequence of use. These take place within three main stages:

Preparation: Download EvalC3 version of Excel

  1. Go here. You will need to provide your email address so we can keep in contact with and learn about your experience of using EvalC3. After your request has been received you will be given access to a DropBox folder containing the current copy of EvalC3
  2. Make sure that Solver add-in is loaded in your version of Excel. Follow this link to see how to do this. Its fairly straightforward
  3. Make sure macros are enabled

1. Preparing the data set

  1. Open EvalC3 version of Excel
    1. This will open at the Get Started worksheet. Read this firstScreenshot 2016-02-16 14.03.17
    2. Click on Start – Input Data, which will take you to the Input Data worksheet. Clue: Don’t use the Worksheet tabs at the bottom to progress forward with the analysis. Use the blue sequence of tabs at the top. But you can use the bottom tabs to go back, without altering the current analysis process
    3. Screenshot 2016-02-16 14.03.27
  2. Import data into Excel
    1. Cut and paste the relevant rows and columns  of your data into the Input Data  worksheet, which should be the first worksheet to open
      1. Make sure you have included the column header names
      2. Clue: If you want to edit your data by adding extra columns or rows, either do it now, or even before you cut and paste the data here
    2. Click on Select Data button, which will take you to the Select Data worksheetScreenshot 2016-02-16 14.03.53
  3. Select Column Types and Choose Rows 
    1. In the Select Data worksheet select a column of data which provides an easily recognizable ID for each case, and choose ID from the drop-down menu above that column of data. Clue: This drop down menu is not visible until you click on the words Outcome, ID or Attribute in the column you are looking at
      1. By default, the left most column is automatically labelled as ID
    2. For the column of data which describes an outcome that you want to predict select Outcome from the drop-down menu above this column
      1. By default, the right most column is automatically labelled as Outcome
    3. For each of the columns of data which you would want to treat as attributes that might be predictors of the outcome, select Attribute from the drop down menu above each of these columns.
      1. By default, all other columns are automatically labelled as Attribute
    4. For all other columns of data you don’t want to include in the analysis, choose Ignore from the drop down menu above each of these columns. These columns may be other identifiers, other attributes and other outcomes, which you may want to use in later analyses. You can come back later and change the how the columns have been assigned.
    5. IF you want to select some rather than all cases to be subject of analysis do this by using the drop down menu at the end of any ID, Attribute or Outcome name.
    6. Click on Design &Evaluate, which will take you to that worksheetScreenshot 2016-02-16 14.04.32

2. Design and evaluate predictive models

  1. The default approach to building a predictive model is manual.
    1. Once the Design and Evaluate view is open look at “The Current Model” on the left side. Here you choose what values to place next to each of the attributes that are automatically listed here. The drop-down menu in the Status column provides three options: N/A meaning ignore this attribute; 1 = this attribute is present, 0 = this attribute will be absent.
      1. The default status for each attribute when this view is first opened is N/A.
    2. You also need to choose whether the Outcome is expected to be present or absent when these attributes are as described above, using the same kind of drop down menu in the Status column
    3. This combination of attribute values and the selected outcome then constitute a predictive model
    4. The performance of this model can then be seen immediately in the Confusion Matrix under the heading “Model Evaluation”, which is explained more below
    5. Click on Save above to save details of this model and its performance. You will need to save the model with a name you will recognize later.
  2. Explore alternative approaches to building a better predictive model
    1. See Search Options and Analysis Sequence on this website for more detailed information about these steps
    2. Click on Find New Models. There are three choices hereScreenshot 2016-02-16 14.12.12
      1. Using an exhaustive search, choose either
        1. Individual attributes, or
        2. Configuration of attributes
      2. Evolutionary search of individual attributes and attribute combinations
  3. IF using exhaustive or evolutionary search of combinations of attributes
    1. Choose the performance indicator: the measure that should be maximised. Although there are 18 different measures the most useful measures are highlighted with a red asterisk. For more information on these see Performance Measures. Clue: Start by using the most widely used measure: Accuracy
    2. Set constraints. These can be of two types
      1. The attributes in the model design whose values need to remain fixed,. For example,as present or absent
      2. Specific performance measures other than the one selected as the objective.
        1. In the Confusion Matrix
          1. Try setting False Positive = 0, to find Sufficient but Unnecessary attributes (or configurations of attributes)
          2. Try setting False Negative = 0, to find Necessary but Insufficient attributes (or configurations of attributes)
        2. In the list of performance measures
          1. Try setting True Positive Rate to =>50%
    3. Click Okay to implement the search
      1. If using exhaustive search, watch the process bar in order to asses if the results will be ready within the time available. If not, cancel.
  4. View the results of the search, given the settings above. 
    1. Under the Current Model view the attributes that have been selected (known as “the model”) as the best predictors of the outcome. If an exhaustive or evolutionary search has been used then the attributes listed in the design menu will have been automatically updated to reflect the most recent best model found by either of these methods
    2. View the raw results of the prediction model as shown in the Confusion Matrix under Model Evaluation. See Performance Measures for more information on how to read the Confusion Matrix.
    3.  View the performance measures derived from the Confusion Matrix, as listed further below. These are used to summarise the performance of the current model in predicting the outcome of interest.
  5. Revise the results: Within the Design & Evaluate view you can tweak the values of the attributes in the new model in order to:
    1. Incrementally improve performance of the model
    2. Identify what attributes in the model contribute most/least to its overall performance. For more on this option see Sensitivity Analysis
  6. Save the results of each version of the model that you find to be of value. This will be done automatically, with a unique name, if exhaustive of evolutionary searches have been carried out. But if there has been any manual tweaking the resulting model will then need to be saved manually
  7. Reiterate the analysis, if the predictive model only covers some but not all of the cases with the expected outcomes i.e. there are still a significant number of False Negatives. Look here for guidance on how to do this.

3. Plan case selection and investigation

  1. Click on View Cases and identify different types of cases, including:Screenshot 2016-02-16 14.04.56
    1. All the cases that belong to each of the four categories in the Confusion Matrix. These will be labelled and color coded. Use the Sort facility in Excel to sort the cases into the four categories.
    2. Exemplar cases for each of these four types.  These are cases that have the highest degree of similarity of attributes with all other case in the same group. There may be more than one. These will have the lowest Hamming Distance values
    3. Outlier cases for each of the four types. These are cases that have the lowest level of similarity of attributes with all other case in the same group. These will have the highest Hamming Distance values
  2. Select cases for subsequent within-case investigations, to identify casual mechanisms  that may be at work underlying the associations represented in the predictive model. These types of cases may be useful:
    1. Exemplar case within the True Positive group. This is where a causal mechanism needs to be found that might apply to all other cases in this group
    2. Outlier cases within the True Positive group. Ideally the same mechanism will also be found here
    3. Exemplar cases within the False Positive Cases. These are cases with the same (predictive model) attributes but different (absent) outcomes. Here the same causal mechanism might be expected but along with other features that prevent them from working and delivering the outcome
    4. Exemplar cases within the False Negative Cases. These are cases with different (predictive model) attributes but the same (present) outcomes. Here the same causal mechanism should not be expected to be found.
    5. See within-case analysis for more information on the kinds of cases selections and analyses that can be made at this stage.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: