2. Select data

When you click on Select Data button this will take you to the Select Data worksheet. An example is shown below, using the same example data set.

Screenshot 2017-09-10 16.57.01

 

  1. Reading the characteristics of the data set. Above the data set itself are a series of measures that describe the dataset:
    1. Configurations: The number of unique configurations of attributes in the dataset. In this example dataset, there are 14, among a total of 26 cases
      1. Click on Sort by Configuration to show the cases grouped by configuration
    2. Consistency: The number of configurations that have consistent outcomes i.e. all absent or all present, but not a mix of both.
    3. Diversity: The proportion of all the possible combinations (i.e. configurations) of attributes present in this data set, as a percentage of the total number that is possible given the number of attributes in this dataset. In this example 14 / (2 to the power of 5).
    4. Missing data: The percentage of all the cells in the data set that have no values (0 or 1)
  2. Select Column Types and Choose Rows 
    1. By default, the left most column is automatically labelled as ID. To change this click on that cell and a drop down menu will appear that gives an option to Ignore that column, to leave it as ID or to change it to Attribute or Outcome
    2. By default, the right most column is automatically labeled as Outcome. If you want to change that, click on that cell, and choose Ignore or Attribute. You will then need to click on another column heading in the same way and change that to Outcome.
    3. There must be one ID column and one Outcome column in any data set being prepared for use at this stage. There may be more than one outcome of interest in the data set but only one can be labeled as such at this stage, prior to going to Design and Explore.
    4. All the columns between ID on the left and Outcome on the right are by default labeled as Attribute i.e. potential predictors of the outcome. But by clicking on any of these labels you can choose to change it to Ignore, or Outcome, or ID.
    5. The status of any of the columns can be re-assigned later on. When you do this you are in effect loading a new data set. One consequence is that the findings from the analysis of the previous data selection will no longer be accessible in the View Models view – so keep a record of those findings somewhere, if they are important.
  3. Click on Design &Evaluate, which will take you to that worksheet
  4. Optimizing the set of attributes being used
    1. This is an optional step to take before proceeding to Design and Evaluate. It can be useful when there are a large number of attributes in the data set, relative to the number of cases, and where there is no theory-led basis for removing some.
    2. By clicking on Find Optimal Attributes button a pop up menu will provide these three options, to:
      1. Maximize the consistency of the configurations in the data set. A high percentage means most cases with a given configuration will have the same outcome. A low percentage means  that often cases with the same configuration will have a mix of outcomes, i.e. both present and absent
      2. Maximize the diversity of the configurations in the data set. A high percentage means most of the possible configurations of the attributes are represented in the data set, a low percentage means that only a few of the possible configurations are represented in the data set.
      3. Maximize both the consistency and diversity of configurations. Neither measure may reach 100% but the highest possible measure on both will be found.
    3. For more information on when these different optimization strategies will be useful, see Selecting attributes and outcomes

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: