2. Select data

When you click on Select Data button this will take you to the Select Data worksheet. An example is shown below, using the same example data set.


  1. Select Column Types and Choose Rows 
    1. In the Select Data worksheet select a column of data which provides an easily recognizable ID for each case, and choose ID from the drop-down menu above that column of data. Clue: This drop down menu is not visible until you click on the words Outcome, ID or Attribute in the column you are looking at
      1. By default, the left most column is automatically labelled as ID
    2. For the column of data which describes an outcome that you want to predict select Outcome from the drop-down menu above this column
      1. By default, the right most column is automatically labelled as Outcome
      2. But any column can be nominated as the Outcome.
    3. For each of the columns of data which you would want to treat as attributes that might be predictors of the outcome, select Attribute from the drop down menu above each of these columns.
      1. By default, all other columns are automatically labelled as Attribute
    4. For all other columns of data you don’t want to include in the analysis, choose Ignore from the drop down menu above each of these columns. These columns may be other identifiers, other attributes and other outcomes, which you may want to use in later analyses. You can come back later and change the how the columns have been assigned.
    5. IF you want to select some rather than all cases to be subject of analysis do this by using the drop down menu at the end of any ID, Attribute or Outcome name.
  2. Click on Design &Evaluate, which will take you to that worksheet
  3. Optimizing the set of attributes being used
    1. This is an optional step to take before proceeding to Design and Evaluate. It can be useful when there are a large number of attributes in the data set, relative to the number of cases, and where there is no theory-led basis for removing some.
    2. By clicking on Find Optimal Attributes button a pop up menu will provide these three options, to:
      1. Maximize the consistency of the configurations in the data set. A high percentage means most cases with a given configuration will have the same outcome. A low percentage means  that most cases with a given configuration would have a mix of outcomes, i.e. both present and absent
      2. Maximize the diversity of the configurations in the data set. A high percentage means most of the possible configurations are represented in the data set, a low percentage means that only a few of the possible configurations are represented in the data set.
      3. Maximize both the consistency and diversity of configurations. Neither measure may reach 100% but the highest possible measure on both will be found.
    3. For more information on when these different optimization strategies will be useful, see Selecting attributes and outcomes

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s