Using a Data Analysis Matrix


Sometimes you have way more data, especially attributes of cases, than you can sensibly analyse in one exercise. Developing a matrix of the kind shown below can help. It makes the planning of your analyses transparent: what you will analyse and what you will not. And thus more accountable.


Here is an example I developed and used in 2015 when helping a UK consulting firm plan a data mining exercise using a data set that had 60+ cases and more than 70 potentially useful attributes. In this matrix…

  • Each blue column represents a grouping of a specific kind of case attribute. At the analysis stage, any one of these could be used as an outcome in an EvalC3 data set
  • Each blue row represents a grouping of a specific kind of case attribute. At the analysis stage, any one of these could be used as attribute which might be predictive of the outcome of interest in an EvalC3 data set
  • Cells represent possible relationships between specific types of attributes (rows) and specific types of outcomes (column)…
    • Colored (grey and yellow) cells represent those relationships that were of interest and which would be analysed
      • Initials in these cells represent the stakeholders with specific interest in this relationship
  • The cell values in the summary column on the right represent the level of  confidence in that row type of case attribute
  • The cell values in the summary row at the bottom represent the level of interest in the potential outcomes of interest represented by each column

The analysis that was carried out focused on the 23 colored cells. They represent 27% of all the possible types of analyses (7×12=84)  that could have been undertaken