If a dataset has information on 10 different attributes of projects this means that there could be 210 different combinations of these that might be the best predictor of an outcome of interest i.e. 1,024 possibilities. EvalC3 provides a number of ways of searching through these possibilities to find the most accurate predictor:
- Hypothesis-led manual selection of attributes, based on a theory derived from past experience and/or research elsewhere. The advantage of this approach is that where the hypothesis is correct there may already be a good foundation knowledge, from prior research, on why it works. In EvalC3 a prediction model can be developed manually by inserting relevant values into the model design (under the Design), and then observing its performance. Normally this should be the first step in an analysis process using EvalC3. However it is possible that there are other solutions with an even better fit with the data, which lay out of sight outside our current understanding,
- Additional attribute search. This is an incremental form of exhaustive search. There are two main ways of using it
- Where there is already an existing model the attributes of this model are treated as search constraints. An exhaustive search is then be made of x+1 attributes, where x is the number of attributes in the current model.
- Where there is no existing model using the “additional attribute search” will search for the best performing single attribute model. This is useful when searching for single attributes that are necessary or sufficient for the outcome. If this search is re-iterated it will treat the result of the first search as a constraint that has to be met. The new model will have x + 1 attributes
- There is a risk, that I have not substantiated, that this form of incremental search will get stuck in “local optimum“. There are two ways of checking if this is the case, which are valid search strategies in their own right:
- Exhaustive search, where every possible combination of multiple attributes is examined. Because it is exhaustive the results will be conclusive. However, an exhaustive search can be very time consuming if there are many attributes (processing time doubles with each additional attribute in a data set). This problem can now be mitigated by specifying the maximum number of attributes in any model found by exhaustive search. I often try this search with a 3 or 4 attributes maximum
- Evolutionary search. When data sets are large (deep and/or wide) an exhaustive search described above can be too slow to implement. Evolutionary searches are a very efficient means of searching for complex (i.e. multi-attribute) models within much larger combinatorial spaces. EvalC3 makes use of an existing Excel add-in known as Solver, to carry out evolutionary searches. However evolutionary searches are not necessarily as conclusive in their findings as exhaustive searches, because they sample different combinations of attributes, rather than test all of them. For this reason, the value of the results generated by an evolutionary search should be tested by repeating the search a number of times
- Decision Tree searches provide another option: a way of generating a whole set of models, which best predict all outcome, both present and absent. As with the exhaustive search, it is possible to specify the depth of the tree i.e. the maximum number of attributes in the models generated by this search.