…tools for developing, exploring and evaluating predictive models of expected outcomes

5.2 EvalC3 versus QCA results

Sometimes an EvalC3 analysis of a given data set will generate different findings to those generated by a crispset QCA analysis of the same data set. (Example to be placed here shortly). There are , it seems to me, at least three possible reasons:

The QCA analysis may have made use of “logical remainders” i.e non-existent cases representing configurations that have not been observed, but where it may be reasonable to make assumptions about whether the outcome would be present or absent in those situations. There are two types of QCA analysis solutions of this type, known as “intermediate” and “most parsimonious”. The results of these analyses may differ from those where machine learning methods have been used, such as those used by EvalC3, because the later do not make use of logical remainders – so the set of cases they use will not be the same.

Sometimes when a Truth Table of all the existing configurations is developed as part of a QCA analysis it is found that cases within a particular configuration are inconsistent i.e. some cases of this type have the outcome present, and some do not. One of the possible solutions is to define a “sufficiency threshold”, where if say 80 % of cases with the same configuration have the outcome “present” then the outcome will be deemed to be present for the whole configuration. But in an EvalC3 analysis, the basic unit of analysis is cases, not configurations, so this initial problem of inconsistency is not an issue and each case retains its original outcome status. If a QCA analysed data set is being reanalysed using EvalC3 the orginal outcome status will have to be reassigned back to each case in an “inconsistent” configuration. So the two data sets will differ, one will have more cases with the ourcome present than the other.

Sometimes there will be insufficient diversity in a data set, so an incremental minimisation process (using the QuineMcCluskey algorithm) will not proceed very far, and may end up finding a larger number of “solutions” than will be found by simple machine learning algorithms. In the small imagined data set below there is more than one difference between any pair of the available configurations, so it is not posssible to do a “minimisation” at all. But a simple visual scan , or a machine learning algorithm, could identify some simple prediction rules i.e. A*b = Outcome present, a * b + A*B = outcome absent