Data sets that can be analysed by EvalC3 can come from different sources: a once-off research or evaluation exercise or from ongoing monitoring systems. And as explained below, they can also be generated by participatory means.
Two types of data can be generated by participatory means: (a) outcome data, which a good predictive model should be able to identify, (b) attribute data, which may be predictive of outcomes (identified by participatory or other sources of data).
Stakeholders in a project, such as those implementing the intervention, and those experiencing its effects, are likely to have views about what works, and what does not work, which may be much wider ranging and sometimes closer to the truth, than the contents of official monitoring systems. It can be worth tapping into those views.
Attribute and outcome data can be generated done using either of two related methods: (a) pile or card sorting, (b) online survey instruments.
Pile sorting is a form of ethnographic inquiry that enables us to identify participants’ view of the world, primarily the categories they use to describe their world. There are many different ways of doing pile sorting (described here) but the simplest approach is called “free sorting”. Participants are presented with a list of events, activities, people or objects that they are familiar with and then asked to sort them into piles, and then to label each pile with a description of what the items in that pile have in common, but which makes them different from the items in the other piles. The task may be explained in ways that make the focus on the inquiry as broad or narrow as needed. For example. “Please sort these projects in two piles capturing a difference between them that you think might effect how successful they are in achieving their objectives“.
Pile sorting is typically done with one or more respondents in a face to face meeting. The researcher (or evaluator) notes down which cards are put in which pile and then asks the respondent an open ended question designed to elicit the respondents view of what the difference is, and sometimes, why they think it is important. This qualitative information is also recorded.
Online survey instruments, such as those available via Survey Monkey, are another means of eliciting these kinds of sorting results and judgements from participants. The use of online survey is more suitable when dealing with larger numbers of respondents and/or respondents based in many different location.
A draft version of such a survey can be seen here. In this survey cases of interest, such as projects, are listed in the rows of a matrix. Participants are then asked to sort them into two piles, by using two columns of check-boxes, representing two piles. A Comment field below the matrix is then used to capture the participants’ description of how the two piles of cases differ. The differences the survey is looking for are “differences that (might) make a difference” to the outcome of interest. The same pile sorting question can then be repeated, in order to capture more than one set of differences from the same respondent.
Aggregating pile sorting data on attributes
Regardless of which method is used ( face to face or online survey) the results from all the participants are then aggregated into one large matrix of cases x attributes , with the attributes (and case values on these) being those contributed by the different participants via their pile-sort responses.
The matrix of data that remains can then be imported into EvalC3 for analysis.
Aggregating pile sorting of data on outcomes
As with the sorting of cases by attributes, cases can also be sorted into two piles according to whether the respondent’s views on whether an outcome of interest is present or not. The outcome of interest can be defined in broad or narrow terms, by varying how the sorting request is expressed. Once the sorting is completed the the pile labels can then elicited to capture the criteria the respondent was using to judge whether the outcome was present or not.
When responses from multiple survey participants are aggregated it then possible to generate an aggregate (multi-criteria) achievement scale, based on the number of respondent who placed a given case in the “successful” pile.
Associated with each of the aggregated ranking scores can be text statements describing the criteria used by each respondents who placed that row case in a “successful” pile. This set of statements will vary row by row in the example above, because different sub-sets of respondents will be involved in each row.
This process is simpler to use than the more traditional ranking methods, where each respondent constructs a full ranking of all cases. Especially where there are a large number of cases.
The data that is generated can be analysed in two ways. One is by using any of the four algorithms in EvalC3 to develop the best possible predictive model, and then to follow-up with within-case inquiries to look for supporting evidence of any causal mechanisms at work.
The other is to engage the same participants in proposing and testing their own hypotheses about what combinations of attributes best predict the outcome of interest, by using the manual design facility in EvalC3. The performance of models generated by participants and algorithms can then be compared, along with any supporting evidence from within-case analyses.
It is likely that some post-survey data cleansing will be needed. Some attributes will correlate highly with each other, in which case it would make sense if one or either were removed. Especially if the Comments suggest they are the same attribute or if one attribute is clearly easier to understand than the other.
If you would like some help in developing and using a participatory predictive modeling data set, email firstname.lastname@example.org
There are a number of online pile sorting website, where people can take part in a range of types of pile sorting exercises, and the data then aggregated and exported for analysis. Although intended for use in improving the structure of web sites they can be used for the same purposes as the online survey above. Most offer free non-premium services. See for example OptimaSort. See here for a list of these services.
The same data set (cases x attributes & outcomes) can also be analysed using social network analysis visualization software. The network structure of the matrix can be visualized in three forms:
(a) A two-mode network, showing how cases are variously connected via their shared attributes.
(b) A one-mode network, showing how cases are variously connected to each other, where the strength of these linkages is defined by the number of attributes they both share.
(c) A one-mode network, showing how attributes are variously connected to each other, where the strength of these linkages is defined by the number of cases they jointly apply to.