Dichotomising data

If you Google “dichotomising data” you will find lots of warnings, that this is basically a bad idea!. Why so? Because if you do so you will lose information. All those fine details of differences between observations will be lost.
But what if you are dealing with something like responses to an attitude survey? Typically these have five-pointed scales ranging from disagree to neutral to agree, or the like. Quite a few of the fine differences in ratings on this scale may well be nothing more than “noise”, i.e. variations unconnected with the phenomenon you are trying to measure. A more likely explanation is that they reflect differences in respondents “response styles“, or something more random still.


Aggregation or “binning” of observations into two classes (higher and lower) can be done in different ways. You could simply find the median value and split the observations at that point. Or, you could look for a  “natural” gap in the frequency distribution and make the split there. Or, you may have a prior theoretical reason that it makes sense to split the range of observations at some other specific point.

In this blog posting on the Rick on the Road website, I have explained how to do this in another more useful way. By treating cut-off points as predictors, some of which perform better than others. On the same page you will also have access to an Excel Online spreadsheet where you can put in our own data and generate dichotomised/binned data that you can then use within EvalC3