What’s an acceptable value for R-squared?

How do I tell if the model is good enough if I only have R-squared to base it on? What’s a good value for  R-squared? How large does an R-squared need to be for the model to be valid? I am aware that there is the understanding among analysts that a model is not useful unless R-squared is at least some fraction greater than 50%. Is this the standard to be followed?

Generally, it is better to look at adjusted R-squared as these are unbiased estimators that correct for sample size and number of parameters estimated.

In marketing science, particularly in the study of customer behavior,  no hard cut-off has ever become the accepted norm for the value of R-squared (or adjusted R-squared for that matter). In marketing we are making predictions about human behavior and not the workings for a physical system, so to get to any level of insight from nothing is a big win.

Thus I would advise that in this case R2 should be “large’, meaning, large enough in the eye of the experimenter, and what is large varies a great deal according to the type of experiment being conducted. For example in social science experiments I would be delighted to get a statistically significant R2 value as low as 0.20. But in natural sciences and engineering experiments, I wouldn’t be happy with R2 values that are lower than 0.90.

As a general rule in experimental designs, the more the researcher knows about the science, the better controlled the experiment can be. Hence, the expectation for R2 increases. Obviously, human behavior is poorly understood so R2 values that physicists and engineers would outright dismiss are acceptable. I wouldn’t even go to the topic of over-fitting.

A good rule of thumb is to consider the context of the experiment. If our projects involve customer response, then statistically significant R2 values around 0.50 can be good enough to give us a basis for improvement. But if we are looking at cycle time through process, we would want a higher threshold, say 0.70. Still, you can tell these are arbitrary. As with every statistical approach, you need to have a fairly good idea of the underlying phenomena that you are trying to model.

The most important thing to remember is that we want our data to point us in the right direction for making improvements. What R2 will do for us varies depending on the context of the experiment. I wouldn’t put too much stock on a single metric like R2 if there are other available metric we can use to validate the model.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.