Do Improved Social Signals Cause Improved Rankings?
http://bit.ly/R4ygCs
Posted by willcritchlow
Everyone in search is by now aware that certain social signals are well-correlated with rankings.
In each major study published on the subject, the authors point to how correlation does not imply causation (see, for example SEOmoz and Searchmetrics). Dr. Pete even wrote a whole post on the subject.
I wanted to see if it was actually plausible for these correlations to arise without social signals being a direct ranking factor. I built some Excel models to test this out and see if I could build a model that achieved the observed correlations without assuming social signals as a ranking factor.
The punchline: it's possible there is no causation
I have a suspicion that this could be the most misinterpreted post I have ever written, so I thought I'd start with a prominent "Cliff notes" to be explicitly clear about what I am saying and more importantly what I am not saying.
I am saying
You can tweet any of the following without misrepresenting me:
- Social signals *may* be correlated with better rankings but not cause them [http://dis.tl/O5bJoF%20via%20@willcritchlow">tweet this]
- Facebook Likes and rankings could achieve high correlation without Likes being a ranking factor [tweet this]
I am not saying
If you tweet any of the following attributed to me, I will write "does not follow instructions" on your forehead in magic marker:
- Likes don't matter [where did I say that?] [tweet this]
- Likes aren't a ranking factor [I don't show any evidence either way] [tweet this]
- Links are dead [what?] [tweet this]
- Correlation studies are a bad idea [I agree with Rand that we could actually use more studies] [tweet this]
What is this based on?
I have built a simplified Excel model of how pages accrue Likes over time. With no assumption of them being a ranking factor, I nevertheless demonstrate that we could see a strong correlation between Likes and ranking position.
Why focus on Likes?
The modelling works equally well with any of the social signals. I simply chose Likes to make the example more concrete - you could build the exact some correlation model with Tweets, Facebook Shares, Google +1s, or any other signal where accruing more social shares makes it even more likely that you will accrue more in the future.
Starting at the beginning
Every time we see a correlation study, I see evidence that some people haven't completely taken on board the correlation/causation subtleties. This is unsurprising - the mathematics behind the calculations in these study is typically undergraduate level (with some of the advanced analysis verging on graduate level) - most people's intuition lets them down horribly when confronted with probability and statistics. (Don't believe me? Check out the Monty Hall Problem).
So let's start from the beginning:
What are these studies looking for?
When we say correlation in this context, you can imagine that what we are looking for is similarity. We are looking for evidence that two things happen together (and don't happen together).
In the context of these studies, we are typically looking to see if "ranking well" happens together with "strong social signals."
Now - the mathematical part comes in when we try to define "happens together with" properly. The human brain is a remarkably powerful pattern matching device. For example - how many sportsmen and women have a pre-game routine involving a specific pair of lucky socks because of a sequence of events something like:
- Wore a new pair of socks today. Kicked ass.
- Wore the same pair of socks as last week. Kicked ass.
- New socks in the wash. Grabbed a different pair. Got whupped.
- Socks successfully cleaned and dried. Kicked ass again.
Pretty compelling evidence for those socks, huh?
From that point onwards, the athlete refuses to surrender the lucky socks. Any future losses are attributed to other factors ("I did everything I could - I even wore my lucky socks").
Michael Jordan apparently started wearing longer shorts to cover his UNC "lucky shorts"
But let's look at this a little more closely and skeptically. Are there any other explanations for this sequence of events? Imagine that the athlete in question is good - winning roughly 75% of his or her games on average. Imagine also that the socks are, in fact, not magic and that they have no impact on the result (shocking, I know). The odds that the single loss of a set of 4 games will coincide with a single wear of a different pair of socks is then: 0.75 x 0.75 x 0.25 x 0.75 = 0.11
In other words, roughly one in ten pairs of socks would randomly look this lucky.
Given all this evidence, most of us would probably chalk it up to chance (but keep wearing our lucky socks just in case).
Add to this the fact that we can't help but be always on the lookout for these patterns (it's just how our brains are wired) and it's unsurprising that there is always some pattern to be seen somewhere.
Given all of this, we apply pretty high standards of proof before stating that there is correlation [i.e. that two things tend to happen (or not) together]. This is measured with a "confidence" which is similar to the layman's definition but is measured in probabilities. We express our confidence in terms of "the probability that we would see a correlation at least this strong even if there were no underlying correlation." Statisticians typically talk in 95% or 99% confidence ranges (though note that a 95% confidence interval is still wrong one time in 20).
The ranking factor studies undertaken by SEOmoz and others have shown a non-zero correlation with high confidence. In other words, there is a correlation between certain social signals and higher rankings. I don't think anyone is seriously disputing that at this point.
Correlation is not causation
This tricky phrase gets wheeled out with every study. What does it mean?
It means that the mathematical techniques we have applied to be confident that there is a relationship between these two variables says nothing about whether one causes the other.
It's easy to think of correlations that are not causative. More ice creams are sold in months when more sun lotion is sold. Sun lotion sales don't cause ice cream sales and ice cream sales don't cause sun
