Thanks to pre-built calculators and a couple of basic formulas, you don’t need a statistician to calculate the winner of a split test. Whether you are looking to estimate how long a test will take or if a test has reached significance, there are only a couple of easily accessible numbers you need to know.
This is when you can confidently say that there is a clear winner and loser in your split test.
This is the confidence level you have in your test. A low p-value means that there is a smaller chance of a false positive. Most tests are run with a p-value less than 0.1 which would mean you are 90% confident in your results.
The power of a test is the ability for it to detect a difference between two variations. The higher the power, the more sure you can be that you did not miss a difference between the two variations if they are looking the same.
The confidence interval is the range of results that you would expect to see after you publish the change you make. Just because a variation wins by 10%, doesn’t mean that the uplift will be exactly the same as more data is collected.
How To Estimate How Long Your Test Will Take
Before setting up a split test, you should always check to see how long the test will take. If the test is expected to take more than a couple of months to reach significance, then there is a good chance that it isn’t worth running. You can use a sample size calculator to get a rough idea of how long your test will take. A good sample size calculator will ask for traffic volume, current conversion rate, number of variations, and the minimum uplift you want to detect. I recommend checking out this one from Adobe.
How To Know When Your Test Is Done
Your test is done when it has reached statistical significance. This is typically done using a t-test, but there is a shorthand way to calculate significance as well. Whether you use the more robust calculation or the one you can do by hand, you are going to want to make sure that your test has at least reached the minimum duration of a good test.
Minimum Test Duration
When it comes to running a split test, it is always tempting to “peek” and see which variation is winning. In general, the only reason you should look at your test early is to make sure that the data looks reasonable. If the results are changing significantly more than what seems reasonable, you may want to take a look at your test and make sure everything is set up correctly.
Now that we’ve established that peeking is bad, here 2 numbers to look for before you should even consider checking for significance.
Your test should run for a bare minimum of 1-week and really a 2-week minimum is not a bad idea. The reason for running at least a week is that the behavior of your users may change depending on the time of day and the day of the week. Maybe your new messaging resonates with the people who view your website during the workweek, but not with those who see it on the weekend. Or maybe the people coming from your Friday newsletter don’t like the new imagery you chose. You won’t know this unless you run your test for a full week.
The reason many CRO experts recommend 2 weeks is because it allows for the smoothing out of an unusual day like a holiday or unexpected traffic spike due to something like a hit social media post. You don’t always have to let your tests run for 2 whole weeks, but at least make sure they run for a full 7 days.
To safely call a split test, it is best to wait until each variation has at least 100 conversions. The reason you want to wait for this is that if your variations have a low number of conversions, they are subject to statistical noise. For example, if you are running a test and don’t look at the number of conversions, you might quickly call a test when a variation is winning by 50%. But if version A has 6 conversions and version B has 9, then the actual difference is only 3 conversions. It is hard to say that version A is definitely better than version B with such a small difference.
The Calculator Free Method
If you want a quick way to check for statistical significance, there is a simple math formula you can use.
D2 > N
A = Number of Conversions for Variation A
B = Number of Conversions for Variation B
N = Total Number of Conversions [ A + B ]
D = Half the Difference Between Variations [ ( B – A) / 2 ]
Here are two examples of using this equation to see how it works.
Variation A: 100 Conversions (A)
Variation B: 120 Conversions (B)
N = 100 + 120 = 220
D = (120 – 100) / 2 = 10
D2 = 102 = 100
100 > 220
Since D2 is not greater than N, the results of this test would not be considered significant.
Variation B: 120 Conversions (B)
N = 1000 + 1200 = 2200
D = (1200 – 1000) / 2 = 100
D2 = 1002 = 10,000
10,000 > 2200
Since D2 is greater than N, the results of this test would be considered significant.
If you are looking for a more robust and statistically sound way to calculate statistical significance, the t-test is the way to go. The t-test is used by statisticians to determine if one variation is different from another within a certain degree of confidence. The math involved in performing a t-test isn’t something you would want to do on paper. Instead, use an online calculator like the one found at abtestguide.com.
You will want to make sure that you select the appropriate p-value when using the calculator to indicate what level of confidence you are comfortable with. You may also want to select a two-sided analysis as well. A one-sided calculation will only check to see if variation B scores higher than variation A. If you want to know if variation B converts significantly less than variation A, then you will need to use the two-sided analysis.
How To Calculate Uplift
Calculating uplift is probably the most exciting part of a split test. This is where you see how much of an impact your new variation will have. The trick to calculating uplift is to note the relative change, not the absolute change. Here is how to calculate the relative change between your variations.
Relative Change = ( B – A) / A
A = Conversion Rate of Variation A
B = Conversion Rate of Variation B
The reason you want to use the relative change instead of the absolute change is that it will give you a better picture of how big of an effect your winning variation will have. If a test you ran improved your conversion rate from 10% to 20%, you would have a relative change of 100% [ ( 20 – 10) / 10 ]. This means that you would expect 100% more conversion on the new variation.
While the statistics behind statistical significance might seem overwhelming at first, with online calculators and some simplified equations, the process is actually quite simple. So whether you paid attention in that college statistics class or not, you should be able to handle all of the calculations that you need to run your own split tests.