Actually we test our trust symbols in our shop with an GA experiment. (all trust symbols vs. no trust symbols)
Our last test run 16 days and declared a winner while it had only 0.26 % better conversion rate than the loser. I do not have a good feeling to accept this result, because of the low numbers on transactions overall for our test group.
Our test settings were:
Objective for this experiment: Transactions
Set a confidence threshold: 95%
Distribute traffic evenly across all variants: off
Sessions | Transactions | Conversion Rate | Compare to Original | Probability of Outperforming Original 50,156 | 768 | 1.53% | 0% | 0.0% 11,874 | 151 | 1.27% | -16.95% | 1.7%
I know that I could set the confidence threshold to 99.5% to get a more accurate result. That would drive the numbers high and I guess that would give me a better feeling to accept the result.
But is there a reason to set the
Distribute traffic evenly across all variants to
Could it give us a better result? I know this would cost us some sells, but if the result would be better we would like to pay this price.
Note: I would think to distribute traffic evenly across all variants would increase the chance to reach a statistical significance result of the one that does not perform as good as the better variant. So it is kinda confusing for me that GA allows to adjusting traffic dynamically based on variation performance and then still declares a winner while the result mass is not statistical significant.
The answer to this is largely going to be down to opinion without more data. It's worth looking into P-value calculations if you want an unbiased, scientific answer.
I would say that with such a low conversion rate, you'll want to run the experiment on a larger scale to gauge the differences. The current result is not statistically significant.