by Alexis
Last Updated October 09, 2019 21:19 PM

**Caveat: This question may be a tad rambly, and I welcome comments with specific directions for me to improve it.**

During a too brief exchange with the worthy @NickCox I got to thinking about transformation/back transformation and inference.

It seems to me pretty apparent that frequentist inferenceâ€”confidence intervals, hypothesis testsâ€”on a transformed variable $f(x)$, is not inference about the untransformed variable $x$, even when back-transforming inferential quantities, because, generally, $\sigma^{2}(f(x)) \ne f(\sigma^{2}(x))$, unless $f(x) = x$, and both CIs and hypothesis tests rely upon an estimate of the variance. To quote from my answer here:

Basing CIs on transformed variables + back-transformation produces intervals without the nominal coverage probabilities, so back-transformed confidence about an estimate based on $f(x)$ is not confidence on an estimate based on $x$.

Likewise, inferences about untransformed variables based on hypothesis tests on transformed variables means that any of the following can be true, for example, when making inferences about $x$ based on some grouping variable $y$:

$x$ differs significantly across $y$, but $f(x)$ does not differ significantly across $y$.

$x$ differs significantly across $y$, and $f(x)$ differs significantly across $y$.

$x$ does not differ significantly across $y$, and $f(x)$ does not differ significantly across $y$.

$x$ does not differ significantly across $y$, but $f(x)$ differs significantly across $y$.

It is also very easy to imagine examples setting this point down sharply. For example, if $y_{i} = x_{i}$ has Pearson's $r=1.0$ for $y$ and $x$, but Pearson 's $r=0.0$ for $y$ and $x^{2}$ if the range of $x$ is symmetric about 0.

On the other hand, tricks like Oehlert's Delta method can provide a 'back-transformation' that approximates the correct variance of $x$ as an alternative to simply calculating it directly, or calculating it as $f^{-1}(\sigma^{2}(x))$.

Good Nick Cox however, points out that to "estimate on a link scale and report on the original scale is central to generalized linear models," and that (if I understood correctly) inference on the geometric mean entails such back-transformation in the form $exp\left(\frac{\sum \log (x)}{n}\right)$.

**When is it Ok to base inferences about $\boldsymbol{x}$ on back-transformations of estimates and inferences on $\boldsymbol{f(x)}$, and when is it not?**

**Second caveat: I am not calling Nick Cox out to defend any position with this question, and am genuinely interested in understanding when performing inference on $\boldsymbol{f(x)}$ but drawing conclusions about $\boldsymbol{x}$ based on back-transformation makes sense and does not make sense.**

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger