Is age categorical or quantitative or both?

by Anthony Nash   Last Updated June 15, 2019 20:19 PM

First off, sorry if this is a simple question. I've been asked to get stuck in with some clinical epidemiology. The internet is my only support group as I am not a student under the supervision of an expert.

I am performing Cox regression over a list of drugs (categorical) and I am adjusting for the covariate age (amongst others, gender, income, etc). I am not sure whether I should be treating age as a categorical or quantitative covariate.

From my basic understanding, age is usually treated as a quantitative variable, whereas a categorical variable would be "are you a smoker?", Yes/No. Favourite colour? etc.

Firstly, investigating the impact of a drug in R:

coxph(Surv(time,status)~drug,data=coxDF)

                       coef exp(coef)  se(coef)      z        p
drugNamecodeine_based  0.150117  1.161970  0.022199  6.762 1.36e-11
drugNameporpranolol    0.237963  1.268662  0.023608 10.080  < 2e-16
drugNameparacetamol    0.202408  1.224347  0.021519  9.406  < 2e-16

And then, controlling for a patients age by treating age as quantitative:

coxph(Surv(time,status)~drug+age,data=coxDF)



                        coef  exp(coef)   se(coef)       z        p
drugNamecodeine_based  0.1393446  1.1495202  0.0222008   6.277 3.46e-10
drugNameporpranolol    0.1849390  1.2031450  0.0236432   7.822 5.20e-15
drugNameparacetamol    0.1939401  1.2140235  0.0215203   9.012  < 2e-16
adjustedAge           -0.0225542  0.9776982  0.0005415 -41.650  < 2e-16

Treating age as a quantitative variable makes some minor adjustments to the drug being taken. This makes sense, however, if I then control for a categorical age not only do I get a breakdown of risk associated by ages (in years), but the adjustment to each drug is different.

coxph(Surv(time,status)~drug+as.factor(age),data=coxDF)

                         coef exp(coef) se(coef)       z        p
drugNamecodeine_based     0.14077   1.15116  0.02222   6.334 2.38e-10
drugNameporpranolol       0.18819   1.20706  0.02367   7.949 1.88e-15
drugNameparacetamol       0.19453   1.21474  0.02155   9.028  < 2e-16
as.factor(adjustedAge)19  0.04519   1.04622  0.07329   0.617 0.537554
as.factor(adjustedAge)20  0.08761   1.09157  0.06905   1.269 0.204510
as.factor(adjustedAge)21  0.02948   1.02992  0.07030   0.419 0.674937
....
....
as.factor(adjustedAge)38 -0.48611   0.61501  0.06367  -7.635 2.26e-14
as.factor(adjustedAge)39 -0.54026   0.58260  0.06362  -8.492  < 2e-16
as.factor(adjustedAge)40 -0.48973   0.61279  0.06416  -7.633 2.29e-14
as.factor(adjustedAge)41 -0.46459   0.62839  0.06286  -7.391 1.46e-13
as.factor(adjustedAge)42 -0.57269   0.56401  0.06272  -9.131  < 2e-16

I really like that I can see the calculations per age. However, whether the effected treating the age variable as is correct or not in terms of the results for the drug I am not sure.

And, it is worth noting, a patient is only observed once. So their age is not time-dependent.

My main questions in summary:

1) It seems great that I can now see the risk associated with each year group, however, why should this change the exp(coef) of each drug between age being treated as quantitative and categorical?

2) If I want to control for age when measuring the impact a particular drug has on patient time-outcome, should this be quantitative or categorical?

3) If I become more interested in the risk associated with the age of a patient (rather than what drugs they have), surely I need to treat age as a categorical variable? Would I perform a separate calculation with just ~age?

Many thanks for you help.



Related Questions




h2o glm tweedie for categorical variables

Updated April 01, 2019 13:19 PM

Simple effects of categorical interaction

Updated July 12, 2017 11:19 AM