First off, sorry if this is a simple question. I've been asked to get stuck in with some clinical epidemiology. The internet is my only support group as I am not a student under the supervision of an expert.
I am performing Cox regression over a list of drugs (categorical) and I am adjusting for the covariate age (amongst others, gender, income, etc). I am not sure whether I should be treating age as a categorical or quantitative covariate.
From my basic understanding, age is usually treated as a quantitative variable, whereas a categorical variable would be "are you a smoker?", Yes/No. Favourite colour? etc.
Firstly, investigating the impact of a drug in R:
coxph(Surv(time,status)~drug,data=coxDF) coef exp(coef) se(coef) z p drugNamecodeine_based 0.150117 1.161970 0.022199 6.762 1.36e-11 drugNameporpranolol 0.237963 1.268662 0.023608 10.080 < 2e-16 drugNameparacetamol 0.202408 1.224347 0.021519 9.406 < 2e-16
And then, controlling for a patients age by treating age as quantitative:
coxph(Surv(time,status)~drug+age,data=coxDF) coef exp(coef) se(coef) z p drugNamecodeine_based 0.1393446 1.1495202 0.0222008 6.277 3.46e-10 drugNameporpranolol 0.1849390 1.2031450 0.0236432 7.822 5.20e-15 drugNameparacetamol 0.1939401 1.2140235 0.0215203 9.012 < 2e-16 adjustedAge -0.0225542 0.9776982 0.0005415 -41.650 < 2e-16
Treating age as a quantitative variable makes some minor adjustments to the drug being taken. This makes sense, however, if I then control for a categorical age not only do I get a breakdown of risk associated by ages (in years), but the adjustment to each drug is different.
coxph(Surv(time,status)~drug+as.factor(age),data=coxDF) coef exp(coef) se(coef) z p drugNamecodeine_based 0.14077 1.15116 0.02222 6.334 2.38e-10 drugNameporpranolol 0.18819 1.20706 0.02367 7.949 1.88e-15 drugNameparacetamol 0.19453 1.21474 0.02155 9.028 < 2e-16 as.factor(adjustedAge)19 0.04519 1.04622 0.07329 0.617 0.537554 as.factor(adjustedAge)20 0.08761 1.09157 0.06905 1.269 0.204510 as.factor(adjustedAge)21 0.02948 1.02992 0.07030 0.419 0.674937 .... .... as.factor(adjustedAge)38 -0.48611 0.61501 0.06367 -7.635 2.26e-14 as.factor(adjustedAge)39 -0.54026 0.58260 0.06362 -8.492 < 2e-16 as.factor(adjustedAge)40 -0.48973 0.61279 0.06416 -7.633 2.29e-14 as.factor(adjustedAge)41 -0.46459 0.62839 0.06286 -7.391 1.46e-13 as.factor(adjustedAge)42 -0.57269 0.56401 0.06272 -9.131 < 2e-16
I really like that I can see the calculations per age. However, whether the effected treating the age variable as is correct or not in terms of the results for the drug I am not sure.
And, it is worth noting, a patient is only observed once. So their age is not time-dependent.
My main questions in summary:
1) It seems great that I can now see the risk associated with each year group, however, why should this change the exp(coef) of each drug between age being treated as quantitative and categorical?
2) If I want to control for age when measuring the impact a particular drug has on patient time-outcome, should this be quantitative or categorical?
3) If I become more interested in the risk associated with the age of a patient (rather than what drugs they have), surely I need to treat age as a categorical variable? Would I perform a separate calculation with just ~age?
Many thanks for you help.