Classification and regression tree for characterising smoking patterns among adults: evidence from global adult tobacco survey, Bangladesh
More details
Hide details
Jahangirnagar University, Savar, Dept. of Statistics, Bangladesh
Publication date: 2018-03-01
Tob. Induc. Dis. 2018;16(Suppl 1):A296
Download abstract book (PDF)

Tobacco consumption is a preventable public health problem. Many tobacco related studies have employed logistic regression in their analysis and they mostly analyzed categorical variables with dichotomous outcomes. In comparison to logistic regressions, classification and regression tree (CART), a data mining technique have not been widely applied for tobacco related research though this technique has enormous benefits over other methods. Therefore, this study examines the smoking patterns among adults by CART method and to compare findings with other traditional techniques.

Dataset covered a nationally representative sample of 9,629 respondents extracted from Global Adult Tobacco Survey, Bangladesh and used CART techniques for its suitability than others such as, binary logistic regression, multinomial logistic regression, chi-squared automatic interaction detector, quick unbiased statistical test.

CART was used to characterize the cigarette smoking behaviour among adults aged 15 years and above in Bangladesh. The algorithm builds a tree model to classify "average number of cigarettes smoked per day" using some attributes as predictors. CART was found easy to understand compared to other data mining techniques. Logistic regression model requires the parametric assumption (PA) of the dependent variable. However, this PA often restricts when data "A mixture of categorical and continuous variables". Therefore, CART is appropriate because: (i) Purely non-parametric and is independent of distribution assumptions (ii) Can handle both continuous and categorical data (iii) Can use skewed or multi-modal data without requiring the independent variables to be normally distributed (iv) Can handle missing data (v) Relatively automatic 'machine learning (vi) Less input is needed for analysis and (vii) Visualization character and its results are simple to interpret even for non-statisticians.

Among the different techniques so far used in characterizing smoking patterns among adults, CART is the best in terms of all aspects and suggested for future research.