INTRODUCTION

Cardiovascular disease (CVD) is the leading cause of death in the US and globally, producing significant health and economic burden1,2. Combustible cigarette smoking is a well-established independent risk factor for CVD2-5. Leveraging such evidence, coupled with robust regulatory policies and enforcement, have resulted in a steady decline in combustible cigarette use across different population subgroups in the US6,7.

Despite the decrease in the rates of smoking, the popularity of non-cigarette tobacco products has increased in the past few decades8-10. Between 2000 and 2015, smokeless tobacco use among US adults increased by 23%8. In 2020, 2.3% of US adults reported past 30-day smokeless tobacco use, while 1.6% of youth reported smokeless tobacco use in 20226,11. Despite a reduction in cigar use in some subgroups, use has increased 68% among adult women12. Additionally, cigar use has decreased among older adults but increased from 12.0% in 2002 to 12.7% in 2008 among those aged 18–25 years13. Lastly, the use of e-cigarettes has become increasingly popular, with approximately 5.1% of US adults reporting past 30-day use of e-cigarettes in 202014. Despite the significant increase in the use of non-traditional tobacco products, important knowledge gaps on their health effects remain, and several studies have reported mixed results on the association of these non-traditional tobacco products and CVD risk14,15.

The use of longitudinal data such as the Population Assessment of Tobacco and Health (PATH) has been instrumental in studying the potential health effects of newer tobacco products such as e-cigarettes15,16. The PATH study is, however, limited by self-reported, non-adjudicated outcomes that could result in misclassification, short follow-up period, and the low prevalence of non-cigarette tobacco product use17. Given the relatively low prevalence of non-traditional tobacco products in individual prospective cohort studies, the synthesis of various datasets can lead to the construction of high-powered and phenotypically diverse databases of unparalleled size. Therefore, prioritizing data synthesis from multiple existing cohorts can offset the financial, technical, and time constraints related to developing new well-powered studies, which supported the rise of large consortia like the Cross-Cohort Collaboration (CCC)18.

The CCC was instituted to develop the infrastructure, policies, and design procedures for harmonization and eventual data sharing for the purpose of studying chronic disease epidemiology. The objective of the tobacco working group arm of the CCC is to provide additional insight into the cardiovascular health implications of non-cigarette tobacco product use with an emphasis on subclinical and clinical CVD.

The 2016 Tobacco Deeming rule extended the regulatory authority of the US Food and Drug Administration (FDA) to include the manufacturing, marketing, and distribution of non-cigarette tobacco products, including e-cigarettes, pipe tobacco, cigars, hookah/waterpipe tobacco, and e-liquids19. The CCC-Tobacco, which is partly supported by the Tobacco Centers of Regulatory Science (TCORS) program, funded by the Center for Tobacco Products of the FDA, seeks to inform the regulatory efforts of the agency directed towards non-traditional tobacco products. The CCC-Tobacco received ethical approval from the Johns Hopkins institutional review board. This article describes the design and methodology for creating and harmonizing the CCC-Tobacco dataset and presents the distribution of baseline sociodemographic characteristics and tobacco exposure in CCC-Tobacco.

METHODS

Cohorts that comprise the CCC-Tobacco

Twenty-three prospective observational cohort studies in the US and Brazil with baseline and follow-up data on tobacco use have currently provided de-identified individual-level data to the CCC-Tobacco. These include nine landmark cohorts which were originally designed to study CVD epidemiology (i.e. traditional cardiovascular cohorts): Atherosclerosis Risk in Communities (ARIC) Study, Coronary Artery Risk Development in Young Adults (CARDIA) Study, Cardiovascular Health Study (CHS), Dallas Heart Study (DHS), Framingham Heart Study (FHS), Hispanic Community Health Study/Study of Latinos (HCHS/SOL), Jackson Heart Study (JHS), Multi-Ethnic Study of Atherosclerosis (MESA), Multiple Risk Factor Intervention Trial (MRFIT), the Reasons for Geographic and Racial Differences in Stroke Study (REGARDS), and the Strong Heart Study (SHS). Other cohorts included in the CCC-Tobacco are (non-cardiovascular specific cohorts): Baltimore Longitudinal Study of Aging (BLSA); Chronic Renal Insufficiency Cohort (CRIC); the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil); Genetics of Lipid Lowering Drugs and Diet Network (GOLDN); the Health, Aging and Body Composition Study (Health ABC); the Osteoporotic Fractures in Men Study (MrOS); Rancho Bernardo Study (RBS) of Healthy Aging; Study of Osteoporotic Fractures (SOF); the Study of Women’s Health Across the Nation (SWAN); and Women’s Health Initiative (WHI). Characteristics of participating cohorts and their geographical distribution are presented in Table 1 and Figure 1, respectively. For additional details, including study-specific rationale, design, funding, and protocols, and appropriate links to background reading, are given in Supplementary file Table 1. Additionally, the contribution of each participating cohort to the whole CCC-Tobacco dataset is given in Supplementary file Figure 1.

Table 1

Characteristics of the twenty-three participating cohorts of the Cross-Cohort Collaboration-Tobacco dataset

Participating cohorts (website link)Cohort population and descriptionEnrollment years
Traditional cardiovascular cohorts
Atherosclerosis Risk in Communities Study (ARIC)15792 in 4 US communities aged 45–64 years1987
Coronary Artery Risk Development in Young Adults (CARDIA)5115 at 4 US field centers aged 18–30 years1985–86
Cardiovascular Health Study (CHS)5888 adults aged ≥65 years in 4 US communities1989–1999
Dallas Heart Study (DHS)6101 from multi-ethnic cohort of Dallas County2000
Framingham Heart Study (FHSL)5209 adult population of Framingham Massachusetts aged 30–62 years (original cohort)1948
Offspring cohort: 5124 adult children of the original cohort and their spouses aged 30–74 years1971
FHS 3rd Gen: 4095 men and women aged >19 years with ≥one parent in the offspring study2002
Hispanic Community Health Study/Study of Latinos (HCHS-SOL)16000 persons of Hispanic/Latino origin from 4 field US centers aged 18–74 years2006
Jackson Heart Study (JHS)5306 community-based African Americans from 3 counties in Jackson MS aged 35–84 years2000–2004
Multi-Ethnic Study of Atherosclerosis (MESA)More than 6000 multi-ethnic men and women from 6 communities in the US aged 45–84 years2000–2002
The Multiple Risk Factor Intervention Trial (MRFIT)12866 men aged 35–57 years enrolled in coronary heart disease intervention trial1972
Reasons for Geographic and Racial Differences in Stroke (REGARDS)30239 employed men and women aged ≥45 years2003
Strong Heart Study (SHS)4500 American Indian tribal members aged 35–74 years1988
Non-cardiovascular specific cohorts
Baltimore Longitudinal Study of Aging (BLSA)>3000 men and women aged >20 years1958
Chronic Renal Insufficiency Cohort Study (CRIC)3939 with chronic kidney disease (1560 older adults during third phase)2001–2013
(I & II) 2013–2015 (III)
Brazilian Longitudinal Study of Adult Health (ELSA-Brasil)15000 active and retired civil servants from teaching and research institutions aged 35–74 years2008
Genetics of Lipid Lowering Drugs and Diet Network (GOLDN)1200 white family members from 2 genetically homogeneous US centers aged >18 years2004–2006
Health Aging and Body Composition Study (Health ABC)3075 community-dwelling in Memphis TN or Pittsburgh PA and aged 70–79 years1997
The Osteoporotic Fractures in Men Study (MrOS)6000 senior men ≥65 years from 6 US communities2000
Rancho Bernardo Study (RBS) of Healthy Aging6339 Community based cohort of all residents of Rancho Bernardo1972–1974
The Study of Osteoporotic Fractures (SOF)10366 older women aged ≥65 years1986
Study of Women’s Health Across the Nation (SWAN)3302 women in longitudinal study of women’s health in 7 US research centers1996–1997
Women’s Health Initiative (WHI)161808 postmenopausal women aged 50–79 years1993
Total population (N)3227821948–2015
Figure 1

Geographical distribution of participating cohorts’ investigations sites Cross-Cohort Collaboration-Tobacco dataset

https://www.tobaccoinduceddiseases.org/f/fulltexts/166517/TID-21-89-g001_min.jpg

Most of the studies began recruiting participants between 1948 and 2008. Four of the cardiovascular studies (ARIC, CARDIA, DHS, MESA) specifically recruited participants from different racial groups, and three were designed to primarily study specific racial or ethnic groups (Hispanic/Latino participants in HCHS/SOL, Black participants in JHS, and Indigenous participants in SHS). The WHI is one of the largest women’s health projects ever launched in the US, having enrolled more than 161000 women aged 50–79 years at 40 clinical centers. The main areas of research were CVD, cancer, and osteoporotic fractures in postmenopausal women.

All the cohorts have extensive data on participants’ baseline sociodemographic characteristics, and gather data on participant tobacco use behaviors, although this varies in scope and detail. Many cohorts that comprise the CCC-Tobacco have collected detailed information on participants’ health and behavior for as long as fifty years of follow-up. Twenty-one cohorts (except ELSA-Brasil and SOF) ascertain CVD including myocardial infarction, stroke, atrial fibrillation, and heart failure, and several cohorts report measures of subclinical cardiovascular injury including measures of inflammation, coronary artery calcium (CAC), carotid plaque, carotid intima-media thickness (cIMT), pulse-wave velocity, and ankle-brachial index.

Participants

Cohort participants previously provided informed consent for in-person, telephone, and/or email contact and for the abstraction of medical records. The institutional review board at each research center approved the study protocol for each cohort. The twenty-three cohorts in the consortium provided data from approximately 322000 participants. All forty-eight continental US states are represented among CCC-Tobacco participants, including rural, suburban, and urban communities (Figure 1). In all, the cohorts included in the CCC-Tobacco have been or are being conducted across approximately forty field/clinical centers. One cohort with extensive geographical reach, the REGARDS, operates via telephone and in-home exams only.

CCC-Tobacco variable domains

We requested and obtained individual-level de-identified data from all participating studies based on the following variable list. Baseline characteristics included sociodemographic variables such as age, sex, race/ethnicity, study site, education status, and income level. Past medical history, family history, and anthropometric variables including body mass index (BMI) were also requested. Measured cardiometabolic parameters including systolic blood pressure (SBP), diastolic blood pressure (DBP), total cholesterol (TC), high-density lipoprotein (HDL) cholesterol, low density lipoprotein (LDL) cholesterol, lipoprotein a [Lp(a)], and triglycerides data were requested. Data on the use of lipid-lowering therapy, anti-hypertensive therapy, anti-hyperglycemic medications, and anti-platelet medications were also collected.

Furthermore, self-reported health behaviors such as physical activity, diet, and the use of traditional and non-traditional tobacco products were requested from all the cohorts. Comorbidities were defined as follows. Obesity was defined as BMI ≥30 kg/m2. Hypertension was defined as SBP ≥140 mmHg, DBP ≥90 mmHg, or use of hypertensive medications. Diabetes was defined as a fasting blood glucose level ≥126 mg/dL, previous diagnosis of diabetes (treated or untreated), or use of antidiabetic medications. Dyslipidemia was defined as if one the following were present: 1) TC >240 mg/dL; 2) Triglycerides >200 mg/dL; 3) HDL-C <50 mg/dL (female) or <40 mg/dL (male); 4) LDL-C >160 mg/dL; or 5) the use of lipid lowering therapies. Hyperlipidemia was defined as either: 1) TC >240 mg/dL; 2) Triglycerides >200 mg/dL; or 3) LDL-C >160 mg/dL.

Participating studies provided baseline and longitudinal data over multiple study visits on the use of cigarettes, cigars, pipes, smokeless tobacco, and e-cigarettes, as well as secondhand smoke exposure. Data on the intensity and duration of exposure including tobacco-product use-years and usage per day were also collected when available. Additionally, data on the patterns and changes in tobacco use over time such as poly-product use, product switching, and quitting were collected.

Biomarkers of subclinical cardiovascular injury based on three domains – subclinical inflammation, thrombosis, and atherosclerosis – were collected. Inflammatory biomarkers included high-sensitivity C-reactive protein (hsCRP) and interleukin-6. Thrombosis biomarkers included fibrinogen and D-dimer. Measures of atherosclerosis included CAC, carotid plaque, cIMT test readings, pulse-wave velocity, and ankle-brachial index. The most recent data on cardiovascular outcomes were requested from each participating study. The outcomes included cardiovascular events (myocardial infarction, stroke, atrial fibrillation, heart failure) and mortality (coronary, cardiovascular, and all-cause). Furthermore, harmonized time-to-event variables will be constructed for the purpose of future survival analysis.

Data acquisition and transfer

The data acquisition process consisted of establishing contact with the designated contact for each cohort, who then advised on the preferred mode of data transfer for the cohort. For most of the studies, the process entailed reaching out to the designated contact and subsequently submitting a study proposal which was then peer reviewed and ultimately approved by the cohort administrators or returned with request for changes. Upon approval of the proposal, data use agreements were completed and signed. Subsequently, data variable lists were sent to each study contact person. For studies like the FHS, data were obtained from the Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC). Finally, several cohorts’ datasets were downloaded directly from the study website including the RBS and HEALTHABC. Upon transfer, datasets were stored in a secure encrypted cloud space (SafeDesktop) at the Johns Hopkins University School of Medicine. The process of data acquisition is summarized in Figure 2.

Figure 2

Steps in the data acquisition process Cross-Cohort Collaboration-Tobacco dataset

https://www.tobaccoinduceddiseases.org/f/fulltexts/166517/TID-21-89-g002_min.jpg

Data harmonization

Data management and harmonization was conducted centrally at the Johns Hopkins University School of Medicine. Upon the receipt of datasets, data were checked for missing variables and any other inconsistencies following which the data providers for the respective study were queried. The decision to harmonize a variable was made if the given variable had been provided by more than one study. Our harmonization techniques were informed by Maelstrom, a McGill University-based group at the forefront of innovative methodological approaches to harmonization. Maelstrom published the first harmonization guidelines and pioneered tools to facilitate documentation, harmonization, and integration20. Additionally, we iteratively learned from the data harmonization methods used for the Trans-Omics for Precision Medicine (TOPMed) project21, an NHLBI-funded effort to couple whole-genome sequencing (WGS) and other Omics data (e.g. DNA methylation signature, RNA expression, and metabolite profiles) with molecular, behavioral, imaging, environmental, and clinical data. We also leveraged some of the techniques applied in the Lifetime Risk Pooling Project (LRPP)22, which combines 20 US community cohorts in a life course study, and the International Collaboration for a Life Course Approach to Women’s Reproductive Health and Chronic Disease Events (InterLACE)23, which harmonized 20 cohorts across ten countries. Figure 3 provides a simplified schematic framework of the current data harmonization process.

Figure 3

Data harmonization plan Cross-Cohort Collaboration-Tobacco dataset

https://www.tobaccoinduceddiseases.org/f/fulltexts/166517/TID-21-89-g003_min.jpg

Statistical analysis

The association between smoking and CVD will be analyzed using survival analysis (COX proportional hazard model). In terms of studying the association of tobacco use transitions and CVD outcomes, our team has pioneered an approach that divides each participant’s experience into ‘person-trials’ reflecting tobacco use exposures accruing between each study visit. We have used this approach in one of our peer-reviewed publications16. This technique uses a variation of latent class mixed models (LCMM).

Preliminary results

The CCC-Tobacco includes approximately 322000 participants from 23 predominantly NHLBI-funded prospective cohort studies. The baseline characteristics of the study participants are presented in Table 2. The mean age ± SD at baseline examination for the combined cohort is 59.7 ± 11.8 years and about three-quarters of the participants are women (76%). CARDIA and FHS Offspring studies have relatively younger participants with mean age of 29.9 ± 3.6 and 36.8 ± 9.9 years, respectively, and the oldest is CHS with a mean age of approximately 72 years at baseline. The overall population is predominantly White (73.1%); the rest of the cohort is 15.6% African American, 6.4% Hispanic/Latino, 1.8% Asian, and 2.8% are American Indian or Alaskan Native participants. Almost all the participants enrolled in the FHS are White, and MESA is a racially and ethnically diverse group with 38.5% White, 27.8 % African American, 12% Chinese American, and 22% Hispanic/Latino participants. About one-fifth of the entire cohort (22.8%) completed high school education while 64.9% have at least some college education, with considerable variation across the cohorts.

Table 2

Baseline characteristics across the thirteen traditional cardiovascular cohorts in the Cross-Cohort Collaboration-Tobacco dataset (Part 1)

CharacteristicsARICCARDIACHSDHSFHS originalFHS offFHS genHCHS-SOLJHSMESAMRIFTREGARDSSHS
Sample size15776 (4.89)4341 (1.34)5882 (1.61)2415 (0.75)3885 (1.20)4838 (1.50)4061 (1.26)16322 (5.06)5218 (1.62)6792 (2.10)12866 (3.99)30067 (9.31)3389 (1.05)
Age (years)54.2 ± 5.7629.9 ± 3.672.7 ± 5.643.4 ± 10.555.5 ± 8.436.8 ± 9.8940.1 ± 8.845.8 ± 13.954.8 ± 12.862.1 ± 10.246.2 ± 5.964.8 ± 9.4256.5 ± 8.16
Female8710 (55.1)2393 (54.9)3390 (57.6)1388 (57.4)2296 (51.6)2509 (51.6)2168 (53.3)9790 (59.9)3367 (63.4)3601 (52.5)0 (00.0)16567 (55.1)1982 (58.4)
Race/ethnicity
White11478 (72.7)2224 (51.3)4922 (84.0)777 (32.1)3885 (100)4838 (100)4061 (100)002615 (38.5)11559 (89.8)17614 (58.5)0
African American4258 (27.0)2117 (48.7)921 (15.7)1248 (51.6)00005128 (100)1879 (27.8)931 (7.2)12453 (41.4)0
Asian34 (0.2)04 (0.07)000000802 (11.8)000
Hispanic/Latino000344 (14.2)00016322 (100)01496 (21.9)000
American Indian or Alaskan14 (0.1)015 (0.3)0000000003389 (100)
Other00046 (1.9)000000376 (2.9)00
Education
High school3767 (23.9)197 (4.5)1730 (29.5)384 (15.9)1620 (41.0)304 (8.9)22 (0.65)6189 (38.0)973 (18.4)1225 (18.0)2083 (16.2)3772 (12.5)1438 (42.4)
High school completed6412 (40.7)2160 (49.8)1583 (27.0)717 (29.7)1204 (30.5)1719 (50.3)465 (13.7)4169 (25.6)1065 (20.1)1236 (18.2)2685 (20.9)7775 (25.8)955 (28.2)
College degree5586 (35.4)1983 (45.7)2552 (43.5)1313 (54.3)1123 (28.4)1396 (40.8)2897 (85.6)5927 (36.4)3248 (61.4)4330 (63.8)8035 (62.7)18496 (61.5)993 (29.3)
Alcohol use8768 (55.8)3606 (83.3)2924 (49.9)1707 (70.8)2892 (71.4)4166 (86.6)3367 (82.9)7733 (47.4)2419 (45.8)3749 (68.4)11897(92.5)10999 (37.3)0 (0)
Health status
BMI (kg/m2)27.7 ± 5.3726.1 ± 5.926.6 ± 4.79.5 ± 7.025.8 ± 4.125.2 ± 4.3126.9 ± 5.629.7 ± 6.054.8 ± 12.828.3 ± 5.4727.7 ± 3.429.3 ± 6.230.4 ± 6.0
Hypertension5506 (34.9)195 (4.5)3886 (66.1)810 (33.6)1877 (48.4)951 (19.7)673 (16.6)4446 (27.2)3078 (59.1)3285 (48.4)11788 (91.6)17782 (59.2)1270 (37.6)
Systolic BP (mmHg)121 ± 18108 ± 11137 ± 22124 ± 17138 ± 23122 ± 16117 ± 14122 ± 18127 ± 17126 ± 21148 ± 15128 ± 17126 ± 19
Diastolic BP (mmHg)74 ± 1169 ± 1071 ± 1279 ± 1085 ± 1179 ± 1175 ± 1073 ± 1176 ± 972 ± 1099 ± 776 ± 1076 ± 10
BP medication4004 (29.9)70 (1.6)2787 (47.5)498 (20.6)377 (9.3)160 (3.3)343 (8.5)2645 (16.5)2454 (52.4)2536 (37.2)2488 (19.4)15490 (53.6)376 (18.4)
Diabetes1561 (9.9)83 (1.92)925 (16.0)220 (9.2)171 (4.2)91 (1.9)123 (3.0)3235 (20.1)1242 (23.7)859 (12.6)711 (5.6)6378 (22.0)1387 (41.2)
Dyslipidemia8986 (57.8)1379 (32.5)2955 (50.7)1086 (50.5)2355 (60.1)2221 (46.5)1557 (38.3)9740 (60.2)2828 (57.6)3787 (55.7)10391 (80.8)19105 (65.4)2118 (63.5)
HPL5771 (37.1)411 (9.7)1947 (33.4)378 (17.6)2346 (59.8)1135 (23.8)671 (16.5)4492 (27.8)1124 (23.3)1559 (22.9)8970 (69.7)6802 (23.60882 (26.5)
HTG4269 (27.4)330 (7.8)1816 (31.2)496 (23.1)652 (38.8)684 (14.3)846 (20.8)5070 (31.4)776 (16.1)1988 (29.3)6903 (53.67)8004 (27.8)1112 (33.4)
HPL medication448 (2.9)11 (0.2)132 (2.2)151 (6.2)46 (1.1)27 (0.6)273 (6.7)1981 (12.4)721 (13.7)1100 (16.1)159 (1.2)9977 (33.5)12 (0.59)
LDL-C (mg/dL)137.6 ± 39.3108.5 ± 32.0129.8 ± 35.6106.8 ± 34.9-128.6 ± 37.2111.7 ± 31.4122.6 ± 36.6126.6 ± 36.6117.2 ± 31.4160.0 ± 36.0113.9 ± 34.8110.3 ± 31.9
Total cholesterol (mg/dL)214.9 ± 42.0178.1 ± 34.3211.2 ± 39.2181.0 ± 38.4252.9 ± 48.3200.2 ± 40.0188.8 ± 35.5199.2 ± 44.1199.3 ± 40.1194.1 ± 35.7240.4 ± 36.8192.0 ± 40.1195.4 ± 39.6
HDL-C (mg/dL)214.9 ± 42.053.3 ± 14.154.1 ± 15.750.4 ± 14.7-51.8 ± 16.254.3 ± 16.149.2 ± 13.051.8 ± 14.650.9 ± 14.842.0 ± 11.751.7 ± 16.146.1 ± 13.8
Triglycerides (mg/dL)131.8 (79–157)80.8 (46–94)139.6 (86–161)122.6 (67–145)152 (98–178)100 (56–120)116 (65–138)139.7 (80–166)106.4 (65–126)132 (78–161)194 (113–228)132 (81–158)150 (82–172)

[i] *Data are presented as frequency and percentage n (%), mean ± standard deviation, or mean (range). ARIC: Atherosclerosis Risk in Communities Study. CARDIA: Coronary Artery Risk Development in Young Adults Study. CHS: Cardiovascular Health Study. DHS: Dallas Heart Study. FHS: Framingham Heart Study. HCHS/SOL: Hispanic Community Health Study/Study of Latinos. JHS Jackson Heart Study. MESA: Multi-Ethnic Study of Atherosclerosis. MRFIT: Multiple Risk Factor Intervention Trial. REGARDS: the Reasons for Geographic and Racial Differences in Stroke Study. SHS: the Strong Heart Study. BMI: body mass index. BP: blood pressure. HPL: hyperlipidemia. HTG: hypertriglyceridemia. LDL-C: low density lipoprotein cholesterol. HDL-C: high density lipoprotein cholesterol.

Table 2

Baseline characteristics across the ten non-cardiovascular cohorts in the Cross-Cohort Collaboration-Tobacco dataset (Part 2)

BLSACRICELSAGOLDENHealth ABCMROSRBSSOFSWANWHITotal
Sample size1788 (0.55)4917 (1.52)15104 (4.68)958 (0.30)3070 (0.95)5993 (1.86)2475 (0.77)9673 (3.00)3270 (1.01)159682 (49.47)322782 (100)
Age (years)65.7 ± 15.159.1 ± 10.752.0 ± 9.048.2 ± 16.473.6 ± 2.8773.6 ± 5.8770.1 ± 11.071.6 ± 5.2245.8 ± 2.663.2 ± 7.259.7 ± 11.8
Female924 (51.6)2121 (43.1)82185 (4.41)506 (52.8)1582 (51.5)0 (0.00)1382 (55.8)9673 (100)3270 (100.0)159682 (100)245405 (76.0)
Race/ethnicity
White1268 (74.2)2010 (42.4)-942 (98.3)1792 (58.3)5384 (89.8)2448 (100)9640 (100)1546 (47.2)135962 (85.1)224957 (73.1)
African American405 (23.7)2130 (44.9)-01278 (41.6)244 (4.0)00914 (27.9)14019 (8.8)48015 (15.6)
Asian30 (1.7)0-2 (0.2)0192 (3.2)00529 (16.1)4002 (2.5)5595 (1.8)
Hispanic/Latino0601 (12.6)-10 (1.0)098 (1.6)00281 (8.5)525 (0.3)19677 (6.4)
American Indian or Alaskan5 (0.2)0-0068 (1.1)0005174 (3.2)8665 (2.8)
Other00-4 (0.4)07 (0.1)0000433 (0.1)
Education
High school17 (0.9)1016 (20.6)1921 (12.7)-774 (25.2)393 (6.5)149 (6.1)2211 (22.9)233 (7.1)8463 (5.3)38774 (12.1)
High school completed148 (8.3)916 (18.6)5233 (34.6)-997 (32.5)1036 (17.2)613 (25.2)3797 (39.3)573 (17.6)27291 (17.2)72670 (22.8)
College degree1612 (90.7)2983 (60.6)7950 (52.6)-1292 (42.1)4564 (76.1)1671 (68.6)3636 (37.7)2433 (75.1)122751 (77.4)206665 (64.9)
Alcohol use1472 (82.5)3090 (62.8)7244 (47.9)479 (50.0)3067 (100)3865 (64.6)2188 (90.5)6759 (69.9)1335 (47.4)-93518 (59.5)
Health status
BMI (kg/m2)27.0 ± 4.932.2 ± 7.6-28.2 ± 5.727.3 ± 4.827.3 ± 3.824.8 ± 3.626.4 ± 4.428.2 ± 7.227.9 ± 5.928.1 ± 5.8
Hypertension692 (39.5)4275 (86.9)5584 (37.0)241 (25.2)2112 (68.8)4205 (70.9)1471 (59.6)6052 (62.59)780 (23.9)53578 (33.8)134537 (41.9)
Systolic BP (mmHg)118 ± 16128 ± 22124 ± 17115 ± 17136 ± 21139 ± 19139 ± 22142 ± 19118 ± 17127 ± 18128 ± 19
Diastolic BP (mmHg)66 ± 971 ± 1374 ± 1068 ± 971 ± 12-76 ± 976 ± 975 ± 1075 ± 976 ± 11
BP medication632 (35.6)4188 (97.0)4411 (29.20198 (20.6)0 (0.0)3053 (50.9)776 (37.6)2644 (30.3)463 (14.2)19459 (12.2)70268 (22.4)
Diabetes268 (15.0)2464 (50.5)3009 (19.9)73 (7.6)1012 (32.9)881 (15.7)257 (10.4)681 (7.0)126 (4.1)9442 (5.9)4130 (9.2)
Dyslipidemia838 (50.7)3920 (85.1)8157 (54.0)551 (57.6)1385 (45.6)3309 (58.5)1072 (43.6)551 (73.1)1406 (43.3)15831 (9.9)23113 (52.2)
HPL232 (14.8)1086 (27.7)5233 (34.7)236 (24.7)847 (27.9)1475 (26.7)909 (37.1)450 (59.7)573 (17.6)-13017 (29.5)
HTG226 (14.4)1479 (37.7)4635 (30.7)302 (31.60918 (30.2)2001 (36.2)562 (22.9)368 (48.8)572 (18.6)-43951 (29.8)
HPL medication528 (58.8)3006 (61.6)1978 (13.1)144 (15.1)437 (14.3)1540 (25.7)16 (0.8)-34 (1.1)15831 (9.9)35531 (12.5)
LDL-C (mg/dL)109.6 ± 32.2102.7 ± 35.5-121.1 ± 30.9121.5 ± 34.6114.1 ± 30.9134.5 ± 36.8152.0 ± 36.1116.0 ± 30.9-124.3 ± 38.3
Total cholesterol (mg/dL)189.8 ± 36.4183.7 ± 45.5-189.7 ± 38.6202.7 ± 38.5193.2 ± 34.2219.3 ± 40.4239.1 ± 40.1194.5 ± 34.8-202.2 ± 46.9
HDL-C (mg/dL)59.6 ± 17.047.5 ± 15.4-46.9 ± 13.154.0 ± 17.048.9 ± 14.661.7 ± 18.753.1 ± 14.855.9 ± 14.5-50.6 ± 15.6
Triglycerides (mg/dL)102 (66–122)157 (89–186)-136 (73–171)138 (88–163)151 (91–179)119 (69–145)172 (106–207)113 (67–131)-137 (79–163)

[i] *Data are presented as frequency and percentage n (%), mean ± standard deviation, or mean (range). BLSA: Baltimore Longitudinal Study of Aging. CRIC: Chronic Renal Insufficiency Cohort. ELSA-Brasil: the Brazilian Longitudinal Study of Adult Health GOLDN: Genetics of Lipid Lowering Drugs and Diet Network. Health ABC: the Health Aging and Body Composition Study. MrOS: the Osteoporotic Fractures in Men Study. RBS: Rancho Bernardo Study of Healthy Aging. SOF: Study of Osteoporotic Fractures. SWAN: the Study of Women’s Health Across the Nation. WHI: Women's Health Initiative. BMI: body mass index. BP: blood pressure. HPL: hyperlipidemia. HTG: hypertriglyceridemia. LDL-C: low density lipoprotein cholesterol; HDL-C: high density lipoprotein cholesterol.

With respect to comorbidities, 29.5% reported having a history of hyperlipidemia and 9.2% diabetes mellitus. Mean SBP and DBP are 127 ± 19 and 75 ± 11 mmHg in the overall population. Self-reported use of blood pressure medication and lipid-lowering medication are 22.4% and 12.5%, respectively.

Smoking status of participants in each of the 23 cohorts is categorized into never, former, and current, for both combustible cigarettes and non-cigarette tobacco products including cigar, pipe, smokeless tobacco, and e-cigarette (Table 3). Overall, 46330 (14.3%) participants reported current use of combustible cigarettes and 117424 (36.4%) reported former use. The prevalence of current cigarette smoking is highest in MRFIT (63.6%) and lowest in MrOS (3.4%). Baseline characteristics of the participants based on their combustible cigarette smoking status are shown in Table 4. The mean age of individuals who reported current smoking is 53.4 ± 12.2 years compared to 59.8 ± 12.4 years for those who never smoked, or 62.1 ± 9.9 years who formerly smoked. The proportion of women is highest for individuals who never smoked (82.6%), followed by those who formerly smoked (75.0%), and those who currently smoke (55.9%). The prevalence of alcohol use is higher among participants who currently smoke compared to never smoked (74.2% vs 50.4%). Similarly, the prevalence of hypertension (44.1% vs 39.9%) and hyperlipidemia (39.4% vs 27.4%) is higher in participants who currently smoke compared to never smoked. The prevalence of diabetes is comparable, approximately 11% in both groups. Furthermore, more detail on smoking status based on race and ethnicity has been provided in Supplementary file Table 2.

Table 3

Distribution (%) of traditional and non-traditional tobacco products across all the cohorts in the Cross-Cohort Collaboration-Tobacco dataset

Participating cohortsTraditional cigarette status
Cigar
Pipe
Smokeless tobacco
E-Cigarette*
F/U visits
NeverFormerCurrentNeverFormerCurrentNeverFormerCurrentNeverFormerCurrentNeverFormerCurrent
Cardiovascular specific cohorts
ARIC41.632.126.293.44.81.890.08.21.791.25.33.37
CARDIA57.214.128.696.03.40.697.81.90.296.72.30.892.93.93.27
CHS46.541.512.010
DHS56.417.226.393.82.63.597.12.40.396.32.11.52
FHS original44.09.846.18
FHS offspring34.919.845.294.90.74.495.60.73.69
FHS 3rd generation57.127.315.598.40.51.199.20.7094.63.91.52
HCHS-SOL60.719.819.491.96.91.22
JHS85.51.213.397.41.51.198.61.00.397.01.21.62
MESA50.336.613.190.67.41.991.67.70.698.11.30.499.30.40.26
SHS29.033.037.96
Non cardiovascular specific cohorts
BLSA605.634.190.97.21.988.111.50.37
CRIC38.646.914.578.019.03.084.112.83.018
ELSA-Brasil56.930.013.14
GOLDN70.721.97.41
Health ABC54.834.810.410
MRFIT14.421.863.610
MrOS37.559.03.410
RBS45.232.322.412
REGARDS45.240.114.688.49.42.198.022
SOF60.429.6107
SWANN/AN/AN/A15
WHI51.042.06.99
Total estimated prevalence (N)1600001170004700043000240010004000020005005800040001400370001000600

ARIC: Atherosclerosis Risk in Communities Study. CARDIA: Coronary Artery Risk Development in Young Adults Study. CHS: Cardiovascular Health Study. DHS: Dallas Heart Study. FHS: Framingham Heart Study. HCHS/SOL: Hispanic Community Health Study/Study of Latinos. JHS Jackson Heart Study. MESA: Multi-Ethnic Study of Atherosclerosis. MRFIT: Multiple Risk Factor Intervention Trial. REGARDS: the Reasons for Geographic and Racial Differences in Stroke Study. SHS: the Strong Heart Study. BLSA: Baltimore Longitudinal Study of Aging. CRIC: Chronic Renal Insufficiency Cohort. ELSA-Brasil: the Brazilian Longitudinal Study of Adult Health GOLDN: Genetics of Lipid Lowering Drugs and Diet Network. Health ABC: the Health Aging and Body Composition Study. MrOS: the Osteoporotic Fractures in Men Study. RBS: Rancho Bernardo Study of Healthy Aging. SOF: Study of Osteoporotic Fractures. SWAN: the Study of Women’s Health Across the Nation. WHI: Women's Health Initiative.

* E-cigarette measures are only in follow-up visits of the respective cohorts.

Table 4

Baseline characteristics across combustible cigarette smoking status in the Cross-Cohort Collaboration-Tobacco dataset

CharacteristicsNever smokerFormer smokerCurrent smokerTotal
Sample size159028 (49.3)117424 (36.4)46330 (14.3)322782 (100)
Age (years)59.8 ± 12.462.1 ± 9.953.4 ± 12.259.7 ± 11.8
Female131395 (82.6)88108 (75.0)25902 (55.9)245405 (100)
Race/ethnicity
White105907 (70.5)89552 (79.4)29498 (66.5)29498 (66.6)
African American24576 (16.4)14548 (12.8)8981 (20.3)48015 (15.62)
Asian4080 (2.7)1274 (1.1)241 (0.5)5595 (1.80)
Hispanic/Latino11743 (7.82)4348 (3.85)3586 (8.09)19677 (6.40)
American Indian or Alaskan3824 (2.5)3026 (2.7)1815 (4.1)8665 (2.8)
Other105 (0.07)138 (0.1)190 (0.4)433 (0.1)
Education
High school18214 (11.6)11781 (10.1)8779 (19.4)38774 (12.2)
High school completed36059 (23.0)23702 (20.0)12909 (28.6)72670 (22.8)
College degree102490 (65.4)80717 (69.50)23458 (51.9)206665 (64.97)
Alcohol use37867 (50.40)30788 (63.4)24863 (74.2)93518 (59.5)
Health status
BMI (kg/m2)28.2 ± 5.928.3 ± 5.827.1 ± 5.428.1 ± 5.8
Hypertension63281 (39.9)50857 (43.5)20399 (44.1)134537 (41.80)
Systolic BP (mmHg)127.4 ± 18.9128.3 ± 18.6127.9 ± 20.2127.8 ± 19.0
Diastolic BP (mmHg)75 ± 1076 ± 1178 ± 1376 ± 11
BP medication34274 (22.1)27026 (23.6)8968 (20.2)70268 (22.4)
Diabetes16734 (10.6)134 (11.5)4975 (10.9)35159 (10.9)
Dyslipidemia46987 (59.9)35906 (66.7)22472 (64.9)105365 (63.2)
HPL19247 (27.4)15014 (32.6)13147 (39.4)47402 (31.7)
HTG17590 (25.4)15114 (33.0)11247 (34.9)43951 (29.9)
HPL medication17823 (11.7)16679 (14.67)4029 (9.1)38531 (12.4)
LDL-C (mg/dL)121.8 ± 36.7123.1 ± 38.2131.0 ± 40.9124.3 ± 38.3
Total cholesterol (mg/dL)200.8 ± 42.9202.3 ± 43.1210.2 ± 46.6206.2 ± 46.9
HDL-C (mg/dL)52.4 ± 15.350.2 ± 15.747.6 ± 15.550.6 ± 15.6
Triglycerides (mg/dL)126 (74–151)143 (83–171)149 (84–178)137 (79–163)

[i] *Data are presented as frequency and percentage n (%), mean ± standard deviation, or mean (range). BMI: body mass index. BP: blood pressure. HPL: hyperlipidemia. HTG: hypertriglyceridemia. LDL-C: low density lipoprotein cholesterol. HDL-C: high density lipoprotein cholesterol.

For the non-cigarette tobacco products, the prevalence of current use of cigar, pipe, and smokeless tobacco, in the overall population is 2.1% (991), 1.2% (523), and 2.2% (1375), respectively. Data on e-cigarette use is available for FHS 3rd generation, MESA, CARDIA, REGARDS, and HCHS/SOL with 191, 31, 219, 331, and 932 users (current and former), respectively. Table 5 shows the prevalence of non-cigarette tobacco product use status stratified by cigarette smoking status. The prevalence of cigars, pipes, and smokeless tobacco use is 2.5%, 1.2%, and 2.0%, respectively, among participants who currently smoke combustible cigarettes. The prevalence among participants who formerly smoked cigarettes is 4.5% for cigar, 4.5% for pipe, and 7.3% for smokeless tobacco use. Among individuals who had never smoked cigarettes, the prevalence of each of the non-cigarette tobacco products is <2%.

Table 5

Non-traditional tobacco use status across combustible cigarette smoking status in the Cross-Cohort Collaboration-Tobacco dataset

Non-traditional tobacco useCombustible cigarette smoking status
Never smoker n (%)Former smoker n (%)Current smoker n (%)Total n (%)
Cigar
Never22175 (96.2)10797 (85.8)10665 (92.6)43637 (92.7)
Former469 (2.0)1455 (11.6)521 (4.5)2445 (5.2)
Current389 (1.7)319 (2.5)283 (2.5)991 (2.1)
Pipe
Never20607 (97.2)9341 (86.6)10170 (93.3)40118 (93.6)
Former398 (1.9)1238 (11.5)596 (4.5)2232 (5.2)
Current190 (0.9)202 (1.9)131 (1.2)523 (1.2)
Smokeless
Never28526 (94.8)18560 (86.9)10992 (90.7)58078 (91.4)
Former978 (3.2)2217 (10.3)881 (7.3)4076 (6.4)
Current557 (1.8)576 (2.7)242 (2.0)1375 (2.2)

Inflammatory markers, a priority area for CCC-Tobacco, were evaluated at baseline and during follow-up. The number of measurements of each inflammatory marker is given in Supplementary file Table 3.

DISCUSSION

The CCC is a research initiative that involves pooling data from several existing prospective cohort studies in the US and Brazil to create a large and diverse dataset capable of leveraging the power in addressing questions that would be unanswerable or otherwise underpowered using a single cohort. The CCC’s core focus is on harmonizing data collected from the various studies to ensure consistency and reliability of the findings. The CCC-Tobacco dataset will enable the examination of the association of traditional and non-traditional tobacco product use with subclinical and clinical CVD in adults, with a particular focus on understudied minority groups. Moreover, because of the large sample size, the cohort will make possible for the first time to study the differential impact of smoking as well as the health effects of non-traditional tobacco products in different population subgroups.

The CCC-Tobacco is significant for several reasons. Despite the rise in usage of non-traditional tobacco products such as cigars, pipes, e-cigarettes, and smokeless tobacco, well-powered studies on their long-term impact on cardiovascular health in a wellcharacterized population are limited. Furthermore, to the best of our knowledge, no prior study has systematically explored the relationship between cigars, pipes, and smokeless tobacco and multiple domains of subclinical markers of CVD, as well as the extent to which cardiovascular outcomes are caused by these non-cigarette tobacco products and mediated by these subclinical markers, and how they may vary among different subgroups. The CCC-Tobacco data will enable us to identify new biomarkers of cardiovascular harm associated with combustible cigarette use and the extent to which these biomarkers mediate cardiovascular risk. Additionally, using the CCC-Tobacco dataset, which has extensive data on non-cigarette tobacco products, we will be able to link the use of non-cigarette tobacco products to already established markers of cardiovascular harm including markers of subclinical inflammation (high sensitivity C-reactive protein and interleukin-6) and novel markers such as CAC24-29.

The FDA considers the study of the health effects of alternative tobacco products using longitudinal data as a top research priority30. Our work will help elucidate the health effects of these non-cigarette tobacco products with respect to the hypothesized risk continuum31-33. Therefore, our work with the CCC-Tobacco could prove vital to the regulatory authority of the FDA and other policy initiatives and recommendation regarding non-cigarette tobacco products in a way that is deemed appropriate for the protection of public health. The importance of addressing CVD as a major contributor to morbidity and mortality is paramount to improving public health. The approach and descriptive findings presented here demonstrate the unique strength of the CCC-Tobacco to provide crucial information that can inform public health strategies and policies regarding non-cigarette tobacco product regulation.

Challenges and limitations

While this article seeks to provide insight into the logistical process of data acquisition and harmonization in addition to an insight into the characteristics of the pooled dataset, we also discuss challenges in our work. Challenges encountered during the early phases include those associated with establishing contact with study personnel and keeping study collaborators engaged. Additionally, the lack of an existing streamlined process for data transfer and completed mandatory data-use contracts led to a largely unpredictable workflow resulting in delays. On a few occasions, following the approval process, datasets were delivered to the processing site in inaccessible formats. Additionally, the CCC-Tobacco database has some limitations. First, the observational study design leads to the potential for residual confounding and limitations in the ability to establish causal relations. Secondly, age distributions limit the ability to generalize to children and young adults smoking patterns and associations. Third, despite the large sample size, the number of individuals using non-traditional tobacco products was still quite modest. Lastly, the studies did not routinely collect data on individuals who were sexual or gender minorities or who used a variety of illicit drugs.

Future perspectives

We envision that the experience and challenges reported in establishing the CCC-Tobacco will serve as a learning opportunity for other cross-cohort work and provide a potential framework for additional future cross collaboration and data sharing between NHLBI studies. Our dataset will potentially serve as an epidemiological resource for the tobacco research community at large. Our dataset will serve as a rich epidemiological resource for other working groups in the CCC and the research community at large. Our approach also provides considerable room for expansion of the current dataset. CCC-Tobacco can be easily expanded to include other risk factors and cohorts, including advanced biomarkers and Omics measures, and results can be compared with other consortia like the Emerging Risk Factors Collaboration34,35.

Furthermore, we plan to continue to harmonize all tobacco use at each additional visit beyond the baseline study visit of each cohort in order to provide unprecedented longitudinal tobacco use data to expand our analysis into the study of tobacco use transitions (product switching and changes in use intensity) and their relative association with subclinical and clinical CVD. Lastly, cohorts (MESA, FHS 3rd generation, CARDIA, REGARDS, and HCHS-SOL) starting to collect data on new tobacco products such as e-cigarette at follow-up, will expand our knowledge regarding the health effects of these products.

CONCLUSIONS

The CCC-Tobacco dataset, with its large sample size, long-term follow up, diverse study population, and encompassing multiple subclinical features and clinical CVD events, aims to expand our knowledge regarding traditional and non-traditional tobacco products and their association with subclinical and clinical CVD. We aim to identify novel biomarkers of cardiovascular harm associated with combustible cigarette and non-cigarette tobacco product use36. The large sample size of women and other underrepresented groups allows for research in these historically understudied groups. Future iterations of this project, by providing data on long-term tobacco use and tobacco produce use transitions, could provide important information on how changes in tobacco use patterns influence markers of subclinical cardiovascular injury and CVD risk. The findings from the CCC-Tobacco will therefore provide the FDA with new and pertinent knowledge that would inform regulation of non-cigarette tobacco products. Ultimately, our aim is to obtain new information regarding the cardiovascular impact of non-traditional tobacco products and to deliver actionable results to the tobacco regulatory science community.