## INTRODUCTION

Evidence linking both active tobacco use and passive second-hand smoke (SHS) exposure to lung cancer is well established, with tobacco exposure accounting for between 80–95% of lung cancers1,2. However, there is also considerable population-level evidence that additional factors, including diet, environmental and occupational exposures, and genetic variation, contribute directly to lung cancer risk, and that some of these factors may modify the effect of tobacco on lung cancer-related outcomes1. Further, there are unanswered questions related to tobacco’s role in differing patterns of lung cancer characteristics by gender, race/ethnicity, and socioeconomic status and social class3-5.

The incidence of lung cancer has been declining in the US since the early 1990s, due to a decline in tobacco use. However, lung cancer incidence remains significant, contributing 13.2% of new cancers in 2017, and representing the most common cancer after prostate cancer for men and the most common after breast cancer for women6. Strong disparities by race persist, with age-adjusted incidence rates for Black men exceeding those for White men5, while the gender gap (associated with the historically lower rates of smoking among women) continues to narrow over time7.

Additionally, with a 5-year survival rate of only 18.1%, lung cancer accounted for 25.9% of cancer deaths in 2017, making it the leading cause of cancer death6. Lung cancer survival varies by histological type (small cell, non-small cell types of squamous cell, adenocarcinoma, large cell, and other rarer types), as well as histological grade and stage at diagnosis8. Although all lung cancer types are associated with tobacco exposure, the strongest associations historically have been seen with squamous and small cell lung cancer9,10. Given that the median age of lung cancer diagnosis is 72 years, most long-term smokers in the US have spent much of their smoking history using non-filtered cigarettes, and are most at risk for cancers of the upper airways, including squamous cell. However, increases in adenocarcinoma rates have been observed recently; possible explanations include the relatively recent introduction of ‘light’ and filtered cigarettes, allowing for deeper inhalation and increased exposure in peripheral airways, as well as increased levels of the tobacco-specific nitrosamine, NNK, in current cigarette manufacturing1. Survival rates vary by histological type, with small cell lung cancer having a considerably worse prognosis than non-small cell types11.

Gender differences in lung cancer incidence have predominantly followed historical trends in active tobacco use, with rising and subsequently declining incidence in men following tobacco use patterns, and rates in women rising later, and continuing to persist, until the first declines were observed after 20006. Conversely, lung cancer rates among female non-smokers have been consistently higher than among male non-smokers, and African-American women non-smokers are at greater risk than their White counterparts1. Lung cancer rates by race differ for men and women, with Black-White differences for men greater than those seen in women12. However, racial disparities in stage-specific survival are similar across gender. Social class disparities in lung cancer incidence, even after controlling for active smoking status, are well documented but not fully understood1.

Differences in the distribution of lung cancer histological types have varied over time by gender and race as well, possibly due to differences in smoking behaviors, including differences in the proportion of cases attributable to active versus passive exposures. Higher rates of squamous cell lung carcinomas in males are observed in older people, while higher rates of adenocarcinoma in women are seen among younger cohorts. Higher proportion of squamous cell lung cancers have been reported previously for Black compared to White cases7,12,13.

One challenge for studying the role of tobacco exposure in lung cancer at the population level is that surveillance data, drawn from state-level cancer registries, do not contain complete information on individual patient smoking history, and cannot capture SHS exposures. Historically, tobacco use was prevalent among all socioeconomic groups in US society, and exposure was ubiquitous across most public settings. Given the high prevalence of tobacco use, many non-smokers were exposed in private settings as well, such as homes and automobiles. In recent decades, however, tobacco control policies have expanded clean air requirements across many public venues, and decreasing rates of tobacco use in the general population has led to widespread adoption of smoke-free homes by non-smokers and some smokers as well. The sociodemographics of tobacco use have shifted, with addiction increasingly concentrated among low income, ethnic minority, and chronically ill sections of the population. Paradoxically, as population-level rates of active use and SHS exposures have fallen overall, disparities in tobacco exposures have likely increased, with some groups, such as those living in high tobacco use communities, experiencing highly concentrated exposures, especially for example, the high portion of low-income families residing in multi-unit housing.

Other social determinants of health that may play additive or synergistic roles in lung cancer disparities include community-level exposures to unhealthy food environments, occupational and residential hazards, air pollution and exposure to fine particulate matter, especially in urban communities1. Thus, it is important to continue to investigate the impact of tobacco use and exposure at the community level, to identify which community and individual characteristics are most associated with tobacco-related diseases, including lung cancer-related outcomes.

The increasingly sophisticated linkage of community-level data from many sources to individual–level data such as cancer registry records, through the process of geocoding, or assigning a geographical reference to residential address information, allows the analysis of neighborhood and area-level influences on population health outcomes14. Although the US Census and related data remain foundational resources, data have expanded to include multiple government and commercial sources. Survey data from smaller representative samples, including the use of consumer expenditure-based geodemographics, are increasingly used to statistically model small area estimates to provide coverages for areas not surveyed. Area-level measures can serve as best estimates of unmeasured individual behaviors, but also provide insight into community-level contextual characteristics (group behaviors and norms, green space, crime rates, social disadvantage) that impact on all community residents, regardless of individual resources. For example, high volumes of tobacco use in a community not only suggest the likelihood that an individual smokes, but also describe the neighborhood’s tobacco use culture overall and SHS exposure in both public and private areas within that community15.

Recent work not only identified significant associations at the county level between small area estimates of current smoking and the incidence of all histological types of lung cancer, but also variation across gender and histological type for county-level effects of poverty and socioeconomic status13. However, given the large intra-county variation in both tobacco-related behaviors as well as social characteristics and resources, investigations at smaller geographical levels may shed light on the characteristics of neighborhoods with an excess lung cancer burden, and identify promising relationships to be studied in more resource-intensive efforts such as primary data collection with longitudinal cohorts. These methods may also serve as a reliable tool for cancer control planning, by allowing the identification and prioritization of individuals and communities at risk for lung cancer disparities. In addition, because population-level research to date has primarily focused on describing patterns of lung cancer incidence and mortality, it is important to explore spatial patterns of lung cancer disease characteristics, and identify potential pathways leading to excess mortality.

Analyses presented here are part of a larger project funded by the National Cancer Institute to investigate the utility of commercially available geodemographics, such as estimates of population-level spending behaviors, for analyzing spatial patterns in cancer. Our two research questions examine: 1) how do adverse cancer characteristics vary by individual and area-level measures, including social class? and 2) is tobacco product consumption estimated at the area level a useful tool for predicting adverse outcomes in lung cancer?

We used data on lung cancer cases reported in the State of Maryland during 2000, combined with Census block group level estimates of social class and consumer spending on tobacco products, to model patterns of lung cancer histological type, aggressive histological grade, and late-stage diagnosis. Maryland is well suited to serve as a single state example of small area disparities in lung cancer burden. It is a geographically, racially and socioeconomically diverse state, with adult tobacco use rates of 13.7% and age-adjusted lung cancer incidence rates of 55.4 cases per 100000 (compared to 15.5% and 55.8 nationally)6,16,17.

## METHODS

Having obtained Institutional Review Board (IRB) approval and negotiated a data sharing agreement, a data request to the Maryland Cancer Registry (MCR) was made for all cases of lung cancer reported in 2000. Cases not permitted to be shared for research purposes include those reported by certain care systems (i.e. the Veterans Administration) and Maryland residents with cancers identified and reported back from other states without data-sharing agreements permitting research use.

Individual-level MCR variables examined for completeness and retained for analysis included race, gender, age and residential address at diagnosis, ‘Surveillance, Epidemiology and End Results’ (SEER) stage and tumor histological grade at diagnosis, and histological type of lung cancer. Due to our interest in including examination of race-based differences geographically, and the relatively small number of non-White/non-Black cases, we removed cases with race other than White or Black, as well as those missing key covariates (age, gender, race).

Histology codes (ICD 03) were used to identify type of lung and bronchus cancer. Six categories were assigned: small cell, large cell, squamous, adenocarcinoma, other specified type, and unspecified malignant neoplasms. Cases with histology codes suggesting non-carcinoma or metastasis from another primary site were omitted.

Geocoding was used to match cases by residential address to latitude-longitude point locations. Standard geocoding processes involved iterative address cleaning and use of multiple basemaps. Addresses for cases not matched by software were manually reviewed. For cases with a legitimate Maryland residential address, a previously developed imputation algorithm was used to assign cases to a point location within their zip code, based on Census age-, race-and gender-specific population distribution patterns18. Imputation is commonly needed for cases in rural areas whose mailbox or rural route address does not identify a geocodable point location. Imputation allows for full use of data across geographical areas, and avoids biasing useable data (and therefore results) towards urban cases.

### Area-level covariate data

##### Table 2

Distribution of Block Group-Level characteristics by race, for lung cancer cases reported in 2000 to the Maryland Cancer Registry (n=3223 )

Race/Ethnicity of Cases
Block Group Characteristics
All Cases (n=3223 )
White (n=2503 )
Black (n=720 )
n%n%n%
High school graduation rate (%)
32–69547172981224935
70–79621194741914720
80–891084348883519627
90–100971308433412818
Employment (%)
52–84156538111816
85–897207107411316
90–94647204641918326
95–10022006818947630642
White collar employment (%)
0–49602194101619227
50–641057337953226236
65–74772246182515421
75–100792246802711216
Per capita income (in $1000) 3–1443811124522431 15–19648204651918325 20–2914644512274923733 30–10776324687277611 Average annual tobacco spending ($ per person)*
27–299800254902031043
300–3991000317112828940
400–49911813710684311316
500–5972427234981

* Adjusted for proportion of block group residents age 18 years and older.

Differences in distribution between White and Black cases, all significant at p<0.001, based on chi-squared test.

##### Table 3

Multi-level random effects logistic regression model individual and area-level factors associated with squamous or small cell histological type, among lung cancer cases reported in 2000 to the Maryland Cancer Registry (n=3223 )

Fixed effectsAll Cases (n=3223 )
Men (n=1758 )
Women (n=1465 )
Full Model
Final Model
Full Model
Final Model
Full Model
Final Model
OR95% CIOR95% CIOR95% CIOR95% CIOR95% CIOR95% CI
Intercept0.510.41–0.620.520.46–0.580.590.47–0.740.630.57–0.690.510.37–0.690.480.42–0.56
Individual level
Age at diagnosis1.011.00–1.011.011.00–1.011.011.00–1.021.011.00–1.021.000.99–1.01
White Race1.020.84–1.251.070.83–1.400.950.67–1.39
Male Gender1.211.04–1.401.211.04–1.40
Census block-group level
Social class0.870.79–0.950.870.80–0.940.870.77–0.970.880.79–0.980.860.73–0.990.820.72–0.93
Tobacco spending1.151.06–1.251.161.07–1.251.181.05–1.321.201.08–1.331.120.97–1.29
Random effectsEstimatepEstimatepEstimatepEstimatepEstimatepEstimatep
Block-group level variance7.28×10-9<0.0017.53×10-9<0.0013.26×10-5<0.0013.33×10-6<0.0010.4240.020.450.003
Residual ICC2.02×10-91.002.29×10-91.009.90×10-60.491.00×10-50.991.14×10-10.041.20×10-10.03

[i] Those with a reported histological type of squamous or small cell lung cancer, compared to those with all other types (large cell, adenocarcinoma, other type specified, or unspecified). Age in years, centered at the median age of 68. Census block group social class index standardized by subtracting median (272), dividing by the standard deviation (37.7). Reference value of 272 represents, as an example, a block group where 97% of persons are employed, 63% of employed persons hold white collar jobs, per capita income is $3200, and 80% of adults are high school graduates. Census block group estimated tobacco spending is average per person, weighted by proportion of adult residents in block group, in units of$10, centered at median ($380) and divided by the standard deviation ($95.30). Significance of variance of the block group-level random intercept calculated with the Wald χ2 test. Significance of the residual intra-class (within block group) correlation based on the likelihood ratio χ2 test.

##### Table 4

Multi-level random effects logistic regression model individual and area-level factors associated with aggressive histological grade, among lung cancer cases with reported grade, reported in 2000 to the Maryland Cancer Registry (n=1735 )

Fixed effectsAll Cases (n=1735 )
Men (n=966 )
Women (n=769 )
Full Model
Final Model
Full Model
Final Model
Full Model
Final Model
OR95% CIOR95% CIOR95% CIOR95% CIOR95% CIOR95% CI
Intercept2.341.72–3.192.211.96–2.513.152.00–4.952.590.96–0.992.221.45–3.392.001.71–2.33
Individual level
Age at diagnosis0.990.98–0.990.990.98–0.990.980.96–0.990.980.96–0.991.000.99–1.01
White Race0.840.61–1.140.780.49–1.280.880.56–1.39
Male Gender1.170.95–1.45
Census block-group level
Social class0.920.81–1.050.890.79–1.001.000.82–1.230.820.68–0.990.800.68–0.96
Tobacco spending1.241.09–1.421.211.07–1.361.261.02–1.541.211.02–1.441.251.04–1.511.231.04–1.49
Random effectsEstimatepEstimatepEstimatepEstimatepEstimatepEstimatep
Block-group level variance2.14×10-1<0.0012.05×10-1<0.0018.45×10-10.028.64×10-10.0032.44×10-20.0032.92×10-3<0.001
Residual ICC6.12×10-20.115.87×10-20.122.15×10-10.532.08×10-10.047.36×10-30.478.86×10-40.50

[i] Among cases with a histological grade of tumor reported, those with grades 3 or 4, compared to those with grades 1 or 2. Further explanations as in Table 3 footnote.

##### Table 5

Multi-level random effects logistic regression model individual and area-level factors associated with later stage at diagnosis, among lung cancer cases with reported stage, reported in 2000 to the Maryland Cancer Registry (n=2744 )

Fixed effectsAll Cases (n=2743 )
Men (n=1512 )
Women (n=1231 )
Full Model
Final Model
Full Model
Final Model
Full Model
Final Model
OR95% CIOR95% CIOR95% CIOR95% CIOR95% CIOR95% CI
Intercept2.571.98–3.352.131.83–2.493.652.58–5.172.912.42–3.502.311.58–3.392.051.68–2.51
Individual level
Age at diagnosis0.990.98–1.000.990.98–0.990.990.97–0.990.990.97–0.990.990.98–1.01
White Race0.840.64–1.070.750.53–1.060.960.64–1.42
Male Gender1.281.07–1.521.281.08–1.53
Aggressive histological grade1.331.10–1.591.341.11–1.611.250.97–1.611.250.97–1.601.431.08–1.901.471.10–1.95
Census block-group level
Social class0.890.79–1.010.870.77–0.971.000.86–1.160.870.73–1.040.760.64–0.91
Tobacco spending0.920.83–1.030.890.81–0.991.020.88–1.180.860.73–1.010.810.70–0.95
Tobacco × social class0.910.83–1.000.810.70–0.93
Random effectsEstimatepEstimatepEstimatepEstimatepEstimatepEstimatep
Block-group level variance0.166<0.0010.146<0.0010.1630.020.1420.0050.3730.040.341<0.001
Residual ICC0.4810.100.4240.130.4730.230.4140.260.1020.140.09400.15

[i] Stage at diagnosis of 2–7 compared to stage 1, among cases with reported stage. Histological grade of tumor reported as grades 3 or 4, compared to those with grades 1 or 2, or ungraded. Further explanations as in Table 3 footnote.

In Table 3, results of final models predicting histological types of squamous or small cell lung cancers suggest some shared predictors for both men and women, but also some differences. The combined gender final model includes significantly increased risk for these histological types with individual case characteristics of older age (OR=1.01, 95% CI: 1.00–1.01) and male gender (OR=1.21, 95% CI: 1.04–1.40), but no statistically significant association by race. In addition, cases living in block groups with higher social class are less likely to have these histological types (OR=0.87, 95% CI: 0.80–0.94), while higher levels of tobacco product spending are associated with greater likelihood of these lung cancer types (OR=1.16, 95% CI: 1.07–1.25). In gender-specific models, for male cases, there is increased likelihood of squamous or small cell lung cancer with increasing age (OR=1.01, 95% CI: 1.00–1.02) and area-level tobacco spending (OR=1.20, 95% CI: 1.08–1.33) and reduced likelihood with social class (OR=0.88, 95% CI: 0.79–0.98). For women, the final model includes only the effect of social class, which is associated with reduced likelihood of these histological types (OR=0.82, 95% CI: 0.72–0.93).

Table 4 reports models examining relationships between individual and area-level characteristics and aggressive histological grade, defined as poorly differentiated or undifferentiated histology, compared to well differentiated or moderately differentiated, among cases with histological grade reported. For both genders, the most parsimonious model includes a significant protective effect of older age (OR=0.99, 95% CI: 0.98–0.99), and area-level social class (OR=0.89, 95% CI: 0.79–1.00), as well as a significant increased risk for aggressive grade with higher block group levels of tobacco product spending (OR=1.21, 95% CI: 1.07–1.36). When examined separately for men and women, different patterns are observed. For men, older age is protective (OR=0.98, 95% CI: 0.96–0.99). No significant association is seen with area-level social class, but increased risk with area-level tobacco spending remains in the final model (OR=1.21, 95% CI: 1.02–1.44). For women, however, age is not significantly related to aggressive grade. Both the protective effect of social class (OR=0.80, 95% CI: 0.68–0.96), and increased risk with area-level tobacco spending (OR=1.23, 95% CI: 1.04–1.49), are statistically significant in the final model for women. As in the prior model, there are no significant effects of race in these models.

Table 5 presents results of models predicting later stage at diagnosis, defined as a SEER summary stage of 2 to 7 (regional to distant) compared to stage 1, localized disease, among cases with a stage reported. In all models, as anticipated, more aggressive tumor histology is positively and significantly associated with extent of disease at time of diagnosis (OR=1.34, 95% CI: 1.11–1.61). In the full model, age is protective (OR=0.99, 95% CI: 0.98– 0.99) and male gender is associated with increased risk for later stage (OR=1.28, 95% CI: 1.08–1.53). Unlike in models for aggressive grade or histological type, both area-level measures are protective for later stage diagnosis (social class OR=0.87, 95% CI: 0.77–0.97, tobacco spending OR=0.89, 95% CI: 0.81–0.99). In addition, an interaction term is statistically significant, indicating effects of higher social class and higher levels of tobacco spending are multiplicative, rather than simply additive, with the highest protective effect for cases living in block groups with higher levels of both social class and tobacco spending (OR=0.91, 95% CI: 0.83–1.00).

The final most parsimonious model for males contains only a statistically significant protective effect of age, with older cases being less likely to have later stage at diagnosis (OR=0.99, 95% CI: 0.97–0.99), as well as increased risk with aggressive histological grade (OR=1.25, 95% CI: 0.97–1.60, with a trend towards significance at p<0.10). For women, the final model does not include significant differences by age, but does include the significant risk associated with aggressive grade (OR=1.47, 95% CI: 1.10–1.95). In addition, for women, the protective effects of social class (OR=0.76, 95% CI: 0.64–0.91) and tobacco spending (OR=0.81, 95% CI: 0.70–0.95), and the statistically significant protective interaction term (OR=0.81, 95% CI: 0.70–0.93), suggest that the full model masked differences between men and women for these effects. No differences by race were seen in the models for later stage diagnosis.

Model fit diagnostics show that block group level variance was significant for all models, supporting the use of the random intercept term. Furthermore, for almost all models, the residual intra-class correlation was not statistically significant, suggesting that the single random intercept term at the block group level was adequate. Semivariograms of residuals showed no spatial dependence.

## DISCUSSION

Findings from these analyses suggest that there are significant associations between neighborhood-level resources and adverse lung cancer characteristics, and that these may partially contribute to disparities in lung cancer mortality. Furthermore, these neighborhood-level characteristics may account for some portion of the previously observed racial disparities in lung cancer.

At the individual level, younger patients are burdened by more aggressive disease and later stage diagnosis, confirming patterns seen in most cancers26-29. Older men are more likely to be diagnosed with squamous and small cell cancers, consistent with patterns associated with earlier generations of smokers’ use of non-filtered tobacco products1. Overall, patterns differ for men and women. Men had greater likelihood of being diagnosed with squamous or small cell lung cancers, and although they were no more likely to have aggressive disease histology, they were more likely to have their disease diagnosed at a more advanced stage. These analyses did not find differences between Black and White cases, at aggregate or gender-specific levels. There were no significant main effects for race in either full or final models, and no race-specific interactions with social class or tobacco use. This suggests that, for these three outcomes, area-level social resource differences may have a stronger relationship to observed disparities than individual case race.

For all three outcomes examined, social class, measured at the neighborhood level, is protective. Therefore, cases living in block groups with lower levels of social resources are more likely to experience tobacco-associated lung cancer histologies, have more aggressive grades of tumor, and be diagnosed at later stages. Because these social class effects are significant in models that adjust for tobacco consumption and individual case race, these findings suggest that there are additional exposures in low resource communities that confer excess risk for lung cancer burden, such as the combined effects of environmental hazards, unhealthy behaviors related to diet, inactivity, alcohol, health resources, and possibly also broader psychosocial stressors30.

Neighborhood–level estimates of tobacco consumption were generally associated with worse lung cancer characteristics, including tobacco-related histology types and more aggressive tumors. One exception to this pattern is the model for later stage diagnosis, where a protective effect of tobacco consumption is seen for women only. The additional significant interaction term indicates that for women, living in a high social class, high tobacco consumption neighborhood is most strongly associated with early diagnosis. One possible explanation draws on the consistent evidence that women consume more health care than men, for both prevention and treatment31,32. In 2000, women of higher social class who smoked or were exposed to smoking in their homes may have been early targets for (and adopters of) lung cancer screening, and also more likely to seek care if they experienced symptoms of lung cancer.

### Limitations and future work

Our data were limited to a single year, 2000, selected to harmonize with both Census and Consumer Expenditure Survey estimates. These older data have strengths for examining relationships based on estimates of tobacco spending, as they predate uptake of electronic nicotine delivery systems (ENDS)33, and likely reflect primarily combustible tobacco product expenditures. Similarly, these data also predate uptake of lung cancer screening at the population level34, which will continue to further modify patterns of stage-related disparities. The findings and relationships identified here can serve as a benchmark, with more recent and multi-year data used to compare time trends.

We had sufficient cases to identify sub-group differences by gender; however, there were insufficient numbers, across all block groups in the state, to model each race and gender pattern separately. Expanding the analyses by both number of years and states included could allow for fuller exploration of gender patterns by race. Despite sparse data, we chose to use a random effects model, modeling a single random intercept to allow block group rates to vary35. However, more data with denser distributions of cases within block groups could be used to further examine interactions between individual and community predictors, or add county-level effects to the models.

These relationships between disease characteristics and neighborhood level social class and tobacco product consumption are ecological. Although we can speculate that area-level consumption is a marker of both active and passive exposures, we would need individual case smoking behaviors to separate these two effects. The data are also cross-sectional; that is, they describe the community where cases resided at the time of their cancer diagnosis. This likely makes our observed relationships more conservative than if we had information on tobacco consumption behaviors during prior, more etiologically relevant, time periods for lung cancer development. Increasingly, data vendors are developing both historical coverages for area-level data as well as validated methodologies for establishing residential histories efficiently for large populations, which opens opportunities for investigating longitudinal effects36,37.

## CONCLUSIONS

Overall, we found that area-level estimates of both social resources and tobacco consumption were useful metrics for identifying communities with excess risk for adverse lung cancer characteristics. The relationships seen between disparities in lung cancer disease characteristics and both social class and tobacco consumption, merit exploration in additional research. In addition, these exploratory findings demonstrate the need to continue to monitor the community-level social determinants of health, in addition to individual behaviors such as tobacco use, for lung cancer control planning as well as clinical care.