Weighting health-related estimates in the GCAT cohort and the general population of Catalonia

Weighting health-related estimates in the GCAT cohort and the general population of Catalonia

For this analysis, 69.3% of the GCAT participants met the inclusion criteria, including 13,434 individuals, who were then compared with age matching SIDIAP, ESCA, IDESCAT or INE subjects.

Representativeness assessment in the GCAT cohort

To determine the extent of the healthy volunteer bias in the GCAT cohort, we compared a set of variables available for the general population, divided into four domains.

Sociodemographic characteristics

The comparison of sociodemographic variables is detailed for all individuals in Table 1, and in Supplementary Table S3 for sex and age-stratified analysis. The analysis shows the following results:

Table 1 Comparison of sociodemographic characteristics between the general population of Catalonia and the GCAT cohort before and after applying raked weights.

Age and gender GCAT participants are younger with a higher female-to-male ratio.

Residence Cohort participants reside in more urban and less deprived areas, with a markedly overrepresentation when compared at the extreme quintile of the distribution (less deprived).

Educational attainment As expected, a higher proportion subjects have a higher education level.

Employment status Employment rates are higher among GCAT participants, as well as the proportion of retired individuals, whilst the proportion of individuals classified as unemployed, with incapacity or undertaking home duties is lower.

Civil status The proportion of single and widowed individuals is lower than the general population, whereas that of married and separated/divorced individuals is higher.

Household size Household sizes of 2 and more than 4 are less prevalent among GCAT participants.

Lifestyle habits

Smoking habit and alcohol consumption were compared. See detailed results for all individuals in Table 2, and in Supplementary Table S4 for sex and age-stratified analysis. The analysis shows the following results:

Table 2 Comparison of lifestyle habits characteristics between the general population of Catalonia and the GCAT cohort before and after applying raked weights.

Smoking habits The proportion of current smokers is lower among GCAT participants compared to the general population, while the proportion of ex-smokers is higher.

Alcohol consumption In the GCAT cohort, alcohol consumption exhibits a U-shaped distribution, with higher proportions of both high- and low-risk drinkers compared to the general population. The disparity is more pronounced when comparing GCAT to SIDIAP (medical records) than to ESCA (health survey).

Health-related factors

Five health-related variables (Table 3 and Supplementary Table S5 for sex and age-stratified analysis), 20 common chronic conditions and the 20 most common cancer types were analysed.

Table 3 Comparison of health-related factors between the general population of Catalonia and the GCAT cohort before and after applying raked weights.

Mortality Mortality rates are lower in the GCAT cohort compared with the general population across all sexes and age ranges considered (Fig. 1a).

Fig. 1
figure 1

Comparison of health-related factors between the GCAT Cohort and the general Catalan population by sex. (a) Mortality rate by sex. In the x-axis the age range and in the y-axis the mortality rate during the follow-up period. (b) Age at first cancer, including all malignant cancer types (ICD-10 codes C00–C99). In the x-axis the age range and in the y-axis the cancer incidence. (c) Disease prevalence. In the y-axis, the selected diseases (ICD-10 code) and in the x-axis the lifetime prevalence of each disease. (d) Cancer prevalence. In the y-axis, the different cancers (ICD-10 code) and in the x-axis the lifetime prevalence of each one. *Stands for p < 0.05, **stands for p < 0.01, ***stands for p < 0.001.

BMI The GCAT cohort has a higher proportion of individuals who are overweight or obese compared to the general population.

Self-perceived health status The proportion of individuals in the cohort who perceive their health status as “good” is significantly higher, while that of those describing their health as “regular”, “bad”, or “very bad” is lower. However, the proportion of individuals rating their health as “very good” is similar to that of the general population.

Healthcare services use Healthcare service usage by the targeted population may affect prevalence estimates. A comparison of diagnosis-associated primary care visits reveals that younger GCAT participants, born between 1961 and 1970, have more primary care visits than the general population. In contrast, older individuals have a similar number of primary care visits to that of the general population.

Elixhauser comorbidity index GCAT cohort participants exhibit a U-shaped distribution in comorbidity scores, with lower overall scores indicating fewer comorbidities, but with more pronounced differences at the extremes (0 and 4+).

Lifetime prevalence Differences were observed in 15 out of 20 common chronic diseases when comparing the GCAT cohort to the general population; with a lower prevalence of type 2 diabetes (E11), alcohol related disorders (F10), nicotine dependence (F17), essential hypertension (I10), angina pectoris (I20), chronic ischemic heart disease (I25), atrial fibrillation and flutter (I48), heart failure (I50), cerebral infarction (I63) and COPD (J44). On the other hand, among GCAT participants, a higher prevalence of migraine (G43) and vasomotor and allergic rhinitis (J30) was observed. Some diseases exhibited significant gender bias; lower prevalence for overweight and obesity (E66) and disorders of lipid metabolism (E78) among females compared with the general population, and higher prevalence of asthma (J45) among men. In some conditions, no significant differences were observed for major depressive disorder (F32), anxiety (F41), atherosclerosis (I70), Psoriasis (L40) and osteoporosis (M81) (Fig. 1c, Supplementary Table S5).

Cancer The overall lifetime prevalence of any cancer was lower among GCAT participants, with their risk of having any type of cancer being half of that from the general population, the lower prevalence in the following was notable; secondary neoplasms (C78, C79), bronchus and lung cancer (C34), ovarian (C56), uterus (C54), female breast cancer (C50), bladder cancer (C67) in men and colon cancer (C18) in women, and the absence of any liver cancer (C22) case in men. The only exception was non-melanoma skin cancer (C44) cases which shows a higher prevalence among GCAT participants (Fig. 1d, Supplementary Table S5). Regarding cancer incidence, the rate for any cancer is lower in the GCAT cohort compared to the general population, except for older age groups (> 60 years), where no significant difference is observed (Fig. 1b, Supplementary Table S5).

Medication use

The data shows that, overall, medication usage patterns in the GCAT cohort indicates a similar or slightly higher use compared to the general population, although there are notable exceptions: the GCAT cohort uses cardiovascular medications (C codes), diabetes medications (A10), and antidiarrheals, intestinal anti-inflammatory/anti-infective agents (A07) less frequently (Fig. 2a, Supplementary Table S6). Additionally, while the mean number of prescriptions per person is generally similar or slightly lower in the GCAT cohort, there are notable sex-specific differences: GCAT women use psychoanaleptics (N06), thyroid therapy (H03), and nasal preparations (R01) more frequently, whereas GCAT men use agents acting on the renin-angiotensin system (C09), anti-inflammatory and antirheumatic products (M01), and antihistamines for systemic use (R06) more frequently compared to the general population (Fig. 2b, Supplementary Table S6).

Fig. 2
figure 2

Comparison of medication use between GCAT Cohort and general Catalan population. (a) Drug use prevalence by sex. In the y-axis, the different groups of ATC codes and in the x-axis the prevalence of use for each group of drugs. (b) Number of mean prescriptions per person. In the y-axis, the different groups of ATC codes and in the x-axis the mean of prescriptions per person. *Stands for p < 0.05, **stands for p < 0.01, ***stands for p < 0.001.

In the sensitivity analysis, we did not observe overall differences in the compared variables between GCAT included individuals and excluded individuals not linked to EHR (Supplementary Table S7).

Raked weighting

After assessing and determining the presence of a healthy volunteer bias, we computed raked weights to improve the cohort representativeness. Most of the bias key indicators belongs to the sociodemographic domain (sex, birthday, rurality, educational attainment, employment status, civil status, household size), one to the lifestyle domain (smoking), and two to the health-related domain (self-perceived health status, number of primary care visits).

After applying raked weights, the GCAT profile aligns with the general population on the variables used for weighting, such as age, gender, and education, indicating that the sample has been effectively adjusted to reflect the broader population. Beyond matching these specific variables, the raked weights also improve the estimates of other variables not directly used in the weighting process (Deprivation index, employment status, alcohol use), suggesting that the weighted GCAT profile provides more representative and generalizable results for most variables (Tables 1, 2 and 3). In the case of lifetime disease prevalence of the 20 selected diseases (Supplementary Table S8) we observed that the prevalence estimate improves, it being similar to the general population in 19 of them, however overweight and obesity weighted prevalence is overestimated using the selected weights (Fig. 3). Estimations for certain diseases, such as asthma, psoriasis and osteoporosis, remain similar, demonstrating robust values from the cohort, while previously underestimated conditions—like T2 diabetes, disorders of lipoprotein metabolism, essential hypertension, and other chronic obstructive pulmonary diseases—now have increased estimates that align more closely with the general population. In contrast, conditions related to toxic habits, such as alcohol-related disorders and nicotine dependence, show increased but still underestimated estimates compared to public data. For mental health conditions, survey weights correct initial overestimates observed in the cohort, particularly for major depression, other anxiety disorders, and migraine. Additionally, for less frequent (< 5%) cardiovascular diseases (I code), weights also improve estimates for angina pectoris, ischemic heart disease, atrial fibrillation and flutter, heart failure, cerebral infarction, and atherosclerosis.

Fig. 3
figure 3

Bar plot of the lifetime disease prevalence between GCAT and the general Catalan population, before (yellow) and after weighting (brown) in a selection of diseases. In the x-axis the disease prevalence and in the y-axis the different diseases (ICD-10 code and description).

link

Leave a Reply

Your email address will not be published. Required fields are marked *