The Influence of Trinucleotide Repeats in the Androgen Receptor Gene and Testosterone Level on Circulating Proteins in Male Participants: Proteomics Analysis Using the UK Biobank Data
Takayoshi Sasako, Yann Ilboudo, Yiheng Chen, Kevin Y.H. Liang, Satoshi Yoshiji, J. Brent Richards

Abstract
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHormonal and reproductive studies
Testosterone and its metabolites, or androgens, play pivotal roles in various tissues with a ligand-activated nuclear receptor, the androgen receptor (AR), as the key transducer protein ^(1)^. The human AR gene is located on chromosome X, with a CAG trinucleotide repeat encoding a polyglutamine chain and a GGC trinucleotide repeat encoding a polyglycine chain found in the first exon. A long CAG repeat and possibly a long GGC repeat are considered to suppress the AR activity as a transcription factor via conformational alteration ^(1), (2)^, and it was also suggested that they could affect the AR protein level via suppressed gene expression or protein instability ^(3), (4)^. However, it remained to be fully elucidated how the AR-mediated androgen signaling or circulating androgens could affect circulating protein levels.
Recently, we quantified CAG and GGC trinucleotide repeat lengths in the AR gene in nearly 200,000 European-ancestry male participants in the UK Biobank ^(5)^, in which individuals aged 40 to 69 years were enrolled between 2006 and 2010 in the United Kingdom ^(6)^. We revealed that longer CAG and GGC repeat lengths induce androgen resistance and elevate circulating total testosterone level. We also showed that total testosterone is associated with various androgen-related traits and diseases, such as fat mass, glycated hemoglobin (HbA1c), and type 2 diabetes, whereas the trinucleotide repeat lengths are associated with some of them, such as bone mineral density, male-pattern baldness, and potentially prostate cancer ^(5)^.
These results prompted us to explore circulating proteins associated with CAG or GGC trinucleotide repeat length, which are expected to serve as biomarkers to link androgen resistance and androgen-related outcomes. For that purpose, we used the Pharma Proteomics Project (PPP) data, which quantified nearly 3,000 proteins in plasma samples of over 50,000 participants in the UK Biobank using the Olink proteomics assay ^(7)^.
In this study, we identified European-ancestry male participants in the UK Biobank based on sex (data field 31, same as below) and genetic ethnic grouping (22006) ^(5)^, with Olink proteomics data from batches 1-6, not from batch 0 (pilot) or 7 (COVID-19), available. Collected baseline data include age (21022), genetic principal components (PCs; 22009), UK Biobank assessment center (54), total testosterone (30850) and sex hormone-binding globulin (SHBG; 30830) both of which were quantified by chemiluminescent immunoassay ^(5)^, fasting time (74), and body mass index (BMI; 21001). Olink proteomics data at baseline were downloaded in February 2024, and those of glioma pathogenesis-related protein 1 (GLIPR1) with low data quality ^(7)^ and SHBG also measured as stated above were excluded, leaving 2,921 proteins subject to analyses. CAG and GGC trinucleotide repeat lengths in the AR gene were quantified from the whole-exome sequence (WES) CRAM files using ExpansionHunter version 5.0.0 ^(8)^, as we previously reported ^(5)^. The effects of CAG and GGC repeat lengths and total testosterone which were normalized by mean and standard deviation ^(5)^, on circulating proteins which were inverse-rank normal transformed ^(9)^, were estimated by linear regression analyses. To examine the difference in betas, the Z score was calculated and converted into p value. The Bonferroni method was used for multiple-comparison correction, and the corrected p < 0.05 was considered statistically significant. All analyses were conducted and all plots were created using R Studio version 4.4.0.
CAG and GGC trinucleotide repeat lengths in the AR gene quantified from WES data, circulating total testosterone level, proteomics data, and other covariates were available in 14,353 male participants (Table 1). It was shown that CAG repeat length was associated with none of the proteins but GGC repeat length was slightly associated with kallikrein-related peptidase 3 (KLK3; beta, −0.04; corrected P value, 2 × 10^−3^; same as below) also known as an established tumor marker, prostate-specific antigen (PSA) ^(10)^. Moreover, total testosterone was shown to be associated positively with 136 proteins, such as insulin-like 3 (INSL3; beta, 0.24; P, 3 × 10^−113^) and prokinectin 1 (PROK1; beta, 0.23; P, 1 × 10^−99^) as well as KLK3 (beta, 0.11; P, 5 × 10^−24^), and negatively with 615 proteins, such as glucagon (GCG; beta, −0.23; P, 2 × 10^−101^), leptin (LEP; beta, −0.22; P, 2 × 10^−102^), and fatty acid binding protein 4 (FABP4; beta, −0.22; P, 5 × 10^−98^), with statistical significance (Figure 1, Supplementary Table 1).
Given that testosterone affects body composition, especially fat mass ^(5)^, we also performed the linear regression analysis with BMI as an additional covariate. After adjustment for BMI, the association between total testosterone and LEP (beta, −0.08; P, 2 × 10^−25^) or FABP4 (beta, −0.10; P, 2 × 10^−30^) was diminished, whereas the other associations stated above were not affected. Indeed, the effect of adjustment for BMI was the largest in LEP, followed by FABP4, among all the proteins examined (Supplementary Table 1).
In this study, almost no circulating proteins were found to be specific to trinucleotide repeat lengths in the AR gene and serve as potential markers of androgen resistance, although GGC repeat length is slightly and negatively associated with KLK3 or PSA. We recently reported that using WES data of the UK Biobank, GGC repeat length is negatively associated with the risk of prostate cancer ^(5)^, and a previous small study reported a similar negative association despite a lack of statistical significance ^(11)^. It would be intriguing to examine whether taking GGC repeat length into account could help us detect early stages of prostate cancer or not.
LEP is an adipokine associated positively with obesity ^(7), (12)^ and negatively with androgens ^(13)^. Our study replicates the latter association and also shows that it is partially mitigated by adjustment for BMI, possibly via the negative effect of testosterone on fat mass ^(5)^. Moreover, the pro-inflammatory property of LEP is associated with diseases, and in this context, the balance with adiponectin (AdipoQ), an anti-inflammatory adipokine, and soluble leptin receptor (LEPR) to bind to and antagonize LEP are also important ^(14)^. This study shows that testosterone is not associated with AdipoQ (beta, 0.02; P, 1.0) but positively associated with LEPR with statistical significance (beta, 0.05; P, 2 × 10^−2^) (Supplementary Table 1), supporting the anti-inflammatory property of androgens ^(13)^.
Moreover, the negative association between testosterone and GCG even after adjustment for BMI shown in this study is consistent with that between total testosterone and HbA1c or type 2 diabetes ^(5)^. Androgens are known to amplify the insulinotropic action of GLP1 (glucagon-like peptide 1; not measured in the PPP) in pancreatic beta cells ^(15)^, but their roles in pancreatic alpha cells remain to be clarified, and the precise mechanisms underlying the association between testosterone and GCG should be elucidated in future studies. The positive associations between testosterone and proteins derived mainly from male reproductive tissues, such as INSL3 and PROK1, even after adjustment for BMI, could be attributed to the up-regulation of these proteins by androgens alone or in combination with insulin ^(16), (17)^.
One of the limitations of our study is that it remains to be addressed whether the identified associations are causal or not. It should be also clarified whether KLK3 level measured by Olink proteomics assay and PSA level measured in clinical practice are correlated with each other. Moreover, only cardiometabolic, inflammation, neurology, and oncology protein panels were used in the PPP ^(7)^, and other proteins were not measured.
Nevertheless, our study indicates that GGC trinucleotide repeat length in the AR gene is negatively associated with circulating KLK3 (PSA) level. However, circulating testosterone level is the major determinant of various circulating proteins in male participants, such as those involved in male reproduction (INSL3 and PROK1), body composition (LEP and FABP4), and metabolism (GCG). Further research is expected to reveal how androgen signaling could regulate circulating proteins and subsequently androgen-related outcomes.
Article Information
This article is based on the study, which received the Medical Research Encouragement Prize of The Japan Medical Association in 2023.
Conflicts of Interest
T.S. has received an endowment unrelated to this research from Eli Lilly; and personal fees unrelated to this research from Boehringer Ingelheim, Daiichi Sankyo, Eli Lilly, Kowa, Novo Nordisk, Ono, and Sumitomo. Y.C. is an employee, and J.B.R. is the founder and CEO, of 5 Prime Sciences, which provides research services for biotech, pharma, and venture capital companies for projects unrelated to this research. J.B.R. has served as an adviser to GlaxoSmithKline and Deerfield Capital. J.B.R.’s institution has received investigator-initiated grant funding from Eli Lilly, GlaxoSmithKline, and Biogen for projects unrelated to this research. The other authors have nothing to disclose.
Sources of Funding
The J.B.R. Research Group is supported by the Canadian Institutes of Health Research (CIHR: 365825, 409511, 100558, 169303), the McGill Interdisciplinary Initiative in Infection and Immunity (MI4), the Lady Davis Institute of the Jewish General Hospital, the Jewish General Hospital Foundation, the Canadian Foundation for Innovation, the National Institute for Health Foundation, Genome Québec, the Public Health Agency of Canada, McGill University, Cancer Research UK (grant No. C18281/A29019), and the Fonds de Recherche Québec Santé (FRQS). J.B.R. is supported by an FRQS Mérite Clinical Research Scholarship. Support from Calcul Québec and Compute Canada is acknowledged. TwinsUK is funded by the Welcome Trust, Medical Research Council, European Union, the National Institute for Health Research-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ National Health Service Foundation Trust in partnership with King’s College London. T.S. is supported by the Medical Research Encouragement Prize of The Japan Medical Association and the Fund for the Promotion of Joint International Research (Fostering Joint International Research; 23KK0301) by the Japan Society for the Promotion of Science (JSPS). Y.C. has been supported by an FRQS doctoral training fellowship. S.Y. has been supported by the JSPS Overseas Research Fellowship. The aforementioned funding agencies had no role in the design, implementation, or interpretation of this study.
Acknowledgement
We thank Dr. Tianyuan Lu (The University of Wisconsin-Madison) for giving us advice on statistical analysis.
Author Contributions
T.S. designed this study, acquired data, performed analyses, and wrote the manuscript. Y.I. supported data acquisition and analyses. K.Y.H.L. and Y.C. supported data acquisition. S.Y. supported data analyses. J.B.R. designed this study, and reviewed and edited the manuscript. All authors contributed to the interpretation of results, critically revised the manuscript, and approved the final version. All authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. T.S. is the guarantor of this work and, as such, has full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Approval by Institutional Review Board (IRB)
Ethics approval for the UK Biobank study was obtained from the North West Centre for Research Ethics Committee (11/NW/0382) per the Declaration of Helsinki. All participants provided informed consent at recruitment, and data of those who withdrew consent were excluded.
Data and Code Availability
The data that support the findings of this study are available from the UK Biobank but restrictions apply to the availability of these data, which were used under license for the present study and therefore are not publicly available. Data are, however, available from the authors on reasonable request and with permission from the UK Biobank Research Committee. Computational scripts used to conduct the present study are available from the corresponding authors upon reasonable request.
Supplement
Supplementary Table 1
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Davey RA, Grossmann M. Androgen receptor structure, function and biology: from bench to bedside. Clin Biochem Rev. 2016;37(1):3-15.27057074 PMC 4810760 · pubmed ↗
- 2Lundin KB, Giwercman A, Dizeyi N, et al. Functional in vitro characterisation of the androgen receptor GGN polymorphism. Mol Cell Endocrinol. 2007;264(1-2):184-7.17197074 10.1016/j.mce.2006.11.008 · doi ↗ · pubmed ↗
- 3Choong CS, Kemppainen JA, Zhou ZX, et al. Reduced androgen receptor gene expression with first exon CAG repeat expansion. Mol Endocrinol. 1996;10(12):1527-35.8961263 10.1210/mend.10.12.8961263 · doi ↗ · pubmed ↗
- 4Ding D, Xu L, Menon M, et al. Effect of GGC (glycine) repeat length polymorphism in the human androgen receptor on androgen action. Prostate. 2005;62(2):133-9.15389799 10.1002/pros.20128 · doi ↗ · pubmed ↗
- 5Sasako T, Ilboudo Y, Liang KYH, et al. The influence of trinucleotide repeats in the androgen receptor gene on androgen-related traits and diseases. J Clin Endocrinol Metab. 2024;109(12):3234-44.38701087 10.1210/clinem/dgae 302PMC 11570371 · doi ↗ · pubmed ↗
- 6Bycroft C, Freeman C, Petkova D, et al. The UK biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203-9.30305743 10.1038/s 41586-018-0579-z PMC 6786975 · doi ↗ · pubmed ↗
- 7Sun BB, Chiou J, Traylor M, et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature. 2023;622(7982):329-38.37794186 10.1038/s 41586-023-06592-6PMC 10567551 · doi ↗ · pubmed ↗
- 8Dolzhenko E, van Vugt JJFA, Shaw RJ, et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. 2017;27(11):1895-903.28887402 10.1101/gr.225672.117PMC 5668946 · doi ↗ · pubmed ↗
