Unveiling unique protein and phosphorylation signatures in lung adenocarcinomas with and without ALK, EGFR, and KRAS genetic alterations

Fanni Bugyi; Mirjam Balbisi; Simon Sugár; Lóránd Váncza; Eszter Regős; Ilona Kovalszky; Ibolya Laczó; Tünde Harkó; Gábor Kecskeméti; Zoltán Szabó; Judit Moldvay; László Drahos; Lilla Turiák

PMC · DOI:10.1002/1878-0261.70091·July 7, 2025

Unveiling unique protein and phosphorylation signatures in lung adenocarcinomas with and without ALK, EGFR, and KRAS genetic alterations

Fanni Bugyi, Mirjam Balbisi, Simon Sugár, Lóránd Váncza, Eszter Regős, Ilona Kovalszky, Ibolya Laczó, Tünde Harkó, Gábor Kecskeméti, Zoltán Szabó, Judit Moldvay, László Drahos, Lilla Turiák

PDF

Open Access

TL;DR

This study identifies unique protein and phosphorylation patterns in lung cancer tumors with specific genetic mutations, which could help improve diagnosis and treatment strategies.

Contribution

The study reveals distinct proteomic and phosphoproteomic signatures associated with EGFR, KRAS, and EML4–ALK mutations in lung adenocarcinoma.

Findings

01

EGFR-mutated tumors showed unique protein expression patterns and downregulated pathways related to membrane and vesicle transport.

02

Altered activities of 10 kinases, including CDK2, were identified in EML4–ALK samples.

03

Proteomic differences included proteins like cathepsin L and STING, and phosphoproteins linked to RNA splicing like SRSF1.

Abstract

Genetic alterations in key oncogenes have been frequently identified in lung adenocarcinoma (LUAD), including genes encoding epidermal growth factor receptor (EGFR), Kirsten rat sarcoma viral oncogene homolog (KRAS), and anaplastic lymphoma kinase (ALK). In this pilot study, we aimed to characterize the differences in enriched biological pathways and phosphorylation events between LUAD tumors harboring EGFR, KRAS, or echinoderm microtubule‐associated protein‐like 4 (EML4)–ALK oncogenic alterations and triple wild‐type LUAD tumors (WT, without EML4–ALK, KRAS, or EGFR alterations) by mass spectrometry (MS)‐based quantitative proteomics and phosphoproteomics. We analyzed tumor regions of 82 formalin‐fixed paraffin‐embedded (FFPE) tissue sections with 6, 23, 31, and 22 samples from the EML4–ALK, EGFR, KRAS, and WT sample groups, respectively. A total of 1377 to 2189 proteins and 73 to 1781…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes20

EGFR KRAS ALK EML4 STING1 SRSF1 SRSF2 SRSF7 CDK2 MAPK Alk Egfr Eml4 Kras Ctsl Sting1 Srsf1 Srsf2 Cdk2 Srsf7

Proteins10

Chemicals2

paraffin formalin

Diseases3

lung adenocarcinoma tumor LUAD

Figures10

Click any figure to enlarge with its caption.

Fig. 1 — (Phospho)proteomic analysis of lung adenocarcinoma (LUAD) tumors. (A) Schematic workflow of applied methodologies: Small regions of formalin‐fixed paraffin‐embedded (FFPE) tissue sections from LUAD tumor samples were selected and subjected to on‐surface tryptic digestion. The resulting peptides were enriched for phosphopeptides using TiO2 spin tips, separating phosphopeptides from non‐phosphorylated peptides, both of which were then analyzed separately by mass spectrometry. (B) Distributions of the samples analyzed in the cohort based on the genetic alterations and subtypes. In the case of WT samples, NA refers to not applicable, while for KRAS samples, NA refers to not available.

Fig. 2 — Proteomic analysis of the EML4–ALK, EGFR, KRAS, and triple wild‐type (WT) sample groups. (A) Uniform manifold approximation and projection (UMAP) representation of the 82 samples based on the 2021 proteins. Different colors mark the four sample groups (blue for EML4–ALK, yellow for EGFR, green for KRAS, and magenta for wild‐type). (B) Heatmap with hierarchical clustering of the 166 proteins significantly altered (false discovery rate < 0.05, fold change > 2) between the groups (log2 protein LFQ intensity values are Z‐scored).

Fig. 3 — Gene set enrichment analysis (GSEA) dot plot of significantly enriched Gene Ontology Biological Process (GOBP) terms in EGFR‐WT comparison, where the x‐axis represents the normalized enrichment score (NES). The size of the dots corresponds to the number of genes associated with the term, and the color represents the NES.

Fig. 4 — Euler plot of the 36 potential alteration‐specific proteins. Proteins highlighted in black were significantly under‐expressed, while those in red were significantly overexpressed within the specific sample group. The proteins positioned at the bottom of the figure exhibit distinct expression patterns across multiple groups. A blue line indicates significant differences in expression between the connected groups, with overexpression occurring in the sample groups highlighted in red.

Fig. 5 — Phosphoproteomic analysis of the EML4–ALK, EGFR, KRAS, and WT sample groups. (A) UMAP representation of the 82 samples based on the 384 phosphosites quantified using maxquant and considered for statistical analysis. Different colors mark the four sample groups (blue for EML4–ALK, yellow for EGFR, green for KRAS, and magenta for wild‐type). (B) Heatmap with hierarchical clustering of the 183 phosphosites significantly altered (false discovery rate < 0.05, fold change > 2) between the groups (log2 phosphosite intensity values are Z‐scored).

Fig. 6 — Euler plot of the 52 potential alteration‐specific phosphosites. Phosphosites highlighted in black showed significantly decreased phosphorylation, while those in red showed significantly increased phosphorylation within the specific sample group. The phosphosites positioned at the bottom of the figure exhibit distinct phosphorylation patterns across multiple groups. A blue line indicates significant differences in phosphorylation between the connected groups, with increased phosphorylation occurring in the sample groups highlighted in red.

Fig. 7 — Visualization of functional enrichment analysis in string for Gene Ontology Biological Process (GOBP) terms of 44 phosphoproteins with potential alteration‐specific phosphosites. The size of the dots corresponds to the number of genes associated with the term, and the color represents the false discovery rate (FDR).

Fig. 8 — Result of the kinase–substrate enrichment analysis (KSEA). Kinase scores based on KSEA of 202 kinase–substrate pairs (number of substrates ≥ 3). Blue bars for suppressed and red bars for elevated kinase activities (P < 0.05 based on z‐score transformation). Black bars for kinases without significantly altered activities.

Fig. 9 — Immunohistochemistry staining of phospho‐p44/42 MAPK (Erk1/2) (Thr202/Tyr204) antibody to assess MAPK3/MAPK1 activity in LUAD tumors. Scale bar: 50 μm. Brown color represents positive staining.

Fig. 10 — Immunohistochemistry staining of phospho‐p38 MAPK (Thr180/Tyr182) antibody to assess MAPK11 activity in LUAD tumors. Scale bar: 50 μm. Brown color represents positive staining.

Tables2

Table 1. Summary of the LUAD samples investigated.

	EML4–ALK	EGFR	KRAS	WT
Number of samples	6	23	31	22
Age	63 (53–72)	66 (39–79)	63 (52–76)	66 (41–76)
Sex
Female	2	17	19	13
Male	4	6	12	9
Grade
2	4	16	21	11
2–3	0	1	1	1
3	2	3	7	9
Stage
I	3	8	8	5
II	2	10	11	7
III	0	5	10	9
IV	1	0	2	0
Smoke
Never	3	7	0	5
Former	1	6	8	6
Current	0	6	19	10

Table 2. List of potentially alteration‐specific phosphosites (significantly dysregulated in all three comparisons of a specific genetic alteration or the WT samples based on the phosphoproteomic analysis) supplemented with information from PhosphoSitePlus.

Site	Alteration specificity	Direction of change	Downstream effects on biological processes
TMPO;T74	EML4–ALK	DOWN	–
AHSG;S138	EML4–ALK	DOWN	Carcinogenesis, induced
SPTBN1;S2165	EML4–ALK	DOWN	–
NAB2;S171	EML4–ALK	DOWN	–
SNIP1;S35	EML4–ALK	DOWN	–
SLTM;S553	EML4–ALK	DOWN	–
SRRM2;S2694	EML4–ALK	DOWN	–
FARP1;S427	EML4–ALK	DOWN	–
LMO7;S1185	EML4–ALK	DOWN	–
LMNB1;S23	EML4–ALK	UP	–
PML;S518	EML4–ALK	UP	Apoptosis, inhibited; carcinogenesis, induced and inhibited; cell growth, induced; cell motility, induced; signaling pathway regulation; transcription, induced
SNTB2;S95	EML4–ALK	UP	–
ZNF326;S181	EML4–ALK	UP	–
KHSRP;S181	EML4–ALK	UP	–
PBXIP1;S43	EML4–ALK	UP	–
RBBP8;S327	EML4–ALK	UP	Cell cycle regulation
LMNB1;T20	EML4–ALK	UP	–
LMNB2;T34	EML4–ALK, WT	EML4–ALK > EGFR, KRAS > WT	–
SRRM2;T1880	EGFR	DOWN	–
NUMA1;S1757	EGFR	DOWN	–
PRKRA;S18	EGFR	UP	–
RPL23A;S43	EGFR	UP	–
MICALL2;S649	EGFR	UP	–
SRRM2;S1101	EGFR	UP	–
H1‐3;T4	EGFR	UP	–
H1‐3;T10	EGFR	UP	–
PLEC;S720	EGFR	DOWN	–
RSRC2;S32	EGFR	DOWN	–
BIN2;S431	EGFR	DOWN	–
FOXK1;S416	EGFR, KRAS	EGFR > EML4–ALK, WT > KRAS	–
SRRM1;S450	KRAS	DOWN	–
MYH11;S1954	KRAS	UP	–
FILIP1L;S551	KRAS	UP	–
VIRMA;S1579	KRAS	UP	–
PRKAR1A;S83	KRAS	DOWN	Carcinogenesis, induced; cell cycle regulation
SRSF1;S94	KRAS	DOWN	–
CCNL2;S369	KRAS	DOWN	–
SRRM2;S2702	KRAS	DOWN	–
CHD4;S885	KRAS	DOWN	–
MATR3;S598	WT	UP	–
PDLIM2;S134	WT	DOWN	–
PML;S8	WT	UP	Apoptosis, altered
TRA2A;S260	WT	UP	–
YBX1;S174	WT	UP	Transcription, altered
MAP1S;S759	WT	UP	–
SMARCC2;S302	WT	UP	–
THRAP3;S320	WT	UP	–
SRRM1;S463	WT	UP	–
SRRM1;S431	WT	UP	–
SRRM1;S429	WT	UP	–
HDGFL2;S137	WT	DOWN	–
LMNA;T19	WT	DOWN	Cell cycle regulation

Funding2

—Magyar Tudományos Akadémia10.13039/501100003825
—Nemzeti Kutatási, Fejlesztési és Innovaciós Alap10.13039/501100012550

Keywords

cancer researchFFPE tissuegenetic alterationslung adenocarcinomamass spectrometryphosphoproteomics

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRNA modifications and cancer · Lung Cancer Treatments and Mutations · Ferroptosis and cancer prognosis

Full text

Introduction

1

Lung cancer (LC) is the leading cause of cancer‐related mortality, accounting for 11.4% of new cancer incidences in 2020 based on the Global Cancer Statistics [1, 2]. LC is histologically classified into small cell lung cancer (SCLC, 15% of all LC) and non‐small cell lung cancer (NSCLC, 85% of all LC). The main subtypes of NSCLC are adenocarcinoma, squamous cell carcinoma, and large cell carcinoma. Lung adenocarcinoma (LUAD) accounts for approximately 40% of all LC cases [3].

Genetic alterations in key oncogenes have been frequently identified in LUAD, including genes encoding epidermal growth factor receptor (EGFR), Kirsten rat sarcoma viral oncogene homolog (KRAS), and anaplastic lymphoma kinase (ALK) [4, 5]. These genetic variations are specific molecular targets for therapeutic intervention and provide prognostic value [6, 7].

Mutations in EGFR occur in 30–40% of NSCLC patients and have a higher prevalence in patients with adenocarcinoma, women, and those who have never smoked [8, 9, 10]. The frequency of EGFR mutation is different among early‐ and late‐stage LUADs [11, 12]. The most common EGFR mutations are exon 19 deletion and L858R exon 21 point mutation of the tyrosine kinase domain, accounting for more than 85% of cases [13]. ALK rearrangements have been found in 2–7% of NSCLC patients; they are more common among younger people and nonsmokers [8, 9, 10]. ALK rearrangements are an adverse prognostic factor in surgically resected LUAD patients, while EGFR mutations are associated with a better prognosis [10]. Genetic mutations in KRAS have been found in 25–35% of NSCLC patients; they occur more often among males, and they are significantly more frequent in smokers or former smokers compared to nonsmokers [4, 8, 9]. The most common mutations in KRAS in LUAD are p.G12C, p.G12V, and p.G12D [14]. Proteins encoded by ALK and EGFR genes are both tyrosine kinase receptors, while KRAS, a small GTPase, is a downstream effector of EGFR. They activate the RAF/MEK/MAPK and PI3K/AKT signaling pathways, leading to cell growth and replication [14, 15].

Mass spectrometry (MS)‐based proteomics and phosphoproteomics have a key role in many fields of cancer research, from biomarker discovery and the characterization of disease mechanisms to drug resistance and personalized medicine [16, 17, 18, 19]. Phosphoproteomics is a powerful approach for the characterization of genetic alterations across different cancer types, including LC [20, 21, 22, 23]. Proteomic and phosphoproteomic studies investigating NSCLCs harboring genetic alterations typically focus on identifying dynamic changes upon mutations of specific driver genes, predicting treatment response, and revealing the role of phosphotyrosine signaling in disease pathology [20, 21, 22, 23]. However, the disrupted biological pathways and altered signaling between LUADs with different mutated or translocated driver genes have not been directly/extensively investigated. Therefore, uncovering common and distinct biological processes and phosphorylation events among oncogenic EGFR‐mutant, KRAS mutant, and EML4–ALK‐rearranged LUADs through proteomics and phosphoproteomics analyses could provide significant insights for further studies developing diagnostic approaches or therapeutic strategies for NSCLC.

In this exploratory study, we investigated the altered biological pathways and phosphorylation events primarily on serine and threonine residues in LUAD tumors harboring EGFR (n = 23), KRAS (n = 31), or EML4–ALK (n = 6) oncogenic alterations and triple wild‐type (n = 22) LUAD tumors by MS‐based quantitative proteomics and phosphoproteomics analyses of 82 formalin‐fixed paraffin‐embedded (FFPE) tissue sections following on‐surface tryptic digestion of selected tissue regions based on a previously established methodology [24]. The triple wild‐type term refers to tumors without alteration of EML4–ALK, EGFR, or KRAS genes. To the best of our knowledge, this is the first study focusing on altered biological processes and phosphorylation events in LUADs harboring EML4–ALK, EGFR, or KRAS genetic alterations, using small amounts of clinical samples.

Materials and methods

2

Materials and tissue samples

2.1

All chemicals used were HPLC‐MS grade. Acetonitrile (ACN), water, formic acid (FA), ammonium bicarbonate (ABC), acetic acid, glycerol, phosphorous acid, and trisodium citrate were purchased from Merck (Darmstadt, Germany). Trifluoroacetic acid (TFA), ammonia, dithiothreitol, iodoacetamide, and citric acid were obtained from Thermo Scientific (Unicam, Budapest, Hungary). Methanol, ethanol, and xylene were purchased from VWR International (Debrecen, Hungary). Trypsin Gold and Trypsin/Lys‐C Mix were obtained from Promega (Bio‐Science Hungary, Budapest, Hungary), and RapiGest surfactant was obtained from Waters (Budapest, Hungary). Phospho‐CDK2 (Thr160) antibody (#2561), phospho‐p44/42 MAPK (Erk1/2) (Thr202/Tyr204) (D13.14.4E) XP^®^ Rabbit mAb (#4370) antibody, and phospho‐p38 MAPK (Thr180/Tyr182) (D3F9) XP^®^ Rabbit mAb (#4511) were purchased from Cell Signaling Technology (Kvalitex Kft, Budapest, Hungary).

FFPE lung tissue sections from patients with LUAD were retrospectively collected between January 2008 and December 2020. The study was conducted in accordance with the Declaration of Helsinki. The work was approved by the Medical Research Council of Hungary (TUKEB permit number: IV/2567‐4/2020/EKU). Patient consent was waived with the approval of the Medical Research Council since the study was conducted retrospectively on FFPE‐archived tissue biopsies. FFPE tissue sections were obtained from the departmental archive of the National Korányi Institute of Pulmonology: EGFR‐mutated (n = 23), KRAS‐mutated (n = 31), EML4–ALK variant (n = 6), and triple wild‐type (n = 22). All tissue sections were 10 μm thick. Patient material was selected based on the following criteria: surgically resected LUAD specimens from the past several years, and the average age between the groups (63–66) was balanced. Cases included were classified according to the latest WHO histological classification of lung cancer. For the clinical diagnosis of EGFR and KRAS mutations, tumor regions with the highest tumor cell content were macrodissected from paraffin‐embedded tissue blocks, and DNA was extracted using the MasterPure DNA Purification Kit. KRAS mutations were identified through microcapillary‐based restriction fragment length analysis followed by Sanger sequencing, while EGFR mutations were detected by polymerase chain reaction (PCR) amplification of exons 18, 19, 20, and 21, followed by bidirectional Sanger sequencing [25]. Additionally, ALK rearrangements were routinely assessed in samples using fluorescence in situ hybridization (FISH) [26]. The low number of EML4–ALK‐rearranged samples reflects the rarity of this alteration. Summarized information about the samples is presented in Table 1, and detailed information can be found in Table S1.

On‐surface digestion

2.2

FFPE tissue sections were handled as previously described [24, 27]. The slides were baked at 60 °C for 2 h to prevent tissue detachment, and then, they were sequentially incubated in solvents with different polarities for deparaffinization: xylene (2 × 5 min), ethanol (2 × 3 min), ethanol–water mixtures (90% ethanol: 3 min, 70% ethanol: 3 min), 10 mm ABC solution (5 min), and water (1 min). Next, heat‐induced antigen retrieval was carried out to disrupt cross‐linking induced by formalin fixation: The tissue sections were boiled at 80–85 °C in citrate buffer (94.6 mm sodium citrate, 20.8 mm citric acid, pH = 6) for 30 min. The digestion was carried out on specific tissue regions characterized by board‐certified pathologists based on the H&E‐stained parallel sections available. In the unstained sections, the selected regions were outlined with a blade to control the digestion area. The proportion of tumor cells relative to other cell types in these regions is presented in Table S1. The exact steps of digestion have been described previously [24]. Briefly, on the tissue surface (ca. 20 mm^2^, using 3–3 μL solutions), the proteins were reduced with RapiGest and dithiothreitol and alkylated with iodoacetamide, and then digested in multiple cycles with LysC‐trypsin mixture (ca. 1 : 25 ratio) and trypsin (ca. 1 : 5 ratio). The extraction of the protein digest was carried out with a 10% acetic acid aqueous solution. The peptide digests from the exact same regions of two parallel tissue sections of the same individual were pooled and treated as one thereafter. Then, the extract was dried and stored at −20 °C until further use.

Purification and phosphopeptide enrichment

2.3

Thermo Pierce C_18_ spin columns (Kvalitex) were used for desalting and clean‐up. After each step, the cartridges were centrifuged at 2500 ** g ** for 2 min. The cartridge was conditioned with 2 × 200 μL 50 : 50 v/v% methanol : H_2_O and washed with 2 × 200 μL 0.5% TFA in 5 : 95 v/v% ACN : H_2_O, then equilibrated with 2 × 200 μL of 0.1% TFA in water. Samples were applied and reapplied once in 50 μL 0.1% TFA in water. The cartridge was washed with 2 × 100 μL of 0.1% TFA, and peptides were eluted with 2 × 50 μL of 0.1% TFA in 30 : 70 v/v% H_2_O : ACN. After the purification, the samples were dried down and stored at −20 °C until further use.

For the phosphopeptide enrichment, Pierce TiO_2_ SpinTips (Kvalitex) were used, as previously described [28]. After washing and equilibrating the column, the sample was loaded in 50 mm citric acid buffer and then washed. The flow‐through and the wash fractions were collected for proteomic analysis, and the phosphopeptides were eluted. The proteomic and phosphoproteomic fractions were lyophilized and stored at −20 °C until further use.

The proteomic samples were further cleaned using C_18_ spin columns with the same method described above, while the purification of phosphoproteomic samples was performed on self‐packed (with 10 mg Oasis HLB resin) centrifugal SPE tips [29]. After the elution, the proteomic samples were dried down in the vacuum evaporator, and the phosphoproteomic samples were lyophilized. Both fractions were stored at −20 °C until further use.

Nano‐UHPLC‐MS(MS) analysis—total proteome analysis

2.4

Every proteomic sample was reconstituted in 10 μL 0.1% FA in 2 : 98 v/v% ACN : H_2_O, out of which 5 μL was injected. Samples were analyzed using an Orbitrap Exploris™ 240 Mass Spectrometer instrument (Thermo Fisher Scientific, Waltham, MA, USA) equipped with a nanospray ion source coupled to a Waters ACQUITY UPLC M‐Class LC system (Waters, Milford, MA, USA). Peptides were separated on an Acquity M‐Class BEH130 C_18_ analytical column (1.7 μm, 75 μm × 250 mm) using gradient elution (isocratic hold at 3% for 1 min, then elevating B solvent content to 25% in 81 min, and to 40% in 3 min) following 3 min trapping on an M‐Class Symmetry C_18_ (100 Å, 5 μm, 180 μm × 20 mm) trap column. Solvent A consisted of 0.1% FA in water, and solvent B was 0.1% FA in ACN.

For MS analysis, data‐dependent analysis (DDA) measurements were performed. The number of dependent scans was set at 20, with a dynamic MS/MS exclusion of the same precursor ion for 40 s if it occurred more than 2 times within 30 s. Preferred charge states were set between +2 and +6. MS spectra were acquired in the 360–2200 m/z range, while the scan range was selected automatically for the MS/MS spectra.

Nano‐UHPLC‐MS(MS) analysis—phosphoproteomic analysis

2.5

Every phosphoproteomic sample was reconstituted in 8 μL 0.1% FA in 2 : 98 v/v% ACN : H_2_O, out of which 6 μL was injected. The samples were analyzed on a nanoElute 2 nano‐HPLC coupled to a timsTOF HT (Bruker Daltonics, Billerica, MA, USA) equipped with a CaptiveSpray 1 source. Phosphopeptides were separated on an experimental monolithic C_18_ column (150 mm*30 μm ID, 0.46 μL void volume, Bruker Daltonics) heated at 50 °C using gradient elution (constant 0.50 μL·min^−1^ flow; elevating B solvent content from 5% to 17% in 7 min, and 35% in 3 min, followed immediately by a 5 min plateau at 95%). Trapping was performed on an Acclaim PepMap trap column (5 μm C_18_‐coated particles, 300 μm × 5 mm, ThermoFisher Scientific P/N 160454). Solvent A consisted of 0.1% FA in water, and solvent B was 0.1% FA in ACN.

MS settings were the following: m/z range = 100–1700 Th, ion mobility range = 0.84–1.3 1/K0; transfer time = 60 μs, pre‐pulse storage time = 12 μs, ion polarity = Positive, scan mode = MS/MS (Pasef); TIMS parameters: ramp time = 180 ms, accumulation time = 180 ms; PASEF parameters: ms/ms scans = 4, total cycle time = 0.93068 s, charge range = 0–5, intensity threshold for scheduling = 1500, scheduling target intensity = 15 000, exclusion release time = 0.4 min, reconsider precursor switch = on, current/previous intensity ratio = 4, exclusion window mass width = 0.015 m/z, exclusion window v·s·cm^−2^ width = 0.015 V·s·cm^−2^.

Immunohistochemistry

2.6

Formalin‐fixed and paraffin‐embedded human lung tissue sections (2 μm thick) were deparaffinized and rehydrated. Heat‐induced antigen retrieval was performed using BOND Epitope Retrieval Solution 2 (AR9640, Leica Biosystems, Newcastle, UK). Endogenous peroxidase blocking was carried out using 10% H_2_O_2_ in methanol for 20 min at room temperature. Sections were washed in phosphate‐buffered saline containing 0.1% v/v Tween‐20 (PBST; containing 137 mm of NaCl, 2.7 mm of KCl, 10 mm of Na_2_HPO_4_, and 1.8 mm of KH_2_PO_4_, pH 7.5) followed by blocking the non‐specific protein binding sites with 10% bovine serum albumin in PBST (BSA, A6003, Merck KGaA, Darmstadt, Germany) for 1 h at room temperature. The sections were incubated with phospho‐p44/42 MAPK (1 : 1000), phospho‐p38 MAPK (1 : 500), and phospho‐CDK2 (1 : 100) antibodies or with the antibody diluent for the no primary antibody controls overnight at 4 °C in 1 : 2000 concentration. The negative control staining was prepared by omitting the primary antibody. The sections were washed and incubated with the HISTO‐Labeling System for 40 min at room temperature (30011.R500, Department of Immunology and Biotechnology, Pécs, Hungary). Diaminobenzidine 3 : 100 (ImmPACT DAB, SK‐4105, Vector Laboratories, Burlingame, CA, USA) was used to visualize the reaction, followed by counterstaining with hematoxylin. The sections were covered using BIOMOUNT (BMT‐500, BIOGNOST D.O.O, Zagreb, Croatia) and scanned with the Pannoramic P1000 scanner. Isotype controls were used as negative controls.

Data analysis

2.7

Software used: byonic 5.0.20 [30] (https://proteinmetrics.com, accessed on 6 June 2024), maxquant 2.4.0.0 and 2.4.2.0 [31] (https://maxquant.org, accessed on 6 June 2024), r 4.3.2 [32] (https://www.r‐project.org/, accessed on 6 June 2024), rstudio 2023.06.1 [33] (https://posit.co/, accessed on 6 June 2024), string 12.0 [34] (https://string‐db.org/, accessed on 6 June 2024), and ksea app 1.0 [35, 36] (https://casecpb.shinyapps.io/ksea/, accessed on 6 June 2024). The exact parameters used for all the software are summarized in Table S2.

Protein identification was performed by byonic on the Homo sapiens database (82 427 sequences, downloaded from https://www.uniprot.org/ in July 2023). Protein quantitation on the proteomic dataset was performed using maxquant v2.4.0.0 on a focused database, made from merging byonic search results from all proteomic MS/MS analyses. The number of quantified proteins varied between 1377 and 2189 across the samples. After quantitation, the data were log2 transformed and filtered based on the number of missing values in each of the four sample groups: only proteins found in at least 50% of all samples in at least one sample group (KRAS, EGFR, WT) were kept for further analysis. Regarding the EML4–ALK sample group, proteins had to be found in four out of six samples. Missing values were then imputed in a group‐wise manner according to the following: If the protein in question was found in less than 1/2 of all samples in the group, it was imputed with the sample 5‐percentile, while if it was found in at least 1/2 of all samples in the group, it was imputed according to the kNN algorithm (vim package [37], k = 10, similarity based on Euclidean distances).

Phosphosite quantitation on the phosphoproteomic dataset was performed using maxquant v2.4.2.0 on a focused database made from merging byonic search results from all phosphoproteomic MS/MS analyses. The number of quantified phosphosites varied between 73 and 1781 across the samples. After quantitation and log2 transformation, phosphosites were filtered based on the localization probability (≥ 0.75) and the number of missing values similar to the proteomic dataset. The missing values were then imputed using the site‐ and condition‐specific imputation if the quantitation rate within that condition was ≥ 0.5 (scImpute function in phosr package [38]). The remaining sites were imputed using tail‐based imputation (tImpute function in phosr package). The imputed dataset was normalized for sample medians using medianScaling function (scale = F, phosr package).

The following statistical analysis was carried out on both datasets: Normality was tested using Shapiro–Wilk tests within each sample group, and variances between the groups compared were tested using Levene tests. For multiple group comparisons, analysis of variance (ANOVA), Welch‐ANOVA, and Kruskal–Wallis tests were performed based on the outcome of the normality and equality of variances tests. For two‐group comparisons on proteins/phosphosites significant in the multiple group comparison, Student's t‐tests, Welch t‐tests, and Wilcoxon rank sum tests were performed based on the outcome of the normality and equality of variances tests. False discovery rates were controlled separately for every protein/phosphosite using the Benjamini–Hochberg method at 5%. Plots were made using the ggplot2 [39] and ggpattern packages. The uniform manifold approximation and projection (UMAP) was performed using the umap function from the m3c package [40], and hierarchical clustering was performed using the heatmap.3 function [41] (using Ward's clustering method ‘ward.D2’ from the hclust function). For the UpSet plot, plot_upset function was used from msnset.utils package [42].

Gene set enrichment analysis (GSEA) was performed using the gseGO function in the clusterprofiler r package [43] for Gene Ontology Biological Process (GOBP) terms. The genes were preranked based on effect sizes (Cohen's d). Enriched terms were visualized using the dotplot and the gseaplot2 functions in the clusterprofiler package.

Functional enrichment of phosphoproteins was performed in string for Gene Ontology (GO) terms. The minimum required interaction score was set to medium confidence (0.400). For the visualization, the following parameters were used: Number of terms shown was set to 10, and group terms by similarity ≥ 0.8.

Kinases were predicted for the 183 substrates significantly dysregulated. Substrate scores for kinases were generated using the PhosphoSitePlus [44] database and the ‘kinaseSubstrateScore’ function in phosr package based on the features of phosphorylation profiles and sequence motifs. Kinase was predicted for a given substrate using the ‘kinaseSubstratePred’ function in phosr package, a machine‐learning approach utilizing adaptive sampling. Kinase network was constructed using ‘plotKinaseNetwork’ function in phosr package.

Relative kinase activities were inferred using KSEA App. The analyses were performed separately for the six comparisons. PhosphoSitePlus and networKIN [45] were the selected kinase–substrate datasets.

Results

3

Proteomic and phosphoproteomic analyses were performed following on‐surface digestion of small regions of FFPE tissue sections from individuals diagnosed with LUAD. The tryptic digest of each sample extracted from the tissue surface of defined regions of two parallel sections was combined and subjected to phosphopeptide enrichment using TiO_2_ spin tips. Phosphopeptides containing phosphorylation on serine or threonine residues were predominantly enriched. The phosphopeptides were then separated from the non‐phosphorylated peptides, and the two fractions were analyzed in separate mass spectrometry runs (Fig. 1A). The samples were grouped into four sample groups based on the type and presence of a given genetic alteration (EML4–ALK‐rearranged, EGFR‐mutated, KRAS‐mutated, and triple wild‐type), and statistical analysis was performed accordingly. The sample groups contained 6, 23, 31, and 22 samples, respectively (Fig. 1B). The low number of EML4–ALK‐rearranged samples represents the frequency of the alteration.

(Phospho)proteomic analysis of lung adenocarcinoma (LUAD) tumors. (A) Schematic workflow of applied methodologies: Small regions of formalin‐fixed paraffin‐embedded (FFPE) tissue sections from LUAD tumor samples were selected and subjected to on‐surface tryptic digestion. The resulting peptides were enriched for phosphopeptides using TiO2 spin tips, separating phosphopeptides from non‐phosphorylated peptides, both of which were then analyzed separately by mass spectrometry. (B) Distributions of the samples analyzed in the cohort based on the genetic alterations and subtypes. In the case of WT samples, NA refers to not applicable, while for KRAS samples, NA refers to not available.

Patient and sample information are summarized in Table 1; detailed information regarding the samples is available in Table S1. The results of the proteomic and phosphoproteomic analyses are discussed separately.

Proteomics—identification of proteins and biological processes specific to samples with different genetic alterations through total proteomics

3.1

During the total proteomic analysis, 3672 proteins were quantified by maxquant using label‐free quantitation. The number of quantified proteins per sample ranged from 1377 to 2189. Following initial filtering based on the number of missing values, 2021 proteins were subjected to statistical analysis. The methods and parameters used for data analysis are summarized in Table S2.

Uniform manifold approximation and projection (UMAP) was performed based on the 2021 proteins. Most of the EGFR, KRAS, and WT samples clustered separately, while EML4–ALK‐rearranged samples grouped together with the other subtypes (Fig. 2A). This could reflect the molecular heterogeneity within the EML4–ALK‐rearranged group, where variations in protein expression might influence clustering behavior. Proteomic data were further investigated to identify proteins differentially expressed between the sample groups. Altogether, in at least one of the six two‐group comparisons (EML4–ALK‐EGFR, EML4–ALK‐KRAS, EML4–ALK‐WT, EGFR‐KRAS, EGFR‐WT, KRAS‐WT), the expression of 504 proteins was significantly altered (FDR < 0.05), out of which 166 proteins had a fold‐change of over 2. The results of the two‐group comparison tests (test type, P‐value, fold‐change) for the 504 proteins are summarized in Table S3. Hierarchical clustering based on the 166 proteins revealed distinct molecular profiles among samples with different genetic alterations (Fig. 2B). Moreover, EGFR‐mutated samples clustered separately from the other three groups. However, it is important to note that the clustering was not influenced by other clinical parameters such as tumor area, mutation subtype, smoking history, stage, grade, or sex. Furthermore, samples with 5–10% tumor content, including those from EGFR_12, EGFR_2, KRAS_18, and KRAS_26, clustered together with other samples harboring the same mutation.

Proteomic analysis of the EML4–ALK, EGFR, KRAS, and triple wild‐type (WT) sample groups. (A) Uniform manifold approximation and projection (UMAP) representation of the 82 samples based on the 2021 proteins. Different colors mark the four sample groups (blue for EML4–ALK, yellow for EGFR, green for KRAS, and magenta for wild‐type). (B) Heatmap with hierarchical clustering of the 166 proteins significantly altered (false discovery rate < 0.05, fold change > 2) between the groups (log2 protein LFQ intensity values are Z‐scored).

Several proteins showed exceptionally high fold‐changes (above 8‐fold); for example, 15‐hydroxyprostaglandin dehydrogenase (PGDH) was significantly overexpressed, while cingulin was significantly under‐expressed in EML4–ALK compared to the other three sample groups. Cysteine‐rich protein 1 (CRIP1) and protein S100‐P were significantly overexpressed in KRAS and WT groups compared to EML4–ALK and EGFR groups.

Multiple proteins related to EGFR, EML4–ALK, or KRAS signaling pathways showed significant differences in their expression between the sample groups (FDR < 0.05). Receptor tyrosine protein kinase erbB‐2 (ERBB2) [46] was significantly altered in EGFR and KRAS groups compared to the WT group; SHC‐transforming protein 1 (SHC1) [47] was significantly altered in EML4–ALK and KRAS groups compared to the EGFR group. Additional proteins that are indirectly regulated by these pathways were also significantly altered between the groups; for example, caspase‐7 [48] was under‐expressed in EML4–ALK, and metalloproteinase‐9 [49] in KRAS compared to the other sample groups.

Gene set enrichment analysis was performed in R for Gene Ontology (GO) Biological Process (BP) terms, based on the 2021 proteins. The complete lists of enriched terms are provided in Table S4. Most of the enriched terms were related to vesicle organization, transport, homeostasis, metabolic, and translation biological processes. Of the 79 enriched GOBP terms, only a few showed potential genetic alteration specificity. For example, membrane organization, vesicle organization, and vesicle‐mediated transport terms were significantly suppressed in EGFR‐mutated samples compared to the other sample groups. A dot plot for the enriched terms in the EGFR‐WT comparison is presented in Fig. 3. Dot plots for the other comparisons are provided in Figs S1–S5, and GSEA plots of significantly enriched GOBP terms in the comparisons of the EGFR sample group are presented in Figs S6–S8.

Gene set enrichment analysis (GSEA) dot plot of significantly enriched Gene Ontology Biological Process (GOBP) terms in EGFR‐WT comparison, where the x‐axis represents the normalized enrichment score (NES). The size of the dots corresponds to the number of genes associated with the term, and the color represents the NES.

Altogether, 36 proteins were found to be differentially expressed (FDR < 0.05 and fold‐change above 2) in all three comparisons of the sample groups, identifying them as potentially genetic alteration‐specific (Fig. 4). For example, procathepsin L (CATL1) was significantly under‐expressed in EML4–ALK‐rearranged samples compared to EGFR‐mutated, KRAS‐mutated, and WT samples. Besides the 16, 13, 2, and 5 proteins altered specifically in EGFR, EML4–ALK, KRAS, and WT groups, an additional seven proteins exhibited distinct expression differences across multiple groups. For example, pulmonary surfactant‐associated protein B (PSPB) was found to be significantly overexpressed in EML4–ALK and KRAS groups compared to EGFR and WT groups (Fig. 4).

Euler plot of the 36 potential alteration‐specific proteins. Proteins highlighted in black were significantly under‐expressed, while those in red were significantly overexpressed within the specific sample group. The proteins positioned at the bottom of the figure exhibit distinct expression patterns across multiple groups. A blue line indicates significant differences in expression between the connected groups, with overexpression occurring in the sample groups highlighted in red.

Phosphoproteomics—identification of unique phosphosites and inference of kinases specific to samples with different genetic alterations

3.2

Altogether, 3708 phosphosites were quantified by maxquant, and the number of quantified phosphosites per sample ranged from 73 to 1781. Out of the 3708 phosphosites, 384 sites were selected for further statistical analysis based on the number of missing values. These 384 phosphosites were derived from 254 unique phosphorylated proteins. The methods and parameters used for data analysis and statistical analysis are summarized in Table S2.

Dimensionality reduction was performed using UMAP based on the 384 phosphosites (Fig. 5A). Most of the EML4–ALK, EGFR, KRAS, and WT samples formed separate clusters. Moreover, most of the samples with EGFR Exon 18 and Exon 19 mutations clustered distinctly from those with EGFR Exon 21 mutations. Phosphoproteomic data were further investigated to identify phosphosites differentially phosphorylated between the four sample groups. Altogether, in at least one of the six two‐group comparisons, the phosphorylation of 211 sites was significantly altered (FDR < 0.05), out of which 183 had a fold‐change over 2. The results of the two‐group comparison tests (test type, P‐value, fold‐change) for the 211 phosphosites are summarized in Table S5. Hierarchical clustering based on the 183 phosphosites revealed some interesting differences between the samples with different genetic alterations (Fig. 5B). Most of the EML4–ALK and KRAS samples formed a distinct cluster, separate from the other samples, while most of the EGFR samples clustered separately from the WT and the other samples. However, samples with EGFR Exon 21 mutation clustered together with other genetic alterations or WT samples. Approximately one‐third of the samples clustered together regardless of their genetic alteration. This is because the highest numbers of proteins and phosphosites were identified in these samples. The cases with 5–10% tumor content (EGFR_12, KRAS_18, and KRAS_26) did not exhibit any distinct clustering behavior in the analyses, as they were part of the broader clusters formed by their respective genetic alterations. These samples did not have fewer identified phosphosites compared to those with higher tumor content, and the overall clustering was influenced more by the mutation status and the number of identified components rather than tumor content. Notably, one EGFR sample, EGFR_2 (with 10% tumor content), did not cluster with the other EGFR samples but rather with samples that had a higher number of identified phosphosites. Similarly, two EML4–ALK samples (with 816 and 1055 identified phosphosites) clustered apart from the others, which had lower phosphosite numbers (86, 87, 120, and 236).

Phosphoproteomic analysis of the EML4–ALK, EGFR, KRAS, and WT sample groups. (A) UMAP representation of the 82 samples based on the 384 phosphosites quantified using maxquant and considered for statistical analysis. Different colors mark the four sample groups (blue for EML4–ALK, yellow for EGFR, green for KRAS, and magenta for wild‐type). (B) Heatmap with hierarchical clustering of the 183 phosphosites significantly altered (false discovery rate < 0.05, fold change > 2) between the groups (log2 phosphosite intensity values are Z‐scored).

Note that two phosphorylation sites in EGFR (S946 and T648) have been quantified in the phosphoproteomic dataset; however, due to the huge number of missing values, they had been filtered out during the initial filtering. Additionally, the EGFR protein has also been quantified in the proteomic dataset. However, the protein itself did not show significant differences during the statistical tests.

To assess whether the phosphorylation or the protein expression was altered, we compared the fold‐change of each significant phosphosite to the fold‐change of the corresponding protein determined from the proteomic dataset separately for all six comparisons (Figs S9–S14). We observed a phosphorylation site (KRR1;S3) in the EGFR‐KRAS comparison where the change in phosphorylation was 2.0‐fold while the corresponding protein change was 1.7‐fold. Additionally, for the PML;S518 site in the EML4–ALK‐EGFR comparison, both the site and the protein exhibited approximately a 2‐fold increase, suggesting that in these cases, the change in phosphorylation was likely driven by an increase in protein abundance. However, in all the other cases, changes in protein abundance could be excluded as the primary cause of the change in the level of phosphorylation.

To uncover upstream kinases responsible for the changes in phosphorylation of the significantly altered phosphosites, potential kinase–substrate pairs and global relationships between kinases were identified using the phosr package (Figs S15 and S16). The combined score of substrates and the predicted kinase–substrate scores are presented in Table S6.

Out of the 183 phosphosites significantly altered in at least one of the six comparisons, 52 sites were specifically differentially phosphorylated in one sample group (Table 2). Altogether, 18, 12, 10, and 14 phosphosites were altered in all three comparisons of the EML4–ALK, EGFR, KRAS, and WT sample groups (Fig. 6). Additionally, 17 phosphosites showed distinct phosphorylation differences across multiple groups. For example, a higher level of phosphorylation of secretory carrier‐associated membrane protein 2 (SCAMP2) at S319 was found in EML4–ALK and KRAS groups compared to EGFR and WT groups.

Euler plot of the 52 potential alteration‐specific phosphosites. Phosphosites highlighted in black showed significantly decreased phosphorylation, while those in red showed significantly increased phosphorylation within the specific sample group. The phosphosites positioned at the bottom of the figure exhibit distinct phosphorylation patterns across multiple groups. A blue line indicates significant differences in phosphorylation between the connected groups, with increased phosphorylation occurring in the sample groups highlighted in red.

The phosphosites potentially specific to a given genetic alteration were derived from 44 phosphorylated proteins. None of these phosphoproteins showed significant alteration at the proteome level (FDR < 0.05 and fold‐change above 2). Serine/arginine repetitive matrix protein 1 (SRRM1), SRRM2, lamin‐B1, protein PML, and histone H1.3 had multiple phosphosites significantly altered in at least one sample group. Interestingly, the phosphorylation of SRRM2 at T1880 was found to be decreased in EGFR, while phosphorylation at S1101 was increased in EGFR, and phosphorylation at S2702 was decreased in KRAS compared to the other three sample groups.

A protein–protein interaction (PPI) network of the 44 phosphoproteins was constructed in STRING for GO terms. Functional enrichment analysis revealed that many of these proteins were most strongly associated with the following GOBP terms: cellular localization, nuclear migration, nucleus organization, regulation of RNA splicing, and RNA splicing (Fig. 7). The complete lists of enriched terms are presented in Table S7; the visualization of Gene Ontology Molecular Function (GOMF) and Gene Ontology Cellular Component (GOCC) terms is presented in Figs S17 and S18.

Visualization of functional enrichment analysis in string for Gene Ontology Biological Process (GOBP) terms of 44 phosphoproteins with potential alteration‐specific phosphosites. The size of the dots corresponds to the number of genes associated with the term, and the color represents the false discovery rate (FDR).

Kinase–substrate enrichment analysis

3.3

Kinase activities were inferred based on substrate phosphorylation levels by kinase–substrate enrichment analysis (KSEA) [36]. The activity of a given kinase is deduced by comparing the quantitative values of the identified substrates of the given kinase and the quantitative values of all substrates identified in the sample. Two‐group comparisons were performed between the four sample groups; altogether 202 kinase–substrate pairs were analyzed in the case of all six comparisons. KSEA revealed 81 kinases with at least one identified substrate (Table S8). Cyclin‐dependent kinase 1 (CDK1) had the largest number of substrates (m), 19. Altogether, 10 kinases showed significantly altered activities (m ≥ 3, P < 0.05) in at least one of the six comparisons (Fig. 8). The activity of cyclin‐dependent kinase 2 (CDK2) was significantly elevated in EML4–ALK samples compared to all other sample groups. The activity of casein kinase 2 alpha 1 (CSNK2A1) was significantly elevated in KRAS compared to the other groups. Additionally, the activity of mitogen‐activated protein kinase 1 (MAPK1) was elevated in both EML4–ALK and KRAS‐mutated samples compared to the WT, while the activities of mitogen‐activated protein kinase 7 (MAPK7) and mitogen‐activated protein kinase 11 (MAPK11) were suppressed in both the EGFR‐mutated and KRAS‐mutated samples relative to the WT samples.

Result of the kinase–substrate enrichment analysis (KSEA). Kinase scores based on KSEA of 202 kinase–substrate pairs (number of substrates ≥ 3). Blue bars for suppressed and red bars for elevated kinase activities (P < 0.05 based on z‐score transformation). Black bars for kinases without significantly altered activities.

Based on the results of KSEA and antibody availability, we performed immunohistochemistry (IHC) on serial tissue sections (2 μm thick) using the phospho‐p44/42 MAPK (Erk1/2) (Thr202/Tyr204) antibody to assess MAPK3/MAPK1 activity, the phospho‐CDK2 (Thr160) antibody for CDK2 activity, and the phospho‐p38 MAPK (Thr180/Tyr182) antibody for MAPK11 activity. Representative images of the stained tumor regions are presented in Figs 9 and 10 and Figs S19 and S20. Phospho‐p44/42 staining showed higher positivity in EML4–ALK‐rearranged and KRAS‐mutated samples. These results are consistent with our MS‐based phosphoproteomic findings, where MAPK1 activity was also higher in the EML4–ALK and KRAS samples compared to WT and in KRAS compared to EGFR. Additionally, MAPK3 activity was elevated in KRAS samples relative to both WT and EGFR, further supporting the observed staining patterns. Phospho‐p38 MAPK staining showed higher positivity in KRAS‐mutated samples, variable positivity among the samples within the EML4–ALK and WT sample groups, and negativity in the EGFR‐mutated samples. These results partly align with the MS analysis, where MAPK11 activity was suppressed in EGFR‐mutated samples compared to WT. However, elevated activity was observed in WT samples compared to KRAS‐mutated samples. In the case of phospho‐CDK2, we did not observe positive staining in most of the samples.

Immunohistochemistry staining of phospho‐p44/42 MAPK (Erk1/2) (Thr202/Tyr204) antibody to assess MAPK3/MAPK1 activity in LUAD tumors. Scale bar: 50 μm. Brown color represents positive staining.

Immunohistochemistry staining of phospho‐p38 MAPK (Thr180/Tyr182) antibody to assess MAPK11 activity in LUAD tumors. Scale bar: 50 μm. Brown color represents positive staining.

Discussion

4

In the present pilot study, lung adenocarcinomas harboring different genetic alterations were investigated by MS‐based proteomic and phosphoproteomic analysis. On‐surface digestion was performed on small, selected regions of FFPE tissue sections derived from patients with LUAD, followed by phosphopeptide enrichment. Differentially expressed proteins and their biological roles are discussed in detail and compared with existing literature. Phosphoproteins related to RNA splicing are also discussed. Finally, the results of the KSEA are interpreted.

Potential genetic alteration‐specific proteins in LUAD

4.1

Proteins significantly altered between samples with a specific genetic alteration can serve as potential markers for further development in diagnostic or therapeutic strategies. These proteins and their relationship with LUAD harboring genetic alterations are summarized in this section.

We identified lower levels of cathepsin L1 and elevated levels of 15‐hydroxyprostaglandin dehydrogenase (15‐PGDH) and stimulator of interferon genes protein (STING1) in EML4–ALK‐rearranged samples compared to the other three groups. Cathepsin L has a diverse role in cancer progression, promoting tumor invasion and metastatic activity. It has been identified as a downstream target of KRAS; moreover, the under‐expression of cathepsin L was detected in KRAS‐mutated, EGFR‐mutated, as well as in LUAD tumors without these mutations [50]. Additionally, the gene encoding cathepsin has been identified as a potential molecular marker for identifying KRAS mutations [51]. While the under‐expression of cathepsin L has been observed in KRAS and EGFR‐mutated tumors, we found it to be under‐expressed in EML4–ALK‐rearranged samples in our dataset, suggesting that its role may vary between different genetic alterations. This highlights the complexity of cathepsin L regulation in lung cancer, which requires further investigation to fully understand its contribution to tumor progression.

Although the elevated levels of 15‐PGDH and STING1 have not been previously associated with ALK alterations, both proteins have been implicated in other genetic mutations in lung cancer. 15‐PGDH was previously found to be under‐expressed in NSCLC tumors compared with matched normal samples; furthermore, inhibition of the EGFR pathway significantly elevated its expression levels [52]. Elevated expression of STING1 has been correlated with improved survival, early‐stage disease, and EGFR or KRAS mutations [53].

Moreover, our results showed that lysosome‐associated membrane glycoprotein 1 (LAMP1) was significantly under‐expressed in EGFR samples compared to the other genetic alterations or the WT samples. Previous studies have reported distinct subcellular localization patterns of LAMP1 depending on EGFR mutation status, with a more diffuse cytoplasmic distribution observed in EGFR‐mutant samples and a perinuclear localization in wild‐type cases [54]. Although this study did not directly compare expression levels, the altered localization was suggested as a potential diagnostic feature and a possible predictor of EGFR tyrosine kinase inhibitor response [55].

Additionally, nucleoside diphosphate kinase A was found to be under‐expressed in EGFR samples in the present study. However, although a direct link between the expressions of this protein and EGFR mutation has not been conclusively established, it has already been identified as a promising prognostic marker for NSCLC: Lower nucleoside diphosphate kinase A expression was strongly associated with increased risk of NSCLC, advanced TNM staging, poorer tumor differentiation, positive lymph node metastasis, lung adenocarcinoma, and a lower 5‐year overall survival rate in NSCLC patients [56].

Genetic alteration‐specific phosphorylation changes are enriched in RNA splicing‐related proteins

4.2

Functional enrichment analysis of 44 phosphoproteins with phosphosites potentially specific to a genetic alteration revealed an association with RNA splicing. Aberrant splicing is frequently observed in cancer, as changes in signaling affect the expression of regulatory splicing factors and the post‐translational modifications of these proteins [57]. Elevated spliceosome expression has been previously identified in human LUAD samples with EGFR mutations compared to non‐mutant EGFR samples [23]. In another study, where the effects of oncogenic KRAS on alternative splicing were investigated using human lung cell lines, alteration in phosphorylation of SRSF1, SRSF2, and SRSF7 proteins occurred in KRAS ^G12V^ and KRAS ^Q61H^ cells compared to KRAS wild‐type cells [58]. Our results showed that many SRSF phosphosites were altered. For example, a higher level of phosphorylation at SRSF1 S96 was observed in EML4–ALK and WT samples compared to KRAS‐mutated samples. SRSF1 S94 was identified as a potentially KRAS‐specific phosphosite, showing a lower level of phosphorylation. Additionally, SRRM2 is essential for organizing nuclear speckle formation, which is important for alternative splicing [59]. We identified several phosphosites of SRRM2 that showed potential alteration specificity, for example, S2694 for EML4–ALK, T1880, S1101 for EGFR, and S2702 for KRAS.

Kinase–substrate enrichment analysis

4.3

KSEA identified that the activities of 10 kinases were significantly altered between the four groups. Mitogen‐activated protein kinase 1 (MAPK1) showed suppressed activities in WT samples compared to samples with genetic alterations on EML4–ALK and KRAS and elevated activity in KRAS‐mutated samples compared to EGFR‐mutated samples. MAPK3 showed very similar activities compared to MAPK1, but it was not significantly altered in the comparison of EML4–ALK and WT. Additionally, the activities of MAPK7 and MAPK11 were suppressed in the WT samples compared to EGFR‐mutated and KRAS‐mutated samples. The MAPK pathway can be activated following the activation of RTKs, like EGFR [60]. Disturbance of the MAPK signaling in cancer is linked to uncontrolled cell proliferation and resistance to chemotherapy and targeted therapies [60]. Overexpression of the members of the MAPK pathway is known oncogenes in a large variety of tumors; they might be potential biomarkers for predicting the progression and prognosis of patients with NSCLC [61, 62]. Sousa et al. [22] found an increase in the predicted activities of kinases in the MAPK pathway for KRAS ^G12C^‐mutated samples.

In our results, cyclin‐dependent kinase 2 (CDK2) activity was elevated in EML4–ALK compared to the other sample groups. CDKs are serine/threonine kinases and major regulators of cell cycle progression, and dysregulation of CDK‐dependent pathways is often observed in NSCLC. Multiple CDK inhibitors, approved by the Food and Drug Administration (FDA), have garnered great interest in lung cancer, especially with the recent development of highly specific CDK4/6 inhibitors such as Palbociclib, Ribociclib, and Abemaciclib, which have shown promise in clinical trials for advanced metastatic breast cancer and are now being explored for lung cancer treatment [63]. Furthermore, CDK inhibition can bring improvements to the treatment of certain drug‐resistant EML4–ALK cells [64].

Limitations

4.4

Our exploratory study has considerable limitations. A relatively low number of samples was analyzed, and the starting sample amount was very low, imitating the size of small tissue biopsies. Tumor‐adjacent normal tissue areas were not investigated in the present cohort; only wild‐type, EML4–ALK‐rearranged, KRAS‐mutated, and EGFR‐mutated LUAD samples were compared. The validation of phosphosites potentially specific to a genetic alteration was not performed; however, immunohistochemical characterization of phospho‐p44/42 MAPK (Erk1/2) (Thr202/Tyr204), phospho‐CDK2 (Thr160), and phospho‐p38 MAPK (Thr180/Tyr182) was performed using parallel tissue sections. Additionally, phosphotyrosine sites were not enriched and hence were not detected in the analysis despite their importance in cell signaling. Moreover, since the cancerous cells were not sorted for analysis and the tissues are very heterogeneous, the identified differences can also be attributed to tumor heterogeneity.

Conclusions

5

In the present pilot study, we conducted an integrated proteomic and phosphoproteomic analysis on small regions of FFPE tissue sections to uncover biological processes and phosphorylation events specific to LUAD with and without EML4–ALK rearrangement, EGFR mutation, or KRAS mutation. Proteomic analysis revealed significant alterations in approximately one‐quarter of the investigated proteins, with EGFR‐mutated samples exhibiting a distinct protein expression pattern. Several proteins, such as cathepsin L and lysosome‐associated membrane glycoprotein 1, were identified as potentially specific to one of the genetic alterations. These proteins have previously been linked to EGFR or KRAS mutations and are known as prognostic markers for NSCLC and other cancers. Phosphoproteomic analysis uncovered significant differences in phosphosites among groups. Phosphoproteins with potential genetic alteration‐specific phosphosites were enriched in RNA splicing‐related processes. Kinase–substrate enrichment analysis indicated altered activities of 10 kinases, including MAPKs and CDKs. The results of this pilot study emphasize the need for further validation of the phosphosites and proteins potentially distinguishing the investigated LUAD types, as well as in‐depth investigations of these genetic alterations to enhance diagnostic and therapeutic strategies for LUAD.

Conflict of interest

The authors declare no conflict of interest.

Author contributions

FB and LT conceptualized and designed the study; ER, IL, TH, and JM collected the tissue samples and the patient data; FB and MB performed the sample preparation; GK and ZS performed the proteomic measurements; FB and SS analyzed the data; LV and IK performed the immunohistochemistry experiments; FB wrote the original draft; LT and LD provided funding, resources, and supervised the work. All authors were involved in correcting the initial draft and approving the manuscript draft.

Supporting information

Fig. S1. GSEA dot plot of significantly enriched GOBP terms in EML4–ALK‐EGFR comparison, where the x‐axis represents the Normalized Enrichment Score (NES). Fig. S2. GSEA dot plot of significantly enriched GOBP terms in EML4–ALK‐KRAS comparison, where the x‐axis represents the Normalized Enrichment Score (NES). Fig. S3. GSEA dot plot of significantly enriched GOBP terms in EML4–ALK‐WT comparison, where the x‐axis represents the Normalized Enrichment Score (NES). Fig. S4. GSEA dot plot of significantly enriched GOBP terms in EGFR‐KRAS comparison, where the x‐axis represents the Normalized Enrichment Score (NES). Fig. S5. GSEA dot plot of significantly enriched GOBP terms in KRAS‐WT comparison, where the x‐axis represents the Normalized Enrichment Score (NES). Fig. S6. GSEA plots of significantly enriched GOBP terms in EML4–ALK‐EGFR comparison. Fig. S7. GSEA plots of significantly enriched GOBP terms in EGFR‐KRAS comparison. Fig. S8. GSEA plots of significantly enriched GOBP terms in EGFR‐WT comparison. Fig. S9. Correlation of significantly altered phosphorylation sites and corresponding protein expression changes in the EML4–ALK‐EGFR comparison. Fig. S10. Correlation of significantly altered phosphorylation sites and corresponding protein expression changes in the EML4–ALK‐KRAS comparison. Fig. S11. Correlation of significantly altered phosphorylation sites and corresponding protein expression changes in the EML4–ALK‐WT comparison. Fig. S12. Correlation of significantly altered phosphorylation sites and corresponding protein expression changes in the EGFR‐KRAS comparison. Fig. S13. Correlation of significantly altered phosphorylation sites and corresponding protein expression changes in the EGFR‐WT comparison. Fig. S14. Correlation of significantly altered phosphorylation sites and corresponding protein expression changes in the KRAS‐WT comparison. Fig. S15. Prediction of kinase‐substrate interactions based on the 183 altered phosphosites. Fig. S16. Interaction network of the inferred kinases based on the 183 altered phosphosites. Fig. S17. Visualization of functional enrichment analysis in STRING for GOCC terms of 44 phosphoproteins with potential alteration‐specific phosphosites. Fig. S18. Visualization of functional enrichment analysis in STRING for GOMF terms of 44 phosphoproteins with potential alteration‐specific phosphosites. Fig. S19. Immunohistochemistry staining of phospho‐CDK2 (Thr160) antibody to assess CDK2 activity in EML4–ALK‐rearranged LUAD tumor. Fig. S20. Isotype control for immunohistochemistry analysis.

Table S1. Detailed information about the samples analyzed.

Table S2. Summary of the parameters used for all the software.

Table S3. List of 504 proteins along with the calculated P‐values and fold‐change values for the six different two‐group comparisons.

Table S4. The list of the enriched GOBP terms from the gene set enrichment analysis.

Table S5. List of 211 phosphosites along with the calculated P‐values and fold‐change values for the six different two‐group comparisons.

Table S6. The combined score of substrates and the predicted kinase‐substrate scores.

Table S7. The list of the enriched GO terms from the functional enrichment analysis of the 44 phosphoproteins with potential alteration‐specific phosphosites.

Table S8. Enriched kinases between the four sample groups based on KSEA of 202 kinase‐substrate pairs.

Bibliography65

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Siegel RL , Miller KD , Fuchs HE , Jemal A . Cancer statistics, 2021. CA Cancer J Clin. 2021;71:7–33. 10.3322/caac.21654 33433946 · doi ↗ · pubmed ↗
2Siegel RL , Miller KD , Wagle NS , Jemal A . Cancer statistics, 2023. CA Cancer J Clin. 2023;73:17–48. 10.3322/caac.21763 36633525 · doi ↗ · pubmed ↗
3Inamura K . Lung cancer: understanding its molecular pathology and the 2015 WHO classification. Front Oncol. 2017;7:193. 10.3389/fonc.2017.00193 28894699 PMC 5581350 · doi ↗ · pubmed ↗
4Kumar A , Kumar A . Non‐small‐cell lung cancer‐associated gene mutations and inhibitors. Adv Cancer Biol Metastasis. 2022;6:100076. 10.1016/j.adcanc.2022.100076 · doi ↗
5Passaro A , Attili I , Rappa A , Vacirca D , Ranghiero A , Fumagalli C , et al. Genomic characterization of concurrent alterations in non‐small cell lung cancer (NSCLC) harboring actionable mutations. Cancer. 2021;13:2172.10.3390/cancers 13092172 PMC 812417133946519 · doi ↗ · pubmed ↗
6Pikor LA , Ramnarine VR , Lam S , Lam WL . Genetic alterations defining NSCLC subtypes and their therapeutic implications. Lung Cancer. 2013;82:179–189. 10.1016/j.lungcan.2013.07.025 24011633 · doi ↗ · pubmed ↗
7Xiao Y , Liu P , Wei J , Zhang X , Guo J , Lin Y . Recent progress in targeted therapy for non‐small cell lung cancer. Front Pharmacol. 2023;14:1125547. 10.3389/fphar.2023.1125547 36909198 PMC 9994183 · doi ↗ · pubmed ↗
8Colombino M , Paliogiannis P , Cossu A , Santeufemia DA , Pazzola A , Fadda GM , et al. EGFR, KRAS, BRAF, ALK, and c MET genetic alterations in 1440 Sardinian patients with lung adenocarcinoma. BMC Pulm Med. 2019;19:209. 10.1186/s 12890-019-0964-x 31711449 PMC 6849322 · doi ↗ · pubmed ↗