Co-expression modules construction by WGCNA and identify potential hub genes and regulation pathways of postpartum depression

Department of Pharmacy, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, 430030 Wuhan, Hubei, China, Department of Hematology, Wuhan Children’s Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University and Technology, 430015 Wuhan, Hubei, China, Institute of Maternal and Child Health, Wuhan Children’s Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University and Technology, 430015 Wuhan, Hubei, China, Department of Radiation and Medical Oncology, Zhongnan Hospital, Wuhan University, 430071 Wuhan, Hubei, China


Abstract
Purpose: The purpose of our present study was to, for the first time, identify key genes associated with postpartum depression (PPD) and discovery the potential molecular mechanisms of this condition. Methods: First, microarray expression profiles GSE45603 dataset were acquired from the Gene Expression Omnibus (GEO) in National Center for Biotechnology Information (NCBI). The weighted gene co-expression network analysis (WGCNA) was performed to identify the top three modules from differentially expressed genes (DEGs).
Furthermore, cross-validated differential gene expression analysis of the top three modules and DEGs was used to identify the hub genes. Gene set enrichment analysis (GSEA) was conducted to identify the potential functions of the hub genes. We conducted a Receiver Operator Characteristic (ROC) curve to verify the diagnostic efficiencies of the hub genes. Lastly, GSE44132 dataset was used to search the association between the methylation profiles of the hub genes and susceptibility to PPD. Results: Altogether, 8979 genes were identified as DEGs for WGCNA analysis. The turquoise, yellow, and green functional modules were the most significant modules related to PPD development after WGCNA analysis. The enrichment analysis results of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway demonstrated that hub genes in the three modules were mainly enriched in the neurotrophin signaling pathway, chemokine signaling pathway, Fcγ receptor-mediated phagocytosis, and Mitogen-activated protein kinase (MAPK) signaling pathway. Eight genes (HNRNPA2B1, IL10, RAD51, UBA52, NHP2, RPL13A, FBL, SPI1) were identified as "real" hub genes from cross-validation data of the three modules and DEGs, and possessed diagnostic value in PPD. The GSEA suggested that "OLFACTORY_TRANSDUCTION", "BUTANOATE_METABOLISM", "MELANOMA", "AMINOACYL_TRNA_BIOSYNTHESIS", and "LY-SINE_DEGRADATION" were all crucial in the development of PPD. Highly significant differentially methylated

Introduction
Postpartum depression (PPD) is one of the most prominent mood disorders affecting 15% of parturient women [1]. Its clinical symptoms typically occur within two weeks after delivery, with patients reporting persistent and severe moodiness, including negative emotions (e.g., anxiety, sadness, hopelessness and/or worthlessness), low energy, and social withdrawal [2]. PPD may result in both short-term and long-term negative effects on neonatal and infant development, cognition, emotion, and behavior. Previous studies have shown that maternal depression is a risk factor for higher rates of premature and low-birth-weight babies, infant malnutrition and stunting, and infant diarrhoeal, which can lead to mothers' reluctance to care for their children and impair normal mother-child relationships. And it's becoming a major public health problem [3].
PPD is the result of the combined influence of multiple factors. Previous studies have reported hypotheses regarding the physiopathologic mechanisms of PPD, such as the abnormal changes of hormones, neurotransmitters, inflammatory factors [4], among others. The molecular pathways implicated in the PPD was involved in genetic inheritance [5], HPA axis [6], serotonin transporter [7], and GABA receptor [8]. However, there is no exact evidence to support any of the above conclusions, and the physiopathologic mechanisms of PPD remain largely obscure.
Despite the severity of PPD on adverse infant and maternal outcomes, early diagnosis would help establish mechanisms for the prevention and early cure of PPD. Therefore, it is necessary to identify reliable diagnostic biomarkers of PPD. In practical work, it is difficult to acquire brain tissue for clinical testing. Peripheral blood mononuclear cells (PBMCs) share mRNA expression patterns in brain tissues [9]. In both PBMCs and the prefrontal cortex, the expression levels of many genes and biological processes are similar [10]. On the one hand, mounting evidence has indicated that gene expression changes in PBMCs may intervene molecular alterations in the region of brain [11]. On the other hand, gene expression of peripheral lymphocytes may be affected by the central nervous system (CNS) via neurotransmitters, cytokines, and hormones [12]. Therefore, PBMCs can serve as an effective tissue to search the gene expression in PPD.
With the development of high-throughput expression microarray technology, gene expression profiling has become a powerful tool for in-depth study of the pathogenesis and diagnostic biomarkers of mental disorders at the genomic level [13]. For example, microarrays have been used to detect differences in gene expression characteristics between healthy controls and PPD patients [14]. Many microarray-based studies have focused only on screening for differentially expressed genes (DEGs) without identifying their links. Genes with similar expression patterns may be functionally related. Weighted gene co-expression Network analysis (WGCNA) algorithm can divide genes into different modules according to the similarity of gene co-expression in multiple samples. This produces a set of genes that function similarly, allowing selection of diseaserelated biomarkers and pathways. In the present study, we, for the first time, used WGCNA to construct a coexpression network of genes and determined key modules and hub genes associated with PPD.
significant DEGs were visualized (p < 0.05). The function of KEGG was to address genomes and biological pathways related to diseases (p < 0.01). To improve the accuracy of the analysis and identify the potential function of the hub gene, GSEA analysis was carried out (GSEA version 4.0.3, http://software.broadinstitute.org/gsea/index.jsp). pvalue < 0.05 was used as truncation standard. The top 15 pathways of yellow module gene in KEGG were selected.

PPI network and module analysis
The results from the STRING (version 11.5, https: //www.string-db.org/) [17] were analyzed and structured using Cytoscape software (Cytoscape version 3.9.0, https: //cytoscape.org/). The MCODE plugin was employed to identify modules of the PPI network. Two modules were selected (score >8). We found there were 8979 DEGs in GSE45603 after analyse by GEO2R. Then, we set several "strict" threshold (log 2 FC <-0.5 and log 2 FC >0.5 were used as a cutoff, with the p value < 0.05) for identify "real" DEGs which were meet our requirements. 673 "real" DEGs were imported into the STRING to obtain their interaction, which were imported into Cytoscape software. CytoHubba was used to calculate gene value [18], was used to screen hub genes based on the degree algorithm, and the top 30 genes were defined as hub genes. We entered the 30 hub genes into NetworkAnalyst, a visual analytic platform for exhibition of the PPI networks [19]. In PPI network, genes with combination score ≥0.9 and connectivity ≥10 were also defined as hub genes. The top 30 hub genes with the highest ranking was presented in Table 1.

Construction of weighted gene co-expression networks
There were 8979 DEGs after analysis by GEO2R. WGCNA (Version 1.6.0, https://git.bioconductor.org/packa ges/GmicR) was performed using 8979 DEGs [20]. The adjacency matrix was transformed into topological overlap matrix (TOM) by one-step method, and the minimum value was 30. Hierarchical clustering was used to generate hierarchical clustering tree [21,22]. The co-expressed gene modules were generated by hierarchical clustering tree with different colors, and the module structure was displayed by topological overlapping matrix.

Identification of clinically significant modules and hub genes in key modules
Correlations between modular characteristic genes (MEs) and clinical features were calculated to identify relevant modules. Gene significance (GS) was measured as the log10 transformation of the p-value (GS = lg p) of the linear regression between gene expression and clinical information. Modular significance (MS) was the average GS of all genes in a module. Overall, of all the selected modules, the module with the absolute first rank in MS was considered to be relevant for clinical features. In our study, the genes in key modules were imported into Cytoscape and screened  out according to degree. Key modules and two genes were overlapped using Venn diagrams.

Distributions of Hub Genes and Methylation Analysis
The distribution of all DEGs in GSE45603 was identified, and the functional similarity between proteins was evaluated by GOSemSim package (version 2.18.1, The raw data were analyzed using "minfi" package (version 1.38.0, http://www.bioconductor.org/packages/release/bi oc/html/minfi.html) in R and were converted into a score, referred to as a beta value.

Screening of diagnostic biomarkers
Receiver operating characteristic (ROC) analysis combines sensitivity and specificity to comprehensively evaluate diagnostic accuracy or discriminant effect. The "pROC" package (version 1.68.1, http://www.bioconduct or.org/packages/release/bioc/html/ROC.html) [24] was applied to evaluate the diagnostic value of hub genes and to screen out the diagnostic biomarkers of PPD.

Statistical Analysis
All statistical analysis were measured as mean ± standard deviation. R software (version 3.5.2, https://www. r-project.org/) was utilized to measure the data.

Identification of DEGs
There was a total of 21 PBMCs samples in this study, which included 5 healthy control and 16 PPD samples. There were 8979 DEGs. Then, the cutoff criteria were adjusted p value < 0.05, log FC >0.5, or log FC <-0.5. A total of 673 DEGs, including 163 upregulated genes and 510 downregulated genes, were explored after analyzing GSE45603. The expression level is displayed in Fig. 1A,B.

GO Function and KEGG pathway enrichment analysis
GO function annotation and KEGG pathway enrichment analysis were conducted to obtain more comprehensive knowledge of the selected DEGs. As for BP, the upregulated DEGs were mainly implicated in the regulation of translation, rRNA processing, regulation of immune response, and SRP-dependent cotranslational protein targeting the membrane. Mitochondrion, T cell receptor complex, ribosome, mitochondrial inner membrane, and cytosolic large ribosomal subunits were involved in the CC. Changes in MF were mainly enriched in the structural constituents of ribosomes, methyltransferase activity, poly (A) RNA binding, protein binding, and mRNA binding (Supplementary Fig. 2A). The downregulated DEGs were mainly responsible for reciprocal meiotic recombination, antigen processing, and presentation of exogenous peptide antigen via MHC class I, TAP-independent, positive regulation of long-term synaptic potentiation, positive regulation of macroautophagy, and negative regulation of gene expression in BP. Cytoplasmic, cytosolic, phagocytic vesicle membrane, NADPH oxidase complex, and focal adhesion in CC. Protein binding, four-way junction DNA binding, superoxide-generating NADPH oxidase activator activity, superoxide-generating NADPH oxidase activity, and enzyme binding in MF (Supplementary Fig. 2B).   The enrichment analysis results of KEGG pathway demonstrated that upregulated DEGs were mainly involved in ribosome enrichment, biosynthesis of amino acids, oxidative phosphorylation, ribosome biogenesis in eukaryotes, and hematopoietic cell lineage (Supplementary Fig. 2C). The downregulated DEGs were mainly enriched in phagosomes, osteoclast differentiation, antigen processing and presentation, glucagon signaling pathway, and Forkhead box O3 (FoxO) signaling pathway (Supplementary Fig.  2D).

Module Screening from the PPI network
The PPI network consisted of 571 nodes and 2213 edges, and module A (score = 17.263, including 20 nodes and 164 edges; Supplementary Fig. 3A), and B (score = 8.5, including 37 nodes and 135 edges; Supplementary Fig. 3B) with score >8 was identified from the PPI network. To establish DEGs from interactive PPI networks, these genes were imported into the STRING tool, the data from STRING was imported into Cytoscape (Supplementary Fig. 4A), and 30 hub genes were identified from the PPI network by cytoHubba (Supplementary Fig. 4B) and visualized by NetworkAnalyzer (Supplementary Fig. 4C).

Weighted co-expression network construction and key modules selection
Firstly, we checked the data quality in GSE45603. All samples were taken for subsequent analysis (Supplementary Fig. 5). DEGs containing 8979 genes was analyzed using WGCNA, and modules containing highly related genes were identified. Based on the approximate scale-free topology criterion, soft threshold power was 14 (scale-free R2 = 0.86) was optimized (Supplementary Fig. 6A and Supplementary Fig. 6B). There were 12 modules ( Fig. 2A), 1921 genes in turquoise module, 704 genes in yellow module, and 447 genes in green module. The 2736 genes that could not be included in any module were placed into the gray module and identified as non-co-expressed.

Correlation between modules and identification of key modules
The 400 genes were selected to draw a network heat map (Fig. 2B). Our results so far showed that the turquoise, yellow, green, and blue modules were independently verified and showed a high degree of independence between modules and the relative independence of the genes expressed in each module. The 11 modules were divided into two main clusters: one consists of two modules (turquoise, black), while the other consists of nine modules (magenta, brown, pink, blue, greenish-yellow, green, red, purple and yellow modules; Fig. 2C). Finally, heat maps were drawn according to the connectivity of characteristic genes to visualize the results (Fig. 2D).

Identification and distribution of hub genes
The turquoise, yellow, and green modules were visualized by Cytoscape, and the top 30 genes were screened out by sorting node degree candidate genes for further analysis ( Fig. 3A-C). Cross-validation of the data from these three modules and DEGs revealed 3 genes (HNRNPA2B1, IL10, and RAD51) in both DEGs and the green module, 4 genes (UBA52, NHP2, RPL13A, FBL) in both DEGs  Fig. 7A). These 8 genes were observed based on the average functional similarity (Supplementary Fig. 7B).

Expression of the hub genes after methylation analysis
Using GSE44132 dataset, we investigated the relationship between methylation levels of 8 genes and PPD susceptibility, and explored the possible mechanism of PPD occurrence. Data from 36 whole blood samples (11 PPD samples and 25 healthy controls) passed quality control indicators and were analyzed. In the GSE45603 dataset, HN-RNPA2B1 was down-regulated, and RPL13A and UBA52 were significantly up-regulated in the PPD group, compared with the control group (Fig. 5A). It's worth noting that, highly differentiated methylation sites were found in HNRNPA2B1, RPL13A, and UBA52 genes: cg19062098 (p-value = 0.036), cg18208268 (p-value = 0.039), and cg25699533 (p = 0.012) (Fig. 5B).

GSEA analysis of hub genes
Because HNRNPA2B1, RPL13A, and UBA52 had highly significantly differentiated methylated positions, we selected the three genes for follow-up studies. To identify the potential functions of these hub genes, GSEA was conducted to identify enriched biological processes in the samples. Geneset enrichment analysis (GSEA) for gene sets related to HNRNPA2B1, RPL13A and UBA52 expressions. Our results found that, compared with healthy control, HNRNPA2B1 was down-regulated and RPL13A and UBA52 were up-regulated in the PPD samples, respectively (Fig. 5A), thus we considered that lowly expression of HNRNPA2B1 and highly expressions of RPL13A and UBA52 were involved in which pathways. Results suggested that HNRNPA2B1 was lowly expressed in the gene set of "OLFACTORY_TRANSDUCTION", "BUTANOATE_METABOLISM", and "MELANOMA" (Fig. 6A). RPL13A and UBA52 were highly expressed in the gene sets of "AMINOACYL_TRNA_BIOSYNTHESIS", "LYSINE_DEGRADATION" and "BU-TANOATE_METABOLISM" (Fig. 6B,C).

Discussion
PPD is a major public health concern, affecting 10 to 15 percent of mothers worldwide [25]. Investigations are being performed to develop a non-invasive and quantitative clinical test to support PPD diagnosis, nevertheless, no specific or sensitive biomarkers in whole blood are currently available for the diagnosis of PPD [26,27]. Thus, we performed several bioinformatic methods to identify the exact pathological mechanisms and effective diagnostic biomarkers of PPD. In the first part of our study, we determined the hub genes and pathways associated with PPD within whole blood samples. A total of strictly screened 673 DEGs (|LogFC| >0.5 was used as a cutoff, with the p value < 0.05 regarded as statistically significant, 163 upregulated genes and 510 downregulated genes) and 30 hub genes were explored from the GSE45603 dataset. Next, we applied a new approach, WGCNA, to validate gene co-expression modules related to the molecular mechanisms underlying PPD. GEO2R identified 8979 genes which were analyzed to construct a co-expression network and generate a highly correlated co-expression gene group. There were three highly correlated modules (turquoise, yellow, and green modules) obtained from the WGCNA analysis. We regarded the top three hub genes (HnRNPA2B1, FBL, and SPI1), both in the PPI network and co-expression network, as "real" hub genes, which may function as important regulators in the pathogenesis of PPD.
Recently, it has been widely accepted that endocrine [28], immune response [29], and genetic and epigenetic factors [30], are all involved in the pathogenesis of PPD. Our study found that genes in the three modules identified by WGCNA (turquoise, green, and yellow modules) were mainly enriched in innate immune response, signal transduction, intracellular signal transduction, inflammatory response, and viral processes. The green module was enriched in the regulation of transcription (DNAtemplated), gene expression, response to ionizing radiation, and RNA processing.
According to the cross-validation of data from the three modules and DEGs, eight genes acted as high degree genes. Among these eight genes, we were interested in the HnRNPA2B1, Fibrillarin (FBL), and Spi-1 protooncogene (SPI1). HnRNPA2B1, an RNA-binding protein and one member of the heterogeneous nuclear ribonucleoprotein family, initiates and amplifies the innate immune response. HnRNPA2B1 dimerization is required for nucleocytoplasmic translocation and initiation of IFN-α/β expression [31]. Furthermore, hnRNPA2B1 is also involved in the pathogenesis of several devastating neurodegenerative diseases. Mutations in prion-like domains of hnRNPA2B1 play a causal role in multisystem proteinopathy (MSP) and may be involved in the pathogenesis of amyotrophic lateral sclerosis (ALS) [32]. Fibrillarin (FBL), a molecular marker of transcriptionally active RNA polymerase I, catalyzes the 2'-O-methylation of ribosomal RNAs. FBL plays an important role in ribosome biogenesis, particularly in the methylation of ribosomal RNAs and rDNA histones [33,34]. Recent work has indicated that FBL may be an important component of ribosome size, life duration in multicellular organisms, and ribosomal RNA production, all of which are correlated with age in healthy humans [35,36]. Spi-1 proto-oncogene (SPI1), the E26 transformation-specific (ETS) family transcription factor PU.1, serves as a master regulator of myeloid and lymphoid development and is primarily expressed in monocytes/macrophages, neutrophils, mast cells, B cells, and early erythroblasts [37,38]. The function of SPI1 is closely related to microglial viability and phagocytic capability. The targets of SPI1 demonstrate a significant relationship with the pathways defined as "Endocytosis", "Fcγ receptor-mediated phagocytosis", "Chemokine signaling pathway", and "MAPK signaling pathway" [38,39]. UBA52, a ubiquitin-ribosomal fusion gene and located in chromosome 19, is a major source of ubiquitin protein for covalent modification of proteinaceous substrates recycled by ubiquitin-proteasome system (UPS) [40]. UBA 52 encodes a fusion protein comprising ubiquitin at the N-terminus and RPL40 at the C-terminus [41]. Upon translation, ubiquitin and RPL40 are immediately cleaved from the translated product [42]. RAD51 has the function of discovering and invading homologous DNA sequences, and can be repaired accurately and timely [43]. NHP2 gene is a member of the H/ACA snoRNPs (small nucleolar ribonucleoproteins) gene family. snoRNPs are involved in various aspects of rRNA processing and modification [44]. RPL13A encodes a member of the ribosomal protein L13P family and is a component of the 60S subunit. The protein encoded as a component of the IFN-γ activated translational inhibitor (gait) complex also plays a role in inhibiting inflammatory genes [45]. However, the functions of RAD51, NHP2, and RPL13A in PPD did not study in detail. Interestingly, the pathway enrichment analysis indicated that genes in the three modules were enriched in "neurotrophin signaling pathway" and "chemokine signaling pathway", which partly coincided with SPI1 targets. The neurotrophin signaling pathway includes several neurotrophic factors, largely a family of secreted proteins that benefit in the growth, survival, and differentiation of neurons. Due to its beneficial and therapeutic effects on the neuronal physiology, researchers have intensively studied neurotrophic factors for decades [43]. Recently, neurotrophic factors have been considered as having significant links with the pathophysiology of neurological and neuropsychiatric disorders. Emerging evidence has demonstrated that brain-derived neurotrophic factor (BDNF), a well-known neurotrophic factor, is involved in PPD and its treatments. Clinical studies have observed a link between PPD symptoms and single nucleotide polymorphisms (SNPs) in BDNF [44]. Postpartum female mice induced by chronic unpredictable stress exhibited low expression of BDNF mRNA in the mPFC region [45]. In the serum of patients with PPD at admission and during development, BDNF levels were lower than healthy individuals. Fluoxetine, a classic treatment for women with PPD, may mediate its antidepressant effect in PPD by upregulating BDNF expression [46]. Second, to protect the fetus against pathogens and receive the semi-allogenic markers as the fetus, the immune system of the mother would go through a tremendous adaptation during pregnancy [47]. Inflammatory cytokine levels may obviously increase in the periphery during cervical ripening [48]. Furthermore, since the mother's immune system recovers to the previous nonpregnant state after delivery, subsequently, the change in inflammatory cytokine levels may affect mood [49]. Brewster et al. [50] reported that there was a close relationship between chemokines and PPD, which is manifested by increased levels of pro-inflammatory cytokines, such as IL-18 and C-X-C motif chemokine 1 (CXCL1). PPD was positively correlated with multiple genes in the immune response of an immobilization stress-induced mouse PPD model. The neurotrophin signaling pathway and chemokine signaling pathway may be deeply involved in the mechanisms of PPD; however, it remains a major challenge that required continued exploration.
There were two previous studies to find biomarkers of PPD based on GSE45603 dataset. Lauren et al. [51] reported that the objective of this study was to independently replicate their previously published prediction model of postpartum depression and women without a history of psychiatric disorders, and to further investigate the DNA methylation status of postpartum depression biomarkers associated with changes in pregnancy hormone levels and the timing of major hormonal changes. Maria et al. [52] focused on whether abnormal patterns of circulating levels of cytokines and chemokines may offer a suitable biomarker for disease development and/or therapeutic response. The objective of the present study and the previous two study were both searching for biomarkers of PPD. However, there were several differences among these studies. Our present study only observed the postpartum mental state and healthy controls. Lauren et al. collected different gestation process samples and used their own model to identify biomarker, their key point was changes in hormone levels. The emphasis of Maria' study was chemokines levels. Our present study did not classify genes, we put all DEGs into WGCNA and obtained hub genes, and the functions of genes were involved in various aspects of cell.
There were several limitations to our present study. First, in order to comprehensively determine dysfunctions related to PPD, both brain tissue and blood samples would need to be integrated. However, our present study did not perform an analysis of brain tissue. Second, only one microarray was included in the computerized analysis, and likely leading to one-sided results and a high falsepositive rate. Third, our present study only performed data mining and data analysis and did not conduct any experiments to validate the results. Finally, there were only 5 healthy control samples and 16 PPD samples selected for analysis; thus, it is necessary to increase the sample size to validate the accuracy of the results in a follow-up study.

Conclusions
Based on our knowledge, we for the first time used the system biology-based WGCNA method to predict several potential biological pathways and diagnostic biomarkers involved in PPD. WGCNA and co-expression network analysis identified key biological processes and signaling pathways, especially the neurotrophin signaling pathway, chemokine signaling pathway, Fcγ receptor-mediated phagocytosis, and the MAPK signaling pathway, which may contribute to the elucidation of the pathogenesis and progression of PPD. The potential diagnostic biomarkers included HNRNPA2B1, IL10, RAD51, UBA52, NHP2, RPL13A, and FBL. Finally, in-depth molecular biological experiments are required to determine the exact functions of the biological pathways and diagnostic biomarkers of PPD.

Author contributions
GW designed this study. GW and XH provided administrative support. DZ and CW collected the data, writed the manuscript. LJ, DA, YY, TJ, and YC analyzed the data. All the authors approved the final version of manuscript.

Ethics approval and consent to participate
Not applicable.

Acknowledgment
Not applicable.

Funding
This study was supported by grants from the Wuhan Clinical Medical Research Project (WX20C15).

Conflict of interest
The authors declare no conflict of interest.