Scientific Journals

News and Features

AAPS Meetings and Education

Products and Services

AAPS Member Services

AAPS Press Room

Marketing Opportunities

Affiliated Organizations

Join AAPS

View PDF Version

View Small Version

View Full Version

View Tables & Figures

Abstract

Introduction

Materials and Methods

Scientific Journals: AAPS PharmSci

Graul RC and Sadee W Evolutionary Relationships Among G Protein-Coupled Receptors Using a Clustered Database Approach AAPS PharmSci 2001; 3 (2) article 12 (https://www.pharmsci.org/scientificjournals/pharmsci/journal/01_12.html).

Evolutionary Relationships Among G Protein-Coupled Receptors Using a Clustered Database Approach

Submitted: October 10, 2000; Accepted: April 16, 2001; Published: May 4, 2001

Richard C. Graul¹ and Wolfgang Sadee²

¹Incyte Genomics, Palo Alto, CA 94304

²Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA 94143

Correspondence to:
Wolfgang Sadee
Telephone: 415-476-1947
Facsimile: 415-476-0464
E-mail: sadee@cgl.ucsf.edu

Keywords:
GPCR
Clustered Database
Peptides
Genetics

Abstract

Guanine nucleotide-binding protein-coupled receptors (GPCRs) comprise large and diverse gene families in fungi, plants, and the animal kingdom. GPCRs appear to share a common structure with 7 transmembrane segments, but sequence similarity is minimal among the most distant GPCRs. To reevaluate the question of evolutionary relationships among the disparate GPCR families, this study takes advantage of the dramatically increased number of cloned GPCRs. Sequences were selected from the National Center for Biotechnology Information (NCBI) nonredundant peptide database using iterative BLAST (Basic Local Alignment Search Tool) searches to yield a database of ~1700 GPCRs and unrelated membrane proteins as controls, divided into 34 distinct clusters. For each cluster, separate position-specific matrices were established to optimize sequence comparisons among GPCRs. This approach resulted in significant alignments between distant GPCR families, including receptors for the biogenic amine/peptide, VIP/secretin, cAMP, STE3/MAP3 fungal pheromones, latrophilin, developmental receptors frizzled and smoothened, as well as the more distant metabotrobic glutamate receptors, the STE2/MAM2 fungal pheromone receptors, and GPR1, a fungal glucose receptor. On the other hand, alignment scores between these recognized GPCR clades with p40 (putative GPCR) and pm1 (putative GPCR), as well as bacteriorhodopsins, failed to support a finding of homology. This study provides a refined view of GPCR ancestry and serves as a reference database with hyperlinks to other sources. Moreover, it may facilitate database annotation and the assignment of orphan receptors to GPCR families.

Introduction

Guanine nucleotide-binding protein-coupled receptors (GPCRs) represent a large class of membrane proteins with diverse functions. Also termed serpentine receptors, GPCRs are polytopic membrane proteins that share a common structure with 7 transmembrane segments (7-TMSs)¹ . These can be identified by hydropathy analysis and are predicted to be a-helical structures, usually consisting of 20 to 24 amino acids each. Online structural representations for the human µ opioid receptor, for example, exist as a two-dimensional (2D) schematic (https://www.gpcr.org/7tm/seq/vis/OPRM_HUMAN/OPRM_HUMAN.html ). For more online information about GPCRs, see the GPCR database, GPCRDB (https://www.gpcr.org/7tm/ .

The literature contains references to several distinct GPCR families, all sharing a 7-TMS topology. Kolakowski² and Horn et al³ have classified these GPCRs into several major families that share only minimal sequence similarity, as follows: family A: rhodopsin, olfactory, biogenic amine, peptide receptors; family B: vasoactive intestinal peptide (VIP), calcitonin, glucagon, secretin receptors; family C: metabotropic glutamate, Ca²⁺ -sensing, g-aminobutyric acid (GABA), apical vomeronasal receptors; family D: fungal pheromone P- and a-factor (STE2/MAM2); family E: fungal pheromone A- and M-factor (STE3/MAP3) receptors; family F: cyclic adenosine monophosphate (cAMP) receptors of Dictyostelium . In addition, a number of new putative GPCR families have been discovered with varying degrees of similarity to the established families, including frizzled, smoothened, basal vomeronasal receptors, and bride of sevenless (BOSS) of Drosophila and mammals, latrophilin, several plant GPCRs, another yeast GPCR, GPR1, as well as other mammalian sequences, p40 and pm1. Lastly, the complete sequencing of the Caenorhabditis elegans genome yielded ~1100 putative GPCR genes, including numerous chemoreceptors^4-9 .

GPCRs recognize a variety of ligands and stimuli (eg, light, ions, biogenic amines, nucleosides, lipids, amino acids, and peptides). Signal transduction is accomplished by coupling via guanine nucleotide-binding proteins (G proteins) to various secondary pathways involving ion channels, adenylyl cyclases, and phospholipases. Furthermore, GPCRs may also couple to other proteins - for example, those containing PDZ domains⁹ . We have recently shown that opioid receptors interact with calmodulin at the same domain required for G protein coupling¹⁰ . Hence, GPCRs may have many more protein signaling partners than we currently realize. Finally, experimental evidence has yet to be provided for many of the newly cloned sequences presumed to be GPCRs that they indeed couple via G proteins. Thus, the GPCR families are extremely diverse in function and primary structure, but adhere to a common topology of 7-TMSs. This feature and certain conserved motifs that do not pervade all families serve to classify a newly identified sequence as a possible GPCR.

Despite compelling similarity in their overall structures, the lack of statistically significant sequence similarity among several GPCR families raises the question whether all GPCRs arose through common ancestry. Thus, the VIP/secretin receptors and the metabotropic glutamate receptors are seemingly unrelated to other peptide and biogenic amine receptors. Furthermore, until recently no significant sequence identities had been established between mammalian GPCRs and the fungal pheromone receptors.

Several other families share the prevalent 7-TMS architecture, most notably bacteriorhodopsin, photoreceptors-proton pumps of archebacteria, for which direct molecular structure information is available. While the 7-TMS topology of the bacteriorhodopsins and GPCRs is similar¹ , the relative spatial arrangements of the TMSs differ in some details. Moreover, the bacteriorhodopsin ligand, retinal, is covalently attached to a Lys residue in the seventh TMS at a location identical to that for the retinal attachment site in rhodopsin, a true GPCR. Yet, any evolutionary relationship between bacteriorhodopsin and GPCRs is unproven because bacteriorhodopsin and GPCR share minimal sequence similarity¹ .

The seeming lack of reported significant sequence similarities - despite similar architecture and function - may be explained by 1) structural convergence of unrelated families, 2) genetic drift (divergence), 3) difficulties in analyzing polytopic membrane proteins, 4) difficulties in analyzing large protein families, and 5) limitations inherent in common sequence analysis methods.

Sequence analysis of polytopic membrane proteins poses particular challenges. The restricted amino acid composition of hydrophobic TMSs, the periodicity of amphipathic a-helices, and the overall serpentine structure combine to impede distinguishing homology from homoplasy^1, ^11-13 . Further, many polytopic membrane protein families are large, composed of hundreds, and even thousands of sequences, in the nonredundant peptide database at the National Center for Biotechnology Information (NCBI https://www.ncbi.nlm.nih.gov )¹⁴ . This makes it difficult to sift relevant alignments from the vast amount of data produced by a single Basic Local Alignment Search Tool (BLAST https://www.ncbi.nlm.nih.gov/BLAST/ ) run. Specifically, the border between sequence similarities caused by chance or convergence rather than true homology is blurred.

Another difficulty in evaluating alignments between polytopic membrane proteins arises from the different physical environments and constraints to which the TMS domains and loops/tails of GPCRs are subjected. The BLAST heuristic rapidly identifies database sequences similar to a given query sequence; however, a major simplification of BLAST is the use of a single substitution matrix irrespective of residue position in a sequence^14,15 . This could introduce errors in estimating the similarity among protein sequences on the basis of a single substitution frequency for each amino acid pair. Position Specific Iterated-BLAST (PSI-BLAST https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-2.html ) overcomes this limitation by using a substitution vector for each residue position of a query sequence¹⁶ . Therefore, we employed PSI-BLAST (in addition to BLAST) in the present study to evaluate sequence similarities between loop/tail and TMS on a residue-by-residue basis. A similar approach recently reported by Josefsson¹⁷ has yielded significant alignments between many GPCRs, suggesting only 3 distinct clades (superfamilies) - the huge rhodopsin-like family including many neurotransmitter and hormone receptors and 2 smaller clades of the BOSS-metabotropic glutamate receptors (MGR), and the STE2 fungal pheromone receptors. Our studies differ in some details from the approach taken by Josefsson¹⁷ but generally support Josefsson's finding of homology among the many distant GPCRs contained in the rhodopsin-like superfamily. Moreover, our study finds significant sequence similarity between the rhodopsin-like, BOSS-MGR-like, and STE2-like GPCRs, thus supporting the hypothesis that all GPCRs arose from a common ancestor.

To find distant relationships, we took advantage of the rapidly growing database of cloned sequences. If evolutionary relationships do exist, one would expect sequences to emerge that provide convincing alignments between distant families. In a first step, we used iterative BLAST (Iterative Neighborhood Cluster Analysis, INCA https://itsa.ucsf.edu/~gram/home/inca/ ) ¹⁸ analyses to extract ~1700 putative GPCR and other membrane protein sequences from the public databases and establish a separate database for our analyses (Table 1 ). Another key aspect of our approach is the use of clusters of highly related GPCRs to generate PSI-BLAST-constructed position-specific score matrices (PPSMs) for each cluster individually¹⁶ . This avoids generating problematic PPSMs that can arise from including spurious sequence alignments¹⁹ . Such alignment models are useful in detecting distant evolutionary relationships¹⁶ . Further, our study implements these PPSMs as a database using the IMPALA (Integrating Matrix Profiles And Local Alignments) software package²⁰ , also available from NCBI (ftp://ftp.ncbi.nlm.nih.gov/Home.html ), along with the standard BLAST distribution. Finally, our study compares the IMPALA results with those using the corresponding Pfam (https://pfam.wustl.edu/ ) models for HMMER (https://hmmer.wustl.edu/ ). Pfam is a large collection of hand-curated multiple sequence alignments and hidden Markov models covering many common protein domains. Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. Commonly used PSI-BLAST searches generate a PPSM that originates from a chosen seed sequence and therefore varies with each alignment model. Both IMPALA and HMMER allow the user to search a database of predetermined alignment models with a query sequence and thus to draw inferences regarding the query sequence's structure or function.

The results of this study confirm the heterogeneity of the GPCR families but also provide statistically significant alignments between distant families with questionable ancestry. The relevance of these alignments is further tested by asking whether equivalent domains of two sequences are aligned and what the key conserved residues are. We also demonstrate that we can build PPSMs in a semi-automated fashion that are comparable in sensitivity to the corresponding Pfam models. Lastly, this database of GPCRs serves as a starting point for addressing the evolution of GPCR structure and function in detail.

Materials and Methods

Experimental approach

We first established a separate GPCR database to develop appropriate alignment models for each of the GPCR and unrelated control membrane protein families. The database was established with the use of selected seed sequences and the iterative BLAST program, INCA (https://itsa.ucsf.edu/~gram/home/inca/ ). Using a single linkage clustering algorithm, INCA exhaustively retrieves all related sequences better than a selected E-value from the entire NCBI (https://www.ncbi.nlm.nih.gov/ ) nonredundant database. This yields nonoverlapping clusters, each containing closely related sequences. Subsequently, each cluster is further subdivided into subclusters with high similarity, again using a single linkage clustering algorithm, but with increased stringency. Both the clusters and subclusters are used to calculate PPSMs (checkpoint files, alignment models). This establishes a separate PPSM for each subfamily of closely related receptors. Each of these PPSMs is used to query the database using a single iteration of PSI-BLAST (https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-2.html ). This strictly avoids contamination of the substitution matrix by excluding the possibility that any unrelated sequences spuriously contribute to the matrix. For comparison to the PSI-BLAST results, we also used gapped BLAST (even though the BLOSUM62 substitution matrix is less appropriate for membrane proteins) to evaluate threshold alignments. Further, we developed a database of PPSMs using the IMPALA package and compared results for these PPSMs with the results from relevant hidden Markov (HMMER https://hmmer.wustl.edu/ )²¹ Pfam (https://pfam.wustl.edu/ ) models²² . Built from multiple alignments, profile HMMs capture position-specific information regarding amino acid frequencies, as well as those of insertions and deletions, whereas PPSMs are built from multiple pairwise alignments, discarding information regarding insertions or deletions with respect to a "master" query sequence. To establish benchmark criteria for significant alignments, we include in the database a number of clusters containing unrelated polytopic membrane proteins (ie, those containing multiple TMSs). Last, alignments among distinct clusters are further evaluated by viewing the location of the alignments and the presence of any conserved motives or signature residues. No attempt was made to remove the highly variable N- and C-termini; rather, alignments in these regions served as further criteria to establish homology or evolutionary paths of added protein modules.

This approach differs from that taken by Josefsson¹⁷ , who adopted the following process. First, seed GPCR sequences were truncated at their N- and C-termini. Second, selected seed sequences were used to scan the NCBI nonredundant databases directly using only PSI-BLAST, with one or more iterations, rather than performing PSI-BLAST on a defined GPCR database. Third, only suspected GPCR sequences were analyzed; this makes it difficult to assess threshold alignment values for establishing homology - arguably the most difficult step in the interpretation. Thus, our current results - obtained with yet more sequences that have become available since the Josefsson study¹⁷ - serve to further evaluate and refine the links between distant GPCR families.

Sequence database and cluster analysis

The database used in this study is a subset of the NCBI nonredundant peptide database consisting of approximately 460 000 sequences. Using INCA, we extracted GPCRs and unrelated control sequences by selecting seed sequences, thus establishing clusters of closely related proteins. Starting with 34 seed query sequences, INCA identified 1720 sequences in 34 distinct clusters in ~500 BLAST searches. INCA parameters were adjusted so that the resultant clusters do not merge (ie, a sequence meets the alignment criteria for only 1 cluster in the database searched). The seed sequences selected were true or putative GPCRs (28 sequences), bacteriorhodopsins (1 sequence), fungal opsins (1 sequence), and other polytopic membrane proteins (4 sequences), thought to be unrelated to GPCRs (er21, patc, pet1, and psn1). The resultant clusters are divided between experimentals and controls, denoted with an asterisk (Table 1 ).

In some instances, we selected a manageable number of representative sequences from each of the groups to be analyzed. Thus, sequences may be limited to certain databases or individual species to reduce the number analyzed. The 5h1a cluster was seeded using a human serotonin receptor²³ . To limit the size of this cluster, the INCA search was restricted to human sequences in the SwissProt database, resulting in 168 sequences. This cluster represents what may be considered the main family of GPCRs (rhodopsin, olfactory, biogenic amine, peptide, etc.). Because the C elegans genome is fully sequenced, we established several clusters of putative C elegans GPCRs. Thus, the 5ht cluster was created by starting with a serotonin receptor from nematode²⁴ and restricting the INCA search to C elegans in the NCBI nonredundant database, resulting in 64 sequences. In other cases we identified several clusters unique to nematode, without the need for restricting the INCA search to that organism - for example, mgr, a cluster of mgr-like proteins³⁵ ; odr and odr10, chemoreceptors of nematode³⁸ ; and sra1, srab6, srd1, sre1, srg1, and sro1, several serpentine receptors of nematode⁴⁷ (Table 1 ).

Selection of control sequences of unrelated proteins

A number of 7-TMS proteins appear to be unrelated to GPCRs, both by function and primary sequence, for example, the KDEL/HDEL family that serves as shuttle vector for protein translocation between the Golgi apparatus and the endoplasmic reticulum²⁸ . Furthermore, there is no evidence that the GPCRs may be related to other classes of polytopic membrane proteins with a distinct TMS number. To provide for a negative control against which our GPCR alignments can be compared, we selected the sequences of 4 distinct groups of polytopic membrane proteins thought to be unrelated to GPCRs (Table 1 ). These are KDEL/HDEL with 7-TMS (er21 cluster, 21 sequences), H+/dipeptide transporters with ~12 TMSs (pet1, 76 sequences), the regulatory membrane protein family patched (patc, 52 sequences; patched is a ~12-TMS membrane protein in the signaling pathway between sonic hedgehog and smoothened - the latter considered a true GPCR [30]), and the presenilins with ~9 TMSs (psn1, 43 sequences; presenilins are thought to be involved in the etiology of Alzheimer's disease [45]). The best score of any of these sequences aligned with any GPCR served as an arbitrary cutoff point for considering the possible significance of an alignment among GPCRs themselves. Bacteriorhodopsins served as a test case because of their structural and functional similarity to true GPCRs. Last, fungal opsins from S cerevisiae (YRO2) are included because of significant alignments with the bacteriorhodopsins^18, ⁵⁶ .

Database subclustering

Sequence comparisons of clusters were performed using the gapped BLAST program, blastpgp, version 2.1.2. Comparisons of sequences within the same cluster were used to compute subclusters using a single linkage algorithm. Sequentially, every sequence is used as a blastpgp query against a subset database containing every sequence in its own cluster. Using the "l" option, it was possible to restrict database searches to the subset specified by the list of gi codes in a given cluster. We used the "z" flag to establish the effective database size as 106. This allows for direct comparison of E-values resulting from different subsets of the database. This also permits sequences to be added to, or removed from, our analysis as needed without affecting our statistical measure. The actual total number of letters in the database was 800 367. We did not filter the query sequence for low complexity. Sequences with E-values better than 10-40 are merged into the same subcluster. This serves to establish a set of highly related sequences for developing individual PPSMs. Clusters have the suffix 0, while subclusters are numbered 1 to n (Table 2 ).

Alignments between protein sequences in different clusters

Each sequence in each cluster was compared with each sequence or model from all other clusters, using gapped BLAST, PSI-BLAST, IMPALA, and HMMER. For further evaluation, we considered only the best pairwise alignment between any 2 clusters. Alignments between clusters of membrane proteins thought to be unrelated to each other served as controls to establish a threshold of possible significance. Any alignment with an E-value better than those involving any of the controls was considered potentially significant within our database. The number of controls was not strictly matched with the number of experimentals; however, many of the experimentals have known evolutionary relationships and may thus serve as positive controls. We did not filter the query sequence for low complexity; however, we manually removed several low complexity, hydrophilic alignments.

Gapped BLAST analysis

Sequentially, every sequence in 1 cluster is used as a blastpgp query against a subset database containing every sequence in every other cluster. The highest scoring pairwise comparisons are tabulated in Table 3 . These can be viewed by clicking on the respective cell in the table.

PSI-BLAST analysis

For PSI-BLAST analysis, we first ran blastpgp as before. We then generated PPSMs for each of the clusters and subclusters. The optimal scoring sequence is selected from each cluster/subcluster, and PSI-BLAST is iteratively run against that cluster/subcluster subset database until convergence occurs. The resultant PPSM represents that cluster/subcluster. We used each PPSM as a blastpgp query against a subset database containing every sequence in every other cluster. The highest scoring pairwise comparisons are tabulated in Table 4 . Due to a limitation of the blastpgp program, only clusters and subclusters with more than one sequence are used to generate PPSMs.

IMPALA analysis

The PPSMs generated during the PSI-BLAST analysis were assembled into an IMPALA database. Sequentially, every sequence in one cluster is used as an IMPALA (version 1.1) query against a database containing the PPSMs derived from the other cluster. The highest scoring pairwise comparisons are tabulated in Table 5 . An advantage of IMPALA is that one may search a database of alignment models with a query sequence in a single search, whereas in PSI-BLAST, one would have to perform searches for each of the models. Also, IMPALA uses an improved scoring technique that considers the composition of the query sequence.

Pfam/HMMER analysis

As a PPSM is a simplified HMM, we wanted to see how our semi-automatically generated PPSMs would compare with the hand-curated counterparts contained in Pfam. We selected HMMER Pfam models representative of the clusters under analysis. Sequences from 1 cluster are used as an hmmsearch (version 2.1.1) query against each Pfam model. We set the database size to 2000 sequences. The highest scoring pairwise comparisons for standard Pfam models are tabulated in Table 6 .

TMS annotation

Selected alignments are annotated with TMSs to aid alignment evaluation. When available, the SwissProt annotation is used, accessible through the Entrez (https://www.ncbi.nlm.nih.gov/Entrez/ ) GenPept display at the NCBI. Otherwise, TMSs are predicted using the hidden Markov model topology prediction Web application, hmmtop (https://www.enzim.hu/hmmtop/ ) (default settings)⁵⁷ . Generally, we consider the SwissProt annotations to be more reliable, as they have been curated. Occasionally, hmmtop over- or underpredicts TMSs. Also, hmmtop identifies signal peptides as TMSs. Signal peptides may be present at the N-terminus, particularly when there is a large extracellular domain present. Finally, not all analyzed sequences are full length. Our dataset includes fragment proteins. We eliminated 24 putative GPCRs with more than 9 predicted TMSs from C elegans because we suspected that they may be chimeric proteins.

Error versus coverage plots

We compared the performance of BLAST, IMPALA, and HMMER/Pfam. We used the 5h1a and vipr clusters as well as the corresponding PPSMs and Pfam models to analyze our database sequences. We considered the 1285 sequences from the following 19 clusters as GPCRs, or true positives: 5h1a, 5ht, car1, friz, latr, oa1, odr, odr10, olf1, pe22, sra1, srb6, srd1, sre1, srg1, sro1, ste3, vipr, and vn1. We considered the 192 sequences from the following 4 clusters as non-GPCRs, or false positives: er21, patc, pet1, and psn1. Each of these 1477 sequences was used as a query against both the 5h1a and vipr clusters. We collected selected Expect values (E-values). For BLAST, the top sequence hit was used. For IMPALA, the top scoring PPSM hit was used. For HMMER/Pfam, a single profile HMM hit was used. Error per query sequence (ie, identification of a negative control alignment as a false positive) is a measure of selectivity. Coverage of true positives is a measure of sensitivity. We plotted error versus coverage⁵⁸ , shown in Figure 1 .

Evolutionary dendrograms

Data from BLAST (Table 3 ), PSI-BLAST (Table 4 ), and IMPALA (Table 5 ) were used to compute cluster dendrograms. For those clusters connected by significant E-values, we considered the correlation between E-value and evolutionary distance and computed a "distance" matrix from the E-values. This matrix was read into the Fitch program of the PHYLIP (https://evolution.genetics.washington.edu/phylip.html ) package⁵⁹ . We set all branches to uniform length, so as not to imply that the dendrograms represent actual evolutionary distance. We used Drawtree, also from the Phyllip package, to generate the graphical dendrograms, shown in Figure 2 .

Results

Details of the 34 seed sequences for iterative BLAST (INCA) are provided in Table 1 . Minimal and maximal E-values were selected to provide nonoverlapping sequence clusters of highly related sequences. These were further divided into subclusters as described (Table 2 ). The most abundant residues at each position for clusters and subclusters can be viewed by clicking on the respective cluster name. Subsequently, a gapped BLAST analysis was performed with each sequence in each cluster against all other sequences. While the default substitution matrix (BLOSUM62) may not be optimal for comparing membrane proteins, it achieves an estimate of relatedness among sequence clusters that can serve for further analysis and for comparison with the results obtained with the PSI-BLAST approach discussed below.

The results of the gapped BLAST analysis are listed in Table 3 , providing the scores of the best alignment between any sequences in 2 different clusters. Significant E-values were determined as follows. The best E-value involving a negative control provided a first approximation of significance, 8e-5, 5h1a vs. patc . The level of significance was made more stringent for inexplicable alignments involving noncorresponding TMSs among putative GPCRs. For the BLAST result, this alignment was 5h1a vs. friz and resulted in the value of 1e-5. Therefore, we considered BLAST alignments with E-values < 1e-5 as indicating probable homology (colored green) and those with E-values = 8e-5 as possible homology (colored yellow). (Note that in the NCBI nonredundant database, these terms would be defined differently due to the differing sizes of the datasets.) To view the actual alignment, click on the highlighted cells in Table 3 . Overall, the rather sparse distribution of significant alignments among GPCR families provides a striking picture of how diverse these sequences are, despite similar inferred architecture and function. These BLAST results do suggest a relationship not previously recognized, that the fungal pheromone receptor ste2 may be related to the mammalian mgr-like pheromone receptors (mgr1 vs. ste2 ).

In evaluating sequence alignments between potentially related GPCR proteins (highlighted in green if E-value < 1e-5), we also considered the location of the alignment in the 7-TMS structure. Thus, the alignments have been annotated to show the estimated location of the TMSs (click on the cells in Table 3 ). Alignments between corresponding TMSs (eg, TMS1 versus TMS1) provide additional support for a finding of homology, whereas alignments of noncorresponding TMSs would weaken the argument.

The clusters and subclusters then served to establish PPSMs for each cluster. Thus, the PPSM was strictly confined to a set of clearly related proteins, without possibility of introducing bias, because of the acquisition of questionable sequences into the substitution matrix. A single iteration of PSI-BLAST was performed using each PPSM to query a database of sequences, and the best scores between each cluster are listed in Table 4 . The results differ in some details with those obtained by gapped BLAST (Table 3 ), but they agree with each other overall.

For the PSI-BLAST result, the best E-value involving a negative control was 0.001, vipr vs. pet1.0 . The level of significance was made more stringent for inexplicable alignments involving noncorresponding TMSs among putative GPCRs. For the PSI-BLAST result, this alignment was l vipr vs. sre1.0 and resulted in the value of 2e-5. For PSI-BLAST, we considered alignments with E-values < 2e-5 as probable homology (colored green) and those with E-values = 0.001 as possible homology (colored yellow). To view the actual alignment, click on the highlighted cells in Table 4 .

Scores between E-values 10-3 and 10-4 obtained with gapped BLAST for several alignments between negative controls and GPCRs deteriorated to above 10-3 upon using PSI-BLAST (eg, er21 vs. 5ht, from 2e-4 to 0.022). This is consistent with the notion that these sequences are probably unrelated and that the use of the restricted PPSM is more discriminate than gapped BLAST in the same setting. The best score for an alignment between the negative controls and GPCRs (vipr vs. pet1, 0.001, glucagon receptor vs. oligopeptide transporter) can be analyzed further by comparing the reverse score for pet1 vs. vipr, which gave an E-value of only 0.94 (Table 4 ). This discrepancy arises because the PPSMs are based on different clusters. If these clusters were related, the E-values would not be expected to differ substantially, as the PPSMs would tend to converge. This result underscores the relevance of choosing an appropriate substitution matrix.

Table 4 contains numerous significant alignments that suggest common ancestry between certain clusters. This is further buttressed by the fact that all optimal alignments between clusters are among corresponding positions in the 7-TMS structure. For further evaluation of these alignments, we have provided hyperlinks in Table 2 to the conserved residues within the various clusters and subclusters, annotated as to their location in the 7-TMS structure. Using our database, one can thus address the question of whether alignments between sequences in different clusters occur in the most highly conserved regions.

Our results confirm those of Josefsson¹⁷ and extend the results of that study by providing additional links between gene families thought to be unrelated. Specifically, the VIP/secretin receptor family - and latrophilin receptors - yielded significant alignments with the cAMP receptors (car1 vs. vipr , car1 vs. latr ) and the main GPCR rhodopsin family A (vipr vs. 5h1a , latr vs. 5h1a ) (E-values 10-5 to 10-10). The frizzled proteins showed similarity to VIP (friz vs. vipr ), cAMP (car1 vs. friz ), and latrophilin receptors (friz vs. latr ). Similarly the ste3 fungal pheromone receptors scored well against VIP (ste3 vs. vipr ), cAMP (car1 vs. ste3 ), and 5-ht receptors (5h1a vs. ste3 , ste3 vs. 5ht ).

The metabotropic glutamate receptors, including GABA-B and raig receptors, and boss yielded good alignments with each other. The mgr1 group members possess large N-termini, apparently related to the periplasmic binding proteins of gram-negative bacteria⁶⁰ ). Even though the N-termini showed significant sequence similarities, further similarities extended into the 7-TMS structure. This suggests that these receptors are related not just because they share a common N-terminus, but also because of a homologous 7-TMS structure. In comparison to other GPCR families, 2 alignments between the mgr1 cluster and clusters in the rhodopsin-like superfamily reached values indicative of probable homology. The best E-values were attained in comparing mgr1 vs. odr (8e-7) and mgr1 vs. 5h1a (2e-5). The second alignment involves a pheromone receptor from the mgr1 cluster with 8 assigned TMSs. It appears likely that the first assigned TMS may be either a signal peptide or part of the large extracellular N-terminus typical of this receptor family; therefore, the alignments shown are between TMSs 1-7 and 2-8, respectively, considered to be corresponding TMSs. These alignments provide evidence of probable homology among the 2 large GPCR superfamilies, rhodopsin-like and mgr-like.

Although ste2 and mgr1 scored well in the BLAST analysis, they scored poorly in the PSI-BLAST analysis. Similarly, the ste2 pheromone receptors scored poorly against the rhodopsin-like receptors. The putative GPCR p40 also failed to show any sequence similarity to known GPCRs. A separate PSI-BLAST analysis of p40 on the NCBI nonredundant database revealed significant similarities to a cluster of sequences containing, for example, the nisin biosynthesis protein nisC. This protein is thought to be involved in the synthesis of the lantibiotic nisin⁶¹ . P40 was classified as a putative GPCR on the basis of its hydropathy profile and conserved cysteine residues⁴⁰ . As the nisC protein is not thought to be a GPCR, but rather is involved in the biosynthesis or translocation of nisin, this casts doubt on the notion that p40 is a GPCR. The putative GPCR pm1 also failed to show any sequence similarity to GPCRs (data not shown). A separate PSI-BLAST analysis of pm1 on the NCBI nonredundant database did not reveal significant similarities to any other sequences (data not shown).

Applying the principle of transitive closure to the PSI-BLAST results in Table 4 indicative of probable homology, we conclude that our analyses support common ancestry of the following clusters: Group 1: 5h1a, 5ht, car1, friz, latr, oa1, odr, odr10, olf1, pe22, sra1, srb6, srd1, sre1, srg1, sro1, ste3, vipr, vn1, and boss, gabab, mgr, mgr1, raig; Group 2: ste2; Group 3: gpr1; Group 4: bacr, yro2; Group 5: p40; Group 6: pm1.

Using IMPALA, we generated a database of PPSMs. Each sequence was compared to the PPSM database, and the best scores between each cluster are listed in Table 5 . Although IMPALA uses an improved alignment strategy relative to the currently available versions of PSI-BLAST²⁰ , the results generally agree with those obtained by PSI-BLAST (Table 4 ). For the IMPALA results, the best E-value involving a negative control was 0.016, mgr1 vs. patc.1 . The level of significance was made more stringent for inexplicable alignments involving noncorresponding TMSs among putative GPCRs. For the IMPALA results, this alignment was odr10 vs. 5ht.2 and resulted in the value of 0.006. Therefore, we considered IMPALA alignments with E-values < 0.006 as probable homology (colored green) and those with E-values = 0.016 as possible homology (colored yellow). These IMPALA results do suggest a relationship not previously recognized, that the yeast glucose receptor gpr1 may be related to the odorant receptor of C elegans (gpr1 vs. odr10 ).

For the Pfam/HMMER result, the best E-value involving a negative control was 0.044, psn1 vs. 7tm_3 , further lowering the threshold value for alignments among apparently unrelated sequences. The level of significance was made more stringent for inexplicable alignments involving noncorresponding TMSs. For the Pfam/HMMER result, this alignment was 5h1a vs. Bac_rhodopsin and resulted in the value of 0.031. For Pfam/HMMER, we considered alignments with E-values < 0.031 as probable homology (colored green) and those with E-values = 0.044 as possible homology (colored yellow). To view the actual alignment, click on the highlighted cells in Table 6 . A separate analysis using fragment Pfam models, allowing the query sequence to match only a part of the alignment model, revealed fewer findings of probable homology than those using the standard models (data not shown). The fact that most of the sequences under our analysis are full-length sequences may account for this. The 7tm_1 model corresponds with the clusters 5h1a, 5ht, olf1, and pe22. The 7tm_2 model corresponds with the latr and vipr clusters. The 7tm_3 model corresponds with the mgr1 and gabab cluster. The 7tm_4 model corresponds with the odr10 cluster. The 7tm_5 model corresponds with the odr cluster. The STE3 receptors remain unlinked with the rhodopsin-like GPCRs. Likewise, the STE2 receptor is not linked to other GPCRs. Finally, the Sra serpentine receptors of C elegans also appear as an isolated family.

Generally, we were able to demonstrate more distant relationships using our PPSMs with PSI-BLAST or IMPALA than using the standard Pfam models with HMMER. A possible explanation for this is that a single Pfam model may correspond to several clusters/subclusters that are represented by a set of PPSMs. In this way, our PPSMs may be able to capture more of the variation associated with a clade. A source of variability in the Pfam/HMMER analysis stems from the fact that the standard Pfam 5.4 release consists of both hmmls (global) and hmmfs (local) models. The hmmls models are the default for hmmbuild and specify a global alignment with respect to the model, but a (multiply) local alignment with respect to the sequence. The hmmfs models are built using the -f flag and specify a multi-hit local alignment. The PSI-BLAST and IMPALA PPSMs specify single-hit local alignments.

We evaluated the performance differences between the various methods. We considered all possible E-value cutoffs and plot error versus coverage for BLAST, IMPALA, and HMMER (Figure 1 ). Two distinct HMMER analyses were performed, those using models built with the default parameters, specifying a global alignment with respect to the model (hmmls) and those built with the -f flag specifying a local alignment with respect to the model (hmmfs). In some cases, our PPSMs outperformed the HMMER Pfam models (Figure 1A ), while in other cases, the HMMER Pfam models outperformed our PPSMs (Figure 1B ). Generally, BLAST was the most sensitive method over the range of E-values. The hmmfs (local) Pfam models performed worse than the hmmls (global) models, probably because allowing a sequence to match part of a model increases the background noise. Our results stand in contrast to those of Park et al⁶² , who found that profiles clearly outperform pairwise methods. Our improved BLAST results may be because we use only the top-hit to classify a sequence.

BLAST (Table 3 ), PSI-BLAST (Table 4 ), and IMPALA (Table 5 ) results were used to generate a cluster dendrogram on the basis of E-values. Shown in Figure 2 , the GPCRs analyzed in this study appear to cluster into 4 large branches. The first branch consists of the main family of GPCRs (rhodopsin, biogenic amine, peptide receptor, etc.). A second branch consists of vipr-like GPCRs. The third branch consists of mgr-like receptors. The fourth branch appears to be a radiation confined to nematode. The ste2, ste3, and gpr1 clusters from yeast may form the roots of the dendrogram, connecting all the branches, but a finding of direct homology among the ste2, ste3, and gpr1 clades remains to be verified.

Conclusion

In understanding the origins of GPCRs, we may examine their occurrence in yeast, the simplest organism in which they are known to occur. The currently known yeast GPCRs consist of 2 pheromone receptors (ste2 and ste3) and a glucose receptor (gpr1) that show no significant similarities with one another. The ste3 receptor shows significant sequence similarity to the main family of GPCRs (rhodopsin, biogenic amine, peptide receptor, etc.). According to our BLAST analysis, the ste2 receptor shows low sequence similarity to the mgr-like receptors in higher organisms. A separate PSI-BLAST analysis of ste2 (STE2_SACKL) on the NCBI nonredundant database also finds links to many mgr-like mammalian pheromone receptors, thus reinforcing the theory that ste2 is related to the mgr-like GPCRs (data not shown). Finally, the gpr1 receptor appears to share some sequence similarity with the odorant receptors of nematode, according to our PSI-BLAST and IMPALA analyses.

As yeast expresses 2 disparate pheromone receptors, the mammalian vomeronasal organ similarly expresses 2 disparate pheromone GPCR families, apical (mgr1-like) and basal (vn1) receptors. Both ste3 and vn1 are rhodopsin-like, whereas the ste2 family and apical vomeronasal receptors appear mgr-like. Thus it appears possible that GPCRs are composed of 2 superfamilies, present in eukaryotes, arising from divergent yeast pheromone receptors. Perhaps as the number of available ste2 and gpr1 sequences grows (currently only 4 and 2 sequences, respectively), it will be possible to demonstrate homology between the yeast receptors, ste2, ste3, and gpr1 with a greater degree of confidence.

Our analyses support the hypothesis of a common origin for the many disparate families of GPCRs. The GPCRs analyzed in this study fall into 2 broad superfamilies: 1) rhodopsin-like: 5h1a, 5ht, car1, friz, latr, oa1, odr, odr10, olf1, pe22, sra1, srb6, srd1, sre1, srg1, sro1, ste3, vipr, and vn1; and 2) mgr-like: boss, gabab, mgr, mgr1, raig, and ste2. Our analyses provide evidence for a finding of probable or possible homology between these 2 broad superfamilies. The yeast glucose receptor, gpr1, appears distinct from these 2 superfamilies, although it does show low but significant similarity to 1 cluster of the rhodopsin-like superfamily. Its exact relationship to other GPCRs is unclear. It is possible that it represents a primordial GPCR or that it evolved from either ste2 or ste3 or that it represents an entirely separate branch of GPCRs. The common ancestry of bacr and yro2 has already been addressed¹⁸ . Our analyses do not support the hypothesis that bacteriorhodopsins and GPCRs share a common ancestor. Lastly, the putative GPCRs p40 and pm1 show no similarity to any of the sequences analyzed and, in fact, may not represent true GPCRs at all. The prediction that p40 is not a GPCR is supported by a recent report that p40 serves as a peripheral membrane protein related to the lantibiotic synthetase component C, rather than a GPCR⁶³ .

These results confirm and extend those of Josefsson¹⁷ . Our studies support Josefsson's finding of a large superfamily (rhodopsin-like) consisting of the following families: family A: rhodopsin, olfactory, biogenic amine, peptide receptors; family B: VIP, calcitonin, glucagon, secretin receptors; family E: fungal pheromone A- and M-factor (STE3/MAP3) receptors; family F: cAMP receptors; Arabidopsis thaliana receptors; frizzled and smoothened receptors; basal vomeronasal receptors; and ocular albinism receptors. Our studies also support Josefsson's finding of a smaller second superfamily (mgr-like) consisting of the following families: family C: metabotropic glutamate, Ca2+-sensing, GABA, apical vomeronasal receptors; and BOSS. In contrast to the results of Josefsson, our studies suggest that STE2 (family D) is not a distinct third superfamily, but is rather a distant member of the superfamily of MGR-like GPCRs. Our studies also examined known or putative GPCR families in addition to those in the Josefsson study. The retinoic acid induced gene (raig) belongs to the MGR-like superfamily.The fungal glucose receptor, gpr1, a known GPCR, represents a distinct superfamily of GPCRs. Our results predict that the putative GPCRs p40 and pm1 are not actual GPCRs. Moreover, our study finds significant sequence similarity between the rhodopsin-like, MGR-like, and gpr1 GPCRs, thus supporting the hypothesis that all GPCRs arose from a common ancestor.

The BLAST and PSI-BLAST/IMPALA methods complement each other, as we were able to demonstrate relationships with one where we could not with the other. Overall, BLAST demonstrated better performance over a range of E-values than did the profile-based methods. This result is in contrast to that of Park et al⁶² , who found that profile methods performed better. This difference may be accounted for by the fact that Park et al considered all BLAST hits, while we considered only the top BLAST hit for any query. We found that BLAST performance decreased substantially by considering other than the top-hit (data not shown). While it seems counterintuitive that BLAST outperforms profile-based methods, an explanation may be as follows. Profile-based methods (as well as the BLAST all-hits method) tend to capture the average, while the BLAST top-hit method is able to capture the outlier. Thus, divergence and chance work to the advantage of top-hit methods. These results demonstrate the necessity of using both profile- and pairwise-based methods to avoid false conclusions about evolutionary relationships.

In this study, we constructed a database of PPSMs. The performance of our PPSMs was comparable with the corresponding Pfam models when analyzing distant relationships. On the other hand, our library of PSI-BLAST profiles can also be used in database annotation, for example, to assign an orphan receptor by highest sequence similarity to one of the many families of GPCRs. This is a particularly important issue, as the completion of the sequencing of the human genome will generate an abundance of orphan receptors without known functions or ligands. Within a narrowly defined family of GPCRs, we expect to find common sequence motifs that are reflected in our scoring matrices. Orphan receptors with similar functions would be expected to share these motifs, which would be weighted more heavily in assigning an orphan receptor to a subcluster. Therefore, sequence alignments of orphan GPCRs with the use of our clustered database and cluster-specific PPSMs could help identify the relevant subfamilies with characteristic conserved residues and point the way to potential ligands and physiological functions. The approach taken in this study is particularly useful in analyzing large gene families and distant evolutionary relationships.

Acknowledgements

We acknowledge Incyte Genomics, Palo Alto, CA, for the use of their computers. This study was supported in part by grant GM43102 from the National Institutes of Health.

Glossary

BLAST - basic local alignment search tool

Entrez - sequence server

GPCR - G protein-coupled receptor

HMM - hidden Markov model

HMMER - HMM software package

IMPALA - integrating matrix profiles and local alignments; profile software package

INCA - iterative neighborhood cluster analysis; iterated BLAST

NCBI - National Center for Biotechnology Information

PDZ - protein interaction domain

Pfam - protein family; hand-curated HMMER models

PPSM - PSI-BLAST-constructed position-specific score matrix

PSSM - position-specific score matrix

PSI-BLAST - position specific iterated BLAST

SwissProt - annotated protein sequence database

TMS - transmembrane segment

References

1. Riek RP, Handschumacher MD, Sung SS, et al. Evolutionary conservation of both the hydrophilic and hydrophobic nature of transmembrane residues. J Theor Biol. 1995;172(3):245-258. [PUBMED]

2. Kolakowski LF Jr. GPCRDb: a G-protein-coupled receptor database. Receptors Channels. 1994;2:1-7. [PUBMED]

3. Horn F, Weare J, Beukers MW, et al. GPCRDB: an information system for G protein coupled receptors. Nucleic Acids Res. 1998;26:275-279. [PUBMED]

4. Bargmann CI. Olfactory receptors, vomeronasal receptors, and the organization of olfactory information. Cell. 1997;90(4):585-587. [PUBMED]

5. Slusarski DC, Corces VG, Moon RT. Interaction of Wnt and a frizzled homologue triggers G-protein-linked phosphatidylinositol signalling. Nature. 1997;390(6658):410-413. [PUBMED]

6. Barnes MR, Duckworth DM, Beeley LJ. Frizzled proteins constitute a novel family of G protein-coupled receptors, most closely related to the secretin family. Trends Pharmacol Sci. 1998;19(10):399-400. [PUBMED]

7. Robertson HM. Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss. Genome Res. 1998;8(5):449-463. [PUBMED]

8. Sugita S, Ichtchenko K, Khvotchev M, Sudhof TC. Alpha-latrotoxin receptor CIRL/latrophilin 1 (CL1) defines an unusual family of ubiquitous G-protein-linked receptors: G-protein coupling not required for triggering exocytosis. J Biol Chem. 1998;273(49):32715-32724. [PUBMED]

9. Bockaert J, Pin JP. Molecular tinkering of G protein-coupled receptors: an evolutionary success. EMBO J. 1998;18(7):1723-1729.

10. Wang D, Sadee W, Quillan JM. Calmodulin binding to G protein-coupling domain of opioid receptors. J Biol Chem. 1999;274:22081-22088. [PUBMED]

11. Rees DC, DeAntonio L, Eisenberg D. Hydrophobic organization of membrane proteins. Science. 1989;245(4917):510-513. [PUBMED]

12. Persson B, Argos P. Prediction of transmembrane segments in proteins utilising multiple sequence alignments. J Mol Biol. 1994;237(2):182-192. [PUBMED]

13. Persson B, Argos P. Prediction of membrane protein topology utilizing multiple sequence alignments. J Protein Chem. 1997;16(5):453-457. [PUBMED]

14. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool.J Mol Biol. 1990;215(3):403-410. [PUBMED]

15. Madden TL, Tatusov RL, Zhang J. Applications of network BLAST server. Methods Enzymol. 1996;266:131-141. [PUBMED]

16. Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389-3402. [PUBMED]

17. Josefsson LG. Evidence for kinship between diverse G-protein coupled receptors. Gene. 1999;239(2):333-340. [PUBMED]

18. Graul RC, Sadée W. Evolutionary relationships among proteins probed by an iterative neighborhood cluster analysis (INCA): alignment of bacteriorhodopsins with the yeast sequence YRO2. Pharm Res. 1997;14(11):1533-1541. [PUBMED]

19. Eddy SR. Multiple alignments and sequence searches. Trends Guide to Bioinformatics. Elsevier Science, Trends Supplement: 15-18.

20. Schaffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF. IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics. 1999;15(12):1000-1011. [PUBMED]

21. Durbin R, Eddy S, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press; 1998.

22. Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL. The Pfam protein families database. Nucleic Acids Res. 2000;28(1):263-266. [PUBMED]

23. Kobilka BK, Frielle T, Collins S, et al. An intronless gene encoding a potential member of the family of receptors coupled to guanine nucleotide regulatory proteins. Nature. 1987;329(6134):75-79. [PUBMED]

24. Olde B, McCombie WR. Molecular cloning and functional expression of a serotonin receptor from Caenorhabditis elegans. J Mol Neurosci. 1997;8(1):53-62. [PUBMED]

25. Dunn RJ, Hackett NR, Huang KS, et al. Studies on the light-transducing pigment bacteriorhodopsin. Cold Spring Harb Symp Quant Biol. 1983;48(Pt 2):853-862.

26. Hart AC, Kramer H, Van Vactor DLJ, Paidhungat M, Zipursky SL. Induction of cell fate in the Drosophila retina: the bride of sevenless protein is predicted to contain a large extracellular domain and seven transmembrane segments. Genes Dev. 1990;4(11):1835-1847. [PUBMED]

27. Klein PS, Sun TJ, Saxe CL 3rd, Kimmel AR, Johnson RL, Devreotes PN. A chemoattractant receptor controls development in Dictyostelium discoideum. Science. 1988;241(4872):1467-1472. [PUBMED]

28. Lewis MJ, Pelham HR. A human homologue of the yeast HDEL receptor. Nature. 1990;348(6297):162-163. [PUBMED]

29. Vinson CR, Conover S, Adler PN. A Drosophila tissue polarity locus encodes a protein containing seven potential transmembrane domains. Nature. 1989;338(6212):263-264. [PUBMED]

30. Alcedo J, Ayzenzon M, Von Ohlen T, Noll M, Hooper JE. The Drosophila smoothened gene encodes a seven-pass membrane protein, a putative receptor for the hedgehog signal. Cell. 1996;86(2):221-232. [PUBMED]

31. Clark JA, Mezey E, Lam AS, Bonner TI. Distribution of the GABAB receptor subunit gb2 in rat CNS. Brain Res. 2000;860(1-2):41-52. [PUBMED]

32. Yun CW, Tamaki H, Nakayama R, Yamamoto K, Kumagai H. G-protein coupled receptor from yeast Saccharomyces cerevisiae. Biochem Biophys Res Commun. 1997;240(2):287-292. [PUBMED]

33. Kraakman L, Lemaire K, Ma P, et al. A Saccharomyces cerevisiae G-protein coupled receptor, Gpr1, is specifically required for glucose activation of the cAMP pathway during the transition to growth on glucose. Mol Microbiol. 1999;32(5):1002-1012. [PUBMED]

34. White GR, Varley JM, Heighway J. Isolation and characterization of a human homologue of the latrophilin gene from a region of 1p31.1 implicated in breast cancer. Oncogene. 1998;17(26):3513-3519. [PUBMED]

35. Abe T, Tanemoto M, Nishio T, Hebert SC (unpublished). Metabotropic glutamate-like sequence in C. elegans.

36. Desai MA, Burnett JP, Mayne NG, Schoepp DD. Cloning and expression of a human metabotropic glutamate receptor 1 alpha: enhanced coupling on co-transfection with a glutamate transporter. Mol Pharmacol. 1995;48(4):648-657. [PUBMED]

37. Bassi MT, Schiaffino MV, Renieri A, et al. Cloning of the gene for ocular albinism type 1 from the distal short arm of the X chromosome. Nature Gen. 1995;10(1):13-19. [PUBMED]

38. Sengupta P, Chou JH, Bargmann CI. odr-10 encodes a seven transmembrane domain olfactory receptor required for responses to the odorant diacetyl. Cell. 1996;84(6):899-909.

39. Buck L, Axel R. A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell. 1991;65(1):175-187. [PUBMED]

40. Mayer H, Salzer U, Breuss J, Ziegler S, Marchler-Bauer A, Prohaska R. Isolation, molecular characterization, and tissue-specific expression of a novel putative G protein-coupled receptor. Biochim Biophys Acta. 1998;1395(3):301-308. [PUBMED]

41. Murphy PM, Malech HL. Nucleotide sequence of a cDNA encoding a protein with primary structural similarity to G-protein coupled receptors. Nucleic Acids Res. 1990;18(7):1896. [PUBMED]

42. Hooper JE, Scott MP. The Drosophila patched gene encodes a putative membrane protein required for segmental patterning. Cell. 1989;59(4):751-765. [PUBMED]

43. Regan JW, Bailey TJ, Pepperl DJ, et al. Cloning of a novel human prostaglandin receptor with characteristics of the pharmacologically defined EP2 subtype. Mol Pharmacol. 1994;46(2):213-220. [PUBMED]

44. Liang R, Fei YJ, Prasad PD, et al. Human intestinal H+/peptide cotransporter: cloning, functional expression, and chromosomal localization. J Biol Chem. 1995;270(12):6456-6463. [PUBMED]

45. Sherrington R, Rogaev EI, Liang Y, et al. Cloning of a gene bearing missense mutations in early-onset familial Alzheimer's disease. Nature. 1995;375(6534):754-760. [PUBMED]

46. Cheng Y, Lotan R. Molecular cloning and characterization of a novel retinoic acid-inducible gene that encodes a putative G protein-coupled receptor. J Biol Chem. 1998;273(52):35008-35015. [PUBMED]

47. Troemel ER, Chou JH, Dwyer ND, Colbert HA, Bargmann CI. Divergent seven transmembrane receptors are candidate chemosensory receptors in C. elegans. Cell. 1995;83(2): 207-218. [PUBMED]

48. Burkholder AC, Hartwell LH. The yeast alpha-factor receptor: structural properties deduced from the sequence of the STE2 gene. Nucleic Acids Res. 1985;13(23):8463-8475. [PUBMED]

49. Kitamura K, Shimoda C. The Schizosaccharomyces pombe MAM2 gene encodes a putative pheromone receptor which has a significant homology with the Saccharomyces cerevisiae STE2 protein. EMBO J. 1991;10(12):3743-3751.

50. Hagen DC, McCaffrey G, Sprague GF Jr. Evidence the yeast STE3 gene encodes a receptor for the peptide pheromone a factor: gene sequence and implications for the structure of the presumed receptor. Proc Natl Acad Sci U S A. 1986;83(5):1418-1422. [PUBMED]

51. Tanaka K, Davey J, Imai Y, Yamamoto M. Schizosaccharomyces pombe map3+ encodes the putative M-factor receptor.Mol Cell Biol. 1993;13(1):80-88. [PUBMED]

52. Sreedharan SP, Patel DR, Huang JX, Goetzl EJ. Cloning and functional expression of a human neuroendocrine vasoactive intestinal peptide receptor. Biochem Biophys Res Commun. 1993;193(2):546-553. [PUBMED]

53. Dulac C, Axel R. A novel family of genes encoding putative pheromone receptors in mammals. Cell. 1995; 83(2):195-206. [PUBMED]

54. Feldmann H, Aigle M, Aljinovic G, et al. Complete DNA sequence of yeast chromosome II. EMBO J. 1994;13(24):5795-5809.

55. Aljinovic G, Pohl TM. Sequence and analysis of 24 kb on chromosome II of Saccharomyces cerevisiae. Yeast. 1995;11(5):475-479. [PUBMED]

56. Bieszke JA, Braun EL, Bean LE, Kang S, Natvig DO, Borkovich KA. The nop-1 gene of Neurospora crassa encodes a seven transmembrane helix retinal-binding protein homologous to archaeal rhodopsins. Proc Natl Acad Sci U S A. 1999;96(14):8034-8039. [PUBMED]

57. Tusnady GE, Simon I. Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol. 1998;283(2):489-506. [PUBMED]

58. Brenner SE, Chothia C, Hubbard TJP. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci U S A. 1998;95:6073-6078. [PUBMED]

59. Retief JD. Phylogenetic analysis using PHYLIP. Methods Mol Biol. 2000;132:243-258. [PUBMED]

60. Felder CB, Graul RC, Lee AY, Merkle H-P, Sadee W. The venus flytrap of periplasmic binding proteins: an ancient protein module present in multiple drug receptors. AAPS PharmSci. 1999;1(2) Available from: https://www.aapspharmaceutica.com/scientificjournals/pharmsci/journal/venus/index.html

61. Engelke G, Gutowski-Eckel Z, Hammelmann M, Entian KD. Biosynthesis of the lantibiotic nisin: genomic organization and membrane localization of the NisB protein. Appl Environ Microbiol. 1992;58(11):3730-3743. [PUBMED]

62. Park J, Karplus K, Barrett C, et al. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol. 1998;284:1201-1210. [PUBMED]

63. Bauer H, Mayer H, Marchler-Bauer A, Salzer U, Prohaska R. Characterization of p40/GPR69A as a peripheral membrane protein related to the lantibiotic synthetase component C. Biochem Biophys Res Commun. 2000;275(1):69-74. [PUBMED]