Bioinformatics Search Computing Demo
The last live demonstration prototype of Bio Search Computing (Bio-SeCo) can be freely used at http://www.bioinformatics.deib.polimi.it/bio-seco/seco/
Web search tools have become ubiquitous, with both generic and domain-specific search services providing users with rapid and selective access to data from potentially huge repositories. However, individual search tools are often ineffective for use in applications in which the answer to a request involves combining results from more than one search engine. In particular, Web search services typically seek individual documents that meet the criteria specified in a request, whereas in practice information relevant to a requirement may be spread over several resources.
Search computing provides a platform for expressing requests over multiple search services, such that the results of the integrated requests take account of the rankings of individual search results.
In the Life Sciences, many resources provide vertical search capabilities, in that they are focused on a single topic. Several Life Science services provide ranked data as results, where the ranking may reflect a property of an algorithm (e.g. a similarity score) or of an experimental result (e.g. an expression level). Furthermore, it is often essential to combine multiple vertical search services to create multi-topic searches, where the different topic searches either refine or augment previous results.
This demo explores the application of a search computing platform in a bioinformatics use case, with a view to identifying the extent to which the existing platform for multi-topic search provides useful facilities for representing and integrating bioinformatics search services and support complex biomedical question answering and biomedical knowledge discovery.
In the Life Sciences, numerous questions can be addressed only by comprehensively searching different topic data that are inherently ordered, or are associated with ranked confidence values. By using available Web services for searching bioinformatics data and taking advantage of the attributes they define for providing a ranking, search computing techniques can be applied to efficiently search for globally ranked answers to such complex questions.
This online prototype shows a case study of the use of a domain independent search computing platform for describing well known bioinformatics resources as search services, and for carrying out integrated analyses over such services. In particular, this makes explicit how ranked data from sequence comparisons and from gene expression results can be integrated in a way that takes account of their order. In so doing, we illustrate the use of ranking as a first class citizen for data integration in the Life Sciences, and identify open issues for further investigation.
The query that will be answered is the following: "Which genes encode proteins, in different organisms, with high sequence similarity to a given protein X and are up/down co-regulated in the same given biological tissue/condition Y?". This multi-topic case study question can be decomposed into the following three single topic sub-queries:
- "Which proteins, in different organisms, have high sequence similarity to a given protein X?";
- "Which genes encode which proteins?";
- "Which genes are up/down co-regulated in the same given biological tissue/condition Y?".
Each of these sub-queries can be mapped to an available search service:
- NCBI-Blast (http://blast.ncbi.nlm.nih.gov/) or WU-BLAST (http://www.ebi.ac.uk/Tools/sss/wublast/), two implementations of BLAST, a well known sequence similarity search algorithm;
- a query service on our Genomic and Proteomic Data Warehouse (GPDW) (http://www.bioinformatics.deib.polimi.it/GPKB/), an integrative data warehouse of genomic and proteomic information;
- a search engine over Array Express Gene Expression Atlas (http://www.ebi.ac.uk/gxa/), a repository of gene expression data.
The user has to submit the ID (e.g. "O75462" or "P26367") and its type (e.g."uniprot") of a protein, the type of differential gene expression regulation looked for (e.g."up in" for up-regulated, "down in" for down-regulated, or "up/down in" for both), and the biological tissue/condition (e.g. "brain" as biological tissue, or "carcinoma", or "tumor" as biological condition) in which the gene expression regulation is evaluated. In response, the systems provides the list of proteins similar in sequence to the given protein, their similarity score (expectation), the list of genes that encode these proteins and their up- or down-expression regulation in the given biological tissue/condition, together with their significance (p-value) based on the experimental gene expression data available in the Array Express Gene Expression Atlas.
The demonstration video of the first Bio Search Computing (Bio-SeCo) prototype, implemented to support answering the above type of complex multi-topic bioinformatics queries, is here below.
Explorative Search
Complex multi-topic questions, such as the one of the use case example discussed above, are typical in Life Sciences, where it is also often interesting to explore the available data that lead to the result given by the complex query created to answer the multi-topic question. This can be done by constructing step-by-step, in an explorative way, the complex query, starting from an initial sub-query and then expanding/refining it with subsequent sub-queries. To support such explorative approach, we constructed a second Bio-SeCo prototype, which is publicly usable at http://www.bioinformatics.deib.polimi.it/bio-seco/seco/
In order to illustrate the advanced features of this Bio-SeCo prototype, let us consider again the case study question "Which genes encode proteins, in different organisms, with high sequence similarity to a given protein X and have up/down co-regulated expression in the same given biological tissue/condition Y?"
Using Bio-SeCo, a user can first input the UniProt ID of a protein X (e.g. the UniProt ID P26367 of the human Paired box protein Pax-6 isoform a protein) and run a sequence alignment search, by using the NCBI Blast (or WU-BLAST) service registered in Bio-SeCo, to look for amino acid sequences similar to the protein X in a user selected protein database (e.g. UniProtKB/Swiss-Prot). Then, he/she can select the most similar proteins found (or some of them, e.g. only those of some selected organisms) and automatically retrieve the coding gene of each of them by using the GPDW Protein coding Gene query service, which has been registered in Bio-SeCo as well. Next, the user can search for biomedical features shared among the retrieved genes. For instance, by using the Array Express service registered in Bio-SeCo, he/she can explore if some of such genes have a significantly up/down co-regulated expression in the same biological tissue or condition Y (e.g. in tumor). At this point, after observing the obtained search results, the user can also decide to expand and refine further his/her search. For example, he/she can decide to explore if any of the genes found also have in common the known involvement in a biological function Z (e.g. the involvement in the regulation of apoptotic process). This can be done by using the GPDW Gene Biological Function Feature annotation service registered in Bio-SeCo.
This is just an example of how is possible to use Bio-SeCo to perform multi-topic explorative searches. The user can start the exploration from any service registered in Bio-SeCo and procede according to the connection patters defined among such services at their registration time. As well, the user can stop the exploration at any step and also (by taking advantage of the history window in Bio-SeCo) decide to go backward, change exploration direction by using a different registered service, or go foreward redoing again the same data exploration expainsion/refinement previously performed, in case using different input parameters. Furthermore, at each step the user can change the relative weight associated with each service/topic included in the expanded/refined search in order to attribute more/less weight, in the global ranked search results, to the outcomes from the diverse composed sources.
More details about Bio-SeCo can be found in the several related publications, in particular in: Masseroli M, Picozzi M, Ghisalberti G, Ceri S. Explorative search of distributed bio-data to answer complex biomedical questions. BMC Bioinformatics 2014; 15(Suppl 1): S3.
The demonstration video of the last Bio-SeCo prototype for explorative search, which supports answering the above type of complex multi-topic bioinformatics queries, is here below.
Bio-SeCo can be freely used at http://www.bioinformatics.deib.polimi.it/bio-seco/seco/
Watch the demonstration video
Download Video: HD Quality (mov 1280x720 ~ 105Mb) High Resolution (m4v 950x540 ~ 81Mb) Mobile (m4v 480x272 ~ 42Mb)
- Printer-friendly version