Lessons and Challenges of Building Data Repositori

Description

Lessons and Challenges of Building Data Repositories Ken Buetow NCICB/NCI/NIH/DHHS,Experience with Diverse Communities Human gene mapping community Cancer Genome Anatomy Project (CGAP) Mouse Models of Human Cancer Consortium Director’s Challenge consortium SPORE community Clinical trials community Imaging community Integrated Cancer Biology Program Cancer Biomedical Informatics Grid (caBIG)

Comments
Would you like to comment?

Sign In if already a member, or Join Now for a free account.

Presentation Transcript Presentation Transcript

Lessons and Challenges of Building Data Repositories : Lessons and Challenges of Building Data Repositories Ken Buetow NCICB/NCI/NIH/DHHS

Experience with Diverse Communities : Experience with Diverse Communities Human gene mapping community Cancer Genome Anatomy Project (CGAP) Mouse Models of Human Cancer Consortium Director’s Challenge consortium SPORE community Clinical trials community Imaging community Integrated Cancer Biology Program Cancer Biomedical Informatics Grid (caBIG)

Lessons learned… : Lessons learned… Know what problem you are attempting to solve… What is your goal? Who is your “customer”? What do they need? Compared with what they want… How will they use a given feature/attribute Maintain focus/discipline

NCI biomedical informatics : NCI biomedical informatics Goal: A virtual web of interconnected data, individuals, and organizations that redefines how research is conducted, care is provided, and patients/participants interact with the biomedical research enterprise

Cancer Biomedical Informatics Grid (caBIG): the program… : Cancer Biomedical Informatics Grid (caBIG): the program… Common, widely distributed infrastructure permits cancer research community to focus on innovation Shared vocabulary, data elements, data models facilitate information exchange Collection of interoperable applications developed to common standard Raw published cancer research data is available for mining and integration

caBIG: the pilot… : caBIG: the pilot… Workspaces Clinical Trials Management System Integrated Cancer Research Tissue Banks and Pathology Vocabulary and Common Data Elements Architecture Strategic Working Groups Data Sharing and Intellectual Capital Training caBIG Strategic Planning Special Interest Groups 23 groups focused on specific topics

caBIG pilot - participation : caBIG pilot - participation Pilot – NCI designated Cancer Centers Members: 45 institutions – executed base agreements developers adopters working group members Statistics Over 450 active participants 196 teleconferences 10 face-to-face meetings Volunteers academic Centers industry Partners Affiliates

Lessons learned… : Lessons learned… Open is good! Data sharing Open source code Open access “Do no harm” licenses

Lessons Learned… : Lessons Learned… Today’s tools are not likely to be tomorrow’s Killer app’s Accessible, useful, user friendly apps critical to adoption Not always the best approach (Eisen’s cluster analysis) Design infrastructure that facilitates rapid exploration of new methods Open source Isolate data from applications Component architecture

Components: software parts : Small parts are better for building flexible shapes Have a uniform interface medium Snap-together connectivity Internals can be made from widely varying technologies Components: software parts

Boundaries and Interfaces : Boundaries and Interfaces focus on boundaries, interfaces, how things fit together, not on the internal details of how they’re built: assume that will be diverse & changing

Lessons Learned… : Lessons Learned… Standards versus standardization Data standards Use established standards where they exist Modify/extend existing standards where ever possible Develop new standards “just in time”, based on practical experience of large-scale users Create new standards as necessary – “just enough” Standards can NOT be proprietary

caCORE – common ontologic representation environment : caCORE – common ontologic representation environment Metadata Infrastructure

Enterprise Vocabulary : Enterprise Vocabulary NCI Meta-Thesaurus (Cross-map standard vocabularies/ontologies, e.g. SNOMED, MEDRA, ICD) Semantic integration, inter-vocabulary mapping UMLS Metathesaurus extended with cancer-oriented vocabularies 800,000 Concepts, 2,000,000 terms and phrases Mappings among over 50 vocabularies NCI Thesaurus Description logic-based 18,000 “Concepts” Concept is the semantic unit One or more terms describe a Concept – synonymy Semantic relationships between Concepts biomedical objects common data elements controlled vocabulary

Common Data Elements : Common Data Elements Structured data reporting elements Precisely defining the questions and answers What question are you asking, exactly? What are the possible answers, and what do they mean? biomedical objects common data elements controlled vocabulary

Biomedical Information Objects : Biomedical Information Objects Data service infrastructure developed using OMG’s Model Driven Architecture approach Object models expressed in UML represent actual biomedical research entities such as genes, sequences, chromosomes, sequences, cellular pathways, ontologies, clinical protocols, etc. The object models form the basis for uniform APIs (Java, SOAP, HTTP-XML, Perl) that provide an abstraction layer and interfaces for developers to access information without worrying about the back-end data stores biomedical objects common data elements controlled vocabulary

Standards supporting infrastructure : Standards supporting infrastructure Enterprise Vocabulary Services (EVS) Browsers APIs cancer Bioinformatics Infrastructure Objects (caBIO) Applications APIs cancer Data Standards Repository (caDSR) CDEs Case Report Forms Object models ISO 11179 model caCORE Software Development Toolkit

caBIG Compatibility Matrix : caBIG Compatibility Matrix

Lessons Learned… : Lessons Learned… Quality measures are transforming Qualitative and quantitative Objective measures critical Should track with the data

Lessons Learned… : Lessons Learned… The devil is in the details Experimental inputs can be as critical as important as outputs Laboratory information management systems (LIMS)

Lessons Learned… : Lessons Learned… You really are going to want to connect these results to other outcomes! Other data types Clinical outcomes

Slide23 : etiology, treatment, prevention

caBIG pilot products : caBIG pilot products Tissue Bank and Pathology Tools Workspace caTISSUE architecture and use cases Federated Tissue Data Set White Paper Data Sharing Federation Operational Guidelines (4th quarter 2004) caTIES beta release (1st quarter 2005) caTISSUE Lite prototype (2nd quarter 2005) caTISSUE prototype (2nd quarter 2005) External module connector prototype (2nd quarter 2005) De-identification reports tool operational (4th quarter 2005)

caBIG pilot products : caBIG pilot products Integrated Cancer Research Gene Annotation PIR (2nd quarter 2005) Cancer Molecular Pages (3rd quarter 2005) Function Express (3rd quarter 2005) GoMiner (3rd quarter 2005) HapMap (3rd quarter 2005) SEED (4th quarter 2005) Data Analysis and Statistical Tools Distance-Weighted Discrimination (2nd quarter 2005) Magellan (2nd quarter 2005) VISDA (2nd quarter 2005) Gene Pattern (4th quarter 2005) Translational (Clinical Integration) TrAPSS (3rd quarter 2005) Informatics for Proteomics LIMS (2nd quarter 2005) Q5 (3rd quarter 2005) RProteomics (4th quarter 2005) Microarray Repositories caArray (4th quarter 2004) NCI-60 Data Sharing (2nd quarter 2005) Zebrafish Mircroarray Data Sharing (2nd quarter 2004) Pathways Cytoscape/BioPAX/cPath (3rd quarter 2005) QPACA (3rd quarter 2005) Reactome (4th quarter 2005)

Interacting with caBIG : Interacting with caBIG Track activities and progress on caBIG Web site at http://caBIG.nci.nih.gov Participate in caBIG open meetings to coordinate activities. Work toward making your applications and solutions caBIG compatible. Current guidelines for caBIG compatibility are available on the caBIG Web site Use caCORE infrastructure – use EVS, CDEs, and models where defined; register meta-data in caDSR (http://ncicb.nci.nih.gov/core ) Download and get familiar with the tools and applications already available on the caBIG Web site. Submit tools, data infrastructure to caBIG repositories

Related Online Classes

Dharmendra Giri
Introduction to Data Warehousing - Part 1 by Dharmendra
Sat, January 10, 09 8:30 PM
(IST)
Dharmendra Giri
Introduction to Data Warehousing - Part 2 by Dharmendra
Sun, January 11, 09 8:30 PM
(IST)
Copyrights © 2009 authorGEN. All rights reserved.