Logout succeed
Logout succeed. See you again!

Medical Data Sharing, Harmonization and Analytics PDF
Preview Medical Data Sharing, Harmonization and Analytics
Medical Data Sharing, Harmonization and Analytics Vasileios C. Pezoulas Themis P. Exarchos Dimitrios I. Fotiadis AcademicPressisanimprintofElsevier 125LondonWall,LondonEC2Y5AS,UnitedKingdom 525BStreet,Suite1650,SanDiego,CA92101,UnitedStates 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom Copyright©2020ElsevierInc.Allrightsreserved. Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans, electronicormechanical,includingphotocopying,recording,oranyinformationstorageand retrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowtoseek permission,furtherinformationaboutthePublisher’spermissionspoliciesandour arrangementswithorganizationssuchastheCopyrightClearanceCenterandtheCopyright LicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions. Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythe Publisher(otherthanasmaybenotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchand experiencebroadenourunderstanding,changesinresearchmethods,professionalpractices,or medicaltreatmentmaybecomenecessary. Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgein evaluatingandusinganyinformation,methods,compounds,orexperimentsdescribedherein. Inusingsuchinformationormethodstheyshouldbemindfuloftheirownsafetyandthesafety ofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors, assumeanyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproducts liability,negligenceorotherwise,orfromanyuseoroperationofanymethods,products, instructions,orideascontainedinthematerialherein. LibraryofCongressCataloging-in-PublicationData AcatalogrecordforthisbookisavailablefromtheLibraryofCongress BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary ISBN:978-0-12-816507-2 ForinformationonallAcademicPresspublicationsvisitour websiteathttps://www.elsevier.com/books-and-journals Publisher:MaraConner AcquisitionEditor:ChrisKatsaropoulos EditorialProjectManager:AliAfzal-Khan ProductionProjectManager:PunithavathyGovindaradjane CoverDesigner:MilesHitchen TypesetbyTNQTechnologies Preface This book provides a framework for comprehending the fundamental basis of data sharing,dataharmonization,cloudcomputing,machinelearning,anddataanalytics intheclinicaldomain.Therationalebehindmedicaldatasharingcombinedwiththe unmetneedsinchronicdiseasesaredescribedfirst,alongwithpopularframeworks andglobalinitiativesinthefield.Dataprotectionlegislationsarediscussedtostate theproblemoftheethical,legal,andprivacyissues,whichareinvolvedinmedical datasharing,andtodiscusspotentialsolutionsprovidedbyglobalinitiativesinthe domain.Theconceptofdataharmonizationisthendescribed.Cloud infrastructure and security protocols are discussed to enable data sharing and at the same time ensure the privacy of the data. Emphasis is given in data mining, deep learning, machine learning, and visual analytics tools and frameworks to analyze the harmonizeddata.Afterpresentingseveralcasestudieswhichcombinetheconcepts of medical data sharing, harmonization, and analytics, the book concludes with future trends in medical data sharing. More specifically,per chapter: • Chapter1isanintroductorychapterwhichaimstofamiliarizethereaderwiththe fundamentalprinciplesandconceptsbehindmedicaldatasharing,dataprotec- tion,dataharmonization,cloudinfrastructure,anddataanalyticstowardthe establishmentofafederatedcloudplatformtodealwiththeclinicalunmetneeds invariousdiseases. • Chapter 2 presents the most common types and sources of medical data along withdatacollectionstandardsforeachtype.Varioustypesofmedicaldata,such asbiosignals, medical images, omics, and laboratorytests, are extensively describedalongwiththesourcesofsuchtypesofmedicaldataincludingpatient registries,healthsensors,electronichealthrecords,genomeregistries,cohorts, clinicaltrials,andclaims,amongothers.Theevolutionofbigdatainmedicine is also discussed along with different cohortstudydesigns. • Chapter 3 presents the fundamental concept of data sharing as the backbone of anyhealthcare cloud platformtoward the interlinking of medicaldata from differentsources.Methodsforenhancingthequalityofmedicaldataintermsof accuracy, relevance, and completeness, such asoutlierdetectionand dedupli- cation, are also presentedalongwith methods for data standardizationas a preliminarystepfordataharmonization.Emphasisisgivenonthedescriptionof existingdata sharingframeworks and ongoing global initiativesalong with barriersthathamperthevisionofdatasharing,suchasthemisuseoftheshared data, among others. • Chapter4offersthebasisforunderstandingthefundamentalaspectsofpopular data protectionregulations, such as the GeneralData Protection Regulation (GDPR)inEuropeandtheHealthInsurancePortabilityandAccountabilityAct (HIPAA)inthe United States. The legalandethicalissues, which are posed during the sharingofmedical data,are identified along with the common xi xii Preface characteristics ofinternational data protectionregulations, global initiatives, principles,and guidelinestoward the establishment ofthe legalandethical complianceofthe cloudcomputing platformsinhealthcare. • Chapter 5 aims to present the latest technological advances and methods for medical data harmonizationincluding lexicaland semantic matching ap- proaches towardthe identification oflexically similar terms,as well asterms thatshareacommonconceptbetweentwoheterogeneousdatasets.Theconcept ofontologiesis introduced asa reference model todescribethe domain knowledgeofadiseaseofinterest.Theimportanceoftheontologiesisfurther highlightedfor semantic matching during the data harmonizationprocess. Emphasisis givenon global data harmonizationinitiativesandframeworks. • Chapter 6 presents the current advances in the overwhelming field of cloud computingtechnologyinhealthcarealongwiththerelatedchallenges.Popular cloud computing vendorsinhealth care are described along with cloud computing architectures including the Infrastructure as aService (IaaS), the ProviderasaService(PaaS),theSoftwareasaService(SaaS),andtheDataasa Service(DaaS).Emphasisisgivenoninternationalsecurityprotocolstoensure the legal and ethicalcompliance ofa cloud computingplatform, as well ason different typesof data storage topologiesinthe cloud, such asthe centralized, the distributed,and the decentralized(blockchain). • Chapter7presentsmethodstoeffectivelypreprocessandanalyzemedicaldatato deal with the unmetneeds invarious diseasesincluding the development of robustpatient stratificationmodels, the identification ofbiomarkers and/or the validation ofexistingones, andthe selection of therapeutictreatments for effectivedisease monitoring.Methodsfor data preprocessing, including data discretizationand feature selection, are presentedalong with supervisedand unsupervisedalgorithmsforclassificationandclustering.Emphasisisgivenon thedistributedlearningstrategyfortheapplicationofmachinelearningmodels acrossdataindistributeddatabases.Popularmachinelearning frameworksare presented alongwith applications inthe medical domain. • Chapter8summarizesthefundamental cohortstudiesonthepromisingfieldof medical data harmonizationacross various medicaldomains including cohort studies on aging, autoimmune diseases,cancer, phenotypes andepidemics, personalityscores,and obesity. • Chapter 9 summarizes the key points of the previous chapters and presents the latest trendsinmedical data sharing, harmonization, and analytics. This book is intended for undergraduate and graduate students in the field of medicine, computer science, computer engineering, data science, and biomedical engineering. Thebook may also be beneficial for professionals inthose fields. This work was carried out at the Unit of Medical Technology and Intelligent Information Systems (MEDLAB) at the University of Ioannina, whose research excellenceinthefieldofbiomedicalengineeringisinternationallyacknowledged. Preface xiii We would like to thank our team in the Unit of Medical Technology and Intelligent Information Systems for their scientific and emotional support during thetimeofthewritingofthisbookandtheteamoftheHarmonicSS(harmonization and integrative analysis of regional, national and international cohorts on primary Sjo¨gren’s syndrome toward improved stratification, treatment, and health policy- making) project funded by the European Commission (grant agreement No. 731944andfromtheSwissStateSecretariatforEducation,ResearchandInnovation SERIundergrantagreement16.0210).Wearealsogratefultotheeditorialteamfor their valuable guidance throughout the publishing process. We also express our sincere gratitude to our families who contributed to the final realization of this work throughthe continuousmotivation and inspiration theyprovided tous. Vasileios C.Pezoulas Themis P. Exarchos Dimitrios I.Fotiadis UniversityofIoannina, Ioannina, Greece Terminology list Datasharing:Theprocessofinterlinkingsensitivemedicaldatafromdifferentmedical databases,worldwide,fulfillingallthenecessaryethicalandlegalrequirementsfordata protection. Datacuration:Thecomputationalprocessofenhancingthequalityoftheclinicaldata throughtheidentificationofoutliers,incompatibleandinconsistentfields,missingvalues,etc. Thedatacurationworkflowalsoincludesfunctionalitiesfordatastandardizationandthus servesasapreharmonizationstep. Dataharmonization:Thecomputationalprocessofhomogenizingmedicaldatabaseswith heterogeneousstructureandvaluerangesunderacommonmedicaldomainusuallythrougha referenceschema(e.g.,anontology).Dataharmonizationcanbeaccomplishedusinglexical and/orsemanticmatchinginasemi-automatedmannerthroughthedetectionoflexically identicalorsimilarterms,aswellastermsthatdescribeacommonconcept. Ontology:Ahigh-leveldatapresentationmodelwherethedataaredescribedintheformof entitiesandobjectproperties,wheretheentitiesaredefinedasclassesandsubclassesandthe objectpropertiesaredefinedastherelationshipsbetweenthem. Semanticmatching:Theprocessofidentifyingterminologiesthatshareacommon conceptualbasis(e.g.,belongingtothesameclassorsubclass)betweentwoheterogeneous ontologies. Lexicalmatching:Theprocessofidentifyinglexicallyidenticalterminologiesbetweentwo heterogeneousontologies,i.e.,terminologieswithcommonlexicalblocksorsynonyms,using stringsimilaritymeasures. Stringentharmonization:Asimplecaseofdataharmonizationwhichinvolvesthe harmonizationofheterogeneousmedicaldatathathavebeencollectedunderaspecificdata collectionprotocol. Flexibleharmonization:Achallengingcaseofdataharmonizationwhichinvolvesthe harmonizationofheterogeneousmedicaldatathathavebeencollectedintheabsenceofa specificdatacollectionprotocol. Referencemodel:Asetofparametersthatefficientlydescribethedomainknowledgeofa medicalconditionordiseaseincludingclasses(e.g.,laboratorytests,medicalconditions, demographics,lifestyle,interventions)andadditionalsubclassesandvariablesaswell.This setofparametersisusuallydeterminedbyateamofclinicalexpertsintherelatedmedical field. Patientstratification:Aclinicalunmetneedinacrossseveralmedicaldomainswhich involvestheapplicationofmachinelearningmodelsfortheidentificationofasubgroupof patientswhoarehighlylikelytodevelopaspecificmedicalconditionordisease. Biomarker(s):Asetofoneormoreprominentfeaturesaccordingtoagiventargetfeature. Thissetoffeaturesisusuallyidentifiedthroughafeatureselectionorafeatureranking method. Precisionmedicine:“Anemergingapproachfordiseasetreatmentandpreventionthattakes intoaccountindividualvariabilityingenes,environment,andlifestyleforeachperson”(the definitionhasbeenprovidedaccordingtothePrecisionMedicineInitiative). Healthimpactassessment:Amultidisciplinaryprocesswhichinvolvestheassessmentof healthpoliciesintermsofevidence,whereahealthpolicyisdefinedas“thedecisions,plans, andactionsthatareundertakentoachievespecifichealthcaregoalswithinasociety.” xvi Terminology list Cloudcomputing:“Amodelforenablingconvenient,on-demandnetworkaccesstoashared poolofconfigurablecomputingresources(e.g.,networks,servers,storage,applications,and services)thatcanberapidlyprovisionedandreleasedwithminimalmanagementeffortor serviceproviderinteraction”(thedefinitionhasbeenprovidedaccordingtotheNational InstituteofStandardsandTechnology). Federatedhealthcareplatform:Afederatedcloudcomputingenvironment,wheremultiple cloudcomputingsystems/modelsinteractunderacommonpurposeinhealthcare.Itcan supporttheprovisioningandmanagementofthecloudinfrastructureinmultiplecloud computingsystems/modelsbystandardizingtheinteractionsbetweentheindividualcloud environments(federatedcloudmanagement). Supervisedlearning:Amachinelearningapproachwhichinvolvestheapplicationofa machinelearningalgorithmonasetofannotatedclinicaldataforclassificationpurposes(the targetfeatureispredefined),e.g.,forthedevelopmentofpredictionmodels. Unsupervisedlearning:Amachinelearningapproachwhichinvolvestheapplicationofa machinelearningalgorithmonasetofclinicaldatawithoutannotation(thetargetfeatureis absent)forclusteringpurposes,e.g.,forthedetectionoffeatureswithsimilarpatternswithina clinicaldataset. Featureselection:Theextractionofaspecificsubsetoffeatureswhicharehighlycorrelated (highlydependent)withatargetfeatureandlesscorrelated(highlyindependent)withtherest ofthefeatures. Featureranking:Therankingoffeatureswithinasetofclinicaldataaccordingtotheir associationwithagiventargetfeature. Bigdata:Massivelyaccumulatedsetsofdailygeneratedmedicaldatawhicharecharacterized byfourdimensions,namelythevolume,thevelocity,theveracity,andthevariety. Batchprocessing:Thestrategyofprocessingbigdataassmallerpartitions/subsetsby fetchingthemintothememoryinasequentialmanner. Onlinelearning:Theprocessofadditivelyadjustingacontinuousdatamodel(e.g.,amachine learningmodel)onupcomingdatastreams(orbatches). Deeplearning:Theprocessofidentifyinghiddenpatternsacrosslargesubsetsofclinicaldata (bigdata)forclassificationpurposes.Thisprocessisusuallyconductedbydeeplearning artificialneuralnetworks,suchastheconvolutionalneuralnetworksandtherecurrentneural networks. Incrementallearning:Theprocessofincrementallyadjustingacontinuousdatamodel (e.g.,amachinelearningmodel)onsubsetsofalargedataset. List of abbreviations APEC Asia-PacificEconomicCooperation API Applicationprogramminginterface AUC Areaunderthecurve BOLD Bloodoxygenleveldependent CA4GH GlobalAllianceforGenomicsandHealth CAGE Capanalysisgeneexpression CAMP CloudApplicationManagementProtocol CASB CloudAccessandSecurityBroker CBC Completebloodcount CBPR Cross-BorderPrivacyRules CC Creativecommon CCD Charge-coupleddevice CCM CloudControlsMatrix CCSK CertificateofCloudSecurityKnowledge CCSP CertificateofCloudSecurityProfessional C-CDA ConsolidatedClinicalDocumentArchitecture CDA ClinicalDocumentArchitecture CDEs Commondataelements CDMI CloudDataManagementInterface cDNA ComplementaryDNA ChIP Chromatinimmunoprecipitation CJEU CourtofJusticeoftheEuropeanUnion CNNs Convolutionalneuralnetworks CoE CouncilofEurope CPIP CloudPortabilityandInteroperabilityProfile CPU Centralprocessingunit CRF Casereportform CRP C-reactiveprotein CSA CloudSecurityAlliance CSF Cerebrospinalfluid CT Computerizedtomography DaaS DataasaService DAST DynamicApplicationSecurityTesting DataSHaPER DataSchemaandHarmonizationPlatformforEpidemiologicalResearch DataSHIELD DataAggregationThroughAnonymousSummary-statisticsfrom HarmonizedIndividualLevelDatabases DBMS Databasemanagementsystem DCC Datacontrollercommittee DFA Directfluorescentantibody DICOM Digitalimagingandcommunicationsinmedicine DIF Differentialitemfunctioning DLP Datalossprevention DLT Distributedledgertechnology DNA Deoxyribonucleicacid xviii List of abbreviations DOC USDepartmentofCommerce DoS DenialofService DPAs DataProtectionAuthorities DPIA Dataprotectionimpactassessment DPO Dataprotectionofficer DPPA DataPrivacyandProtectionAgreement DSA DigitalSignatureAlgorithm DTI DiffusionTensorImaging DWI Diffusionweightedimaging eCRF Electroniccasereportform ECC Ellipticcurvecryptography ECDSA EllipticCurveDigitalSignatureAlgorithm ECG Electrocardiography EMG Electromyography ENISA EuropeanNetworkandInformationandSecurityAgency EOG Electrooculography ECoG Electrocorticography ECS Electricalcorticalstimulation ECHR EuropeanConventiononHumanRights EEG Electroencephalography EHRs ElectronicHealthRecords ELISA Enzyme-linkedimmunosorbentassay ENG Electronystagmography EPI Echo-planarimaging FBP Filteredbackprojection FDA FoodandDrugAdministration FDG Fluorodeoxyglucose FFT FastFouriertransform FBP Filteredbackprojection FCBF Fastcorrelation-basedfilter FDAAA FoodandDrugAdministrationAmendmentsAct FDAP FederalActonDataProtection FISMA FederalInformationSecurityManagementAct fMRI Functionalmagneticresonanceimaging fNIRS Functionalnear-infraredspectroscopy FOAM FrameworkforOntologyAlignmentandMatching FOV FieldofView FTC FederalTradeCommission GDPR GeneralDataProtectionRegulation GI GiniImpurity GGI GaininGiniIndex GLFA Generalizedlinearfactoranalysis GLM Generalizedlinearmodel GLS Generalizedleastsquares GWAS Genome-wideassociationstudies GUI Graphicaluserinterface HCP HumanConnectomeProject HCT Hematocrit