loading

Logout succeed

Logout succeed. See you again!

ebook img

Embedded Computing for High Performance. Efficient Mapping of Computations Using Customization, Code Transformations and Compilation PDF

pages304 Pages
release year2017
file size10.697 MB
languageEnglish

Preview Embedded Computing for High Performance. Efficient Mapping of Computations Using Customization, Code Transformations and Compilation

Embedded Computing for High Performance Embedded Computing for High Performance Efficient Mapping of Computations Using Customization, Code Transformations and Compilation Joa˜o M.P. Cardoso Jos(cid:1)e Gabriel F. Coutinho Pedro C. Diniz MorganKaufmannisanimprintofElsevier 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates #2017ElsevierInc.Allrightsreserved. Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans, electronicormechanical,includingphotocopying,recording,oranyinformationstorageand retrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowtoseek permission,furtherinformationaboutthePublisher’spermissionspoliciesandour arrangementswithorganizationssuchastheCopyrightClearanceCenterandtheCopyright LicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions. Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythe Publisher(otherthanasmaybenotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchand experiencebroadenourunderstanding,changesinresearchmethods,professionalpractices,or medicaltreatmentmaybecomenecessary. Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgein evaluatingandusinganyinformation,methods,compounds,orexperimentsdescribedherein. Inusingsuchinformationormethodstheyshouldbemindfuloftheirownsafetyandthesafety ofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors, assumeanyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproducts liability,negligenceorotherwise,orfromanyuseoroperationofanymethods,products, instructions,orideascontainedinthematerialherein. LibraryofCongressCataloging-in-PublicationData AcatalogrecordforthisbookisavailablefromtheLibraryofCongress BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary ISBN:978-0-12-804189-5 ForinformationonallMorganKaufmannpublications visitourwebsiteathttps://www.elsevier.com/books-and-journals Publisher:JonathanSimpson AcquisitionEditor:JonathanSimpson EditorialProjectManager:LindsayLawrence ProductionProjectManager:PunithavathyGovindaradjane CoverDesigner:MarkRogers TypesetbySPiGlobal,India Dedication We dedicate this bookto: our parents our families To Teresa,Rodrigo, Frederico,and Dinis. To mygrandmother Am(cid:1)elia. To Rafael Nuno, who over the last years has endured so much more than he shouldhave. About the Authors Joa˜oM.P.CardosoisafullprofessorattheDepartmentofInformaticsEngineering, Faculty of Engineering of the University of Porto, Porto, Portugal and a research member at INESC TEC. Before, he was with the IST/Technical Univ. of Lisbon (UTL)(2006–08),aseniorresearcheratINESC-ID(2001–09),andwiththeUniver- sityofAlgarve(1993–2006).In2001/2002,heworkedforPACTXPPTechnologies, Inc., Munich, Germany. He received his PhD degree in electrical and computer engineeringfromIST/TechnicalUniversityofLisbonin2001.HeservedasaPro- gramCommitteemember,asGeneralCo-Chair,andasProgramCo-Chairinmany internationalconferences.Hehas(co-)authoredover150scientificpublicationson subjectsrelatedtocompilers,embeddedsystems,andreconfigurablecomputing.In addition,hehasbeeninvolvedinseveralresearchprojects.Heisaseniormemberof IEEE, a member of IEEE Computer Society, and a senior member of ACM. His research interests include compilation techniques, domain-specific languages, reconfigurablecomputing,application-specificarchitectures,andhigh-performance computingwithaparticular emphasis inembedded computing. Jos(cid:1)e Gabriel F. Coutinho is an associate researcher working in the Custom Computing Research Group at Imperial College London. He received his M. Eng. degree in computer engineering from Instituto Superior T(cid:1)ecnico, Portugal in 1997. In 2000 and 2007 he received his MSc and PhD in computing science from ImperialCollegeLondon,respectively.Since2005,hehasbeeninvolvedinUnited Kingdom and EU research projects, including FP6 hArtes, FP7 REFLECT, FP7 HARNESS, and H2020 EXTRA. In addition, he has published over 50 research papers in peer-referred journals and international conferences and has contributed tofourbookpublications.Hisresearchinterestsincludereconfigurablecomputing, HPC platforms, cloud computing platforms, high-level compilation techniques, programming models, anddomain-specific languages. Pedro C. Diniz received his MS in electrical and computer engineering from the Technical University in Lisbon, Portugal and his PhD in computer science from the University of California, Santa Barbara in 1997. Since 1997 he has been a researchassociatewiththeUniversityofSouthernCalifornia’sInformationSciences Institute(USC/ISI)andaresearchassistantprofessorofComputerScienceatUSCin LosAngeles,California.Hehasparticipatedand/orledvariousresearchprojectsin theareaofcompilationforhigh-performancecomputing,mappingandsynthesisfor reconfigurablecomputingarchitectures,andmorerecentlyresilientcomputing.He has also been heavily involved in the scientific community having participated as part of the technical program committee of over 20 international conferences in the area of high-performance computing, reconfigurable and field-programmable computing. xiii Preface Over the last decades, computer users have enjoyed the benefits of a seemingly unboundedavailabilityoftransistorsonadie,witheverynewmicroprocessordesign exhibitingperformancefiguresthatdwarfedpreviousgenerations.Computingplat- formsevolvedfromasingleprocessorcoretogeneral-purposemulticoresandspe- cializedcores,suchasgraphicsprocessingunits(GPUs),deliveringunprecedented performance thanks to the high degree of parallelism currently available. More recently, energy efficiency has become a major concern, prompting systems to include custom computing engines in the form of field-programmable gate arrays (FPGA) andotherforms ofreconfigurable computing devices. All these computing platforms trends are permeating the embedded computing domain, especially in high-performance embedded computing systems. Still, these advancedarchitecturesexposeanexecutionmodelthatisfardetachedfromthetra- ditionalsequentialprogrammingparadigmthatprogrammershavebeenaccustomed towhendevelopingtheirextensivecodebase,andwhichtheyrelyonwhenreason- ingaboutprogramcorrectness.Asanaturalconsequenceofthisgapbetweenarchi- tectures and high-level programming languages, developers must understand the basic mapping between the application and the target computing architectures to fully exploit their capabilities. To help mitigate the complexity of this mapping andoptimizationproblem,manyhigh-levellanguagesnowincludelanguageexten- sionsandcompilerdirectivesthatallowapplicationstomakemoreeffectiveuseof parallel architectures, for instance, to exploit multithreading on multiple cores. Giventhestringentrequirementsofcurrentembeddedcomputingplatformsinterms of latency, throughput, power and energy, developers need to further master this mapping process. Thisbookprovidesacomprehensivedescriptionofthebasicmappingtechniques and source code transformations for computations expressed in high-level impera- tiveprogramminglanguages,suchasCorMATLAB,tohigh-performanceembed- dedarchitecturesconsistingofmultipleCPUs,GPUs,andreconfigurablehardware (mainlyFPGAs).Itisthereforemeanttohelppractitionersintheareaofelectrical, computer engineering, and computer science to effectively map computations to these architectures. Thisbookalsocoversexistingcompilersandtheirtransformationsoutliningtheir useinmanymappingtechniques.Theseincludetheclassicalparallel-orientedtrans- formations for loop constructs, but equally important data-oriented and data- mappingtransformationsthatarekeyinthecontextofGPU-basedsystems.Assuch, this book is aimed to help computer engineers and computer scientists, as well as electrical engineers, who are faced with the hard task of mapping computations to high-performance embedded computing systems. Given the comprehensive set of source code and retargeting transformations described here, this book can be xv xvi Preface effectivelyusedasatextbookforanadvancedelectrical,computerengineering,and computersciencecoursefocusedonthedevelopmentofhigh-performanceembed- ded systems. Weareveryconsciousaboutthedifficultyofpresentinginasinglebook,andina cohesive form, all the topics we consider important about the process of mapping computations to high-performance embedded computing platforms. However, we believethatthetopicspresentedinthisbookshouldbemasteredbythenextgener- ation of developers. Wehopeyouenjoyreadingthisbook,andthatitcontributestoincreasingyour knowledge about developing efficient programs on high-performance embedded platforms, and thatit serves asan inspirationto yourprojects. Joa˜o M.P.Cardoso Jos(cid:1)e Gabriel F. Coutinho PedroC. Diniz Acknowledgments WewouldliketoacknowledgeWalidNajjar,fromtheUniversityofCaliforniaRiv- erside,UnitedStates,forreadingapreviousversionofChapter2andforproviding important feedback andsuggestions for improvingit. We would like to acknowledge all the members of the SPeCS group1 for their suggestionsanddiscussions,namely,Joa˜oBispo,TiagoCarvalho,PedroPinto,Lu´ıs Reis,andRicardoNobre.Wearealsogratefultoallofthemforreviewingprevious versions of this book’s chapters and for their valuable feedback that undoubtedly helpedtoimprove the book. StudentsofthePhDProgramonInformaticsEngineering(ProDEI)oftheFaculty ofEngineeringoftheUniversityofPorto(FEUP)havealsobeenasourceofhelpful feedback regarding some of the contents of this book, as earlier revisions from selectedchapters were used aspart of the classmaterial for the High-Performance EmbeddedComputing(CEED) course. Inaddition,wewouldalsoliketoacknowledgethesupportgivenbythefollow- ingcompanies.XilinxInc.2(UnitedStates)provided,throughtheirUniversityPro- gram,FPGA-baseddevelopmentboardsandsoftwarelicensesincludingVivadoand VivadoHLS.ARMLtd.3(UnitedKingdom)provided,throughitsARMUniversity Program,asampleARMLab-in-a-BoxonEfficientEmbeddedSystemsDesignand Programming. Joa˜oM.P.CardosowouldliketoacknowledgethesupportoftheDepartmentof InformaticsEngineeringoftheFacultyofEngineeringoftheUniversityofPorto,of INESC TEC, and the partial support provided by the following research projects: ANTAREX (H2020 FETHPC-1-2014, ref. 671623), CONTEXTWA (FCT PTDC/ EEI-SCR/6945/2014), and TEC4Growth—RL1 SMILES (NORTE-01-0145- FEDER-000020).Jos(cid:1)eGabrielF.Coutinhowouldliketoacknowledgethesupport of Wayne Luk, the Department of Computing at Imperial College London, United Kingdom, and the partial support of the EXTRA research project (H2020 FETHPC-1-2014, ref. 671653). WewouldliketoacknowledgeElsevierforgivingustheopportunitytowritethis book.AwarmacknowledgmentandappreciationtoLindsayLawrence,ourElsevier editor,forherbeliefinthisprojectsincetheverybeginning,aswellasherdirection which helped us finish this bookproject. Lastbutnotleast,wewouldliketothankourfamiliesfortheirsupportandunder- standing for the countlesshours we hadto devote tothis book. 1SPeCS(SpecialPurposeComputingSystems,LanguagesandTools)ResearchGroup:http://www.fe. up.pt/(cid:1)specs/. 2XilinxInc.,http://www.xilinx.com. 3ARMLtd.,http://www.arm.com/. xvii Abbreviations ACPI advancedconfigurationandpowerinterface.Astandardpromotedby Intel, Microsoft, and Toshiba ADC analog-to-digital converter AMD advanced micro devices AOP aspect-oriented programming API application programming interface ARM advanced RISC machines ASIP application-specific instruction-set processor AST abstract syntax tree AVX advanced vector extensions BRAM block RAM CD computing device CDFG control/data flow graph CFG control flow graph CG call graph CGRA coarse-grained reconfigurable array CISC complexinstruction set computer CLB configurable logicblock CMP chip multiprocessor COTS commercial off-the-shelf CPA criticalpath analysis CPU central processingunit CU computing unit DAC digital-to-analog converter DAG directed acyclic graph DDG data dependence graph DDR double data rate DFG data flow graph DFS dynamic frequency scaling DPM dynamic powermanagement DRAM dynamic random-accessmemory (RAM) DSE design space exploration DSL domain-specific language DSP digital signal processing DVFS dynamic voltage and frequencyscaling DVS dynamic voltage scaling EDA electronicdesignautomation EDP energy delay product EEMBC embedded microprocessor benchmark consortium FMA fused multiply-add xix xx Abbreviations FPGA field-programmable gate array FPS framesper second FSM finite state machine GA genetic algorithm GCC GNU compiler collection (originallynamed GNU CCompiler) GPGPU general-purpose graphics processing unit (also known as general- purposecomputing ongraphics processing unit) GPIO general-purpose input/output(IO) GPU graphics processing unit HLS high-level synthesis HPC high-performancecomputing HPEC high-performanceembedded computing HPF high performance Fortran ICC Intel C/C++compiler IDE integrated design environment ILP instruction-level parallelismor integer-linear programming IO input/output IOB input/output block IR intermediate representation ISA instructionset architecture LDG loop dependence graph LLVM low level virtualmachine LOC lines ofcode MIC many integrated core MPI message passing interface MPSoC multiprocessor SoC (system-on-a-chip) NFRs nonfunctional requirements NUMA nonuniformmemory access OpenACC open accelerators OpenCL open computing language OpenMP open multiprocessing PC personal computer PCI peripheral component interconnect PCIe peripheral component interconnectexpress PE processing element QoE qualityof experience QoS qualityof service QPI Intel QuickPath interconnect RAM random-access memory RISC reduced-instructionset computer ROM read-onlymemory RTOS real-time operating system SA simulatedannealing

See more

The list of books you might like