Logout succeed
Logout succeed. See you again!

Data Mining for Business Analytics PDF
Preview Data Mining for Business Analytics
DATA MINING FOR BUSINESS ANALYTICS DATA MINING FOR BUSINESS ANALYTICS Concepts, Techniques, and Applications in R Galit Shmueli Peter C. Bruce Inbal Yahav Nitin R. Patel Kenneth C. Lichtendahl, Jr. Thiseditionfirstpublished2018 ©2018JohnWiley&Sons,Inc. Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,in anyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,exceptaspermittedby law.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleisavailableat http://www.wiley.com/go/permissions. TherightofGalitShmueli,PeterC.Bruce,InbalYahav,NitinR.Patel,andKennethC.LichtendahlJr.tobe identifiedastheauthorsofthisworkhasbeenassertedinaccordancewithlaw. RegisteredOffices JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA EditorialOffice 111RiverStreet,Hoboken,NJ07030,USA Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWileyproductsvisitusat www.wiley.com. Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontentthat appearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats. LimitofLiability/DisclaimerofWarranty Thepublisherandtheauthorsmakenorepresentationsorwarrantieswithrespecttotheaccuracyorcompleteness ofthecontentsofthisworkandspecificallydisclaimallwarranties;includingwithoutlimitationanyimplied warrantiesoffitnessforaparticularpurpose.Thisworkissoldwiththeunderstandingthatthepublisherisnot engagedinrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmaynotbesuitablefor everysituation.Inviewofon-goingresearch,equipmentmodifications,changesingovernmentalregulations,and theconstantflowofinformationrelatingtotheuseofexperimentalreagents,equipment,anddevices,thereader isurgedtoreviewandevaluatetheinformationprovidedinthepackageinsertorinstructionsforeachchemical, pieceofequipment,reagent,ordevicefor,amongotherthings,anychangesintheinstructionsorindicationof usageandforaddedwarningsandprecautions.Thefactthatanorganizationorwebsiteisreferredtointhiswork asacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthattheauthororthepublisher endorsestheinformationtheorganizationorwebsitemayprovideorrecommendationsitmaymake.Further, readersshouldbeawarethatwebsiteslistedinthisworkmayhavechangedordisappearedbetweenwhenthis workswaswrittenandwhenitisread.Nowarrantymaybecreatedorextendedbyanypromotionalstatements forthiswork.Neitherthepublishernortheauthorshallbeliableforanydamagesarisingherefrom. LibraryofCongressCataloging-in-PublicationDataappliedfor Hardback:9781118879368 CoverDesign:Wiley CoverImage:©AchimMittler,FrankfurtamMain/Gettyimages Setin11.5/14.5ptBemboStdbyAptaraInc.,NewDelhi,India PrintedintheUnitedStatesofAmerica. 10 9 8 7 6 5 4 3 2 1 The beginning of wisdom is this: Get wisdom, and whatever else you get, get insight. – Proverbs 4:7 Contents ForewordbyGarethJames xix ForewordbyRaviBapna xxi PrefacetotheREdition xxiii Acknowledgments xxvii PART I PRELIMINARIES CHAPTER 1 Introduction 3 1.1 WhatIsBusinessAnalytics? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 WhatIsDataMining? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 DataMiningandRelatedTerms . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 BigData. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 DataScience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 WhyAreThereSoManyDifferentMethods? . . . . . . . . . . . . . . . . . . . 8 1.7 TerminologyandNotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.8 RoadMapstoThisBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 OrderofTopics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 CHAPTER 2 Overview of the Data Mining Process 15 2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 CoreIdeasinDataMining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 AssociationRulesandRecommendationSystems . . . . . . . . . . . . . . . . . 16 PredictiveAnalytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 DataReductionandDimensionReduction . . . . . . . . . . . . . . . . . . . . 17 DataExplorationandVisualization . . . . . . . . . . . . . . . . . . . . . . . . 17 SupervisedandUnsupervisedLearning . . . . . . . . . . . . . . . . . . . . . . 18 2.3 TheStepsinDataMining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 PreliminarySteps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 OrganizationofDatasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 PredictingHomeValuesintheWestRoxburyNeighborhood . . . . . . . . . . . 21 vii viii CONTENTS LoadingandLookingattheDatainR . . . . . . . . . . . . . . . . . . . . . . 22 SamplingfromaDatabase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 OversamplingRareEventsinClassificationTasks . . . . . . . . . . . . . . . . . 25 PreprocessingandCleaningtheData. . . . . . . . . . . . . . . . . . . . . . . 26 2.5 PredictivePowerandOverfitting . . . . . . . . . . . . . . . . . . . . . . . . . 33 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 CreationandUseofDataPartitions . . . . . . . . . . . . . . . . . . . . . . . 35 2.6 BuildingaPredictiveModel . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 ModelingProcess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7 UsingRforDataMiningonaLocalMachine . . . . . . . . . . . . . . . . . . . 43 2.8 AutomatingDataMiningSolutions . . . . . . . . . . . . . . . . . . . . . . . . 43 DataMiningSoftware: TheStateoftheMarket(byHerbEdelstein). . . . . . . . 45 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 55 3.1 UsesofDataVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 BaseRorggplot? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 DataExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Example1: BostonHousingData . . . . . . . . . . . . . . . . . . . . . . . . 57 Example2: RidershiponAmtrakTrains. . . . . . . . . . . . . . . . . . . . . . 59 3.3 BasicCharts: BarCharts,LineGraphs,andScatterPlots . . . . . . . . . . . . . 59 DistributionPlots: BoxplotsandHistograms . . . . . . . . . . . . . . . . . . . 61 Heatmaps: VisualizingCorrelationsandMissingValues . . . . . . . . . . . . . . 64 3.4 MultidimensionalVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . 67 AddingVariables: Color,Size,Shape,MultiplePanels,andAnimation . . . . . . . 67 Manipulations: Rescaling,AggregationandHierarchies,Zooming,Filtering . . . . 70 Reference: TrendLinesandLabels . . . . . . . . . . . . . . . . . . . . . . . . 74 ScalinguptoLargeDatasets. . . . . . . . . . . . . . . . . . . . . . . . . . . 74 MultivariatePlot: ParallelCoordinatesPlot. . . . . . . . . . . . . . . . . . . . 75 InteractiveVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.5 SpecializedVisualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 VisualizingNetworkedData . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 VisualizingHierarchicalData: Treemaps . . . . . . . . . . . . . . . . . . . . . 82 VisualizingGeographicalData: MapCharts . . . . . . . . . . . . . . . . . . . . 83 3.6 Summary: MajorVisualizationsandOperations,byDataMiningGoal . . . . . . . 86 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 TimeSeriesForecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 UnsupervisedLearning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 CHAPTER 4 Dimension Reduction 91 4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2 CurseofDimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92