Logout succeed
Logout succeed. See you again!

data analysis with open source tools PDF
Preview data analysis with open source tools
Strata Jumpstart Sep 19, 2011, NY Strata Summit Sep 20-21, 2011, NY Strata Conference Sep 22-23, 2011, NY Use your data – or lose Register Now Save 20% with code EBOOK Data Analysis with Open Source Tools Data Analysis with Open Source Tools Philipp K. Janert Beijing (cid:129) Cambridge (cid:129) Farnham (cid:129) Köln (cid:129) Sebastopol (cid:129) Tokyo DataAnalysiswithOpenSourceTools byPhilippK.Janert Copyright(cid:2)c 2011PhilippK.Janert.Allrightsreserved.PrintedintheUnitedStatesofAmerica. PublishedbyO’ReillyMedia,Inc.1005GravensteinHighwayNorth,Sebastopol,CA95472. O’Reillybooksmaybepurchasedforeducational,business,orsalespromotionaluse.Online editionsarealsoavailableformosttitles(http://my.safaribooksonline.com).Formoreinformation, contactourcorporate/institutionalsalesdepartment:(800)[email protected]. Editor: MikeLoukides Indexer: FredBrown ProductionEditor: SumitaMukherji CoverDesigner: KarenMontgomery Copyeditor: MattDarnell InteriorDesigner: EdieFreedman andRonBilodeau ProductionServices: MPSLimited,aMacmillan Company,andNewgenNorthAmerica,Inc. Illustrator: PhilippK.Janert PrintingHistory: November2010:FirstEdition. TheO’ReillylogoisaregisteredtrademarkofO’ReillyMedia,Inc.DataAnalysiswithOpenSource Tools,theimageofacommonkite,andrelatedtradedressaretrademarksofO’ReillyMedia,Inc. Manyofthedesignationsusedbymanufacturersandsellerstodistinguishtheirproductsare claimedastrademarks.Wherethosedesignationsappearinthisbook,andO’ReillyMedia,Inc. wasawareofatrademarkclaim,thedesignationshavebeenprintedincapsorinitialcaps. Whileeveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthor assumenoresponsibilityforerrorsoromissions,orfordamagesresultingfromtheuseofthe informationcontainedherein. ISBN:978-0-596-80235-6 [M] [2011-05-27] Furiousactivityisnosubstituteforunderstanding. —H.H.Williams CONTENTS PREFACE xiii 1 INTRODUCTION 1 DataAnalysis 1 What’sinThisBook 2 What’swiththeWorkshops? 3 What’swiththeMath? 4 WhatYou’llNeed 5 What’sMissing 6 PARTI Graphics:LookingatData 2 ASINGLEVARIABLE:SHAPEANDDISTRIBUTION 11 DotandJitterPlots 12 HistogramsandKernelDensityEstimates 14 TheCumulativeDistributionFunction 23 Rank-OrderPlotsandLiftCharts 30 OnlyWhenAppropriate:SummaryStatisticsandBoxPlots 33 Workshop:NumPy 38 FurtherReading 45 3 TWOVARIABLES:ESTABLISHINGRELATIONSHIPS 47 ScatterPlots 47 ConqueringNoise:Smoothing 48 LogarithmicPlots 57 Banking 61 LinearRegressionandAllThat 62 ShowingWhat’sImportant 66 GraphicalAnalysisandPresentationGraphics 68 Workshop:matplotlib 69 FurtherReading 78 4 TIMEASAVARIABLE:TIME-SERIESANALYSIS 79 Examples 79 TheTask 83 Smoothing 84 Don’tOverlooktheObvious! 90 TheCorrelationFunction 91 vii