loading

Logout succeed

Logout succeed. See you again!

ebook img

Subgaussian Tail Bounds via Stability Arguments PDF

file size0.21 MB

Preview Subgaussian Tail Bounds via Stability Arguments

Subgaussian Tail Bounds via Stability Arguments ∗ † Thomas Steinke Jonathan Ullman January 16, 2017 7 1 0 Abstract 2 Sumsofindependent,boundedrandomvariablesconcentratearoundtheirexpectation n approximatelyaswellaGaussianofthesamevariance. Wellknownresultsofthisform a includetheBernstein,Hoeffding,andChernoffinequalitiesandmanyothers. Wepresentan J alternativeproofofthesetailboundsbasedonwhatwecallastabilityargument,whichavoids 2 1 boundingthemomentgeneratingfunctionorhigher-ordermomentsofthedistribution. Our stabilityargumentisinspiredbyrecentworkonthegeneralizationpropertiesofdifferential ] privacyandtheirconnectiontoadaptivedataanalysis(Bassilyetal.,STOC2016). M D 1 Introduction . s c [ Tailbounds,alsoknownaslargedeviationinequalities,orconcentrationofmeasuretheorems, areanessentialtoolinmathematics,computerscience,andstatistics. Themostcommontype 1 v oftailbounds(intheoreticalcomputerscience)showthatthesumofindependent,bounded 3 randomvariableshassubgaussiantails—thebestknownexamplebeingthefollowingspecial 9 4 caseofBernstein’sInequality. 3 0 Theorem 1.1 ([Ber24]). If X1,···,Xn are independent random variables supported on [0,1] and . µ =E[X ]foreveryi,then 1 i i 0   17 ∀ε≥0 P(cid:88)n Xi−µi ≥εn≤e−Ω(ε2n). : v i=1 i X Theorem 1.1 has many generalizations and extensions: tight constants [Hoe63], random r variables with differing ranges [Hoe63], multiplicative error bounds [Che52], variables with a lowvariance[Ber24],variableswithonlylimitedindependence[SSS95],Martingales[Azu67], matrix-valuedrandomvariables[AW02],andmanymore. WepresentanalternativeproofofTheorem1.1,whichdifferssignificantlyfromallother proofstheauthorsareawareof. Ourproofisderivedfromrecentworkonalgorithmicstability foradaptivedataanalysis[BNS+16]. Atahighlevel,ourproofshowsthatTheorem1.1follows fromanappropriateboundontheexpectationofthemaximumofmindependentrealizations of (cid:80)n X −µ . The required bound on the expected maximum follows from the existence i=1 i i ofastablerandomizedalgorithmforselectinganapproximatemaximizerfromamongthem realizations. Herestabilitymeansthattheoutputdistributionofthealgorithmisinsensitiveto smallchangesinthesevalues,whichisformalizedbydifferentialprivacy[DMNS06]. ∗ IBM,[email protected][email protected] 1 Thealternativeproofwas(inhindsight)akeyingredientinobtainingtightboundsonthe generalization properties of differential privacy [BNS+16], which can be viewed as a strong extensionofTheorem1.1. Thuswebelieveourapproachbothdeepensourunderstandingof concentrationinequalitiesandmayfindfurtherapplications. 1.1 OurApproach Sincereasoningdirectlyaboutthedistributionof(cid:80)n X −µ isdifficult,proofsofTheorem1.1 i=1 i i typicallyworkbyboundingtheexpectationofsomerandomvariablethatservesasa“proxy” forthetailprobabilityP(cid:104)(cid:80)n X −µ ≥εn(cid:105). Ourproofusesaverydifferentproxythanprevious i=1 i i proofs. (SeeSection1.2foracomparisontootherproofs.) Specifically,toboundP[Y ≥y]we (cid:104) (cid:105) boundthequantityE max{0,Y1,Y2,···,Ym} ,whereY1,···,Ym areindependentcopiesofthe originalrandomvariableY.1 Boundingthisproxyimpliesatailboundviathefollowinglemma. Lemma1.2. LetY bearandomvariableandletY1,Y2,···,Ym beindependentcopiesofY. Then (cid:104) (cid:104) (cid:110) (cid:111)(cid:105)(cid:105) ln(2) P Y ≥2E max 0,Y1,···,Ym ≤ . m (cid:104) (cid:105) Proof. Lety =2E max{0,Y1,···,Ym} andδ=P[Y ≥y]. ByMarkov’sinequality, (cid:104) (cid:105) 1 P max{0,Y1,Y2,···,Ym}≥y ≤ . 2 However,ifδ>ln(2)/m,then (cid:104) (cid:105) (cid:104) (cid:105) P max{0,Y1,Y2,···,Ym}≥y =1−P ∀j ∈[m] Yj <y =1−P[Y <y]m=1−(1−δ)m >1−e−δm>1−e−ln(2)=1/2, whichisacontradiction. Thusδ≤ln(2)/m,asrequired. Thus,thekeytechnicalstepinourproofistoestablishthefollowingproxybound. Proposition1.3. LetX ,···,X beindependentrandomvariablessupportedon[0,1]andµ =E[X ] 1 n i i foreachi. DefineY =(cid:80)n X −µ . Thenforeverym∈N,ifY1,···,Ym areindependentcopiesofY, i=1 i i √ (cid:104) (cid:110) (cid:111)(cid:105) E max 0,Y1,···,Ym ≤O( n·lnm). CombiningProposition1.3withLemma1.2andsettingm=eΘ(ε2n) yieldsTheorem1.1. We remarkthatProposition1.3andTheorem1.1areequivalent,becausetheproxyboundcanbe derivedbyintegratingthetailboundofTheorem1.1combinedwithaunionbound. OurproofofProposition1.3isessentiallyaspecialcaseoftheworkof[BNS+16]onalgorith- micstability,whichprovesconcentrationinequalitiesinamoregeneral“adaptive”setting. (cid:104) (cid:105) 1OurproofalsoworkswiththeproxyE max{|Y1|,|Y2|,···,|Ym|} ,whichnaturallyyieldstwo-sidedtailbounds. 2 BoundingtheProxyviaStability DefineYm+1=0sothat (cid:110) (cid:111) (cid:110) (cid:111) max 0,Y1,···,Ym =max Y1,···,Ym,Ym+1 =Yargmaxj∈[m+1]Yj. Thefunctionf(Y1,···,Ym+1)=argmax Yj isextremelyunstableinthesensethatasmall j∈[m+1] changeintheinputsY1,···,Ym+1 couldchangetheoutputarbitrarily. Incontrast,aconstant functionisperfectlystable,sinceitdoesn’tdependonitsinputatall. Ourproofreliesonthe existenceofanintermediatefunctionthatapproximatetheargmaxfunctionbutprovidessome stability. Weintroduceaparameterη >0thatparameterizesthedegreeofstability,anddefinea (cid:104) (cid:105) procedureS ,andanalyzethequantityE YSη(Y1,···,Ym+1) . η Smaller values of η imply a higher degree of stability. Specifically, as η → 0 then S is η (cid:104) (cid:105) independentofitsinput,soE YSη(Y1,···,Ym+1) →0,sinceE[Y]=0. Ontheotherhand,asη →∞, theapproximationbecomesarbitrarilyaccurateandwehave (cid:104) (cid:105) (cid:104) (cid:110) (cid:111)(cid:105) E YSη(Y1,···,Ym+1) →E max 0,Y1,···,Ym . Allthatremainsistoquantifythetradeoffbetweenstabilityandaccuracy,andfindasuitable valueofη >0thatallowsustorelateE[Y]toE[max{0,Y1,···,Ym}]. Thefullproofiscontained inSection2. 1.2 ComparisontoPreviousProofs Forcontext,wegivesomecomparisonbetweenourapproachandthepreviousproofsofconcen- trationinequalities. PreviousproofsusedifferentproxiesforthetailsP(cid:104)(cid:80)n X −µ ≥εn(cid:105). The i=1 i i mostcommonproxyisthemomentgeneratingfunctionf(λ):=E(cid:104)eλ(cid:80)ni=1Xi−µi(cid:105). IfX1,···,Xn are independentrandomvariablesssupportedon[0,1]andµ =E[X ]foreachi,then i i ∀λ∈R E(cid:104)eλ(cid:80)ni=1Xi−µi(cid:105)≤eO(λ2n). (1) ByMarkov’sinequality,2 (1)impliesthetailbound ∀λ∈R P(cid:88)n Xi−µi ≥εn≤ E(cid:104)eλ((cid:80)eλni=ε1nXi−µi)(cid:105) ≤eO(λ2n)−λεn. (2) i=1 Settingλ=Θ(ε)appropriatelyin(2)yieldsTheorem1.1. Thereverseisalsotrue—integrating thetailboundofTheorem1.1yieldstheboundonthemomentgeneratingfunction(1). Thus Theorem1.1isequivalenttotheproxybound(1).3 Themomentgeneratingfunctionisparticu- larlyeasytoworkwith,asindependencecanbeexploitedto“factor”themomentgenerating function: E(cid:104)eλ(cid:80)ni=1Xi−µi(cid:105) = (cid:81)n E(cid:104)eλ(Xi−µi)(cid:105). Thus proving (1) reduces to analysing a single i=1 X −µ randomvariable. i i 2Markov’sinequalitystatesthat,foranon-negativerandomvariableXandaconstantx>0,P[X≥x]≤E[X]/x. 3Moreprecisely,foranycenteredrandomvariableX,thecondition∀λ E(cid:104)eλX(cid:105)≤eO(λ2) isequivalentthethe condition∀x≥0 P[|X|≥x]≤e−Ω(x2) (aswellasotherequivalentcharacterizations). Sucharandomvariableis calledsubgaussian[Riv12]. 3 Themomentsm(k):=E[((cid:80)n X −µ )k]areanothercommonproxy. IfX ,···,X areindepen- i=1 i i 1 n dentrandomvariablessupportedon[0,1]andµ =E[X ]foreachi,then i i ∀k∈N E(cid:88)n Xi−µi2k≤O(nk)k. (3) i=1 Theorem1.1followsfrom(3)byapplyingMarkov’sinequalitywithanappropriatechoiceof k=Θ(ε2n). Again,themomentbound(3)canbeobtainedbyintegrating(atwo-sidedversionof) thetailboundinTheorem1.1,somomentboundsareequivalenttotailbounds. Animmediate benefitofusingmomentboundsisthattheindependenceassumptioninTheorem1.1canbe relaxedtolimitedindependence[SSS95,BS15]. KabanetsandImpagliazzo[IK10]useasaproxyE[(cid:81)i∈IXi]whereI ⊆{1,2,···,n}isarandom setofindices. OtherproofsofTheorem1.1tendtoresorttoadirectanalysisofthebinomial distribution. Wereferthereaderto[Mul]foracompendiumofproofs. TheAdvantagesofDifferentProxies TheproxyE[max{0,Y1,···,Ym}]isaverynaturalquantitytobound. Inmanyapplications,tail boundsarecombinedwithaunionboundbecausetherealquantityofinterestisoftheform max{Y1,···,Ym}.4 TheadvantageofourproxyE[max{0,Y1,···,Ym}]isthatitlesssensitivetoheavytailsthan eithermomentsorthemomentgeneratingfunction. Namely,supposethat,withprobabilityδ, therandomvariableY takesonalargevalue∆. ThenE[max{0,Y1,···,Ym}]growslinearlywith ∆→∞asO(mδ∆),whereasE[Y2k]growspolynomiallyasδ∆2k andE[eλY]growsexponentially asO(δeλ∆). NotethatP[Y ≥y]doesnotgrowatallas∆increasesbeyondy. For Theorem 1.1 the tails are thin enough that all three proxies yield the same result. However,intheapplicationofBassilyetal.[BNS+16]toadaptivedataanalysis,thereisaδ tail event as described above. Thus, using the proxy E[max{0,Y1,···,Ym}] yields tighter bounds thanwhatisachievableusingmoments. Theremaybeotherapplicationsforwhichourproxyis moresuitablethaneithermomentsorthemomentgeneratingfunction. Wealsonotethatourproxycanbeboundedintermsofthemomentgeneratingfunction: By Jensen’sinequalityandtheboundmax{1,eλY1,···,eλYm}≤1+eλY1+···+eλYm wecanshowthat ∀λ>0 E(cid:104)max(cid:110)0,Y1,···,Ym(cid:111)(cid:105)≤ 1logE(cid:104)eλ·max{0,Y1,···,Ym}(cid:105)≤ 1log(cid:16)1+m·E(cid:104)eλY(cid:105)(cid:17). (4) λ λ ThusProposition1.3canbederivedfromthemomentgeneratingfunctionbound(1)bysetting (cid:16)(cid:112) (cid:17) λ=Θ ln(m+1)/n appropriately. Likewise,Proposition1.3canbederivedfromthemoment bound(3)via (cid:104) (cid:110) (cid:111)(cid:105) (cid:104) (cid:105)1/2k (cid:16) (cid:104) (cid:105)(cid:17)1/2k ∀k∈N E max |Y1|,···,|Ym| ≤E max{(Y1)2k,···,(Ym)2k} ≤ m·E Y2k . (5) 2 Proof of the Proxy Bound Let X be a random n×m matrix with entries supported on [0,1]. Let Xj denote the random i variable corresponding to the entry in the ith row and jth column of X.5 As a convention we 4Proposition1.3alsoholdsiftherandomvariablesY1,···,Ymarenotindependent,liketheunionbound. 5Wechoosethesubscript-superscriptnotationXji,ratherthanthemorecommondouble-subscriptnotationXi,j, becausetherowsandcolumnsofXplaymarkedlydifferentrolesthatwewanttoemphasize. 4 j will use lower-case letters x and x and x to denote realizations of the matrix, rows of the i i matrix,andentriesofthematrix,respectively,andwewilluseuppercaseletterstodenotethe correspondingrandomvariables. Lemma 2.1 (Main Lemma). If X is a random n×m matrix with entries supported on [0,1] and independentrows,then     ∀η >0 Emj∈[amx](cid:88)n Xij≤eηmj∈[amx]E(cid:88)n Xij+ 2lnη(m). i=1 i=1 Before delving into the proof, some remarks are in order. Lemma 2.1 states that, under certainindependenceandboundednessconditions,wecan“switch”theexpectationandthe maximum of a collection of random sums at the price of a multiplicative eη factor and an additive2ln(m)/η factor,foranychoiceofη >0. Inourapplication,wewillstartwithsomeindependentboundedrandomvariables(X ,...,X ). 1 n j j Eachcolumnj ofXwillanindependentcopyoftheserandomvariables,(X ,...,X ). Inthiscase 1 n thecolumnsofXarealsoindependent. Thelemmathensaysthatthetheexpectedmaximumof thecolumnsumsisnottoomuchlargerthantheexpectationofasinglecolumnsum, which impliesourtailbounds. ProofofLemma2.1. Theproofreliesontheexistenceofarandomizedstableselectionprocedure S :[0,1]n×m→[m]definedby η     P(cid:104)Sη(x)=j(cid:105)= C1(x)expη2 ·(cid:88)n xij, where Cη(x)=(cid:88)m expη2 ·(cid:88)n xij. η i=1 j=1 i=1 Thestabilityparameterη >0canbechosenarbitrarily. ThekeytotheproofisthatS balances twogoals: • (Stability) the distribution of S (x) is not very sensitive to changing a single row of x η (i.e.theprobabilitymassfunctioncanonlychangebyasmallmultiplicativefactor),and • (Accuracy)inexpectation,thechosencolumnj =S (x)isclosetothelargestcolumnofx η (i.e.S (x)≈argmax (cid:80)n xj). η j∈[m] i=1 i For x ∈ [0,1]n×m, x˜ ∈ [0,1]m, and i ∈ [n], define (x−i,x˜) ∈ [0,1]n×m to be x with the ith row replacedbyx˜. Thatis, (x−i,x˜)ji(cid:48)(cid:48) =(cid:40) xx˜jij(cid:48)(cid:48)(cid:48) iiffii(cid:48)(cid:48) (cid:44)=ii fori(cid:48) ∈[n]andj(cid:48) ∈[m]. ThenexttwoclaimsestablishthestabilityandaccuracypropertiesofS . Thesetwoclaims η arespecialcasesofwellknownresultsinthedifferentialprivacyliterature. Claim2.2(Stability). Foreveryx∈[0,1]n×m,x˜ ∈[0,1]m,i ∈[n],andj ∈[m], (cid:104) (cid:105) (cid:104) (cid:105) (cid:104) (cid:105) e−ηP Sη(x)=j ≤P Sη(x−i,x˜)=j ≤eηP Sη(x)=j . 5 ProofofClaim2.2. Since (cid:12) (cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88)n xji(cid:48) −(cid:88)n (x−i,x˜)ji(cid:48)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)=(cid:12)(cid:12)(cid:12)(cid:12)xji −x˜j(cid:12)(cid:12)(cid:12)(cid:12)≤1, (cid:12)i(cid:48)=1 i(cid:48)=1 (cid:12) foreveryj ∈[m],wehave (cid:18) (cid:19) e−η2 ≤ expex(cid:18)pη2 ·η2(cid:80)·ni(cid:80)(cid:48)=1ni(cid:48)(=x1−xi,ijx(cid:48)˜)ji(cid:48)(cid:19) =expη2 ·(cid:88)i(cid:48)n=1xji(cid:48) −(cid:88)i(cid:48)n=1(x−i,x˜)ji(cid:48)=exp(cid:18)η2 ·(cid:18)xji −x˜j(cid:19)(cid:19)≤eη2. Consequently,     Cη(x)=(cid:88)m expη2 ·(cid:88)n xij(cid:48)≤(cid:88)m eη2 ·expη2 ·(cid:88)n (x−i,x˜)ji(cid:48)=eη2 ·Cη(x−i,x˜) j=1 i(cid:48)=1 j=1 i(cid:48)=1 η and,likewise,Cη(x−i,x˜)≤e2 ·Cη(x). ThedefinitionofSη implies (cid:18) (cid:19) e−η2 ·e−η2 ≤ P(cid:104)S(cid:104)η(x−i,x˜)=(cid:105)j(cid:105) = ex(cid:18)p η2 ·(cid:80)ni(cid:48)=1xij(cid:48) (cid:19) · Cη(x−i,x˜) ≤eη2 ·eη2, P Sη(x)=j exp η2 ·(cid:80)ni(cid:48)=1(x−i,x˜)ji(cid:48) Cη(x) asrequired. Claim2.3(Accuracy). Foreveryx∈[0,1]n×m,   SE(cid:88)n xiSη(x)≥mj∈[amx](cid:88)n xij− 2lnη(m) η i=1 i=1 Observe that as η → ∞, the accuracy becomes arbitrarily good, whereas the stability de- grades. ProofofClaim2.3. Recallthat,byconstruction,S satisfies, η     P(cid:104)Sη(x)=j(cid:105)= C1(x)expη2 ·(cid:88)n xij, where Cη(x)=(cid:88)m expη2 ·(cid:88)n xij. η i=1 j=1 i=1 Rearranging,wehave (cid:88)n 2(cid:16) (cid:104) (cid:105)(cid:17) xj = lnC (x)+lnP S (x)=j . i η η η i=1 Thus,   SE(cid:88)n xiSη(x)= (cid:88)m P(cid:104)Sη(x)=j(cid:105)·(cid:88)n xij = (cid:88)m P(cid:104)Sη(x)=j(cid:105)· η2(cid:16)lnCη(x)+lnP(cid:104)Sη(x)=j(cid:105)(cid:17) η i=1 j=1 i=1 j=1   = η21·lnCη(x)+(cid:88)m P(cid:104)Sη(x)=j(cid:105)lnP(cid:104)Sη(x)=j(cid:105) j=1 2(cid:16) (cid:104) (cid:105)(cid:17) = lnC (x)−H S (x) , η η η 6 (cid:104) (cid:105) whereH S (x) istheShannonentropyoftherandomvariableS (x)measuredin“nats,”rather η η thanbits,asweareusingnaturallogarithms,ratherthanbinarylogarithms. Sincetherandom (cid:104) (cid:105) variable S (x) is supported on a set of size m, the entropy satisfies H S (x) ≤ ln(m), as the η η entropyfunctionismaximizedbytheuniformdistribution[?,Theorem2.6.4]. Bydefinitionwe have     Cη(x)=(cid:88)m expη2 ·(cid:88)n xij≥mj∈[amx]expη2(cid:88)n xij and lnCη(x)≥mj∈[amx]η2 ·(cid:88)n xij. j=1 i=1 i=1 i=1 Theclaimnowfollowsfromthesetwoinequalities:     SE(cid:88)n xiSη(x)= η2(cid:16)lnCη(x)−H(cid:104)Sη(x)(cid:105)(cid:17)≥ η2mj∈[amx]η2 ·(cid:88)n xij−ln(m). η i=1 i=1 Inordertocompletetheproofofthelemma,wehavethefollowingclaim. Thisclaimuses (cid:20) (cid:21) (cid:20) (cid:21) stability(Claim2.2)toshowthat E (cid:80)n XSη(X) isnotmuchlargerthan E (cid:80)n Xj fora X,S i=1 i X,S i=1 i η η fixedj ∈[m]. (cid:20) (cid:21) Claim2.4. IfE (cid:80)n Xj ≤µforallj ∈[m],then i=1 i   XE,S (cid:88)n XiSη(X)≤eηµ. η i=1 ProofofClaim2.4. LetX˜ beanindependentandidenticalcopyofX. LetX˜ ∈[0,1]m denotethe i ith rowofX˜. Thekeyobservationisthatthepair(X−i,X˜i)andXij isidenticallydistributedtothe pairXandX˜ij —thisiswhereweusetheindependenceassumption. ThusthepairSη(X−i,X˜i) andXj isidenticallydistributedtothepairS (X)andX˜ ,whichmeanswecanswapthembelow. i η i ByClaim2.2,     XE,S (cid:88)n XiSη(X)=EX(cid:88)m (cid:88)n SP(cid:104)Sη(X)=j(cid:105)Xij η i=1 j=1 i=1 η   ≤XE,X˜(cid:88)jm=1(cid:88)i=n1eηSPη(cid:104)Sη(X−i,X˜i)=j(cid:105)Xij   =XE,X˜(cid:88)jm=1(cid:88)i=n1eηSPη(cid:104)Sη(X)=j(cid:105)X˜ij   ≤XE,X˜(cid:88)jm=1eηSPη(cid:104)Sη(X)=j(cid:105)µ =eηµ. 7 (cid:20) (cid:21) CombiningClaim2.3withClaim2.4(settingµ=maxj∈[m]E (cid:80)ni=1Xji ),wehave       EXmj∈[amx](cid:88)n Xij− 2lnη(m)≤XE,S (cid:88)n XiSη(x)≤eηµ=eηmj∈[amx]EX(cid:88)n Xij. i=1 η i=1 i=1 Rearrangingyieldsthelemma. Lemma2.1readilyyieldstheboundclaimedintheintroduction: Proposition2.5(Proposition1.3). LetX ,···,X beindependentrandomvariablessupportedon 1 n [0,1]andµ =E[X ]foreachi. DefineY =(cid:80)n X −µ . Fixm∈NandletY1,···,Ymbeindependent i i i=1 i i copiesofY. Then (cid:104) (cid:110) (cid:111)(cid:105) (cid:112) E max 0,Y1,···,Ym ≤4 n·ln(m+1). (cid:110) (cid:111) Proof. Firstly,ifm≥en−1,thentheresultholdstriviallyasmax 0,Y1,···,Ym ≤nwithcertainty. Sowemayassumem<en−1. Let µ = (cid:80)n µ . For each i ∈ [n], let X1,···,Xm be independent copies of X , so that Yj = i=1 i i i i (cid:80)n Xj−µ forallj ∈[m]. LetXm+1=µ beaconstant“dummyrandomvaraible”foreachi. i=1 i i i i NowweapplyLemma2.1totherandommatrixX∈[0,1]n×(m+1):     ∀η >0 Ej∈m[ma+x1](cid:88)n Xij≤eηj∈m[ma+x1]E(cid:88)n Xij+ 2ln(mη +1). i=1 i=1 (cid:20) (cid:21) By construction, E (cid:80)n Xj =(cid:80)n µ =µ for all j ∈[m+1]. Also (cid:80)n Xj =Yj +µ for all i=1 i i=1 i i=1 i j ∈[m]and(cid:80)n Xm+1=0+µ. Substitutingintheseexpressionsyields i=1 i (cid:104) (cid:110) (cid:111)(cid:105) 2ln(m+1) ∀η >0 E max Y1+µ,Y2+µ,···,Ym+µ,0+µ ≤eηµ+ . η Subtractingµgives (cid:104) (cid:110) (cid:111)(cid:105) 2ln(m+1) ∀η >0 E max Y1,Y2,···,Ym,0 ≤(eη−1)µ+ . η Finally,weusethe(crude)boundµ≤nandtheapproximationeη−1≤2η forη <1toobtain (cid:104) (cid:110) (cid:111)(cid:105) 2ln(m+1) ∀η ∈(0,1) E max Y1,Y2,···,Ym,0 ≤2ηn+ . η (cid:112) (cid:112) Setη = ln(m+1)/n< ln(en)/n=1tocompletetheproof. NowwecanprovethesubgaussiantailboundusingtheproxyboundofProposition2.5. Theorem2.6(Theorem1.1). IfX ,···,X areindependentrandomvariablessupportedon[0,1]and 1 n µ =E[X ]foreveryi,then i i   ∀ε≥0 P(cid:88)n Xi−µi ≥εn≤e1−ε2n/64. i=1 8 Needlesstosay,wehavenotoptimizedtheconstantsinTheorem2.6. Proof. Fixm∈Ntobedeterminedlater. LetY =(cid:80)n X −µandletY1,···,Ym beindependent i=1 i copiesofY. ByProposition2.5,wehave (cid:104) (cid:105) (cid:112) E max{0,Y1,···,Ym} ≤4 nln(m+1). (cid:104) (cid:104) (cid:105)(cid:105) (cid:106) (cid:107) By Lemma 1.2, P Y ≥2E max{0,Y1,···,Ym} ≤ ln(2)/m. Now, set m = eε2n/64−1 , so that (cid:112) 8 nln(m+1)≤εn. Then (cid:104) (cid:112) (cid:105) (cid:104) (cid:104) (cid:105)(cid:105) ln(2) ln(2) P[Y ≥εn]≤P Y ≥8 nln(m+1) ≤P Y ≥2E max{0,Y1,···,Ym} ≤ ≤ . m eε2n/64−2 Thus (cid:40) (cid:41) P[Y ≥εn]≤min 1, ln(2) ≤(2+ln2)·e−ε2n/64≤e1−ε2n/64. eε2n/64−2 3 Extensions Todemonstratetheversatilityofourtechniques,weshowhowtheycanbeusedtoprovesome well-knowngeneralizationsofTheorem1.1. 3.1 MultiplicativeTailBound Theorem3.1. IfX ,...,X areindependentrandomvariablessupportedon[0,1]andµ=E(cid:104)(cid:80)n X (cid:105), 1 n i=1 i then   ∀ε≥0 P(cid:88)n Xi ≥(1+ε)µ≤e·(cid:18)1+ 4ε(cid:19)−εµ/8. i=1 Moreover,theaboveboundimplies   ∀ε∈[0,10] P(cid:88)n Xi ≥(1+ε)µ≤e1−ε2µ/64. i=1 Proof. Let δ =P(cid:104)(cid:80)n X ≥(1+ε)µ(cid:105) and m=(cid:100)ln(2)/δ(cid:101). For each i ∈[n], let X1,···,Xm be inde- i=1 i i i pendent copies of X and let Xm+1 = E[X] be a constant. Now we apply Lemma 2.1 to the i i randommatrixX∈[0,1]n×(m+1) toobtain   Ej∈m[ma+x1](cid:88)n Xij≤eηµ+ 2ln(mη +1). (6) i=1 Ontheotherhand,     Ej∈m[ma+x1](cid:88)n Xij≥µ+εµ·Pmj∈[amx](cid:88)n Xij ≥(1+ε)µ=µ+εµ(1−(1−δ)m)≥µ+εµ(1−e−δm)≥µ(cid:18)1+2ε(cid:19). i=1 i=1 9 Combiningthiswith(6)andsubtractingµfrombothsidesgives ε 2ln(m+1) ∀η >0 µ≤(eη−1)µ+ . 2 η (cid:16) (cid:17) Nowwesetη =ln(1+ε/4)andrearrangetogetln(m+1)≥ εµln 1+ ε .Sincem≤ln(2)/δ+1, 8 4 wehave P(cid:88)n Xi ≥(1+ε)µ=δ≤min(cid:40)1,mln−21(cid:41)≤ 2m++ln12 ≤e(cid:18)1+ 4ε(cid:19)−εµ/8. i=1 Thisgivesthefirsthalfofthetheorem. Thesecondhalffollowsfromthefactthat,if0≤ε≤10, then1+ε/4≥exp(ε/8). WecanalsoboundP(cid:104)(cid:80)n X ≤(1−ε)µ(cid:105). ThisrequireschangingLemma2.1tothefollowing. i=1 i Lemma3.2. IfXisarandomn×mmatrixwithentriessupportedon[0,1]andindependentrows, then     ∀η >0 Ejm∈[imn](cid:88)n Xij≥e−ηjm∈[imn]E(cid:88)n Xij− 2lnη(m). i=1 i=1 TheproofofLemma3.2isalmostidenticaltothatofLemma2.1,exceptthemaximumis replacedwiththeminimum,η isreplacedwith−η,andsomeinequalitiesarereversed. Lemma3.2yieldsthefollowingbound. Theorem3.3. IfX ,...,X areindependentrandomvariablessupportedon[0,1]andµ=E(cid:104)(cid:80)n X (cid:105), 1 n i=1 i then   ∀ε∈[0,1] P(cid:88)n Xi ≤(1−ε)µ≤e1−ε2µ/32. i=1 3.2 McDiarmid’sInequality Subgaussian tail bounds are not specific to summation. For independent random variables X ,···,X , we can prove subgaussian tail bounds on f(X ,···,X ) for any “low-sensitivity” 1 n 1 n functionf. Thesum—thatis,f(x ,···,x )=(cid:80)n x —isonlyoneexampleofalow-sensitivity 1 n i=1 i function. Themoregeneralpropertythatwerequireisbelow. Definition3.4. Afunctionf :Xn→Rissensitivity-∆if,foralli ∈[n]andallx ,x ,···,x ,x(cid:48) ∈X, 1 2 n i wehave (cid:12) (cid:12) (cid:12)(cid:12)f(x1,x2,···,xn)−f(x1,x2,···,xi−1,xi(cid:48),xi+1,···,xn)(cid:12)(cid:12)≤∆. WecangeneralizeProposition1.3tolow-sensitivityfunctionsasfollows. Lemma3.5. LetXbearandomn×mmatrixwiththeentriessupportedonX (notnecessarilybeing realnumbers). AssumethattherowsofXareindependent. Letf :Xn→Rbesensitivity-∆. Then (cid:34) (cid:18) (cid:20) (cid:21)(cid:19)(cid:35) √ E max f(Xj,···,Xj)−E f(Xj,···,Xj) ≤8∆ nlnm. j∈[m] 1 n 1 n TheproofofLemma3.5issimilartothatofTheorem7.2ofBassilyetal.[BNS+16]. Lemma 3.5allowsustogeneralizeTheorem1.1toMcDiarmid’sinequality: Theorem3.6([McD89]). LetX ,···,X beindependentrandomvariablesandletf :Xn→Rhave 1 n sensitivity-∆. Then,forallε≥0, P[f(X ,···,X )−E[f(X ,···,X )]≥εn∆]≤e−Ω(ε2n). 1 n 1 n 10

See more

The list of books you might like