Subgaussian Tail Bounds via Stability Arguments ∗ † Thomas Steinke Jonathan Ullman January 16, 2017 7 1 0 Abstract 2 Sumsofindependent,boundedrandomvariablesconcentratearoundtheirexpectation n approximatelyaswellaGaussianofthesamevariance. Wellknownresultsofthisform a includetheBernstein,Hoeﬀding,andChernoﬀinequalitiesandmanyothers. Wepresentan J alternativeproofofthesetailboundsbasedonwhatwecallastabilityargument,whichavoids 2 1 boundingthemomentgeneratingfunctionorhigher-ordermomentsofthedistribution. Our stabilityargumentisinspiredbyrecentworkonthegeneralizationpropertiesofdiﬀerential ] privacyandtheirconnectiontoadaptivedataanalysis(Bassilyetal.,STOC2016). M D 1 Introduction . s c [ Tailbounds,alsoknownaslargedeviationinequalities,orconcentrationofmeasuretheorems, areanessentialtoolinmathematics,computerscience,andstatistics. Themostcommontype 1 v oftailbounds(intheoreticalcomputerscience)showthatthesumofindependent,bounded 3 randomvariableshassubgaussiantails—thebestknownexamplebeingthefollowingspecial 9 4 caseofBernstein’sInequality. 3 0 Theorem 1.1 ([Ber24]). If X1,···,Xn are independent random variables supported on [0,1] and . µ =E[X ]foreveryi,then 1 i i 0   17 ∀ε≥0 P(cid:88)n Xi−µi ≥εn≤e−Ω(ε2n). : v i=1 i X Theorem 1.1 has many generalizations and extensions: tight constants [Hoe63], random r variables with diﬀering ranges [Hoe63], multiplicative error bounds [Che52], variables with a lowvariance[Ber24],variableswithonlylimitedindependence[SSS95],Martingales[Azu67], matrix-valuedrandomvariables[AW02],andmanymore. WepresentanalternativeproofofTheorem1.1,whichdiﬀerssigniﬁcantlyfromallother proofstheauthorsareawareof. Ourproofisderivedfromrecentworkonalgorithmicstability foradaptivedataanalysis[BNS+16]. Atahighlevel,ourproofshowsthatTheorem1.1follows fromanappropriateboundontheexpectationofthemaximumofmindependentrealizations of (cid:80)n X −µ . The required bound on the expected maximum follows from the existence i=1 i i ofastablerandomizedalgorithmforselectinganapproximatemaximizerfromamongthem realizations. Herestabilitymeansthattheoutputdistributionofthealgorithmisinsensitiveto smallchangesinthesevalues,whichisformalizedbydiﬀerentialprivacy[DMNS06]. ∗ IBM,[email protected] † [email protected] 1 Thealternativeproofwas(inhindsight)akeyingredientinobtainingtightboundsonthe generalization properties of diﬀerential privacy [BNS+16], which can be viewed as a strong extensionofTheorem1.1. Thuswebelieveourapproachbothdeepensourunderstandingof concentrationinequalitiesandmayﬁndfurtherapplications. 1.1 OurApproach Sincereasoningdirectlyaboutthedistributionof(cid:80)n X −µ isdiﬃcult,proofsofTheorem1.1 i=1 i i typicallyworkbyboundingtheexpectationofsomerandomvariablethatservesasa“proxy” forthetailprobabilityP(cid:104)(cid:80)n X −µ ≥εn(cid:105). Ourproofusesaverydiﬀerentproxythanprevious i=1 i i proofs. (SeeSection1.2foracomparisontootherproofs.) Speciﬁcally,toboundP[Y ≥y]we (cid:104) (cid:105) boundthequantityE max{0,Y1,Y2,···,Ym} ,whereY1,···,Ym areindependentcopiesofthe originalrandomvariableY.1 Boundingthisproxyimpliesatailboundviathefollowinglemma. Lemma1.2. LetY bearandomvariableandletY1,Y2,···,Ym beindependentcopiesofY. Then (cid:104) (cid:104) (cid:110) (cid:111)(cid:105)(cid:105) ln(2) P Y ≥2E max 0,Y1,···,Ym ≤ . m (cid:104) (cid:105) Proof. Lety =2E max{0,Y1,···,Ym} andδ=P[Y ≥y]. ByMarkov’sinequality, (cid:104) (cid:105) 1 P max{0,Y1,Y2,···,Ym}≥y ≤ . 2 However,ifδ>ln(2)/m,then (cid:104) (cid:105) (cid:104) (cid:105) P max{0,Y1,Y2,···,Ym}≥y =1−P ∀j ∈[m] Yj <y =1−P[Y <y]m=1−(1−δ)m >1−e−δm>1−e−ln(2)=1/2, whichisacontradiction. Thusδ≤ln(2)/m,asrequired. Thus,thekeytechnicalstepinourproofistoestablishthefollowingproxybound. Proposition1.3. LetX ,···,X beindependentrandomvariablessupportedon[0,1]andµ =E[X ] 1 n i i foreachi. DeﬁneY =(cid:80)n X −µ . Thenforeverym∈N,ifY1,···,Ym areindependentcopiesofY, i=1 i i √ (cid:104) (cid:110) (cid:111)(cid:105) E max 0,Y1,···,Ym ≤O( n·lnm). CombiningProposition1.3withLemma1.2andsettingm=eΘ(ε2n) yieldsTheorem1.1. We remarkthatProposition1.3andTheorem1.1areequivalent,becausetheproxyboundcanbe derivedbyintegratingthetailboundofTheorem1.1combinedwithaunionbound. OurproofofProposition1.3isessentiallyaspecialcaseoftheworkof[BNS+16]onalgorith- micstability,whichprovesconcentrationinequalitiesinamoregeneral“adaptive”setting. (cid:104) (cid:105) 1OurproofalsoworkswiththeproxyE max{|Y1|,|Y2|,···,|Ym|} ,whichnaturallyyieldstwo-sidedtailbounds. 2 BoundingtheProxyviaStability DeﬁneYm+1=0sothat (cid:110) (cid:111) (cid:110) (cid:111) max 0,Y1,···,Ym =max Y1,···,Ym,Ym+1 =Yargmaxj∈[m+1]Yj. Thefunctionf(Y1,···,Ym+1)=argmax Yj isextremelyunstableinthesensethatasmall j∈[m+1] changeintheinputsY1,···,Ym+1 couldchangetheoutputarbitrarily. Incontrast,aconstant functionisperfectlystable,sinceitdoesn’tdependonitsinputatall. Ourproofreliesonthe existenceofanintermediatefunctionthatapproximatetheargmaxfunctionbutprovidessome stability. Weintroduceaparameterη >0thatparameterizesthedegreeofstability,anddeﬁnea (cid:104) (cid:105) procedureS ,andanalyzethequantityE YSη(Y1,···,Ym+1) . η Smaller values of η imply a higher degree of stability. Speciﬁcally, as η → 0 then S is η (cid:104) (cid:105) independentofitsinput,soE YSη(Y1,···,Ym+1) →0,sinceE[Y]=0. Ontheotherhand,asη →∞, theapproximationbecomesarbitrarilyaccurateandwehave (cid:104) (cid:105) (cid:104) (cid:110) (cid:111)(cid:105) E YSη(Y1,···,Ym+1) →E max 0,Y1,···,Ym . Allthatremainsistoquantifythetradeoﬀbetweenstabilityandaccuracy,andﬁndasuitable valueofη >0thatallowsustorelateE[Y]toE[max{0,Y1,···,Ym}]. Thefullproofiscontained inSection2. 1.2 ComparisontoPreviousProofs Forcontext,wegivesomecomparisonbetweenourapproachandthepreviousproofsofconcen- trationinequalities. PreviousproofsusediﬀerentproxiesforthetailsP(cid:104)(cid:80)n X −µ ≥εn(cid:105). The i=1 i i mostcommonproxyisthemomentgeneratingfunctionf(λ):=E(cid:104)eλ(cid:80)ni=1Xi−µi(cid:105). IfX1,···,Xn are independentrandomvariablesssupportedon[0,1]andµ =E[X ]foreachi,then i i ∀λ∈R E(cid:104)eλ(cid:80)ni=1Xi−µi(cid:105)≤eO(λ2n). (1) ByMarkov’sinequality,2 (1)impliesthetailbound ∀λ∈R P(cid:88)n Xi−µi ≥εn≤ E(cid:104)eλ((cid:80)eλni=ε1nXi−µi)(cid:105) ≤eO(λ2n)−λεn. (2) i=1 Settingλ=Θ(ε)appropriatelyin(2)yieldsTheorem1.1. Thereverseisalsotrue—integrating thetailboundofTheorem1.1yieldstheboundonthemomentgeneratingfunction(1). Thus Theorem1.1isequivalenttotheproxybound(1).3 Themomentgeneratingfunctionisparticu- larlyeasytoworkwith,asindependencecanbeexploitedto“factor”themomentgenerating function: E(cid:104)eλ(cid:80)ni=1Xi−µi(cid:105) = (cid:81)n E(cid:104)eλ(Xi−µi)(cid:105). Thus proving (1) reduces to analysing a single i=1 X −µ randomvariable. i i 2Markov’sinequalitystatesthat,foranon-negativerandomvariableXandaconstantx>0,P[X≥x]≤E[X]/x. 3Moreprecisely,foranycenteredrandomvariableX,thecondition∀λ E(cid:104)eλX(cid:105)≤eO(λ2) isequivalentthethe condition∀x≥0 P[|X|≥x]≤e−Ω(x2) (aswellasotherequivalentcharacterizations). Sucharandomvariableis calledsubgaussian[Riv12]. 3 Themomentsm(k):=E[((cid:80)n X −µ )k]areanothercommonproxy. IfX ,···,X areindepen- i=1 i i 1 n dentrandomvariablessupportedon[0,1]andµ =E[X ]foreachi,then i i ∀k∈N E(cid:88)n Xi−µi2k≤O(nk)k. (3) i=1 Theorem1.1followsfrom(3)byapplyingMarkov’sinequalitywithanappropriatechoiceof k=Θ(ε2n). Again,themomentbound(3)canbeobtainedbyintegrating(atwo-sidedversionof) thetailboundinTheorem1.1,somomentboundsareequivalenttotailbounds. Animmediate beneﬁtofusingmomentboundsisthattheindependenceassumptioninTheorem1.1canbe relaxedtolimitedindependence[SSS95,BS15]. KabanetsandImpagliazzo[IK10]useasaproxyE[(cid:81)i∈IXi]whereI ⊆{1,2,···,n}isarandom setofindices. OtherproofsofTheorem1.1tendtoresorttoadirectanalysisofthebinomial distribution. Wereferthereaderto[Mul]foracompendiumofproofs. TheAdvantagesofDiﬀerentProxies TheproxyE[max{0,Y1,···,Ym}]isaverynaturalquantitytobound. Inmanyapplications,tail boundsarecombinedwithaunionboundbecausetherealquantityofinterestisoftheform max{Y1,···,Ym}.4 TheadvantageofourproxyE[max{0,Y1,···,Ym}]isthatitlesssensitivetoheavytailsthan eithermomentsorthemomentgeneratingfunction. Namely,supposethat,withprobabilityδ, therandomvariableY takesonalargevalue∆. ThenE[max{0,Y1,···,Ym}]growslinearlywith ∆→∞asO(mδ∆),whereasE[Y2k]growspolynomiallyasδ∆2k andE[eλY]growsexponentially asO(δeλ∆). NotethatP[Y ≥y]doesnotgrowatallas∆increasesbeyondy. For Theorem 1.1 the tails are thin enough that all three proxies yield the same result. However,intheapplicationofBassilyetal.[BNS+16]toadaptivedataanalysis,thereisaδ tail event as described above. Thus, using the proxy E[max{0,Y1,···,Ym}] yields tighter bounds thanwhatisachievableusingmoments. Theremaybeotherapplicationsforwhichourproxyis moresuitablethaneithermomentsorthemomentgeneratingfunction. Wealsonotethatourproxycanbeboundedintermsofthemomentgeneratingfunction: By Jensen’sinequalityandtheboundmax{1,eλY1,···,eλYm}≤1+eλY1+···+eλYm wecanshowthat ∀λ>0 E(cid:104)max(cid:110)0,Y1,···,Ym(cid:111)(cid:105)≤ 1logE(cid:104)eλ·max{0,Y1,···,Ym}(cid:105)≤ 1log(cid:16)1+m·E(cid:104)eλY(cid:105)(cid:17). (4) λ λ ThusProposition1.3canbederivedfromthemomentgeneratingfunctionbound(1)bysetting (cid:16)(cid:112) (cid:17) λ=Θ ln(m+1)/n appropriately. Likewise,Proposition1.3canbederivedfromthemoment bound(3)via (cid:104) (cid:110) (cid:111)(cid:105) (cid:104) (cid:105)1/2k (cid:16) (cid:104) (cid:105)(cid:17)1/2k ∀k∈N E max |Y1|,···,|Ym| ≤E max{(Y1)2k,···,(Ym)2k} ≤ m·E Y2k . (5) 2 Proof of the Proxy Bound Let X be a random n×m matrix with entries supported on [0,1]. Let Xj denote the random i variable corresponding to the entry in the ith row and jth column of X.5 As a convention we 4Proposition1.3alsoholdsiftherandomvariablesY1,···,Ymarenotindependent,liketheunionbound. 5Wechoosethesubscript-superscriptnotationXji,ratherthanthemorecommondouble-subscriptnotationXi,j, becausetherowsandcolumnsofXplaymarkedlydiﬀerentrolesthatwewanttoemphasize. 4 j will use lower-case letters x and x and x to denote realizations of the matrix, rows of the i i matrix,andentriesofthematrix,respectively,andwewilluseuppercaseletterstodenotethe correspondingrandomvariables. Lemma 2.1 (Main Lemma). If X is a random n×m matrix with entries supported on [0,1] and independentrows,then     ∀η >0 Emj∈[amx](cid:88)n Xij≤eηmj∈[amx]E(cid:88)n Xij+ 2lnη(m). i=1 i=1 Before delving into the proof, some remarks are in order. Lemma 2.1 states that, under certainindependenceandboundednessconditions,wecan“switch”theexpectationandthe maximum of a collection of random sums at the price of a multiplicative eη factor and an additive2ln(m)/η factor,foranychoiceofη >0. Inourapplication,wewillstartwithsomeindependentboundedrandomvariables(X ,...,X ). 1 n j j Eachcolumnj ofXwillanindependentcopyoftheserandomvariables,(X ,...,X ). Inthiscase 1 n thecolumnsofXarealsoindependent. Thelemmathensaysthatthetheexpectedmaximumof thecolumnsumsisnottoomuchlargerthantheexpectationofasinglecolumnsum, which impliesourtailbounds. ProofofLemma2.1. Theproofreliesontheexistenceofarandomizedstableselectionprocedure S :[0,1]n×m→[m]deﬁnedby η     P(cid:104)Sη(x)=j(cid:105)= C1(x)expη2 ·(cid:88)n xij, where Cη(x)=(cid:88)m expη2 ·(cid:88)n xij. η i=1 j=1 i=1 Thestabilityparameterη >0canbechosenarbitrarily. ThekeytotheproofisthatS balances twogoals: • (Stability) the distribution of S (x) is not very sensitive to changing a single row of x η (i.e.theprobabilitymassfunctioncanonlychangebyasmallmultiplicativefactor),and • (Accuracy)inexpectation,thechosencolumnj =S (x)isclosetothelargestcolumnofx η (i.e.S (x)≈argmax (cid:80)n xj). η j∈[m] i=1 i For x ∈ [0,1]n×m, x˜ ∈ [0,1]m, and i ∈ [n], deﬁne (x−i,x˜) ∈ [0,1]n×m to be x with the ith row replacedbyx˜. Thatis, (x−i,x˜)ji(cid:48)(cid:48) =(cid:40) xx˜jij(cid:48)(cid:48)(cid:48) iiffii(cid:48)(cid:48) (cid:44)=ii fori(cid:48) ∈[n]andj(cid:48) ∈[m]. ThenexttwoclaimsestablishthestabilityandaccuracypropertiesofS . Thesetwoclaims η arespecialcasesofwellknownresultsinthediﬀerentialprivacyliterature. Claim2.2(Stability). Foreveryx∈[0,1]n×m,x˜ ∈[0,1]m,i ∈[n],andj ∈[m], (cid:104) (cid:105) (cid:104) (cid:105) (cid:104) (cid:105) e−ηP Sη(x)=j ≤P Sη(x−i,x˜)=j ≤eηP Sη(x)=j . 5 ProofofClaim2.2. Since (cid:12) (cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88)n xji(cid:48) −(cid:88)n (x−i,x˜)ji(cid:48)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)=(cid:12)(cid:12)(cid:12)(cid:12)xji −x˜j(cid:12)(cid:12)(cid:12)(cid:12)≤1, (cid:12)i(cid:48)=1 i(cid:48)=1 (cid:12) foreveryj ∈[m],wehave (cid:18) (cid:19) e−η2 ≤ expex(cid:18)pη2 ·η2(cid:80)·ni(cid:80)(cid:48)=1ni(cid:48)(=x1−xi,ijx(cid:48)˜)ji(cid:48)(cid:19) =expη2 ·(cid:88)i(cid:48)n=1xji(cid:48) −(cid:88)i(cid:48)n=1(x−i,x˜)ji(cid:48)=exp(cid:18)η2 ·(cid:18)xji −x˜j(cid:19)(cid:19)≤eη2. Consequently,     Cη(x)=(cid:88)m expη2 ·(cid:88)n xij(cid:48)≤(cid:88)m eη2 ·expη2 ·(cid:88)n (x−i,x˜)ji(cid:48)=eη2 ·Cη(x−i,x˜) j=1 i(cid:48)=1 j=1 i(cid:48)=1 η and,likewise,Cη(x−i,x˜)≤e2 ·Cη(x). ThedeﬁnitionofSη implies (cid:18) (cid:19) e−η2 ·e−η2 ≤ P(cid:104)S(cid:104)η(x−i,x˜)=(cid:105)j(cid:105) = ex(cid:18)p η2 ·(cid:80)ni(cid:48)=1xij(cid:48) (cid:19) · Cη(x−i,x˜) ≤eη2 ·eη2, P Sη(x)=j exp η2 ·(cid:80)ni(cid:48)=1(x−i,x˜)ji(cid:48) Cη(x) asrequired. Claim2.3(Accuracy). Foreveryx∈[0,1]n×m,   SE(cid:88)n xiSη(x)≥mj∈[amx](cid:88)n xij− 2lnη(m) η i=1 i=1 Observe that as η → ∞, the accuracy becomes arbitrarily good, whereas the stability de- grades. ProofofClaim2.3. Recallthat,byconstruction,S satisﬁes, η     P(cid:104)Sη(x)=j(cid:105)= C1(x)expη2 ·(cid:88)n xij, where Cη(x)=(cid:88)m expη2 ·(cid:88)n xij. η i=1 j=1 i=1 Rearranging,wehave (cid:88)n 2(cid:16) (cid:104) (cid:105)(cid:17) xj = lnC (x)+lnP S (x)=j . i η η η i=1 Thus,   SE(cid:88)n xiSη(x)= (cid:88)m P(cid:104)Sη(x)=j(cid:105)·(cid:88)n xij = (cid:88)m P(cid:104)Sη(x)=j(cid:105)· η2(cid:16)lnCη(x)+lnP(cid:104)Sη(x)=j(cid:105)(cid:17) η i=1 j=1 i=1 j=1   = η21·lnCη(x)+(cid:88)m P(cid:104)Sη(x)=j(cid:105)lnP(cid:104)Sη(x)=j(cid:105) j=1 2(cid:16) (cid:104) (cid:105)(cid:17) = lnC (x)−H S (x) , η η η 6 (cid:104) (cid:105) whereH S (x) istheShannonentropyoftherandomvariableS (x)measuredin“nats,”rather η η thanbits,asweareusingnaturallogarithms,ratherthanbinarylogarithms. Sincetherandom (cid:104) (cid:105) variable S (x) is supported on a set of size m, the entropy satisﬁes H S (x) ≤ ln(m), as the η η entropyfunctionismaximizedbytheuniformdistribution[?,Theorem2.6.4]. Bydeﬁnitionwe have     Cη(x)=(cid:88)m expη2 ·(cid:88)n xij≥mj∈[amx]expη2(cid:88)n xij and lnCη(x)≥mj∈[amx]η2 ·(cid:88)n xij. j=1 i=1 i=1 i=1 Theclaimnowfollowsfromthesetwoinequalities:     SE(cid:88)n xiSη(x)= η2(cid:16)lnCη(x)−H(cid:104)Sη(x)(cid:105)(cid:17)≥ η2mj∈[amx]η2 ·(cid:88)n xij−ln(m). η i=1 i=1 Inordertocompletetheproofofthelemma,wehavethefollowingclaim. Thisclaimuses (cid:20) (cid:21) (cid:20) (cid:21) stability(Claim2.2)toshowthat E (cid:80)n XSη(X) isnotmuchlargerthan E (cid:80)n Xj fora X,S i=1 i X,S i=1 i η η ﬁxedj ∈[m]. (cid:20) (cid:21) Claim2.4. IfE (cid:80)n Xj ≤µforallj ∈[m],then i=1 i   XE,S (cid:88)n XiSη(X)≤eηµ. η i=1 ProofofClaim2.4. LetX˜ beanindependentandidenticalcopyofX. LetX˜ ∈[0,1]m denotethe i ith rowofX˜. Thekeyobservationisthatthepair(X−i,X˜i)andXij isidenticallydistributedtothe pairXandX˜ij —thisiswhereweusetheindependenceassumption. ThusthepairSη(X−i,X˜i) andXj isidenticallydistributedtothepairS (X)andX˜ ,whichmeanswecanswapthembelow. i η i ByClaim2.2,     XE,S (cid:88)n XiSη(X)=EX(cid:88)m (cid:88)n SP(cid:104)Sη(X)=j(cid:105)Xij η i=1 j=1 i=1 η   ≤XE,X˜(cid:88)jm=1(cid:88)i=n1eηSPη(cid:104)Sη(X−i,X˜i)=j(cid:105)Xij   =XE,X˜(cid:88)jm=1(cid:88)i=n1eηSPη(cid:104)Sη(X)=j(cid:105)X˜ij   ≤XE,X˜(cid:88)jm=1eηSPη(cid:104)Sη(X)=j(cid:105)µ =eηµ. 7 (cid:20) (cid:21) CombiningClaim2.3withClaim2.4(settingµ=maxj∈[m]E (cid:80)ni=1Xji ),wehave       EXmj∈[amx](cid:88)n Xij− 2lnη(m)≤XE,S (cid:88)n XiSη(x)≤eηµ=eηmj∈[amx]EX(cid:88)n Xij. i=1 η i=1 i=1 Rearrangingyieldsthelemma. Lemma2.1readilyyieldstheboundclaimedintheintroduction: Proposition2.5(Proposition1.3). LetX ,···,X beindependentrandomvariablessupportedon 1 n [0,1]andµ =E[X ]foreachi. DeﬁneY =(cid:80)n X −µ . Fixm∈NandletY1,···,Ymbeindependent i i i=1 i i copiesofY. Then (cid:104) (cid:110) (cid:111)(cid:105) (cid:112) E max 0,Y1,···,Ym ≤4 n·ln(m+1). (cid:110) (cid:111) Proof. Firstly,ifm≥en−1,thentheresultholdstriviallyasmax 0,Y1,···,Ym ≤nwithcertainty. Sowemayassumem<en−1. Let µ = (cid:80)n µ . For each i ∈ [n], let X1,···,Xm be independent copies of X , so that Yj = i=1 i i i i (cid:80)n Xj−µ forallj ∈[m]. LetXm+1=µ beaconstant“dummyrandomvaraible”foreachi. i=1 i i i i NowweapplyLemma2.1totherandommatrixX∈[0,1]n×(m+1):     ∀η >0 Ej∈m[ma+x1](cid:88)n Xij≤eηj∈m[ma+x1]E(cid:88)n Xij+ 2ln(mη +1). i=1 i=1 (cid:20) (cid:21) By construction, E (cid:80)n Xj =(cid:80)n µ =µ for all j ∈[m+1]. Also (cid:80)n Xj =Yj +µ for all i=1 i i=1 i i=1 i j ∈[m]and(cid:80)n Xm+1=0+µ. Substitutingintheseexpressionsyields i=1 i (cid:104) (cid:110) (cid:111)(cid:105) 2ln(m+1) ∀η >0 E max Y1+µ,Y2+µ,···,Ym+µ,0+µ ≤eηµ+ . η Subtractingµgives (cid:104) (cid:110) (cid:111)(cid:105) 2ln(m+1) ∀η >0 E max Y1,Y2,···,Ym,0 ≤(eη−1)µ+ . η Finally,weusethe(crude)boundµ≤nandtheapproximationeη−1≤2η forη <1toobtain (cid:104) (cid:110) (cid:111)(cid:105) 2ln(m+1) ∀η ∈(0,1) E max Y1,Y2,···,Ym,0 ≤2ηn+ . η (cid:112) (cid:112) Setη = ln(m+1)/n< ln(en)/n=1tocompletetheproof. NowwecanprovethesubgaussiantailboundusingtheproxyboundofProposition2.5. Theorem2.6(Theorem1.1). IfX ,···,X areindependentrandomvariablessupportedon[0,1]and 1 n µ =E[X ]foreveryi,then i i   ∀ε≥0 P(cid:88)n Xi−µi ≥εn≤e1−ε2n/64. i=1 8 Needlesstosay,wehavenotoptimizedtheconstantsinTheorem2.6. Proof. Fixm∈Ntobedeterminedlater. LetY =(cid:80)n X −µandletY1,···,Ym beindependent i=1 i copiesofY. ByProposition2.5,wehave (cid:104) (cid:105) (cid:112) E max{0,Y1,···,Ym} ≤4 nln(m+1). (cid:104) (cid:104) (cid:105)(cid:105) (cid:106) (cid:107) By Lemma 1.2, P Y ≥2E max{0,Y1,···,Ym} ≤ ln(2)/m. Now, set m = eε2n/64−1 , so that (cid:112) 8 nln(m+1)≤εn. Then (cid:104) (cid:112) (cid:105) (cid:104) (cid:104) (cid:105)(cid:105) ln(2) ln(2) P[Y ≥εn]≤P Y ≥8 nln(m+1) ≤P Y ≥2E max{0,Y1,···,Ym} ≤ ≤ . m eε2n/64−2 Thus (cid:40) (cid:41) P[Y ≥εn]≤min 1, ln(2) ≤(2+ln2)·e−ε2n/64≤e1−ε2n/64. eε2n/64−2 3 Extensions Todemonstratetheversatilityofourtechniques,weshowhowtheycanbeusedtoprovesome well-knowngeneralizationsofTheorem1.1. 3.1 MultiplicativeTailBound Theorem3.1. IfX ,...,X areindependentrandomvariablessupportedon[0,1]andµ=E(cid:104)(cid:80)n X (cid:105), 1 n i=1 i then   ∀ε≥0 P(cid:88)n Xi ≥(1+ε)µ≤e·(cid:18)1+ 4ε(cid:19)−εµ/8. i=1 Moreover,theaboveboundimplies   ∀ε∈[0,10] P(cid:88)n Xi ≥(1+ε)µ≤e1−ε2µ/64. i=1 Proof. Let δ =P(cid:104)(cid:80)n X ≥(1+ε)µ(cid:105) and m=(cid:100)ln(2)/δ(cid:101). For each i ∈[n], let X1,···,Xm be inde- i=1 i i i pendent copies of X and let Xm+1 = E[X] be a constant. Now we apply Lemma 2.1 to the i i randommatrixX∈[0,1]n×(m+1) toobtain   Ej∈m[ma+x1](cid:88)n Xij≤eηµ+ 2ln(mη +1). (6) i=1 Ontheotherhand,     Ej∈m[ma+x1](cid:88)n Xij≥µ+εµ·Pmj∈[amx](cid:88)n Xij ≥(1+ε)µ=µ+εµ(1−(1−δ)m)≥µ+εµ(1−e−δm)≥µ(cid:18)1+2ε(cid:19). i=1 i=1 9 Combiningthiswith(6)andsubtractingµfrombothsidesgives ε 2ln(m+1) ∀η >0 µ≤(eη−1)µ+ . 2 η (cid:16) (cid:17) Nowwesetη =ln(1+ε/4)andrearrangetogetln(m+1)≥ εµln 1+ ε .Sincem≤ln(2)/δ+1, 8 4 wehave P(cid:88)n Xi ≥(1+ε)µ=δ≤min(cid:40)1,mln−21(cid:41)≤ 2m++ln12 ≤e(cid:18)1+ 4ε(cid:19)−εµ/8. i=1 Thisgivestheﬁrsthalfofthetheorem. Thesecondhalffollowsfromthefactthat,if0≤ε≤10, then1+ε/4≥exp(ε/8). WecanalsoboundP(cid:104)(cid:80)n X ≤(1−ε)µ(cid:105). ThisrequireschangingLemma2.1tothefollowing. i=1 i Lemma3.2. IfXisarandomn×mmatrixwithentriessupportedon[0,1]andindependentrows, then     ∀η >0 Ejm∈[imn](cid:88)n Xij≥e−ηjm∈[imn]E(cid:88)n Xij− 2lnη(m). i=1 i=1 TheproofofLemma3.2isalmostidenticaltothatofLemma2.1,exceptthemaximumis replacedwiththeminimum,η isreplacedwith−η,andsomeinequalitiesarereversed. Lemma3.2yieldsthefollowingbound. Theorem3.3. IfX ,...,X areindependentrandomvariablessupportedon[0,1]andµ=E(cid:104)(cid:80)n X (cid:105), 1 n i=1 i then   ∀ε∈[0,1] P(cid:88)n Xi ≤(1−ε)µ≤e1−ε2µ/32. i=1 3.2 McDiarmid’sInequality Subgaussian tail bounds are not speciﬁc to summation. For independent random variables X ,···,X , we can prove subgaussian tail bounds on f(X ,···,X ) for any “low-sensitivity” 1 n 1 n functionf. Thesum—thatis,f(x ,···,x )=(cid:80)n x —isonlyoneexampleofalow-sensitivity 1 n i=1 i function. Themoregeneralpropertythatwerequireisbelow. Deﬁnition3.4. Afunctionf :Xn→Rissensitivity-∆if,foralli ∈[n]andallx ,x ,···,x ,x(cid:48) ∈X, 1 2 n i wehave (cid:12) (cid:12) (cid:12)(cid:12)f(x1,x2,···,xn)−f(x1,x2,···,xi−1,xi(cid:48),xi+1,···,xn)(cid:12)(cid:12)≤∆. WecangeneralizeProposition1.3tolow-sensitivityfunctionsasfollows. Lemma3.5. LetXbearandomn×mmatrixwiththeentriessupportedonX (notnecessarilybeing realnumbers). AssumethattherowsofXareindependent. Letf :Xn→Rbesensitivity-∆. Then (cid:34) (cid:18) (cid:20) (cid:21)(cid:19)(cid:35) √ E max f(Xj,···,Xj)−E f(Xj,···,Xj) ≤8∆ nlnm. j∈[m] 1 n 1 n TheproofofLemma3.5issimilartothatofTheorem7.2ofBassilyetal.[BNS+16]. Lemma 3.5allowsustogeneralizeTheorem1.1toMcDiarmid’sinequality: Theorem3.6([McD89]). LetX ,···,X beindependentrandomvariablesandletf :Xn→Rhave 1 n sensitivity-∆. Then,forallε≥0, P[f(X ,···,X )−E[f(X ,···,X )]≥εn∆]≤e−Ω(ε2n). 1 n 1 n 10

Subgaussian Tail Bounds via Stability Arguments PDF

0.21 MB

by Thomas Steinke

#journals #arxiv

Checking for file health...

Preview Subgaussian Tail Bounds via Stability Arguments

Subgaussian Tail Bounds via Stability Arguments ∗ † Thomas Steinke Jonathan Ullman January 16, 2017 7 1 0 Abstract 2 Sumsofindependent,boundedrandomvariablesconcentratearoundtheirexpectation n approximatelyaswellaGaussianofthesamevariance. Wellknownresultsofthisform a includetheBernstein,Hoeﬀding,andChernoﬀinequalitiesandmanyothers. Wepresentan J alternativeproofofthesetailboundsbasedonwhatwecallastabilityargument,whichavoids 2 1 boundingthemomentgeneratingfunctionorhigher-ordermomentsofthedistribution. Our stabilityargumentisinspiredbyrecentworkonthegeneralizationpropertiesofdiﬀerential ] privacyandtheirconnectiontoadaptivedataanalysis(Bassilyetal.,STOC2016). M D 1 Introduction . s c [ Tailbounds,alsoknownaslargedeviationinequalities,orconcentrationofmeasuretheorems, areanessentialtoolinmathematics,computerscience,andstatistics. Themostcommontype 1 v oftailbounds(intheoreticalcomputerscience)showthatthesumofindependent,bounded 3 randomvariableshassubgaussiantails—thebestknownexamplebeingthefollowingspecial 9 4 caseofBernstein’sInequality. 3 0 Theorem 1.1 ([Ber24]). If X1,···,Xn are independent random variables supported on [0,1] and . µ =E[X ]foreveryi,then 1 i i 0   17 ∀ε≥0 P(cid:88)n Xi−µi ≥εn≤e−Ω(ε2n). : v i=1 i X Theorem 1.1 has many generalizations and extensions: tight constants [Hoe63], random r variables with diﬀering ranges [Hoe63], multiplicative error bounds [Che52], variables with a lowvariance[Ber24],variableswithonlylimitedindependence[SSS95],Martingales[Azu67], matrix-valuedrandomvariables[AW02],andmanymore. WepresentanalternativeproofofTheorem1.1,whichdiﬀerssigniﬁcantlyfromallother proofstheauthorsareawareof. Ourproofisderivedfromrecentworkonalgorithmicstability foradaptivedataanalysis[BNS+16]. Atahighlevel,ourproofshowsthatTheorem1.1follows fromanappropriateboundontheexpectationofthemaximumofmindependentrealizations of (cid:80)n X −µ . The required bound on the expected maximum follows from the existence i=1 i i ofastablerandomizedalgorithmforselectinganapproximatemaximizerfromamongthem realizations. Herestabilitymeansthattheoutputdistributionofthealgorithmisinsensitiveto smallchangesinthesevalues,whichisformalizedbydiﬀerentialprivacy[DMNS06]. ∗ IBM,[email protected] † [email protected] 1 Thealternativeproofwas(inhindsight)akeyingredientinobtainingtightboundsonthe generalization properties of diﬀerential privacy [BNS+16], which can be viewed as a strong extensionofTheorem1.1. Thuswebelieveourapproachbothdeepensourunderstandingof concentrationinequalitiesandmayﬁndfurtherapplications. 1.1 OurApproach Sincereasoningdirectlyaboutthedistributionof(cid:80)n X −µ isdiﬃcult,proofsofTheorem1.1 i=1 i i typicallyworkbyboundingtheexpectationofsomerandomvariablethatservesasa“proxy” forthetailprobabilityP(cid:104)(cid:80)n X −µ ≥εn(cid:105). Ourproofusesaverydiﬀerentproxythanprevious i=1 i i proofs. (SeeSection1.2foracomparisontootherproofs.) Speciﬁcally,toboundP[Y ≥y]we (cid:104) (cid:105) boundthequantityE max{0,Y1,Y2,···,Ym} ,whereY1,···,Ym areindependentcopiesofthe originalrandomvariableY.1 Boundingthisproxyimpliesatailboundviathefollowinglemma. Lemma1.2. LetY bearandomvariableandletY1,Y2,···,Ym beindependentcopiesofY. Then (cid:104) (cid:104) (cid:110) (cid:111)(cid:105)(cid:105) ln(2) P Y ≥2E max 0,Y1,···,Ym ≤ . m (cid:104) (cid:105) Proof. Lety =2E max{0,Y1,···,Ym} andδ=P[Y ≥y]. ByMarkov’sinequality, (cid:104) (cid:105) 1 P max{0,Y1,Y2,···,Ym}≥y ≤ . 2 However,ifδ>ln(2)/m,then (cid:104) (cid:105) (cid:104) (cid:105) P max{0,Y1,Y2,···,Ym}≥y =1−P ∀j ∈[m] Yj <y =1−P[Y <y]m=1−(1−δ)m >1−e−δm>1−e−ln(2)=1/2, whichisacontradiction. Thusδ≤ln(2)/m,asrequired. Thus,thekeytechnicalstepinourproofistoestablishthefollowingproxybound. Proposition1.3. LetX ,···,X beindependentrandomvariablessupportedon[0,1]andµ =E[X ] 1 n i i foreachi. DeﬁneY =(cid:80)n X −µ . Thenforeverym∈N,ifY1,···,Ym areindependentcopiesofY, i=1 i i √ (cid:104) (cid:110) (cid:111)(cid:105) E max 0,Y1,···,Ym ≤O( n·lnm). CombiningProposition1.3withLemma1.2andsettingm=eΘ(ε2n) yieldsTheorem1.1. We remarkthatProposition1.3andTheorem1.1areequivalent,becausetheproxyboundcanbe derivedbyintegratingthetailboundofTheorem1.1combinedwithaunionbound. OurproofofProposition1.3isessentiallyaspecialcaseoftheworkof[BNS+16]onalgorith- micstability,whichprovesconcentrationinequalitiesinamoregeneral“adaptive”setting. (cid:104) (cid:105) 1OurproofalsoworkswiththeproxyE max{|Y1|,|Y2|,···,|Ym|} ,whichnaturallyyieldstwo-sidedtailbounds. 2 BoundingtheProxyviaStability DeﬁneYm+1=0sothat (cid:110) (cid:111) (cid:110) (cid:111) max 0,Y1,···,Ym =max Y1,···,Ym,Ym+1 =Yargmaxj∈[m+1]Yj. Thefunctionf(Y1,···,Ym+1)=argmax Yj isextremelyunstableinthesensethatasmall j∈[m+1] changeintheinputsY1,···,Ym+1 couldchangetheoutputarbitrarily. Incontrast,aconstant functionisperfectlystable,sinceitdoesn’tdependonitsinputatall. Ourproofreliesonthe existenceofanintermediatefunctionthatapproximatetheargmaxfunctionbutprovidessome stability. Weintroduceaparameterη >0thatparameterizesthedegreeofstability,anddeﬁnea (cid:104) (cid:105) procedureS ,andanalyzethequantityE YSη(Y1,···,Ym+1) . η Smaller values of η imply a higher degree of stability. Speciﬁcally, as η → 0 then S is η (cid:104) (cid:105) independentofitsinput,soE YSη(Y1,···,Ym+1) →0,sinceE[Y]=0. Ontheotherhand,asη →∞, theapproximationbecomesarbitrarilyaccurateandwehave (cid:104) (cid:105) (cid:104) (cid:110) (cid:111)(cid:105) E YSη(Y1,···,Ym+1) →E max 0,Y1,···,Ym . Allthatremainsistoquantifythetradeoﬀbetweenstabilityandaccuracy,andﬁndasuitable valueofη >0thatallowsustorelateE[Y]toE[max{0,Y1,···,Ym}]. Thefullproofiscontained inSection2. 1.2 ComparisontoPreviousProofs Forcontext,wegivesomecomparisonbetweenourapproachandthepreviousproofsofconcen- trationinequalities. PreviousproofsusediﬀerentproxiesforthetailsP(cid:104)(cid:80)n X −µ ≥εn(cid:105). The i=1 i i mostcommonproxyisthemomentgeneratingfunctionf(λ):=E(cid:104)eλ(cid:80)ni=1Xi−µi(cid:105). IfX1,···,Xn are independentrandomvariablesssupportedon[0,1]andµ =E[X ]foreachi,then i i ∀λ∈R E(cid:104)eλ(cid:80)ni=1Xi−µi(cid:105)≤eO(λ2n). (1) ByMarkov’sinequality,2 (1)impliesthetailbound ∀λ∈R P(cid:88)n Xi−µi ≥εn≤ E(cid:104)eλ((cid:80)eλni=ε1nXi−µi)(cid:105) ≤eO(λ2n)−λεn. (2) i=1 Settingλ=Θ(ε)appropriatelyin(2)yieldsTheorem1.1. Thereverseisalsotrue—integrating thetailboundofTheorem1.1yieldstheboundonthemomentgeneratingfunction(1). Thus Theorem1.1isequivalenttotheproxybound(1).3 Themomentgeneratingfunctionisparticu- larlyeasytoworkwith,asindependencecanbeexploitedto“factor”themomentgenerating function: E(cid:104)eλ(cid:80)ni=1Xi−µi(cid:105) = (cid:81)n E(cid:104)eλ(Xi−µi)(cid:105). Thus proving (1) reduces to analysing a single i=1 X −µ randomvariable. i i 2Markov’sinequalitystatesthat,foranon-negativerandomvariableXandaconstantx>0,P[X≥x]≤E[X]/x. 3Moreprecisely,foranycenteredrandomvariableX,thecondition∀λ E(cid:104)eλX(cid:105)≤eO(λ2) isequivalentthethe condition∀x≥0 P[|X|≥x]≤e−Ω(x2) (aswellasotherequivalentcharacterizations). Sucharandomvariableis calledsubgaussian[Riv12]. 3 Themomentsm(k):=E[((cid:80)n X −µ )k]areanothercommonproxy. IfX ,···,X areindepen- i=1 i i 1 n dentrandomvariablessupportedon[0,1]andµ =E[X ]foreachi,then i i ∀k∈N E(cid:88)n Xi−µi2k≤O(nk)k. (3) i=1 Theorem1.1followsfrom(3)byapplyingMarkov’sinequalitywithanappropriatechoiceof k=Θ(ε2n). Again,themomentbound(3)canbeobtainedbyintegrating(atwo-sidedversionof) thetailboundinTheorem1.1,somomentboundsareequivalenttotailbounds. Animmediate beneﬁtofusingmomentboundsisthattheindependenceassumptioninTheorem1.1canbe relaxedtolimitedindependence[SSS95,BS15]. KabanetsandImpagliazzo[IK10]useasaproxyE[(cid:81)i∈IXi]whereI ⊆{1,2,···,n}isarandom setofindices. OtherproofsofTheorem1.1tendtoresorttoadirectanalysisofthebinomial distribution. Wereferthereaderto[Mul]foracompendiumofproofs. TheAdvantagesofDiﬀerentProxies TheproxyE[max{0,Y1,···,Ym}]isaverynaturalquantitytobound. Inmanyapplications,tail boundsarecombinedwithaunionboundbecausetherealquantityofinterestisoftheform max{Y1,···,Ym}.4 TheadvantageofourproxyE[max{0,Y1,···,Ym}]isthatitlesssensitivetoheavytailsthan eithermomentsorthemomentgeneratingfunction. Namely,supposethat,withprobabilityδ, therandomvariableY takesonalargevalue∆. ThenE[max{0,Y1,···,Ym}]growslinearlywith ∆→∞asO(mδ∆),whereasE[Y2k]growspolynomiallyasδ∆2k andE[eλY]growsexponentially asO(δeλ∆). NotethatP[Y ≥y]doesnotgrowatallas∆increasesbeyondy. For Theorem 1.1 the tails are thin enough that all three proxies yield the same result. However,intheapplicationofBassilyetal.[BNS+16]toadaptivedataanalysis,thereisaδ tail event as described above. Thus, using the proxy E[max{0,Y1,···,Ym}] yields tighter bounds thanwhatisachievableusingmoments. Theremaybeotherapplicationsforwhichourproxyis moresuitablethaneithermomentsorthemomentgeneratingfunction. Wealsonotethatourproxycanbeboundedintermsofthemomentgeneratingfunction: By Jensen’sinequalityandtheboundmax{1,eλY1,···,eλYm}≤1+eλY1+···+eλYm wecanshowthat ∀λ>0 E(cid:104)max(cid:110)0,Y1,···,Ym(cid:111)(cid:105)≤ 1logE(cid:104)eλ·max{0,Y1,···,Ym}(cid:105)≤ 1log(cid:16)1+m·E(cid:104)eλY(cid:105)(cid:17). (4) λ λ ThusProposition1.3canbederivedfromthemomentgeneratingfunctionbound(1)bysetting (cid:16)(cid:112) (cid:17) λ=Θ ln(m+1)/n appropriately. Likewise,Proposition1.3canbederivedfromthemoment bound(3)via (cid:104) (cid:110) (cid:111)(cid:105) (cid:104) (cid:105)1/2k (cid:16) (cid:104) (cid:105)(cid:17)1/2k ∀k∈N E max |Y1|,···,|Ym| ≤E max{(Y1)2k,···,(Ym)2k} ≤ m·E Y2k . (5) 2 Proof of the Proxy Bound Let X be a random n×m matrix with entries supported on [0,1]. Let Xj denote the random i variable corresponding to the entry in the ith row and jth column of X.5 As a convention we 4Proposition1.3alsoholdsiftherandomvariablesY1,···,Ymarenotindependent,liketheunionbound. 5Wechoosethesubscript-superscriptnotationXji,ratherthanthemorecommondouble-subscriptnotationXi,j, becausetherowsandcolumnsofXplaymarkedlydiﬀerentrolesthatwewanttoemphasize. 4 j will use lower-case letters x and x and x to denote realizations of the matrix, rows of the i i matrix,andentriesofthematrix,respectively,andwewilluseuppercaseletterstodenotethe correspondingrandomvariables. Lemma 2.1 (Main Lemma). If X is a random n×m matrix with entries supported on [0,1] and independentrows,then     ∀η >0 Emj∈[amx](cid:88)n Xij≤eηmj∈[amx]E(cid:88)n Xij+ 2lnη(m). i=1 i=1 Before delving into the proof, some remarks are in order. Lemma 2.1 states that, under certainindependenceandboundednessconditions,wecan“switch”theexpectationandthe maximum of a collection of random sums at the price of a multiplicative eη factor and an additive2ln(m)/η factor,foranychoiceofη >0. Inourapplication,wewillstartwithsomeindependentboundedrandomvariables(X ,...,X ). 1 n j j Eachcolumnj ofXwillanindependentcopyoftheserandomvariables,(X ,...,X ). Inthiscase 1 n thecolumnsofXarealsoindependent. Thelemmathensaysthatthetheexpectedmaximumof thecolumnsumsisnottoomuchlargerthantheexpectationofasinglecolumnsum, which impliesourtailbounds. ProofofLemma2.1. Theproofreliesontheexistenceofarandomizedstableselectionprocedure S :[0,1]n×m→[m]deﬁnedby η     P(cid:104)Sη(x)=j(cid:105)= C1(x)expη2 ·(cid:88)n xij, where Cη(x)=(cid:88)m expη2 ·(cid:88)n xij. η i=1 j=1 i=1 Thestabilityparameterη >0canbechosenarbitrarily. ThekeytotheproofisthatS balances twogoals: • (Stability) the distribution of S (x) is not very sensitive to changing a single row of x η (i.e.theprobabilitymassfunctioncanonlychangebyasmallmultiplicativefactor),and • (Accuracy)inexpectation,thechosencolumnj =S (x)isclosetothelargestcolumnofx η (i.e.S (x)≈argmax (cid:80)n xj). η j∈[m] i=1 i For x ∈ [0,1]n×m, x˜ ∈ [0,1]m, and i ∈ [n], deﬁne (x−i,x˜) ∈ [0,1]n×m to be x with the ith row replacedbyx˜. Thatis, (x−i,x˜)ji(cid:48)(cid:48) =(cid:40) xx˜jij(cid:48)(cid:48)(cid:48) iiffii(cid:48)(cid:48) (cid:44)=ii fori(cid:48) ∈[n]andj(cid:48) ∈[m]. ThenexttwoclaimsestablishthestabilityandaccuracypropertiesofS . Thesetwoclaims η arespecialcasesofwellknownresultsinthediﬀerentialprivacyliterature. Claim2.2(Stability). Foreveryx∈[0,1]n×m,x˜ ∈[0,1]m,i ∈[n],andj ∈[m], (cid:104) (cid:105) (cid:104) (cid:105) (cid:104) (cid:105) e−ηP Sη(x)=j ≤P Sη(x−i,x˜)=j ≤eηP Sη(x)=j . 5 ProofofClaim2.2. Since (cid:12) (cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88)n xji(cid:48) −(cid:88)n (x−i,x˜)ji(cid:48)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)=(cid:12)(cid:12)(cid:12)(cid:12)xji −x˜j(cid:12)(cid:12)(cid:12)(cid:12)≤1, (cid:12)i(cid:48)=1 i(cid:48)=1 (cid:12) foreveryj ∈[m],wehave (cid:18) (cid:19) e−η2 ≤ expex(cid:18)pη2 ·η2(cid:80)·ni(cid:80)(cid:48)=1ni(cid:48)(=x1−xi,ijx(cid:48)˜)ji(cid:48)(cid:19) =expη2 ·(cid:88)i(cid:48)n=1xji(cid:48) −(cid:88)i(cid:48)n=1(x−i,x˜)ji(cid:48)=exp(cid:18)η2 ·(cid:18)xji −x˜j(cid:19)(cid:19)≤eη2. Consequently,     Cη(x)=(cid:88)m expη2 ·(cid:88)n xij(cid:48)≤(cid:88)m eη2 ·expη2 ·(cid:88)n (x−i,x˜)ji(cid:48)=eη2 ·Cη(x−i,x˜) j=1 i(cid:48)=1 j=1 i(cid:48)=1 η and,likewise,Cη(x−i,x˜)≤e2 ·Cη(x). ThedeﬁnitionofSη implies (cid:18) (cid:19) e−η2 ·e−η2 ≤ P(cid:104)S(cid:104)η(x−i,x˜)=(cid:105)j(cid:105) = ex(cid:18)p η2 ·(cid:80)ni(cid:48)=1xij(cid:48) (cid:19) · Cη(x−i,x˜) ≤eη2 ·eη2, P Sη(x)=j exp η2 ·(cid:80)ni(cid:48)=1(x−i,x˜)ji(cid:48) Cη(x) asrequired. Claim2.3(Accuracy). Foreveryx∈[0,1]n×m,   SE(cid:88)n xiSη(x)≥mj∈[amx](cid:88)n xij− 2lnη(m) η i=1 i=1 Observe that as η → ∞, the accuracy becomes arbitrarily good, whereas the stability de- grades. ProofofClaim2.3. Recallthat,byconstruction,S satisﬁes, η     P(cid:104)Sη(x)=j(cid:105)= C1(x)expη2 ·(cid:88)n xij, where Cη(x)=(cid:88)m expη2 ·(cid:88)n xij. η i=1 j=1 i=1 Rearranging,wehave (cid:88)n 2(cid:16) (cid:104) (cid:105)(cid:17) xj = lnC (x)+lnP S (x)=j . i η η η i=1 Thus,   SE(cid:88)n xiSη(x)= (cid:88)m P(cid:104)Sη(x)=j(cid:105)·(cid:88)n xij = (cid:88)m P(cid:104)Sη(x)=j(cid:105)· η2(cid:16)lnCη(x)+lnP(cid:104)Sη(x)=j(cid:105)(cid:17) η i=1 j=1 i=1 j=1   = η21·lnCη(x)+(cid:88)m P(cid:104)Sη(x)=j(cid:105)lnP(cid:104)Sη(x)=j(cid:105) j=1 2(cid:16) (cid:104) (cid:105)(cid:17) = lnC (x)−H S (x) , η η η 6 (cid:104) (cid:105) whereH S (x) istheShannonentropyoftherandomvariableS (x)measuredin“nats,”rather η η thanbits,asweareusingnaturallogarithms,ratherthanbinarylogarithms. Sincetherandom (cid:104) (cid:105) variable S (x) is supported on a set of size m, the entropy satisﬁes H S (x) ≤ ln(m), as the η η entropyfunctionismaximizedbytheuniformdistribution[?,Theorem2.6.4]. Bydeﬁnitionwe have     Cη(x)=(cid:88)m expη2 ·(cid:88)n xij≥mj∈[amx]expη2(cid:88)n xij and lnCη(x)≥mj∈[amx]η2 ·(cid:88)n xij. j=1 i=1 i=1 i=1 Theclaimnowfollowsfromthesetwoinequalities:     SE(cid:88)n xiSη(x)= η2(cid:16)lnCη(x)−H(cid:104)Sη(x)(cid:105)(cid:17)≥ η2mj∈[amx]η2 ·(cid:88)n xij−ln(m). η i=1 i=1 Inordertocompletetheproofofthelemma,wehavethefollowingclaim. Thisclaimuses (cid:20) (cid:21) (cid:20) (cid:21) stability(Claim2.2)toshowthat E (cid:80)n XSη(X) isnotmuchlargerthan E (cid:80)n Xj fora X,S i=1 i X,S i=1 i η η ﬁxedj ∈[m]. (cid:20) (cid:21) Claim2.4. IfE (cid:80)n Xj ≤µforallj ∈[m],then i=1 i   XE,S (cid:88)n XiSη(X)≤eηµ. η i=1 ProofofClaim2.4. LetX˜ beanindependentandidenticalcopyofX. LetX˜ ∈[0,1]m denotethe i ith rowofX˜. Thekeyobservationisthatthepair(X−i,X˜i)andXij isidenticallydistributedtothe pairXandX˜ij —thisiswhereweusetheindependenceassumption. ThusthepairSη(X−i,X˜i) andXj isidenticallydistributedtothepairS (X)andX˜ ,whichmeanswecanswapthembelow. i η i ByClaim2.2,     XE,S (cid:88)n XiSη(X)=EX(cid:88)m (cid:88)n SP(cid:104)Sη(X)=j(cid:105)Xij η i=1 j=1 i=1 η   ≤XE,X˜(cid:88)jm=1(cid:88)i=n1eηSPη(cid:104)Sη(X−i,X˜i)=j(cid:105)Xij   =XE,X˜(cid:88)jm=1(cid:88)i=n1eηSPη(cid:104)Sη(X)=j(cid:105)X˜ij   ≤XE,X˜(cid:88)jm=1eηSPη(cid:104)Sη(X)=j(cid:105)µ =eηµ. 7 (cid:20) (cid:21) CombiningClaim2.3withClaim2.4(settingµ=maxj∈[m]E (cid:80)ni=1Xji ),wehave       EXmj∈[amx](cid:88)n Xij− 2lnη(m)≤XE,S (cid:88)n XiSη(x)≤eηµ=eηmj∈[amx]EX(cid:88)n Xij. i=1 η i=1 i=1 Rearrangingyieldsthelemma. Lemma2.1readilyyieldstheboundclaimedintheintroduction: Proposition2.5(Proposition1.3). LetX ,···,X beindependentrandomvariablessupportedon 1 n [0,1]andµ =E[X ]foreachi. DeﬁneY =(cid:80)n X −µ . Fixm∈NandletY1,···,Ymbeindependent i i i=1 i i copiesofY. Then (cid:104) (cid:110) (cid:111)(cid:105) (cid:112) E max 0,Y1,···,Ym ≤4 n·ln(m+1). (cid:110) (cid:111) Proof. Firstly,ifm≥en−1,thentheresultholdstriviallyasmax 0,Y1,···,Ym ≤nwithcertainty. Sowemayassumem<en−1. Let µ = (cid:80)n µ . For each i ∈ [n], let X1,···,Xm be independent copies of X , so that Yj = i=1 i i i i (cid:80)n Xj−µ forallj ∈[m]. LetXm+1=µ beaconstant“dummyrandomvaraible”foreachi. i=1 i i i i NowweapplyLemma2.1totherandommatrixX∈[0,1]n×(m+1):     ∀η >0 Ej∈m[ma+x1](cid:88)n Xij≤eηj∈m[ma+x1]E(cid:88)n Xij+ 2ln(mη +1). i=1 i=1 (cid:20) (cid:21) By construction, E (cid:80)n Xj =(cid:80)n µ =µ for all j ∈[m+1]. Also (cid:80)n Xj =Yj +µ for all i=1 i i=1 i i=1 i j ∈[m]and(cid:80)n Xm+1=0+µ. Substitutingintheseexpressionsyields i=1 i (cid:104) (cid:110) (cid:111)(cid:105) 2ln(m+1) ∀η >0 E max Y1+µ,Y2+µ,···,Ym+µ,0+µ ≤eηµ+ . η Subtractingµgives (cid:104) (cid:110) (cid:111)(cid:105) 2ln(m+1) ∀η >0 E max Y1,Y2,···,Ym,0 ≤(eη−1)µ+ . η Finally,weusethe(crude)boundµ≤nandtheapproximationeη−1≤2η forη <1toobtain (cid:104) (cid:110) (cid:111)(cid:105) 2ln(m+1) ∀η ∈(0,1) E max Y1,Y2,···,Ym,0 ≤2ηn+ . η (cid:112) (cid:112) Setη = ln(m+1)/n< ln(en)/n=1tocompletetheproof. NowwecanprovethesubgaussiantailboundusingtheproxyboundofProposition2.5. Theorem2.6(Theorem1.1). IfX ,···,X areindependentrandomvariablessupportedon[0,1]and 1 n µ =E[X ]foreveryi,then i i   ∀ε≥0 P(cid:88)n Xi−µi ≥εn≤e1−ε2n/64. i=1 8 Needlesstosay,wehavenotoptimizedtheconstantsinTheorem2.6. Proof. Fixm∈Ntobedeterminedlater. LetY =(cid:80)n X −µandletY1,···,Ym beindependent i=1 i copiesofY. ByProposition2.5,wehave (cid:104) (cid:105) (cid:112) E max{0,Y1,···,Ym} ≤4 nln(m+1). (cid:104) (cid:104) (cid:105)(cid:105) (cid:106) (cid:107) By Lemma 1.2, P Y ≥2E max{0,Y1,···,Ym} ≤ ln(2)/m. Now, set m = eε2n/64−1 , so that (cid:112) 8 nln(m+1)≤εn. Then (cid:104) (cid:112) (cid:105) (cid:104) (cid:104) (cid:105)(cid:105) ln(2) ln(2) P[Y ≥εn]≤P Y ≥8 nln(m+1) ≤P Y ≥2E max{0,Y1,···,Ym} ≤ ≤ . m eε2n/64−2 Thus (cid:40) (cid:41) P[Y ≥εn]≤min 1, ln(2) ≤(2+ln2)·e−ε2n/64≤e1−ε2n/64. eε2n/64−2 3 Extensions Todemonstratetheversatilityofourtechniques,weshowhowtheycanbeusedtoprovesome well-knowngeneralizationsofTheorem1.1. 3.1 MultiplicativeTailBound Theorem3.1. IfX ,...,X areindependentrandomvariablessupportedon[0,1]andµ=E(cid:104)(cid:80)n X (cid:105), 1 n i=1 i then   ∀ε≥0 P(cid:88)n Xi ≥(1+ε)µ≤e·(cid:18)1+ 4ε(cid:19)−εµ/8. i=1 Moreover,theaboveboundimplies   ∀ε∈[0,10] P(cid:88)n Xi ≥(1+ε)µ≤e1−ε2µ/64. i=1 Proof. Let δ =P(cid:104)(cid:80)n X ≥(1+ε)µ(cid:105) and m=(cid:100)ln(2)/δ(cid:101). For each i ∈[n], let X1,···,Xm be inde- i=1 i i i pendent copies of X and let Xm+1 = E[X] be a constant. Now we apply Lemma 2.1 to the i i randommatrixX∈[0,1]n×(m+1) toobtain   Ej∈m[ma+x1](cid:88)n Xij≤eηµ+ 2ln(mη +1). (6) i=1 Ontheotherhand,     Ej∈m[ma+x1](cid:88)n Xij≥µ+εµ·Pmj∈[amx](cid:88)n Xij ≥(1+ε)µ=µ+εµ(1−(1−δ)m)≥µ+εµ(1−e−δm)≥µ(cid:18)1+2ε(cid:19). i=1 i=1 9 Combiningthiswith(6)andsubtractingµfrombothsidesgives ε 2ln(m+1) ∀η >0 µ≤(eη−1)µ+ . 2 η (cid:16) (cid:17) Nowwesetη =ln(1+ε/4)andrearrangetogetln(m+1)≥ εµln 1+ ε .Sincem≤ln(2)/δ+1, 8 4 wehave P(cid:88)n Xi ≥(1+ε)µ=δ≤min(cid:40)1,mln−21(cid:41)≤ 2m++ln12 ≤e(cid:18)1+ 4ε(cid:19)−εµ/8. i=1 Thisgivestheﬁrsthalfofthetheorem. Thesecondhalffollowsfromthefactthat,if0≤ε≤10, then1+ε/4≥exp(ε/8). WecanalsoboundP(cid:104)(cid:80)n X ≤(1−ε)µ(cid:105). ThisrequireschangingLemma2.1tothefollowing. i=1 i Lemma3.2. IfXisarandomn×mmatrixwithentriessupportedon[0,1]andindependentrows, then     ∀η >0 Ejm∈[imn](cid:88)n Xij≥e−ηjm∈[imn]E(cid:88)n Xij− 2lnη(m). i=1 i=1 TheproofofLemma3.2isalmostidenticaltothatofLemma2.1,exceptthemaximumis replacedwiththeminimum,η isreplacedwith−η,andsomeinequalitiesarereversed. Lemma3.2yieldsthefollowingbound. Theorem3.3. IfX ,...,X areindependentrandomvariablessupportedon[0,1]andµ=E(cid:104)(cid:80)n X (cid:105), 1 n i=1 i then   ∀ε∈[0,1] P(cid:88)n Xi ≤(1−ε)µ≤e1−ε2µ/32. i=1 3.2 McDiarmid’sInequality Subgaussian tail bounds are not speciﬁc to summation. For independent random variables X ,···,X , we can prove subgaussian tail bounds on f(X ,···,X ) for any “low-sensitivity” 1 n 1 n functionf. Thesum—thatis,f(x ,···,x )=(cid:80)n x —isonlyoneexampleofalow-sensitivity 1 n i=1 i function. Themoregeneralpropertythatwerequireisbelow. Deﬁnition3.4. Afunctionf :Xn→Rissensitivity-∆if,foralli ∈[n]andallx ,x ,···,x ,x(cid:48) ∈X, 1 2 n i wehave (cid:12) (cid:12) (cid:12)(cid:12)f(x1,x2,···,xn)−f(x1,x2,···,xi−1,xi(cid:48),xi+1,···,xn)(cid:12)(cid:12)≤∆. WecangeneralizeProposition1.3tolow-sensitivityfunctionsasfollows. Lemma3.5. LetXbearandomn×mmatrixwiththeentriessupportedonX (notnecessarilybeing realnumbers). AssumethattherowsofXareindependent. Letf :Xn→Rbesensitivity-∆. Then (cid:34) (cid:18) (cid:20) (cid:21)(cid:19)(cid:35) √ E max f(Xj,···,Xj)−E f(Xj,···,Xj) ≤8∆ nlnm. j∈[m] 1 n 1 n TheproofofLemma3.5issimilartothatofTheorem7.2ofBassilyetal.[BNS+16]. Lemma 3.5allowsustogeneralizeTheorem1.1toMcDiarmid’sinequality: Theorem3.6([McD89]). LetX ,···,X beindependentrandomvariablesandletf :Xn→Rhave 1 n sensitivity-∆. Then,forallε≥0, P[f(X ,···,X )−E[f(X ,···,X )]≥εn∆]≤e−Ω(ε2n). 1 n 1 n 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.