Stochastic ADMM for Nonsmooth Optimization HuaOuyang NiaoHe ComputationalScienceandEngineering IndustrialandSystemsEngineering GeorgiaInstituteofTechnology GeorgiaInstituteofTechnology [email protected] [email protected] 3 1 0 AlexanderGray 2 ComputationalScienceandEngineering n GeorgiaInstituteofTechnology a [email protected] J 2 2 Abstract ] G L Wepresentastochasticsettingforoptimizationproblemswithnonsmoothconvexseparableobjec- tive functions over linear equality constraints. To solve such problems, we propose a stochastic . s AlternatingDirectionMethodofMultipliers(ADMM)algorithm. Ouralgorithmappliestoamore c generalclassofnonsmoothconvexfunctionsthatdoesnotnecessarilyhaveaclosed-formsolution [ byminimizingtheaugmentedfunctiondirectly. We alsodemonstratethe ratesofconvergencefor 2 ouralgorithmundervariousstructuralassumptionsofthestochasticfunctions:O(1/√t)forconvex v functionsandO(logt/t)forstronglyconvexfunctions. Comparedtopreviousliterature,weestab- 2 lishtheconvergencerateofADMMalgorithm,fortheﬁrsttime,intermsofboththeobjectivevalue 3 andthefeasibilityviolation. 6 0 . 1 1 2 1 Introduction 1 : v i The Alternating Direction Method of Multipliers (ADMM) [1, 2] is a very simple computational method for con- X strainedoptimizationproposedin1970s.ThetheoreticalaspectsofADMMhavebeenstudiedfrom1980sto90sand r itsglobalconvergencewasestablishedintheliterature[3,4,5]. Asreviewedinthecomprehensivepaper[6],withits a capacityofdealingwith objectivefunctionsseparatelyandsynchronously,thismethodturnedouttobe a naturalﬁt in theﬁeld oflarge-scaledata-distributedmachinelearningandbig-datarelatedoptimizationandthereforereceived signiﬁcantamountofattentioninthelastfewyears. Intensivetheoreticalandpracticaladvancesareconductedthere- after. Onthetheoreticalhand,ADMMisrecentlyshowntohavearateofconvergenceofO(1/N)[7,8,9,10],where N standsforthe numberof iterations. On the practicalhand, ADMM has beenappliedto a wide range of applica- tiondomains,suchascompressedsensing[11],imagerestoration[12],videoprocessingandmatrixcompletion[13]. Besidesthat,manyvariationsofthisclassicalmethodhavebeenrecentlydeveloped,suchaslinearized[13,14, 15], accelerated[13],andonline[10]ADMM.However,mostofthesevariantsincludingtheclassiconeimplicitlyassume fullaccessibiltytotruedatavalues,whileinrealityonecanhardlyignoretheexistenceofnoise. Amorenaturalway ofhandlingthisissueistoconsiderunbiasedorevenbiasedobservationsoftruedata,whichleadsustothestochastic setting. 1Ashortversionappearsinthe5thNIPSWorkshoponOptimizationforMachineLearning,LakeTahoe,Nevada,USA,2012. 1 1.1 StochasticSettingforADMM Inthiswork,westudyafamilyofconvexoptimizationproblemsinwhichourobjectivefunctionsareseparableand stochastic.Inparticular,weareinterestedinsolvingthefollowinglinearequality-constrainedstochasticoptimization: min E θ (x,ξ)+θ (y) s.t. Ax+By=b, (1) ξ 1 2 x ,y ∈X ∈Y wherex Rd1,y Rd2,A Rm×d1,B Rm×d2,b Rm, isaconvexcompactset,and isaclosedconvex ∈ ∈ ∈ ∈ ∈ X Y set. Weareabletodrawasequenceofidenticalandindependent(i.i.d.) observationsfromtherandomvector ξ that obeysa ﬁxed butunknowndistribution P. One can see thatwhen ξ is deterministic, we can recoverthe traditional problemsetting forADMM [6]. Denotethe expectationfunctionθ (x) E θ (x,ξ). In ourmostgeneralsetting, 1 ξ 1 ≡ real-valuedfunctionsθ ()andθ ()areconvexbutnotnecessarilycontinuouslydifferentiable. 1 2 · · Note that our stochastic setting of the problem is quite different from that of the Online ADMM proposed in [10]. In Online ADMM, one does not assume ξ to be i.i.d., nor the objective to be stochastic, but in- stead, a deterministic concept referred as regret is concerned: R x t [θ (x ,ξ )+θ (y )] [1:t] ≡ k=1 1 k k 2 k − inf t [θ (x,ξ )+θ (y)]. Ax+By=b k=1 1 k 2 (cid:0) (cid:1) P P 1.2 OurContributions In this work, we propose a stochastic setting of the ADMM problem and design the Stochastic ADMM algorithm. A key algorithmic feature of our Stochastic ADMM that distinguishes it from previousADMM and variants is the ﬁrst-orderapproximationof θ thatweusetomodifytheaugmentedLagrangian. Thissimplemodiﬁcationnotonly 1 guarantees the convergenceof our stochastic method, but also beneﬁts to a more general class of convex objective functionswhichmightnothaveaclosed-formsolutionin minimizingtheaugmentedθ directly. Forexample,with 1 stochasticADMM,we canderiveclose-formupdatesforthenonsmoothhingelossfunction(usedin supportvector machines). However, with deterministic ADMM, one has to call SVM solvers during each iteration [6], which is indeedverytime-consuming.Oneofourmaincontributionsisthatwedeveloptheconvergenceratesofouralgorithm undervariousstructuralassumptions. Forconvex θ (),therateisprovedtobeO(1/√t);forstronglyconvexθ (), 1 1 · · therateisprovedtobeO(logt/t).Tothebestofourknowledge,thisistheﬁrsttimethatconvergenceratesofADMM areestablishedforboththeobjectivevalueandthefeasibilityviolation.Bycontrast,recentresearch[8,10]onlyshows theconvergenceofADMMindirectlyintermsofthesatisfactionofvariationalinequalities. 1.3 Notations Throughoutthispaper,we denotethesubgradientsof θ andθ asθ andθ . Whentheyare differentiable,we will 1 2 1′ 2′ use θ and θ todenotethegradients.Weusethenotationθ bothfortheinstancefunctionvalueθ (x,ξ)andfor 1 2 1 1 ∇ ∇ itsexpectationθ (x). Wedenotebyθ(u) θ (x)+θ (y)thesumofthestochasticandthedeterministicfunctions. 1 1 2 ≡ Forsimplicityandclarity,wewillusethefollowingnotationstodenotestackedvectorsortuples: x x x k X u , w y , w y , , ≡(cid:18) y (cid:19) ≡ λ ! k ≡ λkk ! W ≡ RYm ! (2) u¯ k1 ik=−01xi , w¯ k11 kik=1xyi , F(w) −BATTλλ . k ≡ k1Pki=1yi ! k ≡ 1k Pki=1λi  ≡ Ax−+By b   k Pi=1 i  − P     ForapositivesemideﬁnitematrixG Rd1×d1,wedeﬁnPetheG-normofavectorxas x G := G1/2x 2 =√xTGx. ∈ k k k k Weuse , todenotetheinnerproductinaﬁnitedimensionalEuclideanspace.Whenthereisnoambiguity,weoften h· ·i use to denote the Euclidean norm . We assume that the optimal solution of (1) exists and denote it as 2 k ·k k ·k u xT,yT T. Thefollowingquantityappearfrequentlyinourconvergenceanalysis: ∗ ≡ ∗ ∗ (cid:0) (cid:1) δ θ (x ,ξ ) θ (x ), Dk ≡ 1′ suk−p1 kx − x1′ ,k−D1 B(y y ) . (3) X ≡xa,xb∈Xk a− bk y∗,B ≡k 0− ∗ k 2 1.4 Assumptions Beforepresentingthealgorithmandconvergenceresults,welisttheassumptionsthatwillbeusedinourstatements. Assumption1. Forallx ,E θ (x,ξ) 2 M2. ∈X k 1′ k ≤ h i Assumption2. Forallx ,E exp θ (x,ξ) 2/M2 exp 1 . ∈X k 1′ k ≤ { } Assumption3. Forallx ,Eh θ (nx,ξ) θ (x) 2 oiσ2. ∈X k 1′ − 1′ k ≤ (cid:2) (cid:3) 2 Stochastic ADMM Algorithm Directly solving problem (1) can be nontrivialeven if ξ is deterministic and the equality constraint is as simple as x y=0. Forexample,usingtheaugmentedLagrangianmethod,onehastominimizetheaugmentedLagrangian: − β min (x,y,λ) min θ (x)+θ (y) λ,Ax+By b + Ax+By b 2, β 1 2 x ,y L ≡x ,y −h − i 2k − k ∈X ∈Y ∈X ∈Y whereβ isapre-deﬁnedpenaltyparameter. Thisproblemisatleastnoteasierthansolvingtheoriginalone. The(de- terministic)ADMM(Alg.1)solvesthisprobleminaGauss-Seidelmanner:minimizing w.r.t. xandyalternatively β L giventheotherﬁxed,followedbyapenaltyupdateovertheLagrangianmultiplierλ. Algorithm1DeterministicADMM 0. Initializey andλ =0. 0 0 fork =0,1,2,...do 2 1. xk+1 ←argminx∈X θ1(x)+ β2 (Ax+Byk−b)− λβk . (cid:26) (cid:13) (cid:13) (cid:27)2 2. yk+1 ←argminy∈Y θ2(y)+ β2 (cid:13)(cid:13)(Axk+1+By−b)− λ(cid:13)(cid:13)βk . 3. λ λ β(Ax(cid:26) +By (cid:13) b). (cid:13) (cid:27) k+1 k k+1 k+1(cid:13) (cid:13) endfor ← − (cid:13)− (cid:13) AvariantdeterministicalgorithmnamedlinearizedADMMreplacesLine1ofAlg.1by β 1 x argmin θ (x)+ (Ax+By b) λ /β 2+ x x 2 , (4) k+1 ← x 1 2 k k− − k k 2k − kkG ∈X(cid:26) (cid:27) whereG Rd1×d1 ispositivesemideﬁnite. ThisvariantcanberegardedasageneralizationoftheoriginalADMM. WhenG=∈ 0,itisthesameasAlg.1. WhenG=rI βATA,itisequivalenttothefollowinglinearizedproximal d1 − pointmethod: r x argmin θ (x)+β(x x )T AT(Ax +By b λ /β) + x x 2 . k+1 1 k k k k k ← x − − − 2k − k ∈Xn (cid:2) (cid:3) o Notethatthelinearizationisappliedonlytothequadraticfunction (Ax+By b) λ /β 2,butnottoθ . This k k 1 k − − k approximation helps when Line 1 of Alg.1 does not produce a closed-form solution given the quadratic term. For example,letθ (x)= x andAnotidentity. 1 1 k k As shownin Alg.2, we proposea StochasticAlternatingDirection Methodof Multipliers(StochasticADMM) algo- rithm. OuralgorithmsharessomefeatureswiththeclassicalandthelinearizedADMM.OnecanseethatLine2and 3 are essentially the same as before. However, there are two major differences in Line 1. First, we replace θ (x) 1 with aﬁrst-orderapproximationof θ (x,ξ )atx : θ (x )+xTθ (x ,ξ ). Thisapproximationhasthesame 1 k+1 k 1 k 1′ k k+1 ﬂavourofthestochasticmirrordescent[16]usedforsolvingaone-variablestochasticconvexproblem.Oneimportant beneﬁtofusingthisapproximationisthatouralgorithmcanbeappliedtononsmoothobjectivefunctions,beyondthe smoothandseparableleastsquareslossusedinlasso.Second,similartothelinearizedADMM(4),weaddanl -norm 2 prox-function x x 2 butscaleitbyatime-varyingstepsizeη . AswewillseeinSection3,thechoiceofthis k k+1 k − k stepsizeiscrucialinguaranteeingaconvergence. 3 Algorithm2StochasticADMM 0. Initializex ,y andλ =0. 0 0 0 fork =0,1,2,...do 1. xk+1 ←argminx∈X hθ1′(xk,ξk+1),xi+ β2 (Ax+Byk−b)− λβk 2+ kx2−ηkx+k1k2 . (cid:26) (cid:13) 2 (cid:13) (cid:27) 2. yk+1 ←argminy∈Y θ2(y)+ β2 (Axk+1+(cid:13)(cid:13)By−b)− λβk . (cid:13)(cid:13) 3. λ λ β(Ax(cid:26) +By (cid:13) b). (cid:13) (cid:27) k+1 k k+1 k+1(cid:13) (cid:13) endfor ← − (cid:13)− (cid:13) 3 MainResults ofConvergence Rates In this section, we will show that ourStochastic ADMM givenin Alg.2 exhibitsa rate O(1/√t) of convergencein termsofboththeobjectivevalueandthefeasibilityviolation:E[θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b ]=O(1/√t). t t t 2 Weextendthemainresultifmorestructuralinformationofθ isavailab−le. ∗ k − k 1 Before we address the main theorem on convergencerates, we ﬁrst present an upper bound of the variation of the Lagrangianfunctionanditsﬁrstorderapproximationbased oneachiterationpoints. Lemma1. w ,k 1,wehave ∀ ∈W ≥ η θ (x ,ξ ) 2 θ (x )+θ (y ) θ(u)+(w w)TF(w ) k+1k 1′ k k+1 k 1 k 2 k+1 k+1 k+1 − − ≤ 2 1 β + x x 2 x x 2 + Ax+By b 2 Ax+By b 2 (5) k k+1 k k+1 2η k − k −k − k 2 k − k −k − k k+1 (cid:0) 1 (cid:1) (cid:0) (cid:1) + δ ,x x + λ λ 2 λ λ 2 . h k+1 − ki 2β k − kk2−k − k+1k2 (cid:0) (cid:1) Utilizing this lemma we are able to obtain our main result shown as below. We present our main theorem of the convergenceintwofashions,bothintermsofexpectationandprobabilitysatisfaction. Theorem1. Letη = DX , k 1andρ>0. k M√2k ∀ ≥ (i) UnderAssumption1,wehave t 1, ∀ ≥ √2D M βD2 +ρ2/β E[θ(u¯t) θ(u )+ρ Ax¯t+By¯t b ] M1(t)+M2(t) X + y∗,B , (6) − ∗ k − k ≤ ≡ √t 2t (ii) UnderAssumption1and2,wehaveforanyΩ>0, 1 Prob θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b > 1+ Ω+2√2Ω M (t)+M (t) 2exp Ω , (7) t t t 1 2 − ∗ k − k 2 ≤ {− } (cid:26) (cid:18) (cid:19) (cid:27) Remark1. Adaptingourprooftechniquestothedeterministiccasewherenonoisetakesplace,weareabletoobtain asimilarresultfordeterministicADMM: βD2 ρ2 ρ>0,t 1, θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b y∗,B + , (8) t t t 2 ∀ ≥ − ∗ k − k ≤ 2t 2βt While resulting in a O(1/t) convergencerate same as the existingliterature[8, 9, 10], the aboveﬁndingis actually a signiﬁcant advancein the theoretical aspects of ADMM. For the ﬁrst time, the convergenceof ADMM is proved in terms of objective value and feasibility violation. By contrast, the existing literature [8, 9, 10] only shows the convergenceof ADMMin termsofthesatisfaction ofvariationalinequalities,whichis notadirectmeasureofhow fastanalgorithmreachestheoptimalsolution. 4 3.1 Extension: StronglyConvexθ 1 Whenfunctionθ ()isstronglyconvex,theconvergencerateofStochasticADMMcanbeimprovedtoO logt . 1 · t Theorem 2. When θ is µ-strongly convex with respect to , taking η = 1 in Alg.2, under Ass(cid:16)umpti(cid:17)on 1, 1 k · k k kµ ρ>0,t 1wehaveE[θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b ] M2logt + µDX2 + βDy2∗,B + ρ2 . ∀ ≥ t − ∗ k t t− k2 ≤ µt 2t 2t 2βt 3.2 Extension: LipschitzSmoothθ 1 SincetheboundsgiveninTheorem1arerelatedtothemagnitudeofsubgradients,theydonotprovideanyintuition ofthe performancein low-noisescenarios. With a Lipschitz smoothfunctionθ , we are ableto obtainconvergence 1 ratesintermsofthevariationsofgradients,asstatedinAssumption3. Besides,underthisassumptionweareableto replacetheunusualdeﬁnitionofu¯ in(2)with k 1 k x u¯ k i=1 i . (9) k ≡ k1Pki=1yi! Theorem3. Whenθ ()isL-LipschitzsmoothwithrespecPtto ,takingη = 1 inAlg.2,underAssump- 1 · k·k k L+σ√2k/DX tion3, ρ>0,t 1wehaveE[θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b ] √2DXσ + LDX2 + βDy2∗,B + ρ2 . ∀ ≥ t − ∗ k t t− k2 ≤ √t 2t 2t 2βt 4 Summary andFuture Work Inthispaper,wehaveproposedthestochasticsettingforADMMalongwithourstochasticADMMalgorithm.Based onaﬁrst-orderapproximationofthestochasticfunction,ouralgorithmisapplicabletoaverybroadclassofproblems evenwithfunctionsthathavenoclosed-formsolutiontothe subproblemofminimizingtheaugmentedθ . We have 1 also established convergence rates under various structural assumptions of θ : O(1/√t) for convex functions and 1 O(logt/t)forstronglyconvexfunctions.WeareworkingonintegratingNesterov’soptimalﬁrst-ordermethods[17]to ouralgorithm,whichwillhelpinachievingoptimalconvergencerates. Moreinterestingandchallengingapplications willbecarriedoutinourfuturework. 5 Appendix 5.1 3-PointsRelation BeforeprovingLemma1,wewillstartwiththefollowingsimplelemma,whichisaveryusefulresultbyimplementing Bregmandivergenceasaprox-functioninproximalmethods. Lemma2. Letl(x): Rbeaconvexdifferentiablefunctionwithgradientg. Letscalars 0. Foranyvectoru X → ≥ andv,denotetheirBregmandivergenceasD(u,v) ω(u) ω(v) ω(v),u v . If u , ≡ − −h∇ − i ∀ ∈X x∗ argminl(x)+sD(x,u), (10) ≡ x ∈X then g(x ),x x s[D(x,u) D(x,x ) D(x ,u)]. ∗ ∗ ∗ ∗ h − i≤ − − Proof. Invokingtheoptimalityconditionfor(10),wehave g(x )+s D(x ,u),x x 0, x , ∗ ∗ ∗ h ∇ − i≥ ∀ ∈X whichisequivalentto g(x ),x x s D(x ,u),x x ∗ ∗ ∗ ∗ h − i≤ h∇ − i =s ω(x ) ω(u),x x ∗ ∗ h∇ −∇ − i =s[D(x,u) D(x,x ) D(x ,u)]. ∗ ∗ − − 5 5.2 ProofofLemma1 Proof. Duetotheconvexityofθ andusingthedeﬁnitionofδ ,wehave 1 k θ (x ) θ (x) θ (x ),x x = θ (x ,ξ ),x x + δ ,x x + θ (x ,ξ ),x x . 1 k − 1 ≤h 1′ k k− i h 1′ k k+1 k+1− i h k+1 − ki h 1′ k k+1 k− k+1i (11) ApplyingLemma2toLine1ofAlg.2andtakingD(u,v)= 1 v u 2,wehave 2k − k θ (x ,ξ )+AT [β(Ax +By b) λ ],x x 1′ k k+1 k+1 k− − k k+1− 1 (12) (cid:10) x x 2 x x 2 x x 2 (cid:11) k k+1 k k+1 ≤ 2η k − k −k − k −k − k k+1 (cid:0) (cid:1) Combining(11)and(12)wehave θ (x ) θ (x)+ x x, ATλ 1 k 1 k+1 k+1 − − − (11) (cid:10) (cid:11) θ (x ,ξ ),x x + δ ,x x + θ (x ,ξ ),x x + ≤ h 1′ k k+1 k+1− i h k+1 − ki h 1′ k k+1 k− k+1i x x,AT [β(Ax +By b) λ ] k+1 k+1 k+1 k − − − =(cid:10)θ1′(xk,ξk+1)+AT [β(Axk+1+Byk−b)−λ(cid:11)k],xk+1−x + (13) h(cid:10)δk+1,x−xki+ x−xk+1,βATB(yk−yk+1) +hθ1′(xk,ξ(cid:11)k+1),xk−xk+1i (12) 1 (cid:10) (cid:11) x x 2 x x 2 x x 2 + δ ,x x + k k+1 k+1 k k+1 k ≤ 2η k − k −k − k −k − k h − i k+1 x x (cid:0),βATB(y y ) + θ (x ,ξ ),x (cid:1)x − k+1 k− k+1 h 1′ k k+1 k− k+1i Wehandlethelasttwotermsseparately: (cid:10) (cid:11) x x ,βATB(y y ) =β Ax Ax ,By By k+1 k k+1 k+1 k k+1 − − h − − i β (cid:10)= Ax+By b 2 A(cid:11)x+By b 2 + Ax +By b 2 Ax +By b 2 k k+1 k+1 k+1 k+1 k 2 k − k −k − k k − k −k − k β (cid:2)(cid:0) (cid:1) (cid:0)1 (cid:1)(cid:3) Ax+By b 2 Ax+By b 2 + λ λ 2 k k+1 k+1 k ≤ 2 k − k −k − k 2βk − k (cid:0) (cid:1) (14) and η θ (x ,ξ ) 2 x x 2 hθ1′(xk,ξk+1),xk−xk+1i≤ k+1k 1′ 2k k+1 k + k k2−η k+1k , (15) k+1 wherethelaststepisduetoYoung’sinequality.Inserting(14)and(15)into(13),wehave θ (x ) θ (x)+ x x, ATλ 1 k 1 k+1 k+1 − − − 1 η θ (x ,ξ ) 2 x x(cid:10) 2 x x 2 +(cid:11) k+1k 1′ k k+1 k + δ ,x x ≤ 2η k k− k −k k+1− k 2 h k+1 − ki (16) k+1 β (cid:0) (cid:1) 1 + Ax+By b 2 Ax+By b 2 + λ λ 2, k k+1 k+1 k 2 k − k −k − k 2βk − k (cid:0) (cid:1) DuetotheoptimalityconditionofLine2inAlg.2andtheconvexityofθ ,wehave 2 θ (y ) θ (y)+ y y, BTλ 0. (17) 2 k+1 2 k+1 k+1 − − − ≤ UsingLine3inAlg.2,wehave (cid:10) (cid:11) λ λ,Ax +By b k+1 k+1 k+1 h − − i 1 = λ λ,λ λ β h k+1− k− k+1i (18) 1 = λ λ 2 λ λ 2 λ λ 2 k k+1 k+1 k 2β k − k −k − k −k − k (cid:0) (cid:1) Takingthesummationofinequalities(16)(17)and(18),weobtaintheresultasdesired. 6 5.3 ProofofTheorem1 Proof. (i).Invokingconvexityofθ ()andθ ()andthemonotonicityofoperatorF(),wehave w Ω: 1 2 · · · ∀ ∈ t 1 θ(u¯ ) θ(u)+(w¯ w)TF(w¯ ) θ (x )+θ (y ) θ(u)+(w w)TF(w ) t t t 1 k 1 2 k k k − − ≤ t − − − k=1 X(cid:2) (cid:3) (19) t 1 1 − = θ (x )+θ (y ) θ(u)+(w w)TF(w ) 1 k 2 k+1 k+1 k+1 t − − k=0 X(cid:2) (cid:3) ApplyingLemma1attheoptimalsolution(x,y)=(x ,y ),wecanderivefrom(19)that, λ ∗ ∗ ∀ θ(u¯ ) θ(u )+(x¯ x )T( ATλ¯ )+(y¯ y )T( BTλ¯ )+(λ¯ λ)T(Ax¯ +By¯ b) t t t t t t t t − ∗ − ∗ − − ∗ − − − (5) 1 t−1 ηk+1kθ1′(xk,ξk+1)k2 + 1 x x 2 x x 2 + δ ,x x k k+1 k+1 k ≤ t k=0(cid:20) 2 2ηk+1 k − ∗k −k − ∗k h ∗− i(cid:21) X (cid:0) (cid:1) 1 β 1 (20) + Ax +By b 2+ λ λ 2 0 0 t 2k ∗ − k 2βk − k (cid:18) (cid:19) ≤ 1t kt−=10(cid:20)ηk+1kθ1′(x2k,ξk+1)k2 +hδk+1,x∗−xki(cid:21)+ 1t (cid:18)D2ηX2t + β2Dy2∗,B+ 21βkλ−λ0k22(cid:19) X Theaboveinequalityistrueforallλ Rm,henceitalsoholdsintheball = λ : λ ρ . Combingwiththe 0 2 ∈ B { k k ≤ } factthattheoptimalsolutionmustalsobefeasible,itfollowsthat max θ(u¯ ) θ(u )+(x¯ x )T( ATλ¯ )+(y¯ y )T( BTλ¯ )+(λ¯ λ)T(Ax¯ +By¯ b) t t t t t t t t λ∈B0 − ∗ − ∗ − − ∗ − − − = max(cid:8)θ(u¯ ) θ(u )+λ¯T(Ax +By b) λT(Ax¯ +By¯ b) (cid:9) λ∈B0 t − ∗ t ∗ ∗− − t t− (21) = max(cid:8)θ(u¯ ) θ(u ) λT(Ax¯ +By¯ b) (cid:9) t t t λ∈B0 − ∗ − − =θ(u¯ )(cid:8) θ(u )+ρ Ax¯ +By¯ b (cid:9) t t t 2 − ∗ k − k Takinganexpectationover(21)andusing(20)wehave: E[θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b ] t t t 2 − ∗ k − k ≤E"1t kt−=10(cid:18)ηk+1kθ1′(x2k,ξk+1)k2 +hδk+1,x∗−xki(cid:19)+ 1t (cid:18)D2ηX2t + β2Dy2∗,B(cid:19)# X 1 +E max λ λ 2 (cid:20)λ∈B0(cid:26)2βtk − 0k2(cid:27)(cid:21) 1 M2 t η + D2 + βDy2∗,B + ρ2 + 1 t−1E[ δ ,x x ] k X k+1 k ≤ t 2 2ηt! 2t 2βt t h ∗− i k=1 k=0 X X 1 M2 t D2 βD2 ρ2 = η + + y∗,B + k X t 2 2η 2t 2βt t! k=1 X √2D M βD2 ρ2 X + y∗,B + ≤ √t 2t 2βt In the second last step, we use the fact that x is independent of ξ , hence E δ ,x x = k k+1 ξk+1|ξ[1:k]h k+1 ∗− ki E δ ,x x =0. ξk+1|ξ[1:k] k+1 ∗− k D E 7 (ii)Fromthestepsintheproofofpart(i),itfollowsthat, θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b t t t − ∗ k − k ≤ 1t kt−=10ηk+1 kθ1′(x2k,ξk+1)k2 + 1t kt−=10hδk+1,x∗−xki+ 1t (cid:18)D2ηX2t + β2Dy2∗,B + 2ρβ2(cid:19) (22) X X A +B +C t t t ≡ NotethatrandomvariablesA andB aredependentonξ . t t [t] Claim1. ForΩ >0, 1 M2 t Prob A (1+Ω ) η exp Ω . (23) t 1 k 1 ≥ 2t !≤ {− } k=1 X Letα ηk k = 1,...,t,then0 α 1and t α = 1. Usingthefactthat δ , k areindependent k ≡ Ptk=1ηk ∀ ≤ k ≤ k=1 k { k ∀ } andapplyingAssumption2,onehas P t t E exp α θ (x ,ξ ) 2/M2 = E exp α θ (x ,ξ ) 2/M2 " ( kk 1′ k k+1 k )# kk 1′ k k+1 k k=1 k=1 X Y (cid:2) (cid:8) (cid:9)(cid:3) t E exp θ (x ,ξ ) 2/M2 αk (Jensen’sInequality) ≤ k 1′ k k+1 k kY=1(cid:16) (cid:2) (cid:8) (cid:9)(cid:3)(cid:17) t t (exp 1 )αk =exp α =exp 1 k ≤ { } ( ) { } k=1 k=1 Y X Hence,byMarkov’sInequality,wecanget M2 t t Prob A (1+Ω ) η exp (1+Ω ) E exp α θ (x ,ξ ) 2/M2 exp Ω . t ≥ 1 2t k!≤ {− 1 } " ( kk 1′ k k+1 k )#≤ {− 1} k=1 k=1 X X WehavethereforeprovedClaim1. Claim2. ForΩ >0, 2 D M Ω2 Prob Bt >2Ω2 X exp 2 . (24) √t ≤ − 4 (cid:18) (cid:19) (cid:26) (cid:27) Inordertoprovethisclaim,weadoptthefollowingfactsinNemirovski’spaper[16]. Lemma 3. Given that for all k = 1,...,t, ζ is a deterministic function of ξ with E ζ ξ = 0 and k [k] k [k 1] E exp ζ2/σ2 ξ exp 1 ,wehave | − { k k}| [k−1] ≤ { } (cid:2) (cid:3) (cid:2) (cid:3) (a) Forγ 0,E exp γζ ξ exp γ2σ2 , k=1,...,t ≥ { k}| [k−1] ≤ { k} ∀ (cid:2) (cid:3) (b) LetS = t ζ ,thenProb S >Ω t σ2 exp Ω2 . t k=1 k { t k=1 k}≤ − 4 q n o P P Using this result by setting ζ = δ ,x x , S = t ζ , and σ = 2D M, k, we can verify that E ζ ξ =0and k h k ∗− k−1i t k=1 k k X ∀ k| [k−1] P (cid:2) (cid:3) E exp ζ2/σ2 ξ E exp D2 δ 2/σ2 ξ exp 1 , { k k}| [k−1] ≤ { Xk kk k}| [k−1] ≤ { } since ζ 2 x x (cid:2)2 δ 2 D2 2 θ(cid:3) (x ,(cid:2)ξ ) 2+2M2 . (cid:3) | k| ≤k ∗− k−1k k kk ≤ X k 1′ k k+1 k (cid:0) (cid:1) 8 Implementingtheaboveresults,itfollowsthat Ω2 Prob S >2Ω D M√t exp 2 . t 2 X ≤ − 4 (cid:16) (cid:17) (cid:26) (cid:27) SinceS =tB ,wehave t t D M Ω2 Prob Bt >2Ω2 X exp 2 √t ≤ − 4 (cid:18) (cid:19) (cid:26) (cid:27) asdesired. Combining(22),(23)and(24),weobtain M2 t D M Ω Prob Errρ(u¯t)>(1+Ω1) ηk+2Ω2 X +Ct exp Ω1 +exp 2 , 2t k=1 √t !≤ {− } (cid:26)− 4 (cid:27) X whereErr (u¯ ) θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b .SubstitutingΩ =Ω,Ω =2√Ωandplugginginη = DX , ρ t ≡ t − ∗ k t t− k2 1 2 k M√2k weobtain(7)asdesired. 5.4 ProofofTheorem2 Proof. Bythestrong-convexityofθ wehave x: 1 ∀ µ θ1(xk)−θ1(x)≤hθ1′(xk),xk−xi− 2kx−xkk2 µ = θ (x ,ξ ),x x + δ ,x x + θ (x ,ξ ),x x x x 2. h 1′ k k+1 k+1− i h k+1 − ki h 1′ k k+1 k− k+1i− 2k − kk FollowingthesamederivationsasinLemma1andTheorem1(i),wehave E[θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b ] t t t 2 − ∗ k − k E 1 t−1 ηk+1kθ1′(xk,ξk+1)k2 + 1 µ xk x 2 kxk+1−x∗k2 ≤ (t k=0(cid:20) 2 (cid:18)2ηk+1 − 2(cid:19)k − ∗k − 2ηk+1 (cid:21)) X βD2 1 + y∗,B +E max λ λ 2 2t hλ∈B0n2βtk − 0k0oi M2 t 1 + 1 t−1E µk x x 2 µ(k+1) x x 2 + βDy2∗,B + ρ2 k k+1 ≤ 2t µk t 2 k − ∗k − 2 k − ∗k 2t 2βt k=1 k=0 (cid:20) (cid:21) X X M2logt µD2 βD2 ρ2 + + y∗,B + . X ≤ µt 2t 2t 2βt 5.5 ProofofTheorem3 Proof. TheLipschitzsmoothnessofθ impliesthat k 0: 1 ∀ ≥ L θ (x ) θ (x )+ θ (x ),x x + x x 2 1 k+1 1 k 1 k k+1 k k+1 k ≤ h∇ − i 2k − k (=3)θ (x )+ θ (x ,ξ ),x x δ ,x x + L x x 2. 1 k 1 k k+1 k+1 k k+1 k+1 k k+1 k h∇ − i−h − i 2k − k 9 Itfollowsthat x : ∀ ∈X θ (x ) θ (x)+ x x, ATλ 1 k+1 1 k+1 k+1 − − − L θ (x ) θ (x)+ (cid:10)θ (x ,ξ ),x (cid:11)x δ ,x x + x x 2+ x x, ATλ 1 k 1 1 k k+1 k+1 k k+1 k+1 k k+1 k k+1 k+1 ≤ − h∇ − i−h − i 2k − k − − L (cid:10) (cid:11) =θ (x ) θ (x)+ θ (x ,ξ ),x x δ ,x x + x x 2 1 k 1 1 k k+1 k k+1 k+1 k k+1 k − h∇ − i−h − i 2k − k + θ (x ,ξ ),x x + x x, ATλ 1 k k+1 k+1 k+1 k+1 h∇ − i − − L (cid:2)θ (x ),x x + θ (x ,ξ(cid:10) ),x x δ ,(cid:11)x(cid:3) x + x x 2 1 k k 1 k k+1 k k+1 k+1 k k+1 k ≤h∇ − i h∇ − i−h − i 2k − k + θ (x ,ξ ),x x + x x, ATλ 1 k k+1 k+1 k+1 k+1 h∇ − i − − L = δ(cid:2) ,x x + x x(cid:10) 2+ θ (x ,ξ )(cid:11),(cid:3)x x + x x, ATλ k+1 k+1 k+1 k 1 k k+1 k+1 k+1 k+1 h − i 2k − k h∇ − i − − L (cid:2) (cid:10) (cid:11)(cid:3) = δ ,x x + x x 2+ x x ,βATB(y y ) k+1 k+1 k+1 k k+1 k k+1 h − i 2k − k − − + θ (x ,ξ )+AT [β(Ax +B(cid:10)y b) λ ],x x (cid:11) 1 k k+1 k+1 k k k+1 ∇ − − − (12) (cid:10) 1 x x 2 x x 2 1/ηk+1−L x x (cid:11)2 k k+1 k+1 k ≤ 2η k − k −k − k − 2 k − k k+1 + x x(cid:0) ,βATB(y y ) +(cid:1)δ ,x x . k+1 k k+1 k+1 k+1 − − h − i Thela(cid:10)stinnerproductcanbebounded(cid:11)asbelowusingYoung’sinequality,giventhatη 1: k+1 ≤ L δ ,x x = δ ,x x + δ ,x x k+1 k+1 k+1 k k+1 k k+1 h − i h − i h − i 1 1/η L δ ,x x + δ 2+ k+1− x x 2. k+1 k k+1 k k+1 ≤h − i 2(1/η L)k k 2 k − k k+1 − Combiningthiswithinequalities(14,17)and(18),wecangetasimilarstatementasthatofLemma1: δ 2 θ(u ) θ(u)+(w w)TF(w ) k k+1k k+1 k+1 k+1 − − ≤ 2(1/η L) k+1 − 1 β + x x 2 x x 2 + Ax+By b 2 Ax+By b 2 k k+1 k k+1 2η k − k −k − k 2 k − k −k − k k+1 (cid:0) 1 (cid:1) (cid:0) (cid:1) + δ ,x x + λ λ 2 λ λ 2 . h k+1 − ki 2β k − kk2−k − k+1k2 (cid:0) (cid:1) TherestoftheproofareessentiallythesameasTheorem1(i),exceptthatweusethenewdeﬁnitionofu¯ in(9). k References [1] R.GlowinskiandA.Marroco,“Surlapproximation,parelementsnisdordreun,etlaresolution,parpenalisation- dualite, dune classe de problems de dirichlet non lineares,” Revue Francaise dAutomatique, Informatique, et RechercheOperationelle,vol.9,no.2,1975. [2] Daniel Gabay and Bertrand Mercier, “A dual algorithmfor the solution of nonlinearvariationalproblemsvia ﬁniteelementapproximation,” Computers&MathematicswithApplications,vol.2,no.1,1976. [3] DanielGabay, “Applicationsofthemethodofmultiplierstovariationalinequalities,” inAugmentedLagrangian Methods: Applicationsto the Solutionof Boundary-ValueProblems, M. Fortin and R. Glowinski, Eds. North- Holland:Amsterdam,1983. [4] RolandGlowinskiandPatrickLeTallec, AugmentedLagrangianandOperator-SplittingMethodsinNonlinear Mechanics, StudiesinAppliedandNumericalMathematics.SIAM,1989. [5] JonathanEcksteinandDimitriP.Bertsekas, “Onthedouglas-rachfordsplittingmethodandtheproximalpoint algorithmformaximalmonotoneoperators,” MathematicalProgramming,vol.55,no.1-3,pp.293–318,1992. 10

Stochastic ADMM for Nonsmooth Optimization PDF

0.15 MB

English

by Hua Ouyang

#additional_collections #journals #arxiv

Checking for file health...

Preview Stochastic ADMM for Nonsmooth Optimization

Stochastic ADMM for Nonsmooth Optimization HuaOuyang NiaoHe ComputationalScienceandEngineering IndustrialandSystemsEngineering GeorgiaInstituteofTechnology GeorgiaInstituteofTechnology [email protected] [email protected] 3 1 0 AlexanderGray 2 ComputationalScienceandEngineering n GeorgiaInstituteofTechnology a [email protected] J 2 2 Abstract ] G L Wepresentastochasticsettingforoptimizationproblemswithnonsmoothconvexseparableobjec- tive functions over linear equality constraints. To solve such problems, we propose a stochastic . s AlternatingDirectionMethodofMultipliers(ADMM)algorithm. Ouralgorithmappliestoamore c generalclassofnonsmoothconvexfunctionsthatdoesnotnecessarilyhaveaclosed-formsolution [ byminimizingtheaugmentedfunctiondirectly. We alsodemonstratethe ratesofconvergencefor 2 ouralgorithmundervariousstructuralassumptionsofthestochasticfunctions:O(1/√t)forconvex v functionsandO(logt/t)forstronglyconvexfunctions. Comparedtopreviousliterature,weestab- 2 lishtheconvergencerateofADMMalgorithm,fortheﬁrsttime,intermsofboththeobjectivevalue 3 andthefeasibilityviolation. 6 0 . 1 1 2 1 Introduction 1 : v i The Alternating Direction Method of Multipliers (ADMM) [1, 2] is a very simple computational method for con- X strainedoptimizationproposedin1970s.ThetheoreticalaspectsofADMMhavebeenstudiedfrom1980sto90sand r itsglobalconvergencewasestablishedintheliterature[3,4,5]. Asreviewedinthecomprehensivepaper[6],withits a capacityofdealingwith objectivefunctionsseparatelyandsynchronously,thismethodturnedouttobe a naturalﬁt in theﬁeld oflarge-scaledata-distributedmachinelearningandbig-datarelatedoptimizationandthereforereceived signiﬁcantamountofattentioninthelastfewyears. Intensivetheoreticalandpracticaladvancesareconductedthere- after. Onthetheoreticalhand,ADMMisrecentlyshowntohavearateofconvergenceofO(1/N)[7,8,9,10],where N standsforthe numberof iterations. On the practicalhand, ADMM has beenappliedto a wide range of applica- tiondomains,suchascompressedsensing[11],imagerestoration[12],videoprocessingandmatrixcompletion[13]. Besidesthat,manyvariationsofthisclassicalmethodhavebeenrecentlydeveloped,suchaslinearized[13,14, 15], accelerated[13],andonline[10]ADMM.However,mostofthesevariantsincludingtheclassiconeimplicitlyassume fullaccessibiltytotruedatavalues,whileinrealityonecanhardlyignoretheexistenceofnoise. Amorenaturalway ofhandlingthisissueistoconsiderunbiasedorevenbiasedobservationsoftruedata,whichleadsustothestochastic setting. 1Ashortversionappearsinthe5thNIPSWorkshoponOptimizationforMachineLearning,LakeTahoe,Nevada,USA,2012. 1 1.1 StochasticSettingforADMM Inthiswork,westudyafamilyofconvexoptimizationproblemsinwhichourobjectivefunctionsareseparableand stochastic.Inparticular,weareinterestedinsolvingthefollowinglinearequality-constrainedstochasticoptimization: min E θ (x,ξ)+θ (y) s.t. Ax+By=b, (1) ξ 1 2 x ,y ∈X ∈Y wherex Rd1,y Rd2,A Rm×d1,B Rm×d2,b Rm, isaconvexcompactset,and isaclosedconvex ∈ ∈ ∈ ∈ ∈ X Y set. Weareabletodrawasequenceofidenticalandindependent(i.i.d.) observationsfromtherandomvector ξ that obeysa ﬁxed butunknowndistribution P. One can see thatwhen ξ is deterministic, we can recoverthe traditional problemsetting forADMM [6]. Denotethe expectationfunctionθ (x) E θ (x,ξ). In ourmostgeneralsetting, 1 ξ 1 ≡ real-valuedfunctionsθ ()andθ ()areconvexbutnotnecessarilycontinuouslydifferentiable. 1 2 · · Note that our stochastic setting of the problem is quite different from that of the Online ADMM proposed in [10]. In Online ADMM, one does not assume ξ to be i.i.d., nor the objective to be stochastic, but in- stead, a deterministic concept referred as regret is concerned: R x t [θ (x ,ξ )+θ (y )] [1:t] ≡ k=1 1 k k 2 k − inf t [θ (x,ξ )+θ (y)]. Ax+By=b k=1 1 k 2 (cid:0) (cid:1) P P 1.2 OurContributions In this work, we propose a stochastic setting of the ADMM problem and design the Stochastic ADMM algorithm. A key algorithmic feature of our Stochastic ADMM that distinguishes it from previousADMM and variants is the ﬁrst-orderapproximationof θ thatweusetomodifytheaugmentedLagrangian. Thissimplemodiﬁcationnotonly 1 guarantees the convergenceof our stochastic method, but also beneﬁts to a more general class of convex objective functionswhichmightnothaveaclosed-formsolutionin minimizingtheaugmentedθ directly. Forexample,with 1 stochasticADMM,we canderiveclose-formupdatesforthenonsmoothhingelossfunction(usedin supportvector machines). However, with deterministic ADMM, one has to call SVM solvers during each iteration [6], which is indeedverytime-consuming.Oneofourmaincontributionsisthatwedeveloptheconvergenceratesofouralgorithm undervariousstructuralassumptions. Forconvex θ (),therateisprovedtobeO(1/√t);forstronglyconvexθ (), 1 1 · · therateisprovedtobeO(logt/t).Tothebestofourknowledge,thisistheﬁrsttimethatconvergenceratesofADMM areestablishedforboththeobjectivevalueandthefeasibilityviolation.Bycontrast,recentresearch[8,10]onlyshows theconvergenceofADMMindirectlyintermsofthesatisfactionofvariationalinequalities. 1.3 Notations Throughoutthispaper,we denotethesubgradientsof θ andθ asθ andθ . Whentheyare differentiable,we will 1 2 1′ 2′ use θ and θ todenotethegradients.Weusethenotationθ bothfortheinstancefunctionvalueθ (x,ξ)andfor 1 2 1 1 ∇ ∇ itsexpectationθ (x). Wedenotebyθ(u) θ (x)+θ (y)thesumofthestochasticandthedeterministicfunctions. 1 1 2 ≡ Forsimplicityandclarity,wewillusethefollowingnotationstodenotestackedvectorsortuples: x x x k X u , w y , w y , , ≡(cid:18) y (cid:19) ≡ λ ! k ≡ λkk ! W ≡ RYm ! (2) u¯ k1 ik=−01xi , w¯ k11 kik=1xyi , F(w) −BATTλλ . k ≡ k1Pki=1yi ! k ≡ 1k Pki=1λi  ≡ Ax−+By b   k Pi=1 i  − P     ForapositivesemideﬁnitematrixG Rd1×d1,wedeﬁnPetheG-normofavectorxas x G := G1/2x 2 =√xTGx. ∈ k k k k Weuse , todenotetheinnerproductinaﬁnitedimensionalEuclideanspace.Whenthereisnoambiguity,weoften h· ·i use to denote the Euclidean norm . We assume that the optimal solution of (1) exists and denote it as 2 k ·k k ·k u xT,yT T. Thefollowingquantityappearfrequentlyinourconvergenceanalysis: ∗ ≡ ∗ ∗ (cid:0) (cid:1) δ θ (x ,ξ ) θ (x ), Dk ≡ 1′ suk−p1 kx − x1′ ,k−D1 B(y y ) . (3) X ≡xa,xb∈Xk a− bk y∗,B ≡k 0− ∗ k 2 1.4 Assumptions Beforepresentingthealgorithmandconvergenceresults,welisttheassumptionsthatwillbeusedinourstatements. Assumption1. Forallx ,E θ (x,ξ) 2 M2. ∈X k 1′ k ≤ h i Assumption2. Forallx ,E exp θ (x,ξ) 2/M2 exp 1 . ∈X k 1′ k ≤ { } Assumption3. Forallx ,Eh θ (nx,ξ) θ (x) 2 oiσ2. ∈X k 1′ − 1′ k ≤ (cid:2) (cid:3) 2 Stochastic ADMM Algorithm Directly solving problem (1) can be nontrivialeven if ξ is deterministic and the equality constraint is as simple as x y=0. Forexample,usingtheaugmentedLagrangianmethod,onehastominimizetheaugmentedLagrangian: − β min (x,y,λ) min θ (x)+θ (y) λ,Ax+By b + Ax+By b 2, β 1 2 x ,y L ≡x ,y −h − i 2k − k ∈X ∈Y ∈X ∈Y whereβ isapre-deﬁnedpenaltyparameter. Thisproblemisatleastnoteasierthansolvingtheoriginalone. The(de- terministic)ADMM(Alg.1)solvesthisprobleminaGauss-Seidelmanner:minimizing w.r.t. xandyalternatively β L giventheotherﬁxed,followedbyapenaltyupdateovertheLagrangianmultiplierλ. Algorithm1DeterministicADMM 0. Initializey andλ =0. 0 0 fork =0,1,2,...do 2 1. xk+1 ←argminx∈X θ1(x)+ β2 (Ax+Byk−b)− λβk . (cid:26) (cid:13) (cid:13) (cid:27)2 2. yk+1 ←argminy∈Y θ2(y)+ β2 (cid:13)(cid:13)(Axk+1+By−b)− λ(cid:13)(cid:13)βk . 3. λ λ β(Ax(cid:26) +By (cid:13) b). (cid:13) (cid:27) k+1 k k+1 k+1(cid:13) (cid:13) endfor ← − (cid:13)− (cid:13) AvariantdeterministicalgorithmnamedlinearizedADMMreplacesLine1ofAlg.1by β 1 x argmin θ (x)+ (Ax+By b) λ /β 2+ x x 2 , (4) k+1 ← x 1 2 k k− − k k 2k − kkG ∈X(cid:26) (cid:27) whereG Rd1×d1 ispositivesemideﬁnite. ThisvariantcanberegardedasageneralizationoftheoriginalADMM. WhenG=∈ 0,itisthesameasAlg.1. WhenG=rI βATA,itisequivalenttothefollowinglinearizedproximal d1 − pointmethod: r x argmin θ (x)+β(x x )T AT(Ax +By b λ /β) + x x 2 . k+1 1 k k k k k ← x − − − 2k − k ∈Xn (cid:2) (cid:3) o Notethatthelinearizationisappliedonlytothequadraticfunction (Ax+By b) λ /β 2,butnottoθ . This k k 1 k − − k approximation helps when Line 1 of Alg.1 does not produce a closed-form solution given the quadratic term. For example,letθ (x)= x andAnotidentity. 1 1 k k As shownin Alg.2, we proposea StochasticAlternatingDirection Methodof Multipliers(StochasticADMM) algo- rithm. OuralgorithmsharessomefeatureswiththeclassicalandthelinearizedADMM.OnecanseethatLine2and 3 are essentially the same as before. However, there are two major differences in Line 1. First, we replace θ (x) 1 with aﬁrst-orderapproximationof θ (x,ξ )atx : θ (x )+xTθ (x ,ξ ). Thisapproximationhasthesame 1 k+1 k 1 k 1′ k k+1 ﬂavourofthestochasticmirrordescent[16]usedforsolvingaone-variablestochasticconvexproblem.Oneimportant beneﬁtofusingthisapproximationisthatouralgorithmcanbeappliedtononsmoothobjectivefunctions,beyondthe smoothandseparableleastsquareslossusedinlasso.Second,similartothelinearizedADMM(4),weaddanl -norm 2 prox-function x x 2 butscaleitbyatime-varyingstepsizeη . AswewillseeinSection3,thechoiceofthis k k+1 k − k stepsizeiscrucialinguaranteeingaconvergence. 3 Algorithm2StochasticADMM 0. Initializex ,y andλ =0. 0 0 0 fork =0,1,2,...do 1. xk+1 ←argminx∈X hθ1′(xk,ξk+1),xi+ β2 (Ax+Byk−b)− λβk 2+ kx2−ηkx+k1k2 . (cid:26) (cid:13) 2 (cid:13) (cid:27) 2. yk+1 ←argminy∈Y θ2(y)+ β2 (Axk+1+(cid:13)(cid:13)By−b)− λβk . (cid:13)(cid:13) 3. λ λ β(Ax(cid:26) +By (cid:13) b). (cid:13) (cid:27) k+1 k k+1 k+1(cid:13) (cid:13) endfor ← − (cid:13)− (cid:13) 3 MainResults ofConvergence Rates In this section, we will show that ourStochastic ADMM givenin Alg.2 exhibitsa rate O(1/√t) of convergencein termsofboththeobjectivevalueandthefeasibilityviolation:E[θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b ]=O(1/√t). t t t 2 Weextendthemainresultifmorestructuralinformationofθ isavailab−le. ∗ k − k 1 Before we address the main theorem on convergencerates, we ﬁrst present an upper bound of the variation of the Lagrangianfunctionanditsﬁrstorderapproximationbased oneachiterationpoints. Lemma1. w ,k 1,wehave ∀ ∈W ≥ η θ (x ,ξ ) 2 θ (x )+θ (y ) θ(u)+(w w)TF(w ) k+1k 1′ k k+1 k 1 k 2 k+1 k+1 k+1 − − ≤ 2 1 β + x x 2 x x 2 + Ax+By b 2 Ax+By b 2 (5) k k+1 k k+1 2η k − k −k − k 2 k − k −k − k k+1 (cid:0) 1 (cid:1) (cid:0) (cid:1) + δ ,x x + λ λ 2 λ λ 2 . h k+1 − ki 2β k − kk2−k − k+1k2 (cid:0) (cid:1) Utilizing this lemma we are able to obtain our main result shown as below. We present our main theorem of the convergenceintwofashions,bothintermsofexpectationandprobabilitysatisfaction. Theorem1. Letη = DX , k 1andρ>0. k M√2k ∀ ≥ (i) UnderAssumption1,wehave t 1, ∀ ≥ √2D M βD2 +ρ2/β E[θ(u¯t) θ(u )+ρ Ax¯t+By¯t b ] M1(t)+M2(t) X + y∗,B , (6) − ∗ k − k ≤ ≡ √t 2t (ii) UnderAssumption1and2,wehaveforanyΩ>0, 1 Prob θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b > 1+ Ω+2√2Ω M (t)+M (t) 2exp Ω , (7) t t t 1 2 − ∗ k − k 2 ≤ {− } (cid:26) (cid:18) (cid:19) (cid:27) Remark1. Adaptingourprooftechniquestothedeterministiccasewherenonoisetakesplace,weareabletoobtain asimilarresultfordeterministicADMM: βD2 ρ2 ρ>0,t 1, θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b y∗,B + , (8) t t t 2 ∀ ≥ − ∗ k − k ≤ 2t 2βt While resulting in a O(1/t) convergencerate same as the existingliterature[8, 9, 10], the aboveﬁndingis actually a signiﬁcant advancein the theoretical aspects of ADMM. For the ﬁrst time, the convergenceof ADMM is proved in terms of objective value and feasibility violation. By contrast, the existing literature [8, 9, 10] only shows the convergenceof ADMMin termsofthesatisfaction ofvariationalinequalities,whichis notadirectmeasureofhow fastanalgorithmreachestheoptimalsolution. 4 3.1 Extension: StronglyConvexθ 1 Whenfunctionθ ()isstronglyconvex,theconvergencerateofStochasticADMMcanbeimprovedtoO logt . 1 · t Theorem 2. When θ is µ-strongly convex with respect to , taking η = 1 in Alg.2, under Ass(cid:16)umpti(cid:17)on 1, 1 k · k k kµ ρ>0,t 1wehaveE[θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b ] M2logt + µDX2 + βDy2∗,B + ρ2 . ∀ ≥ t − ∗ k t t− k2 ≤ µt 2t 2t 2βt 3.2 Extension: LipschitzSmoothθ 1 SincetheboundsgiveninTheorem1arerelatedtothemagnitudeofsubgradients,theydonotprovideanyintuition ofthe performancein low-noisescenarios. With a Lipschitz smoothfunctionθ , we are ableto obtainconvergence 1 ratesintermsofthevariationsofgradients,asstatedinAssumption3. Besides,underthisassumptionweareableto replacetheunusualdeﬁnitionofu¯ in(2)with k 1 k x u¯ k i=1 i . (9) k ≡ k1Pki=1yi! Theorem3. Whenθ ()isL-LipschitzsmoothwithrespecPtto ,takingη = 1 inAlg.2,underAssump- 1 · k·k k L+σ√2k/DX tion3, ρ>0,t 1wehaveE[θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b ] √2DXσ + LDX2 + βDy2∗,B + ρ2 . ∀ ≥ t − ∗ k t t− k2 ≤ √t 2t 2t 2βt 4 Summary andFuture Work Inthispaper,wehaveproposedthestochasticsettingforADMMalongwithourstochasticADMMalgorithm.Based onaﬁrst-orderapproximationofthestochasticfunction,ouralgorithmisapplicabletoaverybroadclassofproblems evenwithfunctionsthathavenoclosed-formsolutiontothe subproblemofminimizingtheaugmentedθ . We have 1 also established convergence rates under various structural assumptions of θ : O(1/√t) for convex functions and 1 O(logt/t)forstronglyconvexfunctions.WeareworkingonintegratingNesterov’soptimalﬁrst-ordermethods[17]to ouralgorithm,whichwillhelpinachievingoptimalconvergencerates. Moreinterestingandchallengingapplications willbecarriedoutinourfuturework. 5 Appendix 5.1 3-PointsRelation BeforeprovingLemma1,wewillstartwiththefollowingsimplelemma,whichisaveryusefulresultbyimplementing Bregmandivergenceasaprox-functioninproximalmethods. Lemma2. Letl(x): Rbeaconvexdifferentiablefunctionwithgradientg. Letscalars 0. Foranyvectoru X → ≥ andv,denotetheirBregmandivergenceasD(u,v) ω(u) ω(v) ω(v),u v . If u , ≡ − −h∇ − i ∀ ∈X x∗ argminl(x)+sD(x,u), (10) ≡ x ∈X then g(x ),x x s[D(x,u) D(x,x ) D(x ,u)]. ∗ ∗ ∗ ∗ h − i≤ − − Proof. Invokingtheoptimalityconditionfor(10),wehave g(x )+s D(x ,u),x x 0, x , ∗ ∗ ∗ h ∇ − i≥ ∀ ∈X whichisequivalentto g(x ),x x s D(x ,u),x x ∗ ∗ ∗ ∗ h − i≤ h∇ − i =s ω(x ) ω(u),x x ∗ ∗ h∇ −∇ − i =s[D(x,u) D(x,x ) D(x ,u)]. ∗ ∗ − − 5 5.2 ProofofLemma1 Proof. Duetotheconvexityofθ andusingthedeﬁnitionofδ ,wehave 1 k θ (x ) θ (x) θ (x ),x x = θ (x ,ξ ),x x + δ ,x x + θ (x ,ξ ),x x . 1 k − 1 ≤h 1′ k k− i h 1′ k k+1 k+1− i h k+1 − ki h 1′ k k+1 k− k+1i (11) ApplyingLemma2toLine1ofAlg.2andtakingD(u,v)= 1 v u 2,wehave 2k − k θ (x ,ξ )+AT [β(Ax +By b) λ ],x x 1′ k k+1 k+1 k− − k k+1− 1 (12) (cid:10) x x 2 x x 2 x x 2 (cid:11) k k+1 k k+1 ≤ 2η k − k −k − k −k − k k+1 (cid:0) (cid:1) Combining(11)and(12)wehave θ (x ) θ (x)+ x x, ATλ 1 k 1 k+1 k+1 − − − (11) (cid:10) (cid:11) θ (x ,ξ ),x x + δ ,x x + θ (x ,ξ ),x x + ≤ h 1′ k k+1 k+1− i h k+1 − ki h 1′ k k+1 k− k+1i x x,AT [β(Ax +By b) λ ] k+1 k+1 k+1 k − − − =(cid:10)θ1′(xk,ξk+1)+AT [β(Axk+1+Byk−b)−λ(cid:11)k],xk+1−x + (13) h(cid:10)δk+1,x−xki+ x−xk+1,βATB(yk−yk+1) +hθ1′(xk,ξ(cid:11)k+1),xk−xk+1i (12) 1 (cid:10) (cid:11) x x 2 x x 2 x x 2 + δ ,x x + k k+1 k+1 k k+1 k ≤ 2η k − k −k − k −k − k h − i k+1 x x (cid:0),βATB(y y ) + θ (x ,ξ ),x (cid:1)x − k+1 k− k+1 h 1′ k k+1 k− k+1i Wehandlethelasttwotermsseparately: (cid:10) (cid:11) x x ,βATB(y y ) =β Ax Ax ,By By k+1 k k+1 k+1 k k+1 − − h − − i β (cid:10)= Ax+By b 2 A(cid:11)x+By b 2 + Ax +By b 2 Ax +By b 2 k k+1 k+1 k+1 k+1 k 2 k − k −k − k k − k −k − k β (cid:2)(cid:0) (cid:1) (cid:0)1 (cid:1)(cid:3) Ax+By b 2 Ax+By b 2 + λ λ 2 k k+1 k+1 k ≤ 2 k − k −k − k 2βk − k (cid:0) (cid:1) (14) and η θ (x ,ξ ) 2 x x 2 hθ1′(xk,ξk+1),xk−xk+1i≤ k+1k 1′ 2k k+1 k + k k2−η k+1k , (15) k+1 wherethelaststepisduetoYoung’sinequality.Inserting(14)and(15)into(13),wehave θ (x ) θ (x)+ x x, ATλ 1 k 1 k+1 k+1 − − − 1 η θ (x ,ξ ) 2 x x(cid:10) 2 x x 2 +(cid:11) k+1k 1′ k k+1 k + δ ,x x ≤ 2η k k− k −k k+1− k 2 h k+1 − ki (16) k+1 β (cid:0) (cid:1) 1 + Ax+By b 2 Ax+By b 2 + λ λ 2, k k+1 k+1 k 2 k − k −k − k 2βk − k (cid:0) (cid:1) DuetotheoptimalityconditionofLine2inAlg.2andtheconvexityofθ ,wehave 2 θ (y ) θ (y)+ y y, BTλ 0. (17) 2 k+1 2 k+1 k+1 − − − ≤ UsingLine3inAlg.2,wehave (cid:10) (cid:11) λ λ,Ax +By b k+1 k+1 k+1 h − − i 1 = λ λ,λ λ β h k+1− k− k+1i (18) 1 = λ λ 2 λ λ 2 λ λ 2 k k+1 k+1 k 2β k − k −k − k −k − k (cid:0) (cid:1) Takingthesummationofinequalities(16)(17)and(18),weobtaintheresultasdesired. 6 5.3 ProofofTheorem1 Proof. (i).Invokingconvexityofθ ()andθ ()andthemonotonicityofoperatorF(),wehave w Ω: 1 2 · · · ∀ ∈ t 1 θ(u¯ ) θ(u)+(w¯ w)TF(w¯ ) θ (x )+θ (y ) θ(u)+(w w)TF(w ) t t t 1 k 1 2 k k k − − ≤ t − − − k=1 X(cid:2) (cid:3) (19) t 1 1 − = θ (x )+θ (y ) θ(u)+(w w)TF(w ) 1 k 2 k+1 k+1 k+1 t − − k=0 X(cid:2) (cid:3) ApplyingLemma1attheoptimalsolution(x,y)=(x ,y ),wecanderivefrom(19)that, λ ∗ ∗ ∀ θ(u¯ ) θ(u )+(x¯ x )T( ATλ¯ )+(y¯ y )T( BTλ¯ )+(λ¯ λ)T(Ax¯ +By¯ b) t t t t t t t t − ∗ − ∗ − − ∗ − − − (5) 1 t−1 ηk+1kθ1′(xk,ξk+1)k2 + 1 x x 2 x x 2 + δ ,x x k k+1 k+1 k ≤ t k=0(cid:20) 2 2ηk+1 k − ∗k −k − ∗k h ∗− i(cid:21) X (cid:0) (cid:1) 1 β 1 (20) + Ax +By b 2+ λ λ 2 0 0 t 2k ∗ − k 2βk − k (cid:18) (cid:19) ≤ 1t kt−=10(cid:20)ηk+1kθ1′(x2k,ξk+1)k2 +hδk+1,x∗−xki(cid:21)+ 1t (cid:18)D2ηX2t + β2Dy2∗,B+ 21βkλ−λ0k22(cid:19) X Theaboveinequalityistrueforallλ Rm,henceitalsoholdsintheball = λ : λ ρ . Combingwiththe 0 2 ∈ B { k k ≤ } factthattheoptimalsolutionmustalsobefeasible,itfollowsthat max θ(u¯ ) θ(u )+(x¯ x )T( ATλ¯ )+(y¯ y )T( BTλ¯ )+(λ¯ λ)T(Ax¯ +By¯ b) t t t t t t t t λ∈B0 − ∗ − ∗ − − ∗ − − − = max(cid:8)θ(u¯ ) θ(u )+λ¯T(Ax +By b) λT(Ax¯ +By¯ b) (cid:9) λ∈B0 t − ∗ t ∗ ∗− − t t− (21) = max(cid:8)θ(u¯ ) θ(u ) λT(Ax¯ +By¯ b) (cid:9) t t t λ∈B0 − ∗ − − =θ(u¯ )(cid:8) θ(u )+ρ Ax¯ +By¯ b (cid:9) t t t 2 − ∗ k − k Takinganexpectationover(21)andusing(20)wehave: E[θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b ] t t t 2 − ∗ k − k ≤E"1t kt−=10(cid:18)ηk+1kθ1′(x2k,ξk+1)k2 +hδk+1,x∗−xki(cid:19)+ 1t (cid:18)D2ηX2t + β2Dy2∗,B(cid:19)# X 1 +E max λ λ 2 (cid:20)λ∈B0(cid:26)2βtk − 0k2(cid:27)(cid:21) 1 M2 t η + D2 + βDy2∗,B + ρ2 + 1 t−1E[ δ ,x x ] k X k+1 k ≤ t 2 2ηt! 2t 2βt t h ∗− i k=1 k=0 X X 1 M2 t D2 βD2 ρ2 = η + + y∗,B + k X t 2 2η 2t 2βt t! k=1 X √2D M βD2 ρ2 X + y∗,B + ≤ √t 2t 2βt In the second last step, we use the fact that x is independent of ξ , hence E δ ,x x = k k+1 ξk+1|ξ[1:k]h k+1 ∗− ki E δ ,x x =0. ξk+1|ξ[1:k] k+1 ∗− k D E 7 (ii)Fromthestepsintheproofofpart(i),itfollowsthat, θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b t t t − ∗ k − k ≤ 1t kt−=10ηk+1 kθ1′(x2k,ξk+1)k2 + 1t kt−=10hδk+1,x∗−xki+ 1t (cid:18)D2ηX2t + β2Dy2∗,B + 2ρβ2(cid:19) (22) X X A +B +C t t t ≡ NotethatrandomvariablesA andB aredependentonξ . t t [t] Claim1. ForΩ >0, 1 M2 t Prob A (1+Ω ) η exp Ω . (23) t 1 k 1 ≥ 2t !≤ {− } k=1 X Letα ηk k = 1,...,t,then0 α 1and t α = 1. Usingthefactthat δ , k areindependent k ≡ Ptk=1ηk ∀ ≤ k ≤ k=1 k { k ∀ } andapplyingAssumption2,onehas P t t E exp α θ (x ,ξ ) 2/M2 = E exp α θ (x ,ξ ) 2/M2 " ( kk 1′ k k+1 k )# kk 1′ k k+1 k k=1 k=1 X Y (cid:2) (cid:8) (cid:9)(cid:3) t E exp θ (x ,ξ ) 2/M2 αk (Jensen’sInequality) ≤ k 1′ k k+1 k kY=1(cid:16) (cid:2) (cid:8) (cid:9)(cid:3)(cid:17) t t (exp 1 )αk =exp α =exp 1 k ≤ { } ( ) { } k=1 k=1 Y X Hence,byMarkov’sInequality,wecanget M2 t t Prob A (1+Ω ) η exp (1+Ω ) E exp α θ (x ,ξ ) 2/M2 exp Ω . t ≥ 1 2t k!≤ {− 1 } " ( kk 1′ k k+1 k )#≤ {− 1} k=1 k=1 X X WehavethereforeprovedClaim1. Claim2. ForΩ >0, 2 D M Ω2 Prob Bt >2Ω2 X exp 2 . (24) √t ≤ − 4 (cid:18) (cid:19) (cid:26) (cid:27) Inordertoprovethisclaim,weadoptthefollowingfactsinNemirovski’spaper[16]. Lemma 3. Given that for all k = 1,...,t, ζ is a deterministic function of ξ with E ζ ξ = 0 and k [k] k [k 1] E exp ζ2/σ2 ξ exp 1 ,wehave | − { k k}| [k−1] ≤ { } (cid:2) (cid:3) (cid:2) (cid:3) (a) Forγ 0,E exp γζ ξ exp γ2σ2 , k=1,...,t ≥ { k}| [k−1] ≤ { k} ∀ (cid:2) (cid:3) (b) LetS = t ζ ,thenProb S >Ω t σ2 exp Ω2 . t k=1 k { t k=1 k}≤ − 4 q n o P P Using this result by setting ζ = δ ,x x , S = t ζ , and σ = 2D M, k, we can verify that E ζ ξ =0and k h k ∗− k−1i t k=1 k k X ∀ k| [k−1] P (cid:2) (cid:3) E exp ζ2/σ2 ξ E exp D2 δ 2/σ2 ξ exp 1 , { k k}| [k−1] ≤ { Xk kk k}| [k−1] ≤ { } since ζ 2 x x (cid:2)2 δ 2 D2 2 θ(cid:3) (x ,(cid:2)ξ ) 2+2M2 . (cid:3) | k| ≤k ∗− k−1k k kk ≤ X k 1′ k k+1 k (cid:0) (cid:1) 8 Implementingtheaboveresults,itfollowsthat Ω2 Prob S >2Ω D M√t exp 2 . t 2 X ≤ − 4 (cid:16) (cid:17) (cid:26) (cid:27) SinceS =tB ,wehave t t D M Ω2 Prob Bt >2Ω2 X exp 2 √t ≤ − 4 (cid:18) (cid:19) (cid:26) (cid:27) asdesired. Combining(22),(23)and(24),weobtain M2 t D M Ω Prob Errρ(u¯t)>(1+Ω1) ηk+2Ω2 X +Ct exp Ω1 +exp 2 , 2t k=1 √t !≤ {− } (cid:26)− 4 (cid:27) X whereErr (u¯ ) θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b .SubstitutingΩ =Ω,Ω =2√Ωandplugginginη = DX , ρ t ≡ t − ∗ k t t− k2 1 2 k M√2k weobtain(7)asdesired. 5.4 ProofofTheorem2 Proof. Bythestrong-convexityofθ wehave x: 1 ∀ µ θ1(xk)−θ1(x)≤hθ1′(xk),xk−xi− 2kx−xkk2 µ = θ (x ,ξ ),x x + δ ,x x + θ (x ,ξ ),x x x x 2. h 1′ k k+1 k+1− i h k+1 − ki h 1′ k k+1 k− k+1i− 2k − kk FollowingthesamederivationsasinLemma1andTheorem1(i),wehave E[θ(u¯ ) θ(u )+ρ Ax¯ +By¯ b ] t t t 2 − ∗ k − k E 1 t−1 ηk+1kθ1′(xk,ξk+1)k2 + 1 µ xk x 2 kxk+1−x∗k2 ≤ (t k=0(cid:20) 2 (cid:18)2ηk+1 − 2(cid:19)k − ∗k − 2ηk+1 (cid:21)) X βD2 1 + y∗,B +E max λ λ 2 2t hλ∈B0n2βtk − 0k0oi M2 t 1 + 1 t−1E µk x x 2 µ(k+1) x x 2 + βDy2∗,B + ρ2 k k+1 ≤ 2t µk t 2 k − ∗k − 2 k − ∗k 2t 2βt k=1 k=0 (cid:20) (cid:21) X X M2logt µD2 βD2 ρ2 + + y∗,B + . X ≤ µt 2t 2t 2βt 5.5 ProofofTheorem3 Proof. TheLipschitzsmoothnessofθ impliesthat k 0: 1 ∀ ≥ L θ (x ) θ (x )+ θ (x ),x x + x x 2 1 k+1 1 k 1 k k+1 k k+1 k ≤ h∇ − i 2k − k (=3)θ (x )+ θ (x ,ξ ),x x δ ,x x + L x x 2. 1 k 1 k k+1 k+1 k k+1 k+1 k k+1 k h∇ − i−h − i 2k − k 9 Itfollowsthat x : ∀ ∈X θ (x ) θ (x)+ x x, ATλ 1 k+1 1 k+1 k+1 − − − L θ (x ) θ (x)+ (cid:10)θ (x ,ξ ),x (cid:11)x δ ,x x + x x 2+ x x, ATλ 1 k 1 1 k k+1 k+1 k k+1 k+1 k k+1 k k+1 k+1 ≤ − h∇ − i−h − i 2k − k − − L (cid:10) (cid:11) =θ (x ) θ (x)+ θ (x ,ξ ),x x δ ,x x + x x 2 1 k 1 1 k k+1 k k+1 k+1 k k+1 k − h∇ − i−h − i 2k − k + θ (x ,ξ ),x x + x x, ATλ 1 k k+1 k+1 k+1 k+1 h∇ − i − − L (cid:2)θ (x ),x x + θ (x ,ξ(cid:10) ),x x δ ,(cid:11)x(cid:3) x + x x 2 1 k k 1 k k+1 k k+1 k+1 k k+1 k ≤h∇ − i h∇ − i−h − i 2k − k + θ (x ,ξ ),x x + x x, ATλ 1 k k+1 k+1 k+1 k+1 h∇ − i − − L = δ(cid:2) ,x x + x x(cid:10) 2+ θ (x ,ξ )(cid:11),(cid:3)x x + x x, ATλ k+1 k+1 k+1 k 1 k k+1 k+1 k+1 k+1 h − i 2k − k h∇ − i − − L (cid:2) (cid:10) (cid:11)(cid:3) = δ ,x x + x x 2+ x x ,βATB(y y ) k+1 k+1 k+1 k k+1 k k+1 h − i 2k − k − − + θ (x ,ξ )+AT [β(Ax +B(cid:10)y b) λ ],x x (cid:11) 1 k k+1 k+1 k k k+1 ∇ − − − (12) (cid:10) 1 x x 2 x x 2 1/ηk+1−L x x (cid:11)2 k k+1 k+1 k ≤ 2η k − k −k − k − 2 k − k k+1 + x x(cid:0) ,βATB(y y ) +(cid:1)δ ,x x . k+1 k k+1 k+1 k+1 − − h − i Thela(cid:10)stinnerproductcanbebounded(cid:11)asbelowusingYoung’sinequality,giventhatη 1: k+1 ≤ L δ ,x x = δ ,x x + δ ,x x k+1 k+1 k+1 k k+1 k k+1 h − i h − i h − i 1 1/η L δ ,x x + δ 2+ k+1− x x 2. k+1 k k+1 k k+1 ≤h − i 2(1/η L)k k 2 k − k k+1 − Combiningthiswithinequalities(14,17)and(18),wecangetasimilarstatementasthatofLemma1: δ 2 θ(u ) θ(u)+(w w)TF(w ) k k+1k k+1 k+1 k+1 − − ≤ 2(1/η L) k+1 − 1 β + x x 2 x x 2 + Ax+By b 2 Ax+By b 2 k k+1 k k+1 2η k − k −k − k 2 k − k −k − k k+1 (cid:0) 1 (cid:1) (cid:0) (cid:1) + δ ,x x + λ λ 2 λ λ 2 . h k+1 − ki 2β k − kk2−k − k+1k2 (cid:0) (cid:1) TherestoftheproofareessentiallythesameasTheorem1(i),exceptthatweusethenewdeﬁnitionofu¯ in(9). k References [1] R.GlowinskiandA.Marroco,“Surlapproximation,parelementsnisdordreun,etlaresolution,parpenalisation- dualite, dune classe de problems de dirichlet non lineares,” Revue Francaise dAutomatique, Informatique, et RechercheOperationelle,vol.9,no.2,1975. [2] Daniel Gabay and Bertrand Mercier, “A dual algorithmfor the solution of nonlinearvariationalproblemsvia ﬁniteelementapproximation,” Computers&MathematicswithApplications,vol.2,no.1,1976. [3] DanielGabay, “Applicationsofthemethodofmultiplierstovariationalinequalities,” inAugmentedLagrangian Methods: Applicationsto the Solutionof Boundary-ValueProblems, M. Fortin and R. Glowinski, Eds. North- Holland:Amsterdam,1983. [4] RolandGlowinskiandPatrickLeTallec, AugmentedLagrangianandOperator-SplittingMethodsinNonlinear Mechanics, StudiesinAppliedandNumericalMathematics.SIAM,1989. [5] JonathanEcksteinandDimitriP.Bertsekas, “Onthedouglas-rachfordsplittingmethodandtheproximalpoint algorithmformaximalmonotoneoperators,” MathematicalProgramming,vol.55,no.1-3,pp.293–318,1992. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.