AN ALGORITHM FOR SEMI-INFINITE POLYNOMIAL OPTIMIZATION J.B. LASSERRE 1 1 Abstract. We consider the semi-inﬁnite optimization problem: 0 2 f∗ :=min{f(x):g(x,y) ≤ 0, ∀y∈Yx}, x∈X n a where f,g are polynomials and X ⊂ Rn as well as Yx ⊂ Rp, x ∈ X, are J compactbasicsemi-algebraicsets. Toapproximatef∗ weproceedintwosteps. 1 First, we use the “joint+marginal”approach of the author [9] to approximate 2 from above the function x 7→ Φ(x) = sup{g(x,y) : y ∈ Yx} by a polynomial Φd ≥ Φ, of degree at most 2d, with the strong property that Φd converges to ] C Φ for the L1-norm, as d → ∞ (and in particular, almost uniformly for some subsequence (d ), ℓ∈N). Therefore,to approximatef∗ one may wishto solve O ℓ . the polynomial optimization problem fd0 = minx∈X{f(x) : Φd(x) ≤ 0} via h a (by now standard) hierarchy of semideﬁnite relaxations, and for increasing at values ofd. Inpractice dis ﬁxed, small,andone relaxesthe constraintΦd ≤0 m to Φ (x)≤ǫ with ǫ>0, allowingto changeǫ dynamically. As dincreases,the d limit of the optimal value fǫ is bounded above by f∗+ǫ. [ d 1 v 2 2 1. Introduction 1 4 Consider the semi-inﬁnite optimization problem: . 1 (1.1) P : f∗ := min{f(x) : g(x,y) ≤ 0, ∀y ∈ Y }, 0 x x∈X 1 1 where X ⊂ Rn, Y ⊂ Rp for every x ∈ X, and some functions f : Rn → R, x v: g : Rn ×Rp :→ R. Xi Problem P is called a semi-inﬁnite optimization problem because of the inﬁn- r itely many constraints g(x,y) ≤ 0 for all y ∈ Yx (for each ﬁxed x ∈ X). It has a many applications and particularly in robust control. In full generality P is a very hard problem and most methods aiming at com- puting (or at least approximating) f∗ use discretization to overcome the diﬃcult semi-inﬁnite constraint g(x,y) ≤ 0 for all y ∈ Y . Namely, in typical approaches x where Y ≡ Y for all x ∈ X (i.e. no dependence on x), the set Y ⊂ Rp is dis- x cretized on a ﬁnite grid and if the resulting nonlinear programming problems Key words and phrases. Polynomial optimization; min-max optimization; robust optimiza- tion; semideﬁnite relaxations. This work was performed during a visit in November 2010 at CRM (Centre de Recerca Matematica),aMathematicscenterfromUAB(UniversidadAutonomadeBarcelona),andthe author wishes to gratefully acknowledge ﬁnancial support from CRM. 1 are solved to global optimality, then convergence to a global optimum of the semi-inﬁnite problem occurs as the grid size vanishes (see e.g. the discussion and the many references in [10]). Alternatively, in [10] the authors provide lower bounds on f∗ by discretizing Y and upper bounds via convex relaxations of the inner problem max {g(x,y)} ≤ 0. In [11] the authors also use a discretiza- y∈Y tion scheme of Y but now combined with a hierarchy of sum of squares convex relaxations for solving to global optimality. Contribution. We restrict ourselves to problem P where : • f,g are polynomials, and • X ⊂ Rn and Y ⊂ Rp, x ∈ X, are compact basic semi-algebraic sets. x For instance many problems of robust control can be put in this framework; see e.g. their description in [4]. Then in this context we provide a numerical scheme whose novelty with respect to previous works is to avoid discretization of the set Y . Insteadweusethe“joint+marginal”methodologyforparametricpolynomial x optimization developed by the author in [9], to provide a sequence of polynomials (Φ ) ⊂ R[x] (with degree 2d, d ∈ N) that approximate from above the function d Φ(x) := max {g(x,y) : y ∈ Y }, and with the strong property that if d → ∞ y x then Φ → Φ in the L -norm. (In particular, Φ → Φ almost uniformly on X d 1 dℓ for some subsequence (d ), ℓ ∈ N.) Then, ideally, one could solve the nested ℓ sequence of polynomial optimization problems: (1.2) P : f∗ = min{f(x) : Φ (x) ≤ 0}, d = 1,2,... d d d For ﬁxed d, one may approximate (and often solve exactly) (1.2) by solving a hierarchy of semideﬁnite relaxations, as deﬁned in [6]. However, as the size O(dn) of these semideﬁnite relaxations increases very fast with d, in practice one rather let d be ﬁxed, small, and relax the constraint Φ (x) ≤ 0 to Φ (x) ≤ ǫ d d for some scalar ǫ > 0 that one may adjust dynamically during the algorithm. As d increases, the resulting optimal value fǫ is bounded above by f∗ +ǫ. The d approach is illustrated on a sample of small problems taken from the literature. 2. Notation, definitions and preliminary results Let R[x] (resp. R[x,y]) denote the ring of real polynomials in the variables x = (x ,...,x ) (resp. x and y = (y ,...,y )), whereas Σ[x] (resp. Σ[x,y]) 1 n 1 p denote its subset of sums of squares. Let R[y] ⊂ R[y] denote the vector space of real polynomials of degree at most k k. For every α ∈ Nn the notation xα stands for the monomial xα1 ···xαn and 1 n for every d ∈ N, let Nn := {α ∈ Nn : α ≤ d} with cardinal s(d) = n+d . d j j n Similarly Np := {β ∈ Np : β ≤ d} with cardinal p+d . A polynomial d j j P p (cid:0) (cid:1) f ∈ R[x] is written P (cid:0) (cid:1) x 7→ f(x) = f xα, α α∈Nn 2 X and f can be identiﬁed with its vector of coeﬃcients f = (f ) in the canonical α basis. For a real symmetric matrix A the notation A (cid:23) 0 stands for A is positive semideﬁnite. A real sequence z = (z ), α ∈ Nn, has a representing measure if there exists α some ﬁnite Borel measure µ on Rn such that z = xαdµ(x), ∀α ∈ Nn. α Rn Z Given a real sequence z = (z ) deﬁne the linear functional L : R[x] → R by: α z f (= f xα) 7→ L (f) = f z , f ∈ R[x]. α z α α α α X X Moment matrix. The moment matrix associated with a sequence z = (z ), α α ∈ Nn, is the real symmetric matrix M (z) with rows and columns indexed d by Nn, and whose entry (α,β) is just z , for every α,β ∈ Nn. If z has a d α+β d representing measure µ then M (z) (cid:23) 0 because d hf,M (z)fi = f2dµ ≥ 0, ∀f ∈ Rs(d). d Z Localizing matrix. With z as above and g ∈ R[x] (with g(x) = g xγ), the γ γ localizing matrix associated with z and g is the real symmetric matrix M (gz) d withrowsandcolumnsindexedbyNn,andwhoseentry(α,β)isjustP g z , d γ γ α+β+γ for every α,β ∈ Nn. If z has a representing measure µ whose support is contained d P in the set {x : g(x) ≥ 0} then M (gz) (cid:23) 0 because d hf,M (gz)fi = f2gdµ ≥ 0, ∀f ∈ Rs(d). d Z Deﬁnition 2.1 (Archimedean property). A set of polynomials q ∈ R[x], j = j 0,...,p (with q = 1), satisfy the Archimedean property if the quadratic polyno- 0 mial x 7→ M −kxk2 can be written in the form: p M −kxk2 = σ (x)q (x), j j j=0 X for some sums of squares polynomials (σ ) ⊂ Σ[x]. j Of course the Archimedean property implies that the set D := {x ∈ Rn : q (x) ≥ 0, j = 1,...,p} is compact. For instance, it holds whenever the level j set {x : q (x) ≥ 0} is compact for some k ∈ {1,...,p}, or if the q ’s are aﬃne k j and D is compact (hence a polytope). On the other hand, if D is compact then M − kx|2 ≥ 0 for all x ∈ D and some M suﬃciently large. So if one adds the redundant quadratic constraint x 7→ q (x) = M −kxk2 ≥ 0 in the deﬁnition of p+1 D then the Archimedean property holds. Hence it is not a restrictive assumption. 3 Let D := {x ∈ Rn : q (x) ≥ 0, j = 1,...,p}, and given a polynomial h ∈ R[x], j consider the hierarchy of semideﬁnite programs: ρ = min L (h) ℓ z (2.1) z ( s.t. Mℓ(z),Mℓ−vj(qjz) (cid:23) 0, j = 1,...,p, where z = (z ), α ∈ Nn , and v = ⌈(degq )/2⌉, j = 1,...,p. α 2ℓ j j Theorem 2.2 ([6, 8]). Let a family of polynomials (q ) ⊂ R[x] satisfy the j Archimedean property. Then as ℓ → ∞, ρ ↑ h∗ = min {h(x) : x ∈ D}. ℓ x Moreover, if z∗ is an optimal solution of (2.1) and (2.2) rankM (z∗) = rankM (z∗)(=: r) ℓ ℓ−v (where v = max v ) then ρ = h∗ and one may extract r global minimizers j j ℓ x∗ ∈ D, k = 1,...,r. k The size (resp. the number of variables) of the semideﬁnite program (2.1) grows as n+ℓ (resp. as n+2ℓ ) and so becomes rapidly prohibitive, especially n n in view of the present status of available semideﬁnite solvers. Therefore, and (cid:0) (cid:1) (cid:0) (cid:1) even though practice reveals that convergence is fast and often ﬁnite, so far, the above methodology is limited to small to medium size problems (typically, and depending on the degree of the polynomials appearing in the data, problems with up to n ∈ [10,20] variables). However, for larger size problems with sparsity in the data and/or symmetries, adhoc and tractable versions of (2.1) exist. See for instance the sparse version of (2.1) proposed in [12], and whose convergence was proved in [7] when the sparsity pattern satiﬁes the so-called running intersection property. In [12] this technique was shown to be successful on a sample of non convex problems with up to 1000 variables. 3. Main result Let B ⊂ Rn be a simple set like a box or an ellipsoid. Let p ∈ R[x], s = s 1,...,sx, and h ∈ R[x,y], j = 1,...,m, be given polynomials and let X ⊂ Rn j be the basic semi-algebraic set X := {x ∈ Rn : p (x) ≥ 0, s = 1,...,sx}. s Next, for every x ∈ Rn, let Y ⊂ Rp be the basic semi-algebraic set described by: x (3.1) Y = {y ∈ Rp : h (x,y) ≥ 0, j = 1,...,m}, x j and with B ⊇ X, let K ⊂ Rn ×Rp be the set (3.2) K := {(x,y) ∈ Rn+p : x ∈ B; h (x,y) ≥ 0, j = 1,...,m}. j Observe that problem P in (1.1) is equivalent to: (3.3) P : f∗ = min{f(x) : Φ(x) ≤ 0} x∈X (3.4) where Φ(x) = max{g(x,y) : y ∈ Y }, x ∈ B. x y 4 Lemma 3.1. Let K ⊂ Rn+p in (3.2) be compact and assume that for every x ∈ B ⊂ Rn, the set Y deﬁned in (3.1) is nonempty. Then Φ is upper semicon- x tinuous (u.s.c.) on B. Moreover, if there is some compact set Y ⊂ Rp such that Y = Y for every x ∈ B, then Φ is continuous on B. x Proof. Let x0 ∈ B be ﬁxed, arbitrary, and let (xk)k∈N ⊂ B be a sequence that converges to x and such that 0 limsup Φ(x) = lim Φ(x ). k x→x0 k→∞ As K is compact then so is Y for every x ∈ B. Therefore, as Y 6= ∅ for x x all x ∈ B and g is continuous, there exists an optimal solution y∗ ∈ Y for k xk every k. By compactness there exist a subsequence (k ) and y∗ ∈ Rp such that ℓ (x ,y∗ ) → (x ,y∗) ∈ K, as ℓ → ∞. Hence kℓ kℓ 0 limsup Φ(x) = lim Φ(x ) k x→x0 k→∞ = lim g(x ,y∗) = lim g(x ,y∗ ) k→∞ k k ℓ→∞ kℓ kℓ = g(x ,y∗) ≤ Φ(x ), 0 0 which proves that Φ is u.s.c. at x . As x ∈ B was arbitrary, Φ is u.s.c. on B. 0 0 Next, assume that there is some compact set Y ⊂ Rp such that Y = Y for x every x ∈ B. Let x ∈ B be ﬁxed arbitrary with Φ(x ) = g(x ,y∗) for some 0 0 0 0 y∗ ∈ Y. Let (x ) ⊂ B, n ∈ N, be a sequence such that x → x as n → ∞, 0 n n 0 and Φ(x ) ≥ liminfΦ(x) = lim Φ(x ). Again, let y∗ ∈ Y be such that Φ(x ) = 0 n n n x→x0 n→∞ g(x ,y∗), n ∈ N. By compactness, consider an arbitrary converging subsequence n n (n ) ⊂ N, i.e., such that (x ,y∗ ) → (x ,y∗) ∈ K as ℓ → ∞, for some y∗ ∈ Y. ℓ nℓ nℓ 0 Suppose that Φ(x )(= g(x ,y∗)) > g(x ,y∗), say Φ(x ) > g(x ,y∗)+δ for some 0 0 0 0 0 0 δ > 0. By continuity of g, g(x ,y∗ ) < g(x ,y∗)+δ/2 for every ℓ > ℓ (for some nℓ nℓ 0 1 ℓ ). But again, by continuity, |g(x ,y∗)−g(x ,y∗)| < δ/3 whenever ℓ > ℓ (for 1 nℓ 0 0 0 2 some ℓ ). And so we obtain the contradiction 2 Φ(x ) ≥ g(x ,y∗) > Φ(x )−δ/3 nℓ nℓ 0 0 Φ(x ) = g(x ,y∗ ) < Φ(x )−δ/2, nℓ nℓ nℓ 0 whenever ℓ > max[ℓ ,ℓ ]. Therefore, g(x ,y∗) = g(x ,y∗) and so, 1 2 0 0 0 g(x ,y∗) = Φ(x ) = g(x ,y∗) = lim Φ(x ) = liminfΦ(x) ≤ Φ(x ), 0 0 0 0 ℓ→∞ nℓ x→x0 0 which combined with Φ being u.s.c., yields that Φ is continuous at x . (cid:3) 0 We next explain how to • approximate the function x 7→ Φ(x) on B by a polynomial, and • evaluate (or at least approximate) Φ(x) for some given x ∈ B, to check whether Φ(x) ≤ 0. Indeed, these are the two main ingredients of the algorithm that we present later. 5 3.1. Certiﬁcate of Φ(x) ≤ 0. For every x ∈ X ﬁxed, let g ,hx ∈ R[y] be the x j polynomials y 7→ g (y) = g(x,y) and y 7→ hx(y) := h (x,y), j = 1,...,m, and x j j consider the hierarchy of semideﬁnite programs: ρ (x) = max L (g ) ℓ z x (3.5) Q (x) : z ℓ s.t. M (z),M (hxz) (cid:23) 0, j = 1,...,m, ( ℓ ℓ−vj j where z = (z ), β ∈ Np , and v = ⌈(deghx)/2⌉, j = 1,...,m. Obviously one has β 2ℓ j j ρ (x) ≥ Φ(x) for every ℓ, and ℓ Corollary 3.2. Let x ∈ X and assume that the polynomials (hx) ⊂ R[y] satisfy j the Archimedean property. Then: (a) As ℓ → ∞, ρ (x) ↓ Φ(x) = max{g(x,y) : y ∈ Y }. In particular, if ℓ x ρ (x) ≤ 0 for some ℓ, then Φ(x) ≤ 0. ℓ (b) Moreover, if z∗ is an optimal solution of (3.5) that satisﬁes rankM (z∗) = rankM (z∗)(=: r), ℓ ℓ−v (where v := max v ), then ρ (x) = Φ(x) and there are r global maximizers j j ℓ y(k) ∈ Y , k = 1,...,r. x Corollary 3.2 is a direct consequence of Theorem 2.2. 3.2. Approximating the function Φ. Recall that B ⊇ X is a simple set like e.g., a simplex, a box or an ellipsoid and let µ be the ﬁnite Borel probability measure uniformly distributed on B. Therefore, the vector γ = (γ ), α ∈ Nn, of α moments of µ, i.e., γ := xαdµ(x), α ∈ Nn, α ZB canbecomputedeasily. Forinstance, inthesequelweassumethatB = [−1,1]n = {x : θ (x) ≥ 0, i = 1,...n} with θ ∈ R[x,y] being the polynomial (x,y) 7→ i i θ (x,y) := 1−x2, i = 1,...,n. i i Observe that the function Φ is deﬁned in (3.4) via a parametric polynomial optimization problem (with x being the parameter vector). Therefore, following [9], let r = ⌈(degh )/2⌉, j = 1,...,m, and consider the hierarchy of semideﬁnite j j relaxations indexed by d ∈ N: ρ = max L (g) d z z s.t. M (z),M (h z) (cid:23) 0, j = 1,...,m (3.6)  d d−rj j  Md−1(θiz) (cid:23) 0, i = 1,...,n  L (xα) = γ , α ∈ Nn , z α 2d 6    where the sequence z is now indexed in Nn+p, i.e., z = (z ), (α,β) ∈ Nn+p. 2d αβ 2d Writing g ≡ 1, the dual of the semideﬁnite program (3.6) reads 0 (3.7) ρ∗ = min q(x)dµ(x) d  q,σj,θi ZB m n  s.t. q(x)−g(x,y) = σj(x,y)hj(x,y)+ ψi(x,y)θi(x,y)    j=0 i=1  q ∈ R[x] , σ ,ψ X∈ Σ[x,y] X 2d j i degσ h ≤ 2d, j = 0,...,m. j j   degψiθi ≤ 2d, i = 1,...,n.   It turns out that any optimal solution of the semideﬁnite program (3.7) permits  to approximate Φ in a strong sense. Theorem 3.3 ([9]). Let K ⊂ Rn+p in (3.2) be compact. Assume that the polyno- mials h ,θ ∈ R[x,y] satisfy the Archimedean property and assume that for every j i x ∈ B, the set Y deﬁned in (3.1) is nonempty. Let Φ ∈ R[x] be an optimal x d 2d solution of (3.7). Then : (a) Φ ≥ Φ and as d → ∞, d (3.8) (Φ (x)−Φ(x))dµ(x) = |Φ (x)−Φ(x)|dµ(x) → 0, d d ZB ZB that is, Φ → Φ for the L (B,µ)-norm1. d 1 (b) There is a subsequence (d ), ℓ ∈ N, such that Φ → Φ, µ-almost uniformly2 ℓ dℓ in B, as ℓ → ∞. The proof of (a) can be found in [9], whereas (b) follows from (a) and [1, The- orem 2.5.3]. 3.3. An algorithm. The idea behind the algorithm is to approximate P in (1.1) with the polynomial optimization problem: (Pǫ): d (3.9) Pǫ : fǫ = min{f(x) : Φ (x) ≤ ǫ}, d = 1,2,... d d d x∈X with d ∈ N,ǫ > 0 ﬁxed, and Φ as in Theorem 3.3, for every d = 1,.... d Obviously, for ǫ = 0 one has f0 ≥ f∗ for all d because by deﬁnition Φ ≥ Φ d d for every d ∈ N. However, it may happen that P0 has no solution. Next, if x∗ d is an optimal solution of P and Φ(x∗) < 0, it may also happen that Φ (x∗) > 0 d if d is not large enough. This is why one needs to relax the constraint Φ ≤ 0 to Φ ≤ ǫ for some ǫ > 0. However, in view of Theorem 3.3, one expects that d fǫ ≈ f∗ provided that d and ǫ are suﬃciently large and small, respectively. And d indeed: 1L1(B,µ) is the Banach space of µ-integrable functions on B, with norm kfk= B|f|dµ. 2If one ﬁxes ǫ >0 arbitrary then there is some A∈B(B) such that µ(A) <ǫ and Φ →Φ R dℓ uniformly on B\A, as ℓ→∞. 7 Theorem 3.4. Assume that X is the closure of an open set. Let ǫ ≥ 0 be ﬁxed, arbitrary and with fǫ be as in (3.9), let xǫ ∈ X be any optimal solution of (3.9) d d (including the case where ǫ = 0), and let f˜ǫ := min{fǫ : ℓ = 1,...,d} = f(xǫ ) for some ℓ(d) ∈ {1,...,d}. d ℓ ℓ(d) (a) If ǫ > 0 there exists d ∈ N such that for every d ≥ d , f(xǫ ) < f∗ +ǫ. ǫ ǫ ℓ(d) (b) If there is an optimal solution x∗ ∈ X of (1.1) such that Φ(x∗) < 0, then there exists d ∈ N such that for every d ≥ d , f∗ ≤ f(x0 ) < f∗ +ǫ. 0 0 ℓ(d) Proof. (a) With ǫ > 0 ﬁxed, arbitrary, let x∗ ∈ X be such that Φ(x∗) ≤ 0 and ǫ ǫ f(x∗) < f∗ + ǫ/2. We may assume that x∗ is not on the boundary of X. Let ǫ ǫ O1 := {x ∈ intX : Φ(x) < ǫ/2} which is an open set because Φ is u.s.c. (by ǫ Lemma 3.1), and so µ(O1) > 0. Next, as f is continuous, there exists ρ > 0 ǫ 0 such that f < f∗ +ǫ whenever x ∈ O2 := {x ∈ intX : kx−x∗k < ρ }. Observe ǫ ǫ 0 that ρ := µ(O1 ∩O2) > 0 because O1 ∩O2 is an open set (with x∗ ∈ O1 ∩O2). ǫ ǫ ǫ ǫ ǫ ǫ ǫ Next, by Theorem 3.3(b), there is a subsequence (d ), ℓ ∈ N, such that Φ → Φ, ℓ dℓ µ-almost uniformly on B. Hence, there is some Borel set A ⊂ B, and integer ǫ ℓ ∈ N, such that µ(A ) < ρ/2 and sup |Φ(x)−Φ (x)| < ǫ/2 for all ℓ ≥ ℓ . In ǫ ǫ dℓ ǫ x∈X\Aǫ particular, as µ(A ) < ρ/2 < µ(O1∩O2), the set ∆ := (O1∩O2)\A has positive ǫ ǫ ǫ ǫ ǫ ǫ ǫ µ-measure. Therefore, f(x) < f∗+ǫ and Φ (x) < ǫ whenever ℓ ≥ ℓ and x ∈ ∆ , dℓ ǫ ǫ which in turn implies fǫ < f∗ +ǫ, and consequently, f˜ǫ = f(xǫ ) < f∗ +ǫ, the dℓ d ℓ(d) desired result. (b) Let ǫ′ := −Φ(x∗), and let O1 := {x ∈ intX : Φ(x) < −ǫ′/2} which is a ǫ′ nonempty open set because it contains x∗ and Φ is u.s.c.. Let O2 be as O2 in ǫ′ ǫ the proof of (a), but now with x∗ = x∗ ∈ X. Both O1 and O2 are open and ǫ′ ǫ′ ǫ′ nonempty because they contain x∗. The rest of the proof is like for the proof of (a), but noticing that now for every x ∈ ∆ǫ′ one has Φdℓ(x) < −ǫ′/2+ǫ′/2 = 0, and so x is feasible for (3.9) with ǫ = 0. Next, by feasiblity f(x) ≥ f∗ since the resulting feasible set in (3.9) is smaller than that of (1.2) because Φ ≥ Φ, for d all d. And so f∗ ≤ f(x) < f∗ +ǫ whenever x ∈ ∆ , and ℓ ≥ ℓ , from which (b) ǫ ǫ (cid:3) follows. Theorem 3.4 provides a rationale behind the algorithm that we present below. In solving (3.9) with d suﬃciently large and small ǫ (or even ǫ = 0), fǫ would d provide a good approximation of f∗. But in principle, computing the global optimum fǫ is still a diﬃcult problem. However, Pǫ is a polynomial optimization d d problem. Therefore, by Theorem 2.2, if the polynomials (p ) ⊂ R[x] that deﬁne s X satisfy the Archimedean property (see Deﬁnition 2.1) we can approximate fǫ d from below, as closely as desired, by a monotone sequence (fǫ ), t ∈ N, obtained dt 8 by solving the hierarchy of semideﬁnite relaxations (2.1), which here read: fǫ = min L (f) dt z z (3.10) s.t. M (z),M (ǫ−Φ z) (cid:23) 0  t t−d d  Mt−ts(psz) (cid:23) 0, s = 1,...,sx, where t = ⌈(degp )/2⌉, s = 1,...,sx. s  s Corollary 3.5. Assume that the polynomials (p ) ⊂ R[x] satisfy the Archimedean s property. Then fǫ ↑ fǫ as t → ∞. Moreover, if z∗ is an optimal solution of (3.10) dt d and (3.11) rankM (z∗) = rankM (z∗)(=: r) t t−t0 (where t := max[d,max [t ]]) then fǫ = fǫ and one may extract r global mini- 0 s s dt d mizers x∗(k) ∈ X, k = 1,...,r. That is, for every k = 1,...,r, f(x∗(k)) = fǫ d d d and Φ (x∗(k)) ≤ ǫ. d d However, givenaminimizer x∗ ∈ X,ifontheonehandΦ (x∗) ≤ ǫ, ontheother d d d hand it may not satisfy Φ(x∗) ≤ 0. (Recall that checking whether Φ(x∗) ≤ 0 can d d be done via solving the hierarchy of relaxations Q (x) in (3.5) with x := x∗.) If ℓ d this happens then one solves again (3.10) for a smaller value of ǫ, etc., until one obtains some x∗ ∈ X with Φ(x∗) ≤ 0. d d Finally, andasalreadymentioned, ifdisrelativelylarge, thesizeofsemideﬁnite relaxations (3.10) to compute fǫ becomes too large for practical implementation dt (as one must have t ≥ d). So in practice one let d be ﬁxed at a small value, typically the smallest possible value of d, i.e., 1 (Φ is quadratic) or 2 (Φ is d d quartic)), and one updates ǫ as indicated above. So the resulting algorithm reads: Algorithm. Input: ℓ,d,k∗ ∈ N, ǫ > 0 (e.g. ǫ := 10−1), d ∈ N, x˜ := ⋆, f(⋆) = +∞. 0 0 Output: f(x∗) with x∗ ∈ X and Φ(x∗) ≤ 0. d d d Step 1: Set k = 1 and ǫ(k) = 1. Step 2: While k ≤ k∗, solve Pǫ(k) in (3.9) → x∗ ∈ X. d k Step 3: Solve Q (x∗) in (3.5) → ρ (x∗). ℓ k ℓ k If −ǫ ≤ ρ (x∗) ≤ 0 set x∗ := x∗ and STOP. 0 ℓ k d k If ρ (x∗) < −ǫ then: ℓ k 0 • if f(x˜) > f(x∗) then set x˜ := x∗. If k = k∗ then x∗ := x∗. k k d k • set ǫ(k +1) := 2ǫ(k), k := k +1 and go to Step 2. If ρ (x∗) > 0 then: ℓ k • If k < k∗ set ǫ(k +1) := ǫ(k)/2, k := k +1 and go to Step 2. • If k = k∗ then set set x∗ = x˜. d Observe that in Step 2 of the above algorithm, one assumes that by solving Pǫ(k) one obtains x∗ ∈ X. d k 9 3.4. Numerical experiments. We have taken Examples 2,7,9, K, M, N, all from Bhattacharjee et al. [2, Appendix A] and whose data are polynomials, ex- cept for problem L. For the latter problem, the non-polynomial function x 7→ min[0,(x −x )] is semi-algebraic and can be generated by introducing an addi- 1 2 tional variable x , with the polynomial constraints: 3 x2 = (x −x )2; x ≥ 0. 3 1 2 3 Indeed, 2min[0,(x −x )] = x −x −x . 1 2 1 2 3 Although these examples are quite small, they are still non trivial (and even diﬃcult) to solve, and we wanted to test the above methodology with small relaxation order d. In fact we have even considered the smallest possible d, i.e., d = 1 (Φ is quadratic). Results in Table 1 are quite good since by using the d semideﬁnite relaxation of minimal order “d” one obtains an optimal value f∗ d quite close to f∗, at the price of updating ǫ several times. Next, for Problem L, if we now increase d to d = 2, we improve the opti- mal value which becomes f∗ = 0.3849 with ǫ = 2.2. However, for Problem M, d increasing d does not improve the optimal value. 10

An algorithm for semi-infinite polynomial optimization PDF

0.17 MB

English

by Jean Lasserre

#additional_collections #journals #arxiv

Checking for file health...

Preview An algorithm for semi-infinite polynomial optimization

AN ALGORITHM FOR SEMI-INFINITE POLYNOMIAL OPTIMIZATION J.B. LASSERRE 1 1 Abstract. We consider the semi-inﬁnite optimization problem: 0 2 f∗ :=min{f(x):g(x,y) ≤ 0, ∀y∈Yx}, x∈X n a where f,g are polynomials and X ⊂ Rn as well as Yx ⊂ Rp, x ∈ X, are J compactbasicsemi-algebraicsets. Toapproximatef∗ weproceedintwosteps. 1 First, we use the “joint+marginal”approach of the author [9] to approximate 2 from above the function x 7→ Φ(x) = sup{g(x,y) : y ∈ Yx} by a polynomial Φd ≥ Φ, of degree at most 2d, with the strong property that Φd converges to ] C Φ for the L1-norm, as d → ∞ (and in particular, almost uniformly for some subsequence (d ), ℓ∈N). Therefore,to approximatef∗ one may wishto solve O ℓ . the polynomial optimization problem fd0 = minx∈X{f(x) : Φd(x) ≤ 0} via h a (by now standard) hierarchy of semideﬁnite relaxations, and for increasing at values ofd. Inpractice dis ﬁxed, small,andone relaxesthe constraintΦd ≤0 m to Φ (x)≤ǫ with ǫ>0, allowingto changeǫ dynamically. As dincreases,the d limit of the optimal value fǫ is bounded above by f∗+ǫ. [ d 1 v 2 2 1. Introduction 1 4 Consider the semi-inﬁnite optimization problem: . 1 (1.1) P : f∗ := min{f(x) : g(x,y) ≤ 0, ∀y ∈ Y }, 0 x x∈X 1 1 where X ⊂ Rn, Y ⊂ Rp for every x ∈ X, and some functions f : Rn → R, x v: g : Rn ×Rp :→ R. Xi Problem P is called a semi-inﬁnite optimization problem because of the inﬁn- r itely many constraints g(x,y) ≤ 0 for all y ∈ Yx (for each ﬁxed x ∈ X). It has a many applications and particularly in robust control. In full generality P is a very hard problem and most methods aiming at com- puting (or at least approximating) f∗ use discretization to overcome the diﬃcult semi-inﬁnite constraint g(x,y) ≤ 0 for all y ∈ Y . Namely, in typical approaches x where Y ≡ Y for all x ∈ X (i.e. no dependence on x), the set Y ⊂ Rp is dis- x cretized on a ﬁnite grid and if the resulting nonlinear programming problems Key words and phrases. Polynomial optimization; min-max optimization; robust optimiza- tion; semideﬁnite relaxations. This work was performed during a visit in November 2010 at CRM (Centre de Recerca Matematica),aMathematicscenterfromUAB(UniversidadAutonomadeBarcelona),andthe author wishes to gratefully acknowledge ﬁnancial support from CRM. 1 are solved to global optimality, then convergence to a global optimum of the semi-inﬁnite problem occurs as the grid size vanishes (see e.g. the discussion and the many references in [10]). Alternatively, in [10] the authors provide lower bounds on f∗ by discretizing Y and upper bounds via convex relaxations of the inner problem max {g(x,y)} ≤ 0. In [11] the authors also use a discretiza- y∈Y tion scheme of Y but now combined with a hierarchy of sum of squares convex relaxations for solving to global optimality. Contribution. We restrict ourselves to problem P where : • f,g are polynomials, and • X ⊂ Rn and Y ⊂ Rp, x ∈ X, are compact basic semi-algebraic sets. x For instance many problems of robust control can be put in this framework; see e.g. their description in [4]. Then in this context we provide a numerical scheme whose novelty with respect to previous works is to avoid discretization of the set Y . Insteadweusethe“joint+marginal”methodologyforparametricpolynomial x optimization developed by the author in [9], to provide a sequence of polynomials (Φ ) ⊂ R[x] (with degree 2d, d ∈ N) that approximate from above the function d Φ(x) := max {g(x,y) : y ∈ Y }, and with the strong property that if d → ∞ y x then Φ → Φ in the L -norm. (In particular, Φ → Φ almost uniformly on X d 1 dℓ for some subsequence (d ), ℓ ∈ N.) Then, ideally, one could solve the nested ℓ sequence of polynomial optimization problems: (1.2) P : f∗ = min{f(x) : Φ (x) ≤ 0}, d = 1,2,... d d d For ﬁxed d, one may approximate (and often solve exactly) (1.2) by solving a hierarchy of semideﬁnite relaxations, as deﬁned in [6]. However, as the size O(dn) of these semideﬁnite relaxations increases very fast with d, in practice one rather let d be ﬁxed, small, and relax the constraint Φ (x) ≤ 0 to Φ (x) ≤ ǫ d d for some scalar ǫ > 0 that one may adjust dynamically during the algorithm. As d increases, the resulting optimal value fǫ is bounded above by f∗ +ǫ. The d approach is illustrated on a sample of small problems taken from the literature. 2. Notation, definitions and preliminary results Let R[x] (resp. R[x,y]) denote the ring of real polynomials in the variables x = (x ,...,x ) (resp. x and y = (y ,...,y )), whereas Σ[x] (resp. Σ[x,y]) 1 n 1 p denote its subset of sums of squares. Let R[y] ⊂ R[y] denote the vector space of real polynomials of degree at most k k. For every α ∈ Nn the notation xα stands for the monomial xα1 ···xαn and 1 n for every d ∈ N, let Nn := {α ∈ Nn : α ≤ d} with cardinal s(d) = n+d . d j j n Similarly Np := {β ∈ Np : β ≤ d} with cardinal p+d . A polynomial d j j P p (cid:0) (cid:1) f ∈ R[x] is written P (cid:0) (cid:1) x 7→ f(x) = f xα, α α∈Nn 2 X and f can be identiﬁed with its vector of coeﬃcients f = (f ) in the canonical α basis. For a real symmetric matrix A the notation A (cid:23) 0 stands for A is positive semideﬁnite. A real sequence z = (z ), α ∈ Nn, has a representing measure if there exists α some ﬁnite Borel measure µ on Rn such that z = xαdµ(x), ∀α ∈ Nn. α Rn Z Given a real sequence z = (z ) deﬁne the linear functional L : R[x] → R by: α z f (= f xα) 7→ L (f) = f z , f ∈ R[x]. α z α α α α X X Moment matrix. The moment matrix associated with a sequence z = (z ), α α ∈ Nn, is the real symmetric matrix M (z) with rows and columns indexed d by Nn, and whose entry (α,β) is just z , for every α,β ∈ Nn. If z has a d α+β d representing measure µ then M (z) (cid:23) 0 because d hf,M (z)fi = f2dµ ≥ 0, ∀f ∈ Rs(d). d Z Localizing matrix. With z as above and g ∈ R[x] (with g(x) = g xγ), the γ γ localizing matrix associated with z and g is the real symmetric matrix M (gz) d withrowsandcolumnsindexedbyNn,andwhoseentry(α,β)isjustP g z , d γ γ α+β+γ for every α,β ∈ Nn. If z has a representing measure µ whose support is contained d P in the set {x : g(x) ≥ 0} then M (gz) (cid:23) 0 because d hf,M (gz)fi = f2gdµ ≥ 0, ∀f ∈ Rs(d). d Z Deﬁnition 2.1 (Archimedean property). A set of polynomials q ∈ R[x], j = j 0,...,p (with q = 1), satisfy the Archimedean property if the quadratic polyno- 0 mial x 7→ M −kxk2 can be written in the form: p M −kxk2 = σ (x)q (x), j j j=0 X for some sums of squares polynomials (σ ) ⊂ Σ[x]. j Of course the Archimedean property implies that the set D := {x ∈ Rn : q (x) ≥ 0, j = 1,...,p} is compact. For instance, it holds whenever the level j set {x : q (x) ≥ 0} is compact for some k ∈ {1,...,p}, or if the q ’s are aﬃne k j and D is compact (hence a polytope). On the other hand, if D is compact then M − kx|2 ≥ 0 for all x ∈ D and some M suﬃciently large. So if one adds the redundant quadratic constraint x 7→ q (x) = M −kxk2 ≥ 0 in the deﬁnition of p+1 D then the Archimedean property holds. Hence it is not a restrictive assumption. 3 Let D := {x ∈ Rn : q (x) ≥ 0, j = 1,...,p}, and given a polynomial h ∈ R[x], j consider the hierarchy of semideﬁnite programs: ρ = min L (h) ℓ z (2.1) z ( s.t. Mℓ(z),Mℓ−vj(qjz) (cid:23) 0, j = 1,...,p, where z = (z ), α ∈ Nn , and v = ⌈(degq )/2⌉, j = 1,...,p. α 2ℓ j j Theorem 2.2 ([6, 8]). Let a family of polynomials (q ) ⊂ R[x] satisfy the j Archimedean property. Then as ℓ → ∞, ρ ↑ h∗ = min {h(x) : x ∈ D}. ℓ x Moreover, if z∗ is an optimal solution of (2.1) and (2.2) rankM (z∗) = rankM (z∗)(=: r) ℓ ℓ−v (where v = max v ) then ρ = h∗ and one may extract r global minimizers j j ℓ x∗ ∈ D, k = 1,...,r. k The size (resp. the number of variables) of the semideﬁnite program (2.1) grows as n+ℓ (resp. as n+2ℓ ) and so becomes rapidly prohibitive, especially n n in view of the present status of available semideﬁnite solvers. Therefore, and (cid:0) (cid:1) (cid:0) (cid:1) even though practice reveals that convergence is fast and often ﬁnite, so far, the above methodology is limited to small to medium size problems (typically, and depending on the degree of the polynomials appearing in the data, problems with up to n ∈ [10,20] variables). However, for larger size problems with sparsity in the data and/or symmetries, adhoc and tractable versions of (2.1) exist. See for instance the sparse version of (2.1) proposed in [12], and whose convergence was proved in [7] when the sparsity pattern satiﬁes the so-called running intersection property. In [12] this technique was shown to be successful on a sample of non convex problems with up to 1000 variables. 3. Main result Let B ⊂ Rn be a simple set like a box or an ellipsoid. Let p ∈ R[x], s = s 1,...,sx, and h ∈ R[x,y], j = 1,...,m, be given polynomials and let X ⊂ Rn j be the basic semi-algebraic set X := {x ∈ Rn : p (x) ≥ 0, s = 1,...,sx}. s Next, for every x ∈ Rn, let Y ⊂ Rp be the basic semi-algebraic set described by: x (3.1) Y = {y ∈ Rp : h (x,y) ≥ 0, j = 1,...,m}, x j and with B ⊇ X, let K ⊂ Rn ×Rp be the set (3.2) K := {(x,y) ∈ Rn+p : x ∈ B; h (x,y) ≥ 0, j = 1,...,m}. j Observe that problem P in (1.1) is equivalent to: (3.3) P : f∗ = min{f(x) : Φ(x) ≤ 0} x∈X (3.4) where Φ(x) = max{g(x,y) : y ∈ Y }, x ∈ B. x y 4 Lemma 3.1. Let K ⊂ Rn+p in (3.2) be compact and assume that for every x ∈ B ⊂ Rn, the set Y deﬁned in (3.1) is nonempty. Then Φ is upper semicon- x tinuous (u.s.c.) on B. Moreover, if there is some compact set Y ⊂ Rp such that Y = Y for every x ∈ B, then Φ is continuous on B. x Proof. Let x0 ∈ B be ﬁxed, arbitrary, and let (xk)k∈N ⊂ B be a sequence that converges to x and such that 0 limsup Φ(x) = lim Φ(x ). k x→x0 k→∞ As K is compact then so is Y for every x ∈ B. Therefore, as Y 6= ∅ for x x all x ∈ B and g is continuous, there exists an optimal solution y∗ ∈ Y for k xk every k. By compactness there exist a subsequence (k ) and y∗ ∈ Rp such that ℓ (x ,y∗ ) → (x ,y∗) ∈ K, as ℓ → ∞. Hence kℓ kℓ 0 limsup Φ(x) = lim Φ(x ) k x→x0 k→∞ = lim g(x ,y∗) = lim g(x ,y∗ ) k→∞ k k ℓ→∞ kℓ kℓ = g(x ,y∗) ≤ Φ(x ), 0 0 which proves that Φ is u.s.c. at x . As x ∈ B was arbitrary, Φ is u.s.c. on B. 0 0 Next, assume that there is some compact set Y ⊂ Rp such that Y = Y for x every x ∈ B. Let x ∈ B be ﬁxed arbitrary with Φ(x ) = g(x ,y∗) for some 0 0 0 0 y∗ ∈ Y. Let (x ) ⊂ B, n ∈ N, be a sequence such that x → x as n → ∞, 0 n n 0 and Φ(x ) ≥ liminfΦ(x) = lim Φ(x ). Again, let y∗ ∈ Y be such that Φ(x ) = 0 n n n x→x0 n→∞ g(x ,y∗), n ∈ N. By compactness, consider an arbitrary converging subsequence n n (n ) ⊂ N, i.e., such that (x ,y∗ ) → (x ,y∗) ∈ K as ℓ → ∞, for some y∗ ∈ Y. ℓ nℓ nℓ 0 Suppose that Φ(x )(= g(x ,y∗)) > g(x ,y∗), say Φ(x ) > g(x ,y∗)+δ for some 0 0 0 0 0 0 δ > 0. By continuity of g, g(x ,y∗ ) < g(x ,y∗)+δ/2 for every ℓ > ℓ (for some nℓ nℓ 0 1 ℓ ). But again, by continuity, |g(x ,y∗)−g(x ,y∗)| < δ/3 whenever ℓ > ℓ (for 1 nℓ 0 0 0 2 some ℓ ). And so we obtain the contradiction 2 Φ(x ) ≥ g(x ,y∗) > Φ(x )−δ/3 nℓ nℓ 0 0 Φ(x ) = g(x ,y∗ ) < Φ(x )−δ/2, nℓ nℓ nℓ 0 whenever ℓ > max[ℓ ,ℓ ]. Therefore, g(x ,y∗) = g(x ,y∗) and so, 1 2 0 0 0 g(x ,y∗) = Φ(x ) = g(x ,y∗) = lim Φ(x ) = liminfΦ(x) ≤ Φ(x ), 0 0 0 0 ℓ→∞ nℓ x→x0 0 which combined with Φ being u.s.c., yields that Φ is continuous at x . (cid:3) 0 We next explain how to • approximate the function x 7→ Φ(x) on B by a polynomial, and • evaluate (or at least approximate) Φ(x) for some given x ∈ B, to check whether Φ(x) ≤ 0. Indeed, these are the two main ingredients of the algorithm that we present later. 5 3.1. Certiﬁcate of Φ(x) ≤ 0. For every x ∈ X ﬁxed, let g ,hx ∈ R[y] be the x j polynomials y 7→ g (y) = g(x,y) and y 7→ hx(y) := h (x,y), j = 1,...,m, and x j j consider the hierarchy of semideﬁnite programs: ρ (x) = max L (g ) ℓ z x (3.5) Q (x) : z ℓ s.t. M (z),M (hxz) (cid:23) 0, j = 1,...,m, ( ℓ ℓ−vj j where z = (z ), β ∈ Np , and v = ⌈(deghx)/2⌉, j = 1,...,m. Obviously one has β 2ℓ j j ρ (x) ≥ Φ(x) for every ℓ, and ℓ Corollary 3.2. Let x ∈ X and assume that the polynomials (hx) ⊂ R[y] satisfy j the Archimedean property. Then: (a) As ℓ → ∞, ρ (x) ↓ Φ(x) = max{g(x,y) : y ∈ Y }. In particular, if ℓ x ρ (x) ≤ 0 for some ℓ, then Φ(x) ≤ 0. ℓ (b) Moreover, if z∗ is an optimal solution of (3.5) that satisﬁes rankM (z∗) = rankM (z∗)(=: r), ℓ ℓ−v (where v := max v ), then ρ (x) = Φ(x) and there are r global maximizers j j ℓ y(k) ∈ Y , k = 1,...,r. x Corollary 3.2 is a direct consequence of Theorem 2.2. 3.2. Approximating the function Φ. Recall that B ⊇ X is a simple set like e.g., a simplex, a box or an ellipsoid and let µ be the ﬁnite Borel probability measure uniformly distributed on B. Therefore, the vector γ = (γ ), α ∈ Nn, of α moments of µ, i.e., γ := xαdµ(x), α ∈ Nn, α ZB canbecomputedeasily. Forinstance, inthesequelweassumethatB = [−1,1]n = {x : θ (x) ≥ 0, i = 1,...n} with θ ∈ R[x,y] being the polynomial (x,y) 7→ i i θ (x,y) := 1−x2, i = 1,...,n. i i Observe that the function Φ is deﬁned in (3.4) via a parametric polynomial optimization problem (with x being the parameter vector). Therefore, following [9], let r = ⌈(degh )/2⌉, j = 1,...,m, and consider the hierarchy of semideﬁnite j j relaxations indexed by d ∈ N: ρ = max L (g) d z z s.t. M (z),M (h z) (cid:23) 0, j = 1,...,m (3.6)  d d−rj j  Md−1(θiz) (cid:23) 0, i = 1,...,n  L (xα) = γ , α ∈ Nn , z α 2d 6    where the sequence z is now indexed in Nn+p, i.e., z = (z ), (α,β) ∈ Nn+p. 2d αβ 2d Writing g ≡ 1, the dual of the semideﬁnite program (3.6) reads 0 (3.7) ρ∗ = min q(x)dµ(x) d  q,σj,θi ZB m n  s.t. q(x)−g(x,y) = σj(x,y)hj(x,y)+ ψi(x,y)θi(x,y)    j=0 i=1  q ∈ R[x] , σ ,ψ X∈ Σ[x,y] X 2d j i degσ h ≤ 2d, j = 0,...,m. j j   degψiθi ≤ 2d, i = 1,...,n.   It turns out that any optimal solution of the semideﬁnite program (3.7) permits  to approximate Φ in a strong sense. Theorem 3.3 ([9]). Let K ⊂ Rn+p in (3.2) be compact. Assume that the polyno- mials h ,θ ∈ R[x,y] satisfy the Archimedean property and assume that for every j i x ∈ B, the set Y deﬁned in (3.1) is nonempty. Let Φ ∈ R[x] be an optimal x d 2d solution of (3.7). Then : (a) Φ ≥ Φ and as d → ∞, d (3.8) (Φ (x)−Φ(x))dµ(x) = |Φ (x)−Φ(x)|dµ(x) → 0, d d ZB ZB that is, Φ → Φ for the L (B,µ)-norm1. d 1 (b) There is a subsequence (d ), ℓ ∈ N, such that Φ → Φ, µ-almost uniformly2 ℓ dℓ in B, as ℓ → ∞. The proof of (a) can be found in [9], whereas (b) follows from (a) and [1, The- orem 2.5.3]. 3.3. An algorithm. The idea behind the algorithm is to approximate P in (1.1) with the polynomial optimization problem: (Pǫ): d (3.9) Pǫ : fǫ = min{f(x) : Φ (x) ≤ ǫ}, d = 1,2,... d d d x∈X with d ∈ N,ǫ > 0 ﬁxed, and Φ as in Theorem 3.3, for every d = 1,.... d Obviously, for ǫ = 0 one has f0 ≥ f∗ for all d because by deﬁnition Φ ≥ Φ d d for every d ∈ N. However, it may happen that P0 has no solution. Next, if x∗ d is an optimal solution of P and Φ(x∗) < 0, it may also happen that Φ (x∗) > 0 d if d is not large enough. This is why one needs to relax the constraint Φ ≤ 0 to Φ ≤ ǫ for some ǫ > 0. However, in view of Theorem 3.3, one expects that d fǫ ≈ f∗ provided that d and ǫ are suﬃciently large and small, respectively. And d indeed: 1L1(B,µ) is the Banach space of µ-integrable functions on B, with norm kfk= B|f|dµ. 2If one ﬁxes ǫ >0 arbitrary then there is some A∈B(B) such that µ(A) <ǫ and Φ →Φ R dℓ uniformly on B\A, as ℓ→∞. 7 Theorem 3.4. Assume that X is the closure of an open set. Let ǫ ≥ 0 be ﬁxed, arbitrary and with fǫ be as in (3.9), let xǫ ∈ X be any optimal solution of (3.9) d d (including the case where ǫ = 0), and let f˜ǫ := min{fǫ : ℓ = 1,...,d} = f(xǫ ) for some ℓ(d) ∈ {1,...,d}. d ℓ ℓ(d) (a) If ǫ > 0 there exists d ∈ N such that for every d ≥ d , f(xǫ ) < f∗ +ǫ. ǫ ǫ ℓ(d) (b) If there is an optimal solution x∗ ∈ X of (1.1) such that Φ(x∗) < 0, then there exists d ∈ N such that for every d ≥ d , f∗ ≤ f(x0 ) < f∗ +ǫ. 0 0 ℓ(d) Proof. (a) With ǫ > 0 ﬁxed, arbitrary, let x∗ ∈ X be such that Φ(x∗) ≤ 0 and ǫ ǫ f(x∗) < f∗ + ǫ/2. We may assume that x∗ is not on the boundary of X. Let ǫ ǫ O1 := {x ∈ intX : Φ(x) < ǫ/2} which is an open set because Φ is u.s.c. (by ǫ Lemma 3.1), and so µ(O1) > 0. Next, as f is continuous, there exists ρ > 0 ǫ 0 such that f < f∗ +ǫ whenever x ∈ O2 := {x ∈ intX : kx−x∗k < ρ }. Observe ǫ ǫ 0 that ρ := µ(O1 ∩O2) > 0 because O1 ∩O2 is an open set (with x∗ ∈ O1 ∩O2). ǫ ǫ ǫ ǫ ǫ ǫ ǫ Next, by Theorem 3.3(b), there is a subsequence (d ), ℓ ∈ N, such that Φ → Φ, ℓ dℓ µ-almost uniformly on B. Hence, there is some Borel set A ⊂ B, and integer ǫ ℓ ∈ N, such that µ(A ) < ρ/2 and sup |Φ(x)−Φ (x)| < ǫ/2 for all ℓ ≥ ℓ . In ǫ ǫ dℓ ǫ x∈X\Aǫ particular, as µ(A ) < ρ/2 < µ(O1∩O2), the set ∆ := (O1∩O2)\A has positive ǫ ǫ ǫ ǫ ǫ ǫ ǫ µ-measure. Therefore, f(x) < f∗+ǫ and Φ (x) < ǫ whenever ℓ ≥ ℓ and x ∈ ∆ , dℓ ǫ ǫ which in turn implies fǫ < f∗ +ǫ, and consequently, f˜ǫ = f(xǫ ) < f∗ +ǫ, the dℓ d ℓ(d) desired result. (b) Let ǫ′ := −Φ(x∗), and let O1 := {x ∈ intX : Φ(x) < −ǫ′/2} which is a ǫ′ nonempty open set because it contains x∗ and Φ is u.s.c.. Let O2 be as O2 in ǫ′ ǫ the proof of (a), but now with x∗ = x∗ ∈ X. Both O1 and O2 are open and ǫ′ ǫ′ ǫ′ nonempty because they contain x∗. The rest of the proof is like for the proof of (a), but noticing that now for every x ∈ ∆ǫ′ one has Φdℓ(x) < −ǫ′/2+ǫ′/2 = 0, and so x is feasible for (3.9) with ǫ = 0. Next, by feasiblity f(x) ≥ f∗ since the resulting feasible set in (3.9) is smaller than that of (1.2) because Φ ≥ Φ, for d all d. And so f∗ ≤ f(x) < f∗ +ǫ whenever x ∈ ∆ , and ℓ ≥ ℓ , from which (b) ǫ ǫ (cid:3) follows. Theorem 3.4 provides a rationale behind the algorithm that we present below. In solving (3.9) with d suﬃciently large and small ǫ (or even ǫ = 0), fǫ would d provide a good approximation of f∗. But in principle, computing the global optimum fǫ is still a diﬃcult problem. However, Pǫ is a polynomial optimization d d problem. Therefore, by Theorem 2.2, if the polynomials (p ) ⊂ R[x] that deﬁne s X satisfy the Archimedean property (see Deﬁnition 2.1) we can approximate fǫ d from below, as closely as desired, by a monotone sequence (fǫ ), t ∈ N, obtained dt 8 by solving the hierarchy of semideﬁnite relaxations (2.1), which here read: fǫ = min L (f) dt z z (3.10) s.t. M (z),M (ǫ−Φ z) (cid:23) 0  t t−d d  Mt−ts(psz) (cid:23) 0, s = 1,...,sx, where t = ⌈(degp )/2⌉, s = 1,...,sx. s  s Corollary 3.5. Assume that the polynomials (p ) ⊂ R[x] satisfy the Archimedean s property. Then fǫ ↑ fǫ as t → ∞. Moreover, if z∗ is an optimal solution of (3.10) dt d and (3.11) rankM (z∗) = rankM (z∗)(=: r) t t−t0 (where t := max[d,max [t ]]) then fǫ = fǫ and one may extract r global mini- 0 s s dt d mizers x∗(k) ∈ X, k = 1,...,r. That is, for every k = 1,...,r, f(x∗(k)) = fǫ d d d and Φ (x∗(k)) ≤ ǫ. d d However, givenaminimizer x∗ ∈ X,ifontheonehandΦ (x∗) ≤ ǫ, ontheother d d d hand it may not satisfy Φ(x∗) ≤ 0. (Recall that checking whether Φ(x∗) ≤ 0 can d d be done via solving the hierarchy of relaxations Q (x) in (3.5) with x := x∗.) If ℓ d this happens then one solves again (3.10) for a smaller value of ǫ, etc., until one obtains some x∗ ∈ X with Φ(x∗) ≤ 0. d d Finally, andasalreadymentioned, ifdisrelativelylarge, thesizeofsemideﬁnite relaxations (3.10) to compute fǫ becomes too large for practical implementation dt (as one must have t ≥ d). So in practice one let d be ﬁxed at a small value, typically the smallest possible value of d, i.e., 1 (Φ is quadratic) or 2 (Φ is d d quartic)), and one updates ǫ as indicated above. So the resulting algorithm reads: Algorithm. Input: ℓ,d,k∗ ∈ N, ǫ > 0 (e.g. ǫ := 10−1), d ∈ N, x˜ := ⋆, f(⋆) = +∞. 0 0 Output: f(x∗) with x∗ ∈ X and Φ(x∗) ≤ 0. d d d Step 1: Set k = 1 and ǫ(k) = 1. Step 2: While k ≤ k∗, solve Pǫ(k) in (3.9) → x∗ ∈ X. d k Step 3: Solve Q (x∗) in (3.5) → ρ (x∗). ℓ k ℓ k If −ǫ ≤ ρ (x∗) ≤ 0 set x∗ := x∗ and STOP. 0 ℓ k d k If ρ (x∗) < −ǫ then: ℓ k 0 • if f(x˜) > f(x∗) then set x˜ := x∗. If k = k∗ then x∗ := x∗. k k d k • set ǫ(k +1) := 2ǫ(k), k := k +1 and go to Step 2. If ρ (x∗) > 0 then: ℓ k • If k < k∗ set ǫ(k +1) := ǫ(k)/2, k := k +1 and go to Step 2. • If k = k∗ then set set x∗ = x˜. d Observe that in Step 2 of the above algorithm, one assumes that by solving Pǫ(k) one obtains x∗ ∈ X. d k 9 3.4. Numerical experiments. We have taken Examples 2,7,9, K, M, N, all from Bhattacharjee et al. [2, Appendix A] and whose data are polynomials, ex- cept for problem L. For the latter problem, the non-polynomial function x 7→ min[0,(x −x )] is semi-algebraic and can be generated by introducing an addi- 1 2 tional variable x , with the polynomial constraints: 3 x2 = (x −x )2; x ≥ 0. 3 1 2 3 Indeed, 2min[0,(x −x )] = x −x −x . 1 2 1 2 3 Although these examples are quite small, they are still non trivial (and even diﬃcult) to solve, and we wanted to test the above methodology with small relaxation order d. In fact we have even considered the smallest possible d, i.e., d = 1 (Φ is quadratic). Results in Table 1 are quite good since by using the d semideﬁnite relaxation of minimal order “d” one obtains an optimal value f∗ d quite close to f∗, at the price of updating ǫ several times. Next, for Problem L, if we now increase d to d = 2, we improve the opti- mal value which becomes f∗ = 0.3849 with ǫ = 2.2. However, for Problem M, d increasing d does not improve the optimal value. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.