loading

Logout succeed

Logout succeed. See you again!

ebook img

Multiclass Learnability and the ERM Principle PDF

pages74 Pages
release year2011
file size0.66 MB
languageEnglish

Preview Multiclass Learnability and the ERM Principle

Multiclass Learnability and the ERM Principle Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz TheHebrewUniversityofJerusalem COLT 2011 Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 1/24 Surprise: The fundamental theorem is not true in the multiclass case. Then how can we learn in the multiclass setting? The Fundamental Theorem of Learning Theory Uniform ERM works Learnability Convergence Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 2/24 Then how can we learn in the multiclass setting? The Fundamental Theorem of Learning Theory Uniform ERM works Learnability Convergence Surprise: The fundamental theorem is not true in the multiclass case. Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 2/24 The Fundamental Theorem of Learning Theory Uniform ERM works Learnability Convergence Surprise: The fundamental theorem is not true in the multiclass case. Then how can we learn in the multiclass setting? Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 2/24 Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24

See more

The list of books you might like