Tree structured sparse coding on cubes ArthurSzlam CityCollegeofNewYork [email protected] 3 1 0 Severalrecentworkshavediscussedtreestructuredsparse coding[8,10,7,3],whereN datapoints 2 in Rd written asthe d×N matrix X areapproximatelydecomposedintothe productof matrices n WZ.HereW isad×Kdictionarymatrix,andZisaK×Nmatrixofcoefﬁcients.Intreestructured a sparsecoding,therowsofZ correspondtonodesonatree,andthecolumnsofZ areencouragedto J benonzeroononlyafewbranchesofthetree;oralternatively,thecolumnsareconstrainedtolieon 6 atmostaspeciﬁednumberofbranchesofthetree. 1 When viewedfroma geometricperspective,thiskindofdecompositionisa “waveletanalysis” of ] thedata pointsinX [9,6,11, 1]. As eachrowin Z isassociatedto acolumnof W, the columns T of W also take a tree structure. The decomposition correspondsto a multiscale clustering of the I . data, where the scale of the clustering is given by the depth in the tree, and cluster membership s correspondsto activation of a row in Z. The root node rows of Z correspondsto the whole data c [ set,andtherootnodecolumnsofW areabestﬁtlinearrepresentationof X. ThesetofrowsofZ correspondingtoeachnodespecifyacluster-adatapointxisinthatclusterifithasactiveresponses 1 inthoserows. ThesetofcolumnsofW correspondingtoanodespecifyalinearcorrectiontothe v bestﬁtsubspacedeﬁnedbythenodesancestors;thecorrectionisvalidonthecorrespondingcluster. 0 9 Herewediscusstheanalagousconstructiononthebinarycube{−1,1}d. Linearbestﬁtisreplaced 5 bybestﬁtsubcubes. 3 . 1 1 The construction onthecube 0 3 1 1.1 Setup : v WearegivenN datapointsinBd ={−1,1}dwrittenasthed×N binarymatrixX.Ourgoalisto i X decomposeX asatreeofsubcubesand“subcubecorrections”.AqdimensionalsubcubeC =C c,Ir r ofBdisdeterminedbyapointc∈Bd,alongwithasetofd−qrestrictedindicesIr =r1,...,rd−q. a ThecubeC consistsofthepointsb∈Bd suchthatb =c forallr ∈Ir,thatis c,Ir ri ri i C ={b∈Bd s.t. b =c ∀r ∈Ir}. c,Ir ri ri i TheunrestrictedindicesIu ={1,...,d}\Ir cantakeoneithervalue. 1.2 Theconstruction HereIwilldescribeasimpleversionoftheconstructionwhereeachnodeinthetreecorrespondsto asubcubeofthesamedimensionq,andahardbinaryclusteringisusedateachstage. Supposeour treehasdepthl. Thentheconstructionconsistsof 1. AtreestructuredclusteringofX intosetsX atdepth(scale)i∈{1,...,l}suchthat ij [Xij =X, j 2. andclusterrepresentatives(thatisd−iq-dimensionalsubcubes) C cij,Iirj 1 suchthattherestrictedsetshavethepropertythatif ijisanancestorofi′j′, Iirj ∩Iir′j′ =Iirj, and cij(s)=ci′j′(s) foralls∈Ir ij Here each c is a vector in Bd; the complete set of c roughly corresponds to W from before. ij ij However,notethateachc haspreciselyd−iqentriesthatactuallymatter;andmoreoverbecause ij of the nested equalities, the leaf nodes carry all the informationon the branch. This is notto say thatthetreestructureisnotimportantornotused-itis,as theleafnodeshavetosharecoordinates. Howeveroncethe fullconstructionisspeciﬁed, theleaf representativesareallthatisnecessaryto codeadatapoint. 1.3 Algorithms Wecanbuildthepartitionsandrepresentativesstartingfromtherootanddescendingdownthetree as follows: ﬁrst, ﬁnd the bestﬁt d−q dimensionalsubcubeforthe whole data set. Thisis given bya coordinate-wisemode; thefreecoordinatesare the ones with thelargestaveragediscrepancy fromtheirmodes. Removetheq ﬁxedcoordinatesfromconsideration. Clusterthereduced(d−q dimensional)data usingK meanswithK = 2; oneachclusterﬁndthebestﬁt (d−q)−q cube. Continuetotheleaves. 1.3.1 Reﬁnement ThetermsC andX canbeupdatedwithaLloydtypealternation. WithalloftheX ﬁxed, cij,Iirj ij ij loopthrougheachC fromtherootofthetreeﬁndingthebestsubcubesateachscaleforthecurrent partition.Nowupdatethepartitionsothateachxissenttoitsbestﬁtleafcube. 1.3.2 Adaptiveq,l,etc. In [1], one of the important points is that many of the model parameters, including the q, l, and the numberof clusterscouldbe determinedin a principledway. While itis possible thatsome of theiranalysismaycarryovertothissetting,itisnotyetdone. However,insteadofﬁxingq,wecan ﬁx a percentageof the energyto bekeptateach level, andchoosethe numberof freecoordinates accordingly. 2 Experiments We use a binarized the MNIST training data by thresholding to obtain X. Here d = 282 and N = 60000. Replace 70% of the entries in X with noise sampled uniformlyfrom {−1,1}, and trainatreestructuredcubedictionarywithq =80anddepthl=9. Thesubdivisionschemeusedto generatethe multiscaleclusteringis 2-meansinitializedvia randomizedfarthestinsertion[2]; this meanswecancyclespinoverthedictionaries[5],togetmanydifferentreconstructionstoaverage over. In this experimentthe reconstructionwas preformed50 timesforthe noise realization. The resultsarevisualizedbelow. References [1] W.Allard,G.Chen,andM.Maggioni.MultiscalegeometricmethodsfordatasetsII:Geomet- ricmulti-resolutionanalysis. toappearinAppliedandComputationalHarmonicAnalysis. [2] DavidArthurandSergeiVassilvitskii. k-means++:theadvantagesofcarefulseeding. InPro- ceedingsoftheeighteenthannualACM-SIAMsymposiumonDiscretealgorithms,SODA’07, pages 1027–1035, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathe- matics. [3] Richard G. Baraniuk, Volkan Cevher, Marco F. Duarte, and Chinmay Hegde. Model-Based CompressiveSensing. Dec2009. 2 Figure 1: Results of denoising using the tree structured coding. The top left image is the ﬁrst 64 binarizedMNIST digits afterreplacing 70%of the data matrix with uniformnoise. The top right imageisrecovered,usingabinarytreeofdepthl=9andq =90,and100cyclespins,thusthenon- binaryoutput,astheﬁnalresultistheaverageoftherandomclusteringinitialization(ofcoursewith thesamenoiserealization).Thebottomleftimageisrecoveredusingrobustpca[4],forcomparison. Thebottomrightisthetruebinarydata. 3 [4] Emmanuel J. Cande`s, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? J.ACM,58(3):11,2011. [5] R.R.CoifmanandD.L.Donoho. Translation-invariantde-noising. Technicalreport,Depart- mentofStatistics,1995. [6] G. David and S. Semmes. Singularintegralsand rectiﬁable sets in Rn: au-dela` des graphes Lipschitziens. Aste´risque,193:1–145,1991. [7] LaurentJacob, Guillaume Obozinski, and Jean-PhilippeVert. Grouplasso with overlapand graphlasso.InProceedingsofthe26thAnnualInternationalConferenceonMachineLearning, ICML’09,pages433–440,NewYork,NY,USA,2009.ACM. [8] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximalmethodsforsparse hierarchical dictionarylearning. InInternationalConferenceonMachineLearning(ICML),2010. [9] P.W. Jones. Rectiﬁablesets andthetravelingsalesmanproblem. InventMath, 102(1):1–15, 1990. [10] SeyoungKimandEricP.Xing. Tree-guidedgrouplassoformulti-taskregressionwithstruc- turedsparsity. InICML,pages543–550,2010. [11] Gilad Lerman. Quantifying curvelike structures of measures by using L2 Jones quantities. Comm.PureAppl.Math.,56(9):1294–1365,2003. 4

Tree structured sparse coding on cubes PDF

0.12 MB

English

by Arthur Szlam

#additional_collections #journals #arxiv

Checking for file health...

Preview Tree structured sparse coding on cubes

Tree structured sparse coding on cubes ArthurSzlam CityCollegeofNewYork [email protected] 3 1 0 Severalrecentworkshavediscussedtreestructuredsparse coding[8,10,7,3],whereN datapoints 2 in Rd written asthe d×N matrix X areapproximatelydecomposedintothe productof matrices n WZ.HereW isad×Kdictionarymatrix,andZisaK×Nmatrixofcoefﬁcients.Intreestructured a sparsecoding,therowsofZ correspondtonodesonatree,andthecolumnsofZ areencouragedto J benonzeroononlyafewbranchesofthetree;oralternatively,thecolumnsareconstrainedtolieon 6 atmostaspeciﬁednumberofbranchesofthetree. 1 When viewedfroma geometricperspective,thiskindofdecompositionisa “waveletanalysis” of ] thedata pointsinX [9,6,11, 1]. As eachrowin Z isassociatedto acolumnof W, the columns T of W also take a tree structure. The decomposition correspondsto a multiscale clustering of the I . data, where the scale of the clustering is given by the depth in the tree, and cluster membership s correspondsto activation of a row in Z. The root node rows of Z correspondsto the whole data c [ set,andtherootnodecolumnsofW areabestﬁtlinearrepresentationof X. ThesetofrowsofZ correspondingtoeachnodespecifyacluster-adatapointxisinthatclusterifithasactiveresponses 1 inthoserows. ThesetofcolumnsofW correspondingtoanodespecifyalinearcorrectiontothe v bestﬁtsubspacedeﬁnedbythenodesancestors;thecorrectionisvalidonthecorrespondingcluster. 0 9 Herewediscusstheanalagousconstructiononthebinarycube{−1,1}d. Linearbestﬁtisreplaced 5 bybestﬁtsubcubes. 3 . 1 1 The construction onthecube 0 3 1 1.1 Setup : v WearegivenN datapointsinBd ={−1,1}dwrittenasthed×N binarymatrixX.Ourgoalisto i X decomposeX asatreeofsubcubesand“subcubecorrections”.AqdimensionalsubcubeC =C c,Ir r ofBdisdeterminedbyapointc∈Bd,alongwithasetofd−qrestrictedindicesIr =r1,...,rd−q. a ThecubeC consistsofthepointsb∈Bd suchthatb =c forallr ∈Ir,thatis c,Ir ri ri i C ={b∈Bd s.t. b =c ∀r ∈Ir}. c,Ir ri ri i TheunrestrictedindicesIu ={1,...,d}\Ir cantakeoneithervalue. 1.2 Theconstruction HereIwilldescribeasimpleversionoftheconstructionwhereeachnodeinthetreecorrespondsto asubcubeofthesamedimensionq,andahardbinaryclusteringisusedateachstage. Supposeour treehasdepthl. Thentheconstructionconsistsof 1. AtreestructuredclusteringofX intosetsX atdepth(scale)i∈{1,...,l}suchthat ij [Xij =X, j 2. andclusterrepresentatives(thatisd−iq-dimensionalsubcubes) C cij,Iirj 1 suchthattherestrictedsetshavethepropertythatif ijisanancestorofi′j′, Iirj ∩Iir′j′ =Iirj, and cij(s)=ci′j′(s) foralls∈Ir ij Here each c is a vector in Bd; the complete set of c roughly corresponds to W from before. ij ij However,notethateachc haspreciselyd−iqentriesthatactuallymatter;andmoreoverbecause ij of the nested equalities, the leaf nodes carry all the informationon the branch. This is notto say thatthetreestructureisnotimportantornotused-itis,as theleafnodeshavetosharecoordinates. Howeveroncethe fullconstructionisspeciﬁed, theleaf representativesareallthatisnecessaryto codeadatapoint. 1.3 Algorithms Wecanbuildthepartitionsandrepresentativesstartingfromtherootanddescendingdownthetree as follows: ﬁrst, ﬁnd the bestﬁt d−q dimensionalsubcubeforthe whole data set. Thisis given bya coordinate-wisemode; thefreecoordinatesare the ones with thelargestaveragediscrepancy fromtheirmodes. Removetheq ﬁxedcoordinatesfromconsideration. Clusterthereduced(d−q dimensional)data usingK meanswithK = 2; oneachclusterﬁndthebestﬁt (d−q)−q cube. Continuetotheleaves. 1.3.1 Reﬁnement ThetermsC andX canbeupdatedwithaLloydtypealternation. WithalloftheX ﬁxed, cij,Iirj ij ij loopthrougheachC fromtherootofthetreeﬁndingthebestsubcubesateachscaleforthecurrent partition.Nowupdatethepartitionsothateachxissenttoitsbestﬁtleafcube. 1.3.2 Adaptiveq,l,etc. In [1], one of the important points is that many of the model parameters, including the q, l, and the numberof clusterscouldbe determinedin a principledway. While itis possible thatsome of theiranalysismaycarryovertothissetting,itisnotyetdone. However,insteadofﬁxingq,wecan ﬁx a percentageof the energyto bekeptateach level, andchoosethe numberof freecoordinates accordingly. 2 Experiments We use a binarized the MNIST training data by thresholding to obtain X. Here d = 282 and N = 60000. Replace 70% of the entries in X with noise sampled uniformlyfrom {−1,1}, and trainatreestructuredcubedictionarywithq =80anddepthl=9. Thesubdivisionschemeusedto generatethe multiscaleclusteringis 2-meansinitializedvia randomizedfarthestinsertion[2]; this meanswecancyclespinoverthedictionaries[5],togetmanydifferentreconstructionstoaverage over. In this experimentthe reconstructionwas preformed50 timesforthe noise realization. The resultsarevisualizedbelow. References [1] W.Allard,G.Chen,andM.Maggioni.MultiscalegeometricmethodsfordatasetsII:Geomet- ricmulti-resolutionanalysis. toappearinAppliedandComputationalHarmonicAnalysis. [2] DavidArthurandSergeiVassilvitskii. k-means++:theadvantagesofcarefulseeding. InPro- ceedingsoftheeighteenthannualACM-SIAMsymposiumonDiscretealgorithms,SODA’07, pages 1027–1035, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathe- matics. [3] Richard G. Baraniuk, Volkan Cevher, Marco F. Duarte, and Chinmay Hegde. Model-Based CompressiveSensing. Dec2009. 2 Figure 1: Results of denoising using the tree structured coding. The top left image is the ﬁrst 64 binarizedMNIST digits afterreplacing 70%of the data matrix with uniformnoise. The top right imageisrecovered,usingabinarytreeofdepthl=9andq =90,and100cyclespins,thusthenon- binaryoutput,astheﬁnalresultistheaverageoftherandomclusteringinitialization(ofcoursewith thesamenoiserealization).Thebottomleftimageisrecoveredusingrobustpca[4],forcomparison. Thebottomrightisthetruebinarydata. 3 [4] Emmanuel J. Cande`s, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? J.ACM,58(3):11,2011. [5] R.R.CoifmanandD.L.Donoho. Translation-invariantde-noising. Technicalreport,Depart- mentofStatistics,1995. [6] G. David and S. Semmes. Singularintegralsand rectiﬁable sets in Rn: au-dela` des graphes Lipschitziens. Aste´risque,193:1–145,1991. [7] LaurentJacob, Guillaume Obozinski, and Jean-PhilippeVert. Grouplasso with overlapand graphlasso.InProceedingsofthe26thAnnualInternationalConferenceonMachineLearning, ICML’09,pages433–440,NewYork,NY,USA,2009.ACM. [8] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximalmethodsforsparse hierarchical dictionarylearning. InInternationalConferenceonMachineLearning(ICML),2010. [9] P.W. Jones. Rectiﬁablesets andthetravelingsalesmanproblem. InventMath, 102(1):1–15, 1990. [10] SeyoungKimandEricP.Xing. Tree-guidedgrouplassoformulti-taskregressionwithstruc- turedsparsity. InICML,pages543–550,2010. [11] Gilad Lerman. Quantifying curvelike structures of measures by using L2 Jones quantities. Comm.PureAppl.Math.,56(9):1294–1365,2003. 4

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.