loading

Logout succeed

Logout succeed. See you again!

ebook img

Tree structured sparse coding on cubes PDF

file size0.12 MB
languageEnglish

Preview Tree structured sparse coding on cubes

Tree structured sparse coding on cubes ArthurSzlam CityCollegeofNewYork [email protected] 3 1 0 Severalrecentworkshavediscussedtreestructuredsparse coding[8,10,7,3],whereN datapoints 2 in Rd written asthe d×N matrix X areapproximatelydecomposedintothe productof matrices n WZ.HereW isad×Kdictionarymatrix,andZisaK×Nmatrixofcoefficients.Intreestructured a sparsecoding,therowsofZ correspondtonodesonatree,andthecolumnsofZ areencouragedto J benonzeroononlyafewbranchesofthetree;oralternatively,thecolumnsareconstrainedtolieon 6 atmostaspecifiednumberofbranchesofthetree. 1 When viewedfroma geometricperspective,thiskindofdecompositionisa “waveletanalysis” of ] thedata pointsinX [9,6,11, 1]. As eachrowin Z isassociatedto acolumnof W, the columns T of W also take a tree structure. The decomposition correspondsto a multiscale clustering of the I . data, where the scale of the clustering is given by the depth in the tree, and cluster membership s correspondsto activation of a row in Z. The root node rows of Z correspondsto the whole data c [ set,andtherootnodecolumnsofW areabestfitlinearrepresentationof X. ThesetofrowsofZ correspondingtoeachnodespecifyacluster-adatapointxisinthatclusterifithasactiveresponses 1 inthoserows. ThesetofcolumnsofW correspondingtoanodespecifyalinearcorrectiontothe v bestfitsubspacedefinedbythenodesancestors;thecorrectionisvalidonthecorrespondingcluster. 0 9 Herewediscusstheanalagousconstructiononthebinarycube{−1,1}d. Linearbestfitisreplaced 5 bybestfitsubcubes. 3 . 1 1 The construction onthecube 0 3 1 1.1 Setup : v WearegivenN datapointsinBd ={−1,1}dwrittenasthed×N binarymatrixX.Ourgoalisto i X decomposeX asatreeofsubcubesand“subcubecorrections”.AqdimensionalsubcubeC =C c,Ir r ofBdisdeterminedbyapointc∈Bd,alongwithasetofd−qrestrictedindicesIr =r1,...,rd−q. a ThecubeC consistsofthepointsb∈Bd suchthatb =c forallr ∈Ir,thatis c,Ir ri ri i C ={b∈Bd s.t. b =c ∀r ∈Ir}. c,Ir ri ri i TheunrestrictedindicesIu ={1,...,d}\Ir cantakeoneithervalue. 1.2 Theconstruction HereIwilldescribeasimpleversionoftheconstructionwhereeachnodeinthetreecorrespondsto asubcubeofthesamedimensionq,andahardbinaryclusteringisusedateachstage. Supposeour treehasdepthl. Thentheconstructionconsistsof 1. AtreestructuredclusteringofX intosetsX atdepth(scale)i∈{1,...,l}suchthat ij [Xij =X, j 2. andclusterrepresentatives(thatisd−iq-dimensionalsubcubes) C cij,Iirj 1 suchthattherestrictedsetshavethepropertythatif ijisanancestorofi′j′, Iirj ∩Iir′j′ =Iirj, and cij(s)=ci′j′(s) foralls∈Ir ij Here each c is a vector in Bd; the complete set of c roughly corresponds to W from before. ij ij However,notethateachc haspreciselyd−iqentriesthatactuallymatter;andmoreoverbecause ij of the nested equalities, the leaf nodes carry all the informationon the branch. This is notto say thatthetreestructureisnotimportantornotused-itis,as theleafnodeshavetosharecoordinates. Howeveroncethe fullconstructionisspecified, theleaf representativesareallthatisnecessaryto codeadatapoint. 1.3 Algorithms Wecanbuildthepartitionsandrepresentativesstartingfromtherootanddescendingdownthetree as follows: first, find the bestfit d−q dimensionalsubcubeforthe whole data set. Thisis given bya coordinate-wisemode; thefreecoordinatesare the ones with thelargestaveragediscrepancy fromtheirmodes. Removetheq fixedcoordinatesfromconsideration. Clusterthereduced(d−q dimensional)data usingK meanswithK = 2; oneachclusterfindthebestfit (d−q)−q cube. Continuetotheleaves. 1.3.1 Refinement ThetermsC andX canbeupdatedwithaLloydtypealternation. WithalloftheX fixed, cij,Iirj ij ij loopthrougheachC fromtherootofthetreefindingthebestsubcubesateachscaleforthecurrent partition.Nowupdatethepartitionsothateachxissenttoitsbestfitleafcube. 1.3.2 Adaptiveq,l,etc. In [1], one of the important points is that many of the model parameters, including the q, l, and the numberof clusterscouldbe determinedin a principledway. While itis possible thatsome of theiranalysismaycarryovertothissetting,itisnotyetdone. However,insteadoffixingq,wecan fix a percentageof the energyto bekeptateach level, andchoosethe numberof freecoordinates accordingly. 2 Experiments We use a binarized the MNIST training data by thresholding to obtain X. Here d = 282 and N = 60000. Replace 70% of the entries in X with noise sampled uniformlyfrom {−1,1}, and trainatreestructuredcubedictionarywithq =80anddepthl=9. Thesubdivisionschemeusedto generatethe multiscaleclusteringis 2-meansinitializedvia randomizedfarthestinsertion[2]; this meanswecancyclespinoverthedictionaries[5],togetmanydifferentreconstructionstoaverage over. In this experimentthe reconstructionwas preformed50 timesforthe noise realization. The resultsarevisualizedbelow. References [1] W.Allard,G.Chen,andM.Maggioni.MultiscalegeometricmethodsfordatasetsII:Geomet- ricmulti-resolutionanalysis. toappearinAppliedandComputationalHarmonicAnalysis. [2] DavidArthurandSergeiVassilvitskii. k-means++:theadvantagesofcarefulseeding. InPro- ceedingsoftheeighteenthannualACM-SIAMsymposiumonDiscretealgorithms,SODA’07, pages 1027–1035, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathe- matics. [3] Richard G. Baraniuk, Volkan Cevher, Marco F. Duarte, and Chinmay Hegde. Model-Based CompressiveSensing. Dec2009. 2 Figure 1: Results of denoising using the tree structured coding. The top left image is the first 64 binarizedMNIST digits afterreplacing 70%of the data matrix with uniformnoise. The top right imageisrecovered,usingabinarytreeofdepthl=9andq =90,and100cyclespins,thusthenon- binaryoutput,asthefinalresultistheaverageoftherandomclusteringinitialization(ofcoursewith thesamenoiserealization).Thebottomleftimageisrecoveredusingrobustpca[4],forcomparison. Thebottomrightisthetruebinarydata. 3 [4] Emmanuel J. Cande`s, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? J.ACM,58(3):11,2011. [5] R.R.CoifmanandD.L.Donoho. Translation-invariantde-noising. Technicalreport,Depart- mentofStatistics,1995. [6] G. David and S. Semmes. Singularintegralsand rectifiable sets in Rn: au-dela` des graphes Lipschitziens. Aste´risque,193:1–145,1991. [7] LaurentJacob, Guillaume Obozinski, and Jean-PhilippeVert. Grouplasso with overlapand graphlasso.InProceedingsofthe26thAnnualInternationalConferenceonMachineLearning, ICML’09,pages433–440,NewYork,NY,USA,2009.ACM. [8] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximalmethodsforsparse hierarchical dictionarylearning. InInternationalConferenceonMachineLearning(ICML),2010. [9] P.W. Jones. Rectifiablesets andthetravelingsalesmanproblem. InventMath, 102(1):1–15, 1990. [10] SeyoungKimandEricP.Xing. Tree-guidedgrouplassoformulti-taskregressionwithstruc- turedsparsity. InICML,pages543–550,2010. [11] Gilad Lerman. Quantifying curvelike structures of measures by using L2 Jones quantities. Comm.PureAppl.Math.,56(9):1294–1365,2003. 4

See more

The list of books you might like