Logout succeed
Logout succeed. See you again!

Unsupervised Learning in Neuromemristive Systems PDF
Preview Unsupervised Learning in Neuromemristive Systems
Unsupervised Learning in Neuromemristive Systems Cory Merkel and Dhireesha Kudithipudi Department of Computer Engineering Rochester Institute of Technology Rochester, New York 14623-5603 Email: {cem1103,dxkeec}@rit.edu Abstract—Neuromemristive systems (NMSs) currently repre- we believe this work will advance the state of unsupervised sent the most promising platform to achieve energy efficient learning in NMSs and help others do the same. neuro-inspired computation. However, since the research field 6 is less than a decade old, there are still countless algorithms II. BRIEFDESCRIPTIONOFMEMRISTORS 1 and design paradigms to be explored within these systems. One 0 particular domain that remains to be fully investigated within Technically, a memristor can be defined as any two- 2 NMSs is unsupervised learning. In this work, we explore the terminal device with a state-dependent Ohm’s law [5]. More designofanNMSforunsupervisedclustering,whichisacritical concretely, a memristor is a thin film (I) sandwiched between n a element of several machine learning algorithms. Using a simple a top (M) and bottom (M) electrode. The stack is referred to memristorcrossbararchitectureandlearningrule,weareableto J as a metal-insulator-metal (MIM) structure because the film 7 aclcuhsiteevreinpge.rformance which is on par with MATLAB’s k-means material is nominally insulating. That is, in its stoichiometric 2 crystalline form it will have a large band gap and not enough free carriers to conduct. The film is made conductive by ] introducing defects in the crystalline structure, either through T I. INTRODUCTION fabrication, applying an electric field, or both. Defects may be E The present research explores multiple aspects of unsuper- interstitial metallic ions which are oxidized at one electrode s. vised learning in a class of neurologically-inspired computer and then drift to the other, where they are reduced. Defects c architectures referred to as neuromemristive systems (NMSs). mayalsobevacanciessuchasoxygenvacanciesinaTiO2film. [ Although they are closely related, NMSs differ from neuro- In addition, defects may be changes in polarization, such as 1 morphic systems–pioneered by Mead in the late 1980s [1]– thoseinferroelectricfilms,orevenjustchangesincrystallinity v in two respects: First, they are designed using a mixture of as in phase change memory. In some films, the defect profile 2 CMOS and memristor technologies, which affords levels of can be gradually adjusted by applying electric fields for short 8 connectivityandplasticitythatarenotachievableinneuromor- durations, yielding incremental modifications to the film’s 4 phicsystems.Theseconddistinction,whichismoresubtlebut overall conductance. In other films, only two conductance 7 equally important, is that NMSs focus on abstraction, rather states can be reached. Moreover, there is usually a minimum 0 than mimicking, of neurological processes. This abstraction amountofenergyrequiredtoeffectchangeinthefilm’sdefect . 1 generally improves the efficiency of the resulting hardware profile. This often translates to a threshold voltage which 0 implementations. must be applied across the film to change its conductance. 6 Giventheconstantevolutionofmemristortechnology,itmakes 1 NMS research and development took off rapidly in the little sense to design an NMS around any specific memristor : late 2000s, coinciding with a growing interest in two-terminal deviceparameters.Instead,weassumedeviceswillhavethese v i memristors, which will be described briefly in Section II general characteristics: (1) a large minimum resistance value X for unfamiliar readers. The utility of these systems has been (e.g. in kΩs), (2) a large OFF/ON resistance ratio (at least r demonstratedinvariousapplicationdomains,especiallyimage 103), (3) high endurance (ability to switch many times before a processing/analysis. See [2] for a review. Learning in these failing),(4)highretention(non-volatility),and(5)incremental systems has primarily been supervised. Although there are conductance states that can be reached by applying bipolar some examples of unsupervised learning in spike-based sys- voltagepulsesaboveaparticularthersholdvoltage.Allofthese tems [3], it is relatively unexplored in non-spiking NMSs. properties have been demonstrated in various devices. See [6] One example where unsupervised learning in non-spiking for a review. networks has been demonstrated is in [4], where principal componentanalysis(PCA)isusedtoreducethedimensionality III. CLUSTERINGALGORITHMDESIGN of images. However, the authors to not discuss any circuit for Clustering algorithms uncover structure in a set of m implementing the PCA algorithm in hardware. unlabeled input vectors {u(p)} by identifying M groups, or In this research, we propose a non-spiking NMS design clustersofvectorsthataresimilarinsomeway.Inonecommon for unsupervised clustering. The NMS is tested on samples approach, each cluster is represented by its centroid, so the from the MNIST database of handwritten digits. To the best clustering algorithm is reduced to finding each of the M of our knowledge, this is the first work that explores general centroids. This can be achieved through a simple competitive aspects of clustering in these systems. Given its ability to learning algorithm: Initialize M vectors w by assigningthem i reducedatadimensionalityandaidinclassification,webelieve to randomly-chosen input vectors. These will be referred to thatclusteringisakeyprimitiveforfutureNMSs.Furthermore, as weight vectors. Then, for each input vector, move the Algorithm 1 Proposed clustering algorithm. A T 𝑥 𝑥 … 𝑥 W 1 2 𝑀 1: Map inputs to hypercube vertices. n 423::: Ifonriftoiearpliopzce=hw1=e:im1g:hNtdeovpeoccthosrsdtoo random input vectors. ecnatsiDoitalucla … 56:: dx∗ii,p==(cid:26)w01,,i·uod(t∗ihp,p)er=w∀iims=eax1(,d2∗i,,p.)..∀,iM=1,2,...,M rotsirmrabssoC UWpedigahtet 7: ∆wi,j = αxiu(jp) ∀i = 1,2,...,M ∀j = eMrCs 1,2,...,m tu 𝑢 𝑢 … 𝑢 p 1 2 𝑁 8: end for n I 9: end for Fig.1. BlockdiagramofproposedNMSforunsupervisedclustering. closest weight vector a little closer. After several iterations, 𝑖𝑢1(𝑝) 𝑖𝑢2(𝑝) 𝑖𝑢𝑁(𝑝) Inputs the algorithm should converge with the weight vectors lying 𝑣 𝑣 𝑣 at (or close to) the centroids. Of course, there are several 𝑤𝑖,1 𝑤𝑖,2 𝑤𝑖,𝑁 parameterswhichmustbedefined,includingadistancemetric train_en for measuring closeness. The most obvious choice is the (cid:96)2-norm. However, computing this is expensive in terms of … 𝑅 hardware because it requires units for calculating squares and square roots. In addition, as we will discuss later, it is easy to use a high-density memristor circuit called a crossbar … 𝑣𝑑𝑖∗,𝑝 to compute dot products between input and weight vectors. Therefore, it is preferred to use a dot product as a distance 𝑅 metric. For example, if all of the vectors are normalized Memristor Crossbar for 𝐰 ((cid:107)u(p)(cid:107) = (cid:107)w (cid:107) = 1), then w ·u(p) > w ·u(p)∀w (cid:54)= w , 𝑖 i i∗ i i i∗ where w is the closest weight vector to u(p). However, the i∗ constraint that (cid:107)u(p)(cid:107) = (cid:107)w (cid:107) = 1 creates a large overhead, Fig.2. Crossbarandsummingamplifiercircuitforcomputingthedistance i betweentheinputandaweightvector. because every input vector has to be normalized and every weight vector has to be re-normalized each time it is updated. IV. NMSHARDWAREDESIGN We propose the following solution: Map each input vector to the vertex of a hypercube centered about the origin: u(p) ∈ TheunsupervisedclusteringalgorithmdiscussedinSection {−1,1}N, where N is the dimensionality of the input space. III can be implemented efficiently in an NMS by representing Now, w ·u(p) will yield a scalar value d∗ between −N and weight vectors as memristor conductances. A block diagram i i,p +N. Moreover, this scalar value can be linearly transformed of the proposed design is shown in Figure 1. The inputs, toadistanced whichisthe(cid:96)1-norm,orManhattandistance, which are represented as positive and negative currents, are i,p between the weight vector and the input: fedthroughM crossbarcircuits.Togetherwithanon-inverting summing amplifier, (represented as a circle), each crossbar d ≡N −d∗ =(cid:88)N |w −u(p)|. (1) computesthedistancebetweenthecurrentinputandtheweight i,p i,p i,j j vector represented by its memristors’ conductances. j=1 Theconfigurationofthecrossbarandsummingamplifieris shown in Figure 2. Memristors in the top row inhibit, or con- Using this distance metric, we don’t ever need to re- tribute a negative component to the output, while memristors normalize the weight vectors. Furthermore, mapping input in the bottom row excite, or contribute a positive component vectors to hypercube vertices can usually be accomplished by to the output. Therefore, each crossbar column represents one thresholding. For example, grayscale images can be mapped component of one weight vector w , which can be positive or i by assigning -1 to pixel values from 0 to 127 and +1 to negative. If we assume that the op amp has a high open loop pixel values from 128 to 255. Algorithm 1 summarizes the gain and the wire resistances are small, then algorithm. The first two lines are initialization steps. Within N (cid:18) (cid:19) otthhfeethwdeoeuciglbohlsetecsftoovmrepclotoonorepn(ctxsailoleifsdt1hthewewhweiinnnnneierrc)aorarernedmsp0oovonetdhdsecrtwlooistsheere.tTionhdtehenxe, vd∗i,p =(cid:88)j=1iu(jp)R GG12−+GG21 i,j, (2) current input vector using a Hebbian update rule. The pre- where G and G are the top and bottom memristors in each 1 2 factor α, which is called the learning rate, determines how far column, respectively. The output of the circuit is a voltage the weight vectors move each time they win. Notice that this representationofthedistancebetweenthecurrentinputandthe algorithm is completely unsupervised, so there are no labeled weight vector represented by the crossbar. The weight vectors input vectors. are modified by connecting them to write voltages v using wi,j Fig.3. 10clustercentroidsfoundinasetof1000MNISTimagesusingtheproposedNMS. a training enable signal train_en. The write voltages are #10 5 determined by the value of ∆w in line 7 of Algorithm 1. 1.8 i,j Specifically, if ∆w is negative, then v will be a negative voltage below theim,jemristor’s write thrweis,jhold, and if ∆w 1.7 MATLAB i,j Proposed is positive, then v will be a positive voltage above the memristor’s write wthir,jeshold. Otherwise, the write voltage is 1.6 zero. J 1.5 So far, we have only discussed the memristor crossbar and distance calculation parts of Figure 1 (line 5 in Algorithm 1.4 1). The winner-takes-all circuit (line 6 in Algorithm 1) can be implemented in a number of ways. In this work, we used 1.3 the current-mode design described in [7]. Finally, the weight 0 100 200 300 400 500 update (line 7 in Algorithm 1) can be computed using simple Epoch combinational logic circuits. Fig. 4. Cost function versus epoch while clustering MNIST images using theproposedNMS. V. CLUSTERINGMNISTIMAGES Oneexcitingapplicationoftheproposedhardwareisauto- matically identifying clusters in sets of images. We took 1000 REFERENCES images (m=1000) from the MNIST handwritten digit dataset [1] C.Mead,“Neuromorphicelectronicsystems,”Proceedings and clustered them using a behavioral model of the NMS of the IEEE, vol. 78, no. 10, pp. 1629–1636, 1990. describedinthelastsection.Eachimagewasoriginally20×20 [2] D. Kudithipudi, C. Merkel, M. Soltiz, G. S. Rose, and grayscale pixels (N=400). They were mapped to hypercube R.Pino,“Designofneuromorphicarchtectureswithmem- vertices using the thresholding approach discussed earlier. In ristors,” in Network Science and Cybersecurity, R. Pino, addition, we used 10 clusters (M=10), 500 training epochs Ed. Springer, 2014, pp. 93–103. (N =500), and α=0.005. The results are shown in Figure train [3] S. Yu, B. Gao, Z. Fang, H. Yu, J. Kang, and H.-S. P. 3. Here, we have plotted the weight vectors representing the Wong, “A low energy oxide-based electronic synaptic centroid of each cluster. Figure 4 shows the cost versus the device for neuromorphic visual systems with tolerance to training epoch, where the cost is defined as devicevariation.”AdvancedMaterials,vol.25,no.12,pp. m 1774–9, Mar. 2013. (cid:88) J = (mind ∀i). (3) [4] S.Choi,P.Sheridan,andW.D.Lu,“DataClusteringusing i,p Memristor Networks.” Scientific reports, vol. 5, p. 10492, p=1 Jan. 2015. We see that the cost function for the proposed NMS ap- [5] L.Chua,“Resistanceswitchingmemoriesarememristors,” proaches that of MATLAB’s built-in k-means clustering after AppliedPhysicsA,vol.102,no.4,pp.765–783,Jan.2011. 500 epochs. [6] D. Kuzum, S. Yu, and H.-S. P. Wong, “Synaptic electron- ics: materials, devices and applications.” Nanotechnology, VI. CONCLUSIONS vol. 24, no. 38, p. 382001, Sep. 2013. [7] J. Lazzaro, S. Ryckebusch, M. A. Mahowald, and C. A. The goal of this work was to explore both algorithmic and Mead, “Winner-take-all networks of O(N) complexity,” in hardwaredesignaspectsofunsupervisedlearninginNMSs.To AdvancesinNeuralInformationProcessingSystems,1988, that end, we proposed a clustering algorithm that maps inputs pp. 703–711. to vertices of a hypercube, and then iteratively finds clusters’ centroids using a Hebbian learning rule. We argue (although we haven’t proven) that the proposed algorithm can be im- plemented more efficiently in an NMS than algorithms that use either (cid:96)2-norm or cosine similarity as a distance function. ThealgorithmwasimplementedinacustomNMSdesignthat leverages crossbar circuits to compute the distance between inputs and weight vectors. To test our design, we clustered 1000 MNIST images and found the results to be consistent with MATLAB’s k-means clustering implementation.