loading

Logout succeed

Logout succeed. See you again!

ebook img

Latent Relation Representations for Universal Schemas PDF

file size0.06 MB
languageEnglish

Preview Latent Relation Representations for Universal Schemas

Latent Relation Representations for Universal Schemas SebastianRiedel LiminYao,AndrewMcCallum DepartmentofComputerScience DepartmentofComputerScience 3 1 UniversityCollegeLondon UniversityofMassachusettsatAmherst 0 [email protected] {lmyao,mccallum}@cs.umass.edu 2 n a J 8 2 1 Introduction ] G L Supervised relation extraction uses a pre-defined schema of relation types (such as born-in or . employed-by). This approach requires labeling textual relations, a time-consuming and difficult s c process. Thishasledtosignificantinterestindistantly-supervisedlearning. Hereonealignsexist- [ ingdatabaserecordswiththesentencesinwhichtheserecordshavebeen“rendered”,andfromthis labelingonecan traina machinelearningsystem asbefore[1, 2]. However,thismethodrelieson 2 theavailabilityofalargedatabasethathasthedesiredschema. v 3 The need for pre-existing databases can be avoided by not having any fixed schema. This is the 9 approachtakenbyOpenIE[3]. Heresurfacepatternsbetween mentionsofconceptsserveasrela- 2 tions. Thisapproachrequiresnosupervisionandhastremendousflexibility,butlackstheabilityto 4 generalize. Forexample,OpenIEmayfindFERGUSON–historian-at–HARVARDbutdoesnotknow . 1 FERGUSON–is-a-professor-at–HARVARD. 0 3 Oneway to gaingeneralizationisto clustertextualsurface formsthathavesimilarmeaning[4, 5, 1 6,7]. Whiletheclustersdiscoveredbyallthesemethodsusuallycontainsemanticallyrelateditems, : closer inspection invariably shows that they do not provide reliable implicature. For example, a v cluster may include historian-at, professor-at, scientist-at, worked-at. However, scientist-at does i X notnecessarilyimplyprofessor-at,andworked-atcertainlydoesnotimplyscientist-at. Infact,we r contendthatanyrelationalschemawouldinherentlybebrittleandill-defined––havingambiguities, a problematicboundarycases,andincompleteness. Inresponsetothisproblem,wepresentanewapproach: implicaturewithuniversalschemas. Here we embrace the diversity and ambiguity of originalinputs. This is accomplished by defining our schematobetheunionofallsourceschemas: originalinputforms,e.g. variantsofsurfacepatterns similarly to OpenIE, as well as relations in the schemas of pre-existingstructureddatabases. But unlikeOpenIE,welearnasymmetricimplicatureamongrelationsandentitytypes. Thisallowsus to probabilistically“fill in” inferredunobservedentity-entityrelationsin thisunion. Forexample, afterobservingFERGUSON–historian-at–HARVARD,oursysteminfersthatFERGUSON–professor- at–HARVARD,butnotviceversa. At the heart of our approach is the hypothesis that we should concentrate on predicting source data––arelativelywelldefinedtaskthatcanbeevaluatedandoptimized––asopposedtomodeling semanticequivalence,whichwebelievewillalwaysbeillusive. Toreasonwithauniversalschema,welearnlatentfeaturerepresentationsofrelations,tuplesanden- tities. Theseact,throughdotproducts,asnaturalparametersofalog-linearmodelfortheprobability thatagivenrelationholdsforagiventuple.Weshowexperimentallythatthisapproachsignificantly outperformsa comparable baseline without latent features, and the current state-of-the-artdistant supervisionmethod. 1 2 Model We use R to denotethe set of relationswe seek to predict(such as works-written in Freebase, or the X–heads–Y pattern), and T to denote the set of input tuples. For simplicity we assume each relationtobebinary. Givenarelation r ∈ Randatuplet ∈ T thepairhr,tiisafact,orrelation instance. Theinputtoourmodelisasetofobservedfacts,andtheobservedfactsforagiventuple :={hr,ti∈}. t Ourgoalisamodelthatcanestimate,foragivenrelationr(suchasX–historian-at–Y)andagiven tuple t (such as <FERGUSON,HARVARD>) a score cr,t forthe fact hr,ti. This matrix completion problem is related to collaborative filtering. We can think of each tuple as a customer, and each relationasaproduct.Ourgoalistopredicthowthetupleratestherelation(rating0=false,rating1 =true),basedonobservedratingsin. Weinterpretc astheprobabilityp(y =1)wherey is r,t r,t r,t abinaryrandomvariablethatistrueiffhr,tiholds.Tothisendweintroduceaseriesofexponential familymodelsinspiredbygeneralizedPCA[8],aprobabilisticgeneralizationofPrincipleCompo- nentAnalysis. These modelswill estimate the confidencein hr,ti using a naturalparameter θ r,t andthelogisticfunction:c :=p(y |θ ):= 1 . r,t r,t r,t 1+exp(−θr,t) Wefollow[9]andusearankingbasedobjectivefunctiontoestimateparametersofourmodels. Latent Feature Model One way to define θ is through a latent feature model F. We measure r,t compatibilitybetweenrelationrandtupletasadotproductoftwolatentfeaturerepresentationsof sizeKF: a forrelation r,andv fortuplet. ThisgivesθF := PKFa v andcorrespondsto r t r,t k r,k t,k theoriginalgeneralizedPCAthatlearnsalow-rankfactorizationofΘ=(θ ). r,t NeighborhoodModel We caninterpolatetheconfidenceforagiventupleandrelationbasedon thetruenessofothersimilarrelationsforthesametuple.InCollaborativeFilteringthisisreferredas aneighborhood-basedapproach[10]. WeimplementaneighborhoodmodelNviaasetofweights wr,r′, whereeachcorrespondsto a directedassociationstrength betweenrelationsr andr′. Sum- mingtheseupgivesθrN,t :=Pr′∈t\{r}wr,r′.1 Entity Model Relations have selectional preferences: they allow only certain types in their ar- gument slots. To capture this observation, we learn a latent entity representation from data. For each entity e we introduce a latent feature vector t ∈ Rl. In addition, for each relation r and e argumentslot i we introduce a feature vector d . Measuring compatibility of an entity tuple and i relationamountstosummingupthecompatibilitiesbetweeneachargumentslotrepresentationand thecorrespondingentityrepresentation: θE :=Parity(r)PKEd t . r,t i=1 k i,k ti,k Combined Models In practice all the above models can capture important aspects of the data. Hencewealsousevariouscombinations,suchasθN,F,E :=θN +θF +θE . r,t r,t r,t r,t 3 Experiments Doesreasoningjointlyacrossa universalschemahelpto improveovermoreisolated approaches? Inthefollowingweseektoanswerthisquestionempirically. Data Ourexperimentalsetupisroughlyequivalentto previouswork[2], andhencewe omitde- tails. To summarize, we consider each pair ht ,t i of Freebase entities that appear together in a 1 2 corpus. Itssetofobservedfacts correspondto: Extractedsurfacepatterns(inourcase lexicalized t dependencypaths)between mentionsof t and t , andthe relationsof t and t in Freebase. We 1 2 1 2 divideallourtuplesintoapproximately200ktrainingtuples,and200ktesttuples. Thetotalnumber ofrelations(patternsandfromFreebase)isapproximately4k. 1Notice that theneighborhood model amounts toa collectionof local log-linear classifiers, one foreach relationrwithweightswr. 2 Predicting Freebase and Surface Pattern Relations For evaluation we use two collections of relations:Freebaserelationsandsurfacepatterns.Ineithercasewecomparethecompetingsystems withrespecttotheirrankedresultsforeachrelationinthe collection. Our first baseline is MI09, a distantly supervised classifier based on the work of [1]. We also compareagainstYA11,aversionofMI09thatusespreprocessedpatternclusterfeaturesaccording to[7]. Thethirdbaselineis SU12,the state-of-the-artMulti-InstanceMulti-Labelsystemby[11]. Theremainingsystemsareourneighborhoodmodel(N),thefactorizedmodel(F),theircombination (NF)andthecombinedmodelwithalatententityrepresentation(NFE). Theresultsintermsofmeanaverageprecision(withrespecttopooledresultsfromeachsystem)are inthetablebelow: Relation # MI09 YA11 SU12 N F NF NFE TotalFreebase 334 0.48 0.52 0.57 0.52 0.66 0.67 0.69 TotalPattern 329 0.28 0.56 0.50 0.46 ForFreebaserelations,wecanseethataddingpatternclusterfeatures(andhenceincorporatingmore data) helps YA11 to improve over MI09. Likewise, we see that the factorized model F improves over N, again learning from unlabeled data. This improvementis bigger than the corresponding changebetweenMI09and YA11, possiblyindicatingthatourlatentrepresentationsare optimized directlytowardsimprovingpredictionperformance.Ourbestmodel,thecombinationofN,FandE, outperformsallothermodelsintermsoftotalMAP,indicatingthepowerofselectionalpreferences learnedfromdata. MI09,YA11andSU12aredesignedtopredictstructuredrelations,andsoweomitthemforresults onsurfacepatterns. Lookatourmodelsforpredictingtuplesofsurfacepatterns. Weagainseethat learning a latent representation (F, NF and NFE models) from additional data helps substantially overthenon-latentNmodel. Allourmodelsarefasttotrain. Theslowestmodeltrainsinjust30minutes. Bycontrast,training thetopicmodelinYA11alonetakes4hours. TrainingSU12takestwohours(onlessdata). Also noticethatourmodelsnotonlylearntopredictFreebaserelations,butalsoapproximately4ksurface patternrelations. 4 Conclusion Werepresentrelationsusinguniversalschemas. Suchschemascontainsurfacepatternsasrelations, aswellasrelationsfromstructuredsources. Wecanpredict missingtuplesforsurfacepatternrela- tionsandstructuredschemarelations.Weshowthisexperimentallybycontrastingaseriesofpopular weaklysupervisedmodelstoourcollaborativefilteringmodelsthatlearnlatentfeaturerepresenta- tions across surface patterns and structured relations. Moreover, our models are computationally efficient,requiringlesstimethancomparablemethods,whilelearningmorerelations. Reasoningwithuniversalschemasisnotmerelyatoolforinformationextraction. Itcanalsoserve asaframeworkforvariousdataintegrationtasks,forexample,schemamatching.Infutureworkwe alsoplantointegrateuniversalentitytypesandattributesintothemodel. References [1] Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. Distant supervision forrelation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual MeetingoftheACLandthe4thInternationalJointConferenceonNaturalLanguageProcess- ing of the AFNLP (ACL ’09), pages 1003–1011.Association for ComputationalLinguistics, 2009. [2] SebastianRiedel,LiminYao,andAndrewMcCallum. Modelingrelationsandtheirmentions withoutlabeled text. In Proceedingsofthe EuropeanConferenceonMachineLearningand KnowledgeDiscoveryinDatabases(ECMLPKDD’10),2010. [3] Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld. Open information extractionfromtheweb. Commun.ACM,51(12):68–74,2008. 3 [4] DekangLinandPatrickPantel. DIRT-discoveryofinferencerulesfromtext. InKnowledge DiscoveryandDataMining,pages323–328,2001. [5] Patrick Pantel, Rahul Bhagat, BonaventuraCoppola, TimothyChklovski, and EduardHovy. ISP:LearningInferentialSelectionalPreferences. InProceedingsofNAACLHLT,2007. [6] AlexanderYatesandOrenEtzioni. Unsupervisedmethodsfordeterminingobjectandrelation synonymsontheweb. JournalofArtificialIntelligenceResearch,34:255–296,2009. [7] Limin Yao, Aria Haghighi, Sebastian Riedel, and Andrew McCallum. Structured relation discoveryusinggenerativemodels. In ProceedingsoftheConferenceonEmpiricalmethods innaturallanguageprocessing(EMNLP’11),July2011. [8] Michael Collins, Sanjoy Dasgupta, and Robert E. Schapire. A generalization of principal componentanalysistotheexponentialfamily. InProceedingsofNIPS,2001. [9] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesianpersonalizedrankingfromimplicitfeedback. InProceedingsofUAI,2009. [10] Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborativefiltering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discoveryanddatamining,KDD’08,pages426–434,NewYork,NY,USA,2008.ACM. [11] Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. Multi- instancemulti-labellearningforrelationextraction. InProceedingsofEMNLP-CoNLL,2012. 4

See more

The list of books you might like