Learning Semantic Knowledge from Wikipedia: Learning Entity Representations from Hyperlinks

📆 6/1/2024 10:45 AM

United States News News

United States Latest News,United States Headlines

📆 6/1/2024 10:45 AM
📰 hackernoon

⏱ Reading Time:
1532 sec. here
27 min. at publisher
📊 Quality Score:
News: 616%
Publisher: 51%

In this study, researchers exploit rich, naturally-occurring structures on Wikipedia for various NLP tasks.

Author: Mingda Chen. Table of Links Abstract Acknowledgements 1 INTRODUCTION 1.1 Overview 1.2 Contributions 2 BACKGROUND 2.1 Self-Supervised Language Pretraining 2.2 Naturally-Occurring Data Structures 2.

3 Sentence Variational Autoencoder 2.4 Summary 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.3 Summary 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4.1 Learning Entity Representations from Hyperlinks 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.3 Learning Concept Hierarchies from Document Categories 4.4 Summary 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.3 Summary 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6.1 Long-Form Data-to-Text Generation 6.2 Long-Form Text Summarization 6.3 Story Generation with Constraints 6.4 Summary 7 CONCLUSION APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX B - APPENDIX TO CHAPTER 6 BIBLIOGRAPHY 4.1 Learning Entity Representations from Hyperlinks 4.1.1 Introduction Entity representations play a key role in numerous important problems including language modeling , dialogue generation , entity linking , and story generation . One successful line of work on learning entity representations has been learning static embeddings: that is, assign a unique vector to each entity in the training data . While these embeddings are useful in many applications, they have the obvious drawback of not accommodating unknown entities. Motivated by the recent success of contextualized word representations from pretrained models , we propose to encode the mention context or the description to dynamically represent an entity. In addition, we perform an in-depth comparison of ELMo and BERT-based embeddings and find that they show different characteristics on different tasks. We analyze each layer of the CWRs and make the following observations: • The dynamically encoded entity representations show a strong improvement on the entity disambiguation task compared to prior work using static entity embeddings. • BERT-based entity representations require further supervised training to perform well on downstream tasks, while ELMo-based representations are more capable of performing zero-shot tasks. • In general, higher layers of ELMo and BERT-based CWRs are more transferable to entity-related tasks. To further improve contextualized and descriptive entity representations , we leverage natural hyperlink annotations in Wikipedia. We identify effective objectives for incorporating the contextual information in hyperlinks and improve ELMo-based CWRs on a variety of entity related tasks. 4.1.2 Related Work The training objectives considered in this work are built on previous works that involve reasoning over entities. We give a brief overview of relevant works. Entity linking is a fundamental task in information extraction with a wealth of literature . The goal of this task is to map a mention in context to the corresponding entity in a database. A natural approach is to learn entity representations that enable this mapping. Recent works focused on learning a fixed embedding for each entity using Wikipedia hyperlinks . Gupta et al. additionally train context and description embeddings jointly, but this mainly aims to improve the quality of the fixed entity embeddings rather than using the context and description embeddings directly; we find that their context and description encoders perform poorly on EntEval tasks. A closely related concurrent work by jointly encodes a mention in context and an entity description from Wikipedia to perform zero-shot entity linking. In contrast, here we seek to pretrain a general-purpose entity representations that can function well either given or not given entity descriptions or mention contexts. Other entity-related tasks involve entity typing and coreference resolution . 4.1.3 Method We are interested in two approaches: contextualized entity representations and descriptive entity representations , both encoding fixed-length vector representations for entities. The contextualized entity representations encodes an entity based on the context it appears regardless of whether the entity is seen before. The motivation behind contextualized entity representations is that we want an entity encoder that does not depend on entries in a knowledge base, but is capable of inferring knowledge about an entity from the context it appears. As opposed to contextualized entity representations, descriptive entity representations do rely on entries in Wikipedia. We use a model-specific function f to obtain a fixed-length vector representation from the entity’s textual description. Encoders for Descriptive Entity Representations. We encode an entity description by treating the entity description as a sentence, and use the average of the hidden states from ELMo as the entity description representation. With BERT, we use the output from the token as the description representation. Hyperlink-Based Training. An entity mentioned in a Wikipedia article is often linked to its Wikipedia page, which provides a useful description of the mentioned entity. The same Wikipedia page may correspond to many different entity mentions. Likewise, the same entity mention may refer to different Wikipedia pages depending on its context. For instance, as shown in Fig. 4.1, based on the context, “France” is linked to the Wikipedia page of “France national football team” instead of the country. The specific entity in the knowledge base can be inferred from the context information. In such cases, we believe Wikipedia provides valuable complementary information to the current pretrained CWRs such as BERT and ELMo. To incorporate such information during training, we automatically construct a hyperlink-enriched dataset from Wikipedia. Prior work has used similar resources . The dataset consists of sentences with contextualized entity mentions and their corresponding descriptions obtained via hyperlinked Wikipedia pages. When processing descriptions, we only keep the first 100 word tokens at most as the description of a Wikipedia page; similar truncation has been done in prior work . For context sentences, we remove those without hyperlinks from the training data and duplicate those with multiple hyperlinks. We also remove context sentences for which we cannot find matched Wikipedia descriptions. These processing steps result in a training set of approximately 85 million instances and over 3 million unique entities. Same as the original ELMo, each log loss is approximated with negative sampling . We write EntELMo to denote the model trained by Eq. . When using EntELMo for contextualized entity representations and descriptive entity representations, we use it analogously to ELMo. To evaluate CERs and DERs, we propose a EntEval comprised of a wide range of entity related tasks. Specifically, EntEval consists of the tasks below: • The task of entity typing is to assign types to an entity given only the context of the entity mention. ET is context-sensitive, making it an effective approach to probe the knowledge of context encoded in pretrained representations. • Given two entities and the associated context, coreference arc prediction seeks to determine whether they refer to the same entity. Solving this task may require the knowledge of entities. • The entity factuality prediction task involves determining the correctness of statements regarding entities. • The task of contexualized entity relationship prediction modeling determines the connection between two entities appeared in the same context. • Given two entities with their descriptions from Wikipedia, entity similarity and Relatedness is to determine their similarity or relatedness. • As another popular resource for common knowledge, we propose a entity relationship typing task, which uses Freebase for probing the encoded knowledge by classifying the types of relations between pair of entities. • Named entity disambiguation is the task of linking a named-entity mention to its corresponding instance in a knowledge base such as Wikipedia. 4.1.4 Experiments Setup. As a baseline for hyperlink-based training, we train EntELMo on our constructed dataset with only a bidirectional language model loss. Due to the limitation of computational resources, both variants of EntELMo are trained for one epoch with smaller dimensions than ELMo. We set the hidden dimension of each directional LSTM layer to be 600, and project it to 300 dimensions. The resulting vectors from each layer are thus 600 dimensional. We use 1024 as the negative sampling size for each positive word token. For bag-of-words reconstruction, we randomly sample at most 50 word tokens as positive samples from the the target word tokens. Other hyperparameters are the same as ELMo. EntELMo is implemented based on the official ELMo implementation. We evaluate the transferrability of ELMo, EntELMo, and BERT by using trainable mixing weights for each layer. For ELMo and EntELMo, we follow the recommendation from Peters et al. to first pass mixing weights through a softmax layer and then multiply the weighted-summed representations by a scalar. For BERT, we find it better to just use unnormalized mixing weights. In addition, we investigate per-layer performance for both models in Section 4.1.5. Code and data are available at https://github.com/ZeweiChu/EntEval. Results. Table 4.1 shows the performance of our models on the EntEval tasks. Our findings are detailed below: • Pretrained CWRs perform the best on EntEval overall, indicating that they capture knowledge about entities in contextual mentions or as entity descriptions. • BERT performs poorly on entity similarity and relatedness tasks. Since this task is zero-shot, it validates the recommended setting of finetuning BERT on downstream tasks, while the embedding of the token does not necessarily capture the semantics of the entity. • BERT-large is better than BERT-base on average, showing large improvements in ERT and NED. To perform well at ERT, a model must either glean particular relationships from pairs of lengthy entity descriptions or else leverage knowledge from pretraining about the entities considered. Relatedly, performance on NED is expected to increase with both the ability to extract knowledge from descriptions and by starting with increased knowledge from pretraining. The Large model appears to be handling these capabilities better than the Base model. • EntELMo improves over the EntELMo baseline on some tasks but suffers on others. The hyperlink-based training helps on CERP, EFP, ET, and NED. Since the hyperlink loss is closely-associated to the NED problem, it is unsurprising that NED performance is improved. Overall, we believe that hyperlink-based training benefits contextualized entity representations but does not benefit descriptive entity representations . This pattern may be due to the difficulty of using descriptive entity representations to reconstruct their appearing context. 4.1.5 Analysis Is descriptive entity representation necessary? A natural question to ask is whether the entity description is needed, as for humans, the entity names carry sufficient amount of information for a lot of tasks. To answer this question, we experiment with encoding entity names by the descriptive entity encoder for ERT and NED tasks. The results in Table 4.2 show that encoding the entity names by themselves already captures a great deal of knowledge regarding entities, especially for CoNLL-YAGO. However, in tasks like ERT, the entity descriptions are crucial as the names do not reveal enough information to categorize their relationships. Table 4.3 reports the performance of different descriptive entity representations on the CoNLL-YAGO task. The three models all use ELMo as the context encoder. “ELMo” encodes the entity name with ELMo as descriptive encoder, while both Gupta et al. and Deep ED use their trained static entity embeddings. As Gupta et al. and Deep ED have different embedding sizes from ELMo, we add an extra linear layer after them to map to the same dimension. These two models are designed for entity linking, which gives them potential advantages. Even so, ELMo outperforms them both by a wide margin. Per-Layer Analysis. We evaluate each ELMo and EntELMo layer, i.e., the character CNN layer and two bidirectional LSTM layers, as well as each BERT layer on the EntEval tasks. Fig. 4.2 reveals that for ELMo models, the first and second LSTM layers capture most of the entity knowledge from context and descriptions. The BERT layers show more diversity. Lower layers perform better on ESR , while for other tasks higher layers are more effective. This paper is available on arxiv under CC 4.0 license. Our implementation is available at https://github.com/mingdachen/bilm-tf We note that the numbers reported here are not strictly comparable to the ones in their original paper since we keep all the top 30 candidates from Crosswiki while prior work employs different pruning heuristics. Author: Mingda Chen. Author: Author: Mingda Chen. Table of Links Abstract Acknowledgements 1 INTRODUCTION 1.1 Overview 1.2 Contributions 2 BACKGROUND 2.1 Self-Supervised Language Pretraining 2.2 Naturally-Occurring Data Structures 2.3 Sentence Variational Autoencoder 2.4 Summary 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.3 Summary 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4.1 Learning Entity Representations from Hyperlinks 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.3 Learning Concept Hierarchies from Document Categories 4.4 Summary 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.3 Summary 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6.1 Long-Form Data-to-Text Generation 6.2 Long-Form Text Summarization 6.3 Story Generation with Constraints 6.4 Summary 7 CONCLUSION APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX B - APPENDIX TO CHAPTER 6 BIBLIOGRAPHY Abstract Abstract Abstract Acknowledgements Acknowledgements Acknowledgements 1 INTRODUCTION 1 INTRODUCTION 1 INTRODUCTION 1 INTRODUCTION 1.1 Overview 1.1 Overview 1.1 Overview 1.2 Contributions 1.2 Contributions 1.2 Contributions 2 BACKGROUND 2 BACKGROUND 2 BACKGROUND 2 BACKGROUND 2.1 Self-Supervised Language Pretraining 2.1 Self-Supervised Language Pretraining 2.1 Self-Supervised Language Pretraining 2.2 Naturally-Occurring Data Structures 2.2 Naturally-Occurring Data Structures 2.2 Naturally-Occurring Data Structures 2.3 Sentence Variational Autoencoder 2.3 Sentence Variational Autoencoder 2.3 Sentence Variational Autoencoder 2.4 Summary 2.4 Summary 2.4 Summary 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.3 Summary 3.3 Summary 3.3 Summary 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4.1 Learning Entity Representations from Hyperlinks 4.1 Learning Entity Representations from Hyperlinks 4.1 Learning Entity Representations from Hyperlinks 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.3 Learning Concept Hierarchies from Document Categories 4.3 Learning Concept Hierarchies from Document Categories 4.3 Learning Concept Hierarchies from Document Categories 4.4 Summary 4.4 Summary 4.4 Summary 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.3 Summary 5.3 Summary 5.3 Summary 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6.1 Long-Form Data-to-Text Generation 6.1 Long-Form Data-to-Text Generation 6.1 Long-Form Data-to-Text Generation 6.2 Long-Form Text Summarization 6.2 Long-Form Text Summarization 6.2 Long-Form Text Summarization 6.3 Story Generation with Constraints 6.3 Story Generation with Constraints 6.3 Story Generation with Constraints 6.4 Summary 6.4 Summary 6.4 Summary 7 CONCLUSION 7 CONCLUSION 7 CONCLUSION 7 CONCLUSION APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX B - APPENDIX TO CHAPTER 6 APPENDIX B - APPENDIX TO CHAPTER 6 APPENDIX B - APPENDIX TO CHAPTER 6 BIBLIOGRAPHY BIBLIOGRAPHY BIBLIOGRAPHY 4.1 Learning Entity Representations from Hyperlinks 4.1.1 Introduction Entity representations play a key role in numerous important problems including language modeling , dialogue generation , entity linking , and story generation . One successful line of work on learning entity representations has been learning static embeddings: that is, assign a unique vector to each entity in the training data . While these embeddings are useful in many applications, they have the obvious drawback of not accommodating unknown entities. Motivated by the recent success of contextualized word representations from pretrained models , we propose to encode the mention context or the description to dynamically represent an entity. In addition, we perform an in-depth comparison of ELMo and BERT-based embeddings and find that they show different characteristics on different tasks. We analyze each layer of the CWRs and make the following observations: • The dynamically encoded entity representations show a strong improvement on the entity disambiguation task compared to prior work using static entity embeddings. • BERT-based entity representations require further supervised training to perform well on downstream tasks, while ELMo-based representations are more capable of performing zero-shot tasks. • In general, higher layers of ELMo and BERT-based CWRs are more transferable to entity-related tasks. To further improve contextualized and descriptive entity representations , we leverage natural hyperlink annotations in Wikipedia. We identify effective objectives for incorporating the contextual information in hyperlinks and improve ELMo-based CWRs on a variety of entity related tasks. 4.1.2 Related Work The training objectives considered in this work are built on previous works that involve reasoning over entities. We give a brief overview of relevant works. Entity linking is a fundamental task in information extraction with a wealth of literature . The goal of this task is to map a mention in context to the corresponding entity in a database. A natural approach is to learn entity representations that enable this mapping. Recent works focused on learning a fixed embedding for each entity using Wikipedia hyperlinks . Gupta et al. additionally train context and description embeddings jointly, but this mainly aims to improve the quality of the fixed entity embeddings rather than using the context and description embeddings directly; we find that their context and description encoders perform poorly on EntEval tasks. A closely related concurrent work by jointly encodes a mention in context and an entity description from Wikipedia to perform zero-shot entity linking. In contrast, here we seek to pretrain a general-purpose entity representations that can function well either given or not given entity descriptions or mention contexts. Other entity-related tasks involve entity typing and coreference resolution . 4.1.3 Method We are interested in two approaches: contextualized entity representations and descriptive entity representations , both encoding fixed-length vector representations for entities. The contextualized entity representations encodes an entity based on the context it appears regardless of whether the entity is seen before. The motivation behind contextualized entity representations is that we want an entity encoder that does not depend on entries in a knowledge base, but is capable of inferring knowledge about an entity from the context it appears. As opposed to contextualized entity representations, descriptive entity representations do rely on entries in Wikipedia. We use a model-specific function f to obtain a fixed-length vector representation from the entity’s textual description. Encoders for Descriptive Entity Representations. We encode an entity description by treating the entity description as a sentence, and use the average of the hidden states from ELMo as the entity description representation. With BERT, we use the output from the token as the description representation. Encoders for Descriptive Entity Representations. Hyperlink-Based Training. An entity mentioned in a Wikipedia article is often linked to its Wikipedia page, which provides a useful description of the mentioned entity. The same Wikipedia page may correspond to many different entity mentions. Likewise, the same entity mention may refer to different Wikipedia pages depending on its context. For instance, as shown in Fig. 4.1, based on the context, “France” is linked to the Wikipedia page of “France national football team” instead of the country. The specific entity in the knowledge base can be inferred from the context information. In such cases, we believe Wikipedia provides valuable complementary information to the current pretrained CWRs such as BERT and ELMo. Hyperlink-Based Training. To incorporate such information during training, we automatically construct a hyperlink-enriched dataset from Wikipedia. Prior work has used similar resources . The dataset consists of sentences with contextualized entity mentions and their corresponding descriptions obtained via hyperlinked Wikipedia pages. When processing descriptions, we only keep the first 100 word tokens at most as the description of a Wikipedia page; similar truncation has been done in prior work . For context sentences, we remove those without hyperlinks from the training data and duplicate those with multiple hyperlinks. We also remove context sentences for which we cannot find matched Wikipedia descriptions. These processing steps result in a training set of approximately 85 million instances and over 3 million unique entities. Same as the original ELMo, each log loss is approximated with negative sampling . We write EntELMo to denote the model trained by Eq. . When using EntELMo for contextualized entity representations and descriptive entity representations, we use it analogously to ELMo. To evaluate CERs and DERs, we propose a EntEval comprised of a wide range of entity related tasks. Specifically, EntEval consists of the tasks below: • The task of entity typing is to assign types to an entity given only the context of the entity mention. ET is context-sensitive, making it an effective approach to probe the knowledge of context encoded in pretrained representations. • Given two entities and the associated context, coreference arc prediction seeks to determine whether they refer to the same entity. Solving this task may require the knowledge of entities. • The entity factuality prediction task involves determining the correctness of statements regarding entities. • The task of contexualized entity relationship prediction modeling determines the connection between two entities appeared in the same context. • Given two entities with their descriptions from Wikipedia, entity similarity and Relatedness is to determine their similarity or relatedness. • As another popular resource for common knowledge, we propose a entity relationship typing task, which uses Freebase for probing the encoded knowledge by classifying the types of relations between pair of entities. • Named entity disambiguation is the task of linking a named-entity mention to its corresponding instance in a knowledge base such as Wikipedia. 4.1.4 Experiments Setup. As a baseline for hyperlink-based training, we train EntELMo on our constructed dataset with only a bidirectional language model loss. Due to the limitation of computational resources, both variants of EntELMo are trained for one epoch with smaller dimensions than ELMo. We set the hidden dimension of each directional LSTM layer to be 600, and project it to 300 dimensions. The resulting vectors from each layer are thus 600 dimensional. We use 1024 as the negative sampling size for each positive word token. For bag-of-words reconstruction, we randomly sample at most 50 word tokens as positive samples from the the target word tokens. Other hyperparameters are the same as ELMo. EntELMo is implemented based on the official ELMo implementation. We evaluate the transferrability of ELMo, EntELMo, and BERT by using trainable mixing weights for each layer. For ELMo and EntELMo, we follow the recommendation from Peters et al. to first pass mixing weights through a softmax layer and then multiply the weighted-summed representations by a scalar. For BERT, we find it better to just use unnormalized mixing weights. In addition, we investigate per-layer performance for both models in Section 4.1.5. Code and data are available at https://github.com/ZeweiChu/EntEval. Results . Table 4.1 shows the performance of our models on the EntEval tasks. Our findings are detailed below: Results • Pretrained CWRs perform the best on EntEval overall, indicating that they capture knowledge about entities in contextual mentions or as entity descriptions. • BERT performs poorly on entity similarity and relatedness tasks. Since this task is zero-shot, it validates the recommended setting of finetuning BERT on downstream tasks, while the embedding of the token does not necessarily capture the semantics of the entity. • BERT-large is better than BERT-base on average, showing large improvements in ERT and NED. To perform well at ERT, a model must either glean particular relationships from pairs of lengthy entity descriptions or else leverage knowledge from pretraining about the entities considered. Relatedly, performance on NED is expected to increase with both the ability to extract knowledge from descriptions and by starting with increased knowledge from pretraining. The Large model appears to be handling these capabilities better than the Base model. • EntELMo improves over the EntELMo baseline on some tasks but suffers on others. The hyperlink-based training helps on CERP, EFP, ET, and NED. Since the hyperlink loss is closely-associated to the NED problem, it is unsurprising that NED performance is improved. Overall, we believe that hyperlink-based training benefits contextualized entity representations but does not benefit descriptive entity representations . This pattern may be due to the difficulty of using descriptive entity representations to reconstruct their appearing context. 4.1.5 Analysis Is descriptive entity representation necessary? A natural question to ask is whether the entity description is needed, as for humans, the entity names carry sufficient amount of information for a lot of tasks. To answer this question, we experiment with encoding entity names by the descriptive entity encoder for ERT and NED tasks. The results Is descriptive entity representation necessary? in Table 4.2 show that encoding the entity names by themselves already captures a great deal of knowledge regarding entities, especially for CoNLL-YAGO. However, in tasks like ERT, the entity descriptions are crucial as the names do not reveal enough information to categorize their relationships. Table 4.3 reports the performance of different descriptive entity representations on the CoNLL-YAGO task. The three models all use ELMo as the context encoder. “ELMo” encodes the entity name with ELMo as descriptive encoder, while both Gupta et al. and Deep ED use their trained static entity embeddings. As Gupta et al. and Deep ED have different embedding sizes from ELMo, we add an extra linear layer after them to map to the same dimension. These two models are designed for entity linking, which gives them potential advantages. Even so, ELMo outperforms them both by a wide margin. Per-Layer Analysis. We evaluate each ELMo and EntELMo layer, i.e., the character CNN layer and two bidirectional LSTM layers, as well as each BERT layer on the EntEval tasks. Fig. 4.2 reveals that for ELMo models, the first and second LSTM layers capture most of the entity knowledge from context and descriptions. The BERT layers show more diversity. Lower layers perform better on ESR , while for other tasks higher layers are more effective. Per-Layer Analysis. This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv Our implementation is available at https://github.com/mingdachen/bilm-tf We note that the numbers reported here are not strictly comparable to the ones in their original paper since we keep all the top 30 candidates from Crosswiki while prior work employs different pruning heuristics.

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

Write Comment

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Gene Linked to Learning Difficulties Has Direct Impact on Learning and MemoryScience, Space and Technology News 2024
Read more »

Machine Learning vs. Deep Learning: What's the Difference?Artificial intelligence technology is undergirded by two intertwined forms of automation.
Read more »

The Power of Universal Semantic Layers: Insights from Cube Co-founder Artyom KeydunovWhat is a universal semantic layer, and how is it different from a semantic layer? Is there actual semantics involved? Who uses that, how, and what for?
Read more »

How to disable learning on the Nest Learning ThermostatYou'll want to disable learning on the Nest Learning Thermostat if you want to run your own heating and cooling schedule. Here's how it works.
Read more »

Researchers conduct Wikipedia study to motivate experts to contribute to open contentGetting experts to contribute to open content, such as Wikipedia, is not an easy task as experts often have high demands on their time. But one way to increase expert contributions is to understand what motivates them to contribute, a University of Michigan study shows.
Read more »

Leveraging Natural Supervision: Learning Semantic Knowledge from WikipediaIn this study, researchers exploit rich, naturally-occurring structures on Wikipedia for various NLP tasks.
Read more »