字詞網絡︰ WordNet 《一》 索引

Natural Language Processing with Python
— Analyzing Text with the Natural Language Toolkit

Steven Bird, Ewan Klein, and Edward Loper

一書第二章第五節《 2.5 WordNet 》之『字詞網絡』概念階層片段

※或可參考【譯著

wordnet-hierarchy

Figure 2-8. Fragment of WordNet concept hierarchy: Nodes correspond to synsets; edges indicate the hypernym/hyponym relation, i.e., the relation between superordinate and subordinate concepts.

WordNet』字詞網絡計畫啟始於一九八五年,在普林斯頓大學『認知科學實驗室』由心理學教授『喬治‧A‧米勒』 George Armitage Miller 的指導下建立和維護的英語『詞彙資料庫』 lexical database 字典。因為它包含了多種『字詞』間之『語義關係』,所以別於通常意義下的『字典』。『WordNet』是什麼?也許最好先讀讀『創造者』怎麼說︰

What is WordNet?

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the creators of WordNet and do not necessarily reflect the views of any funding agency or Princeton University.When writing a paper or producing a software application, tool, or interface based on WordNet, it is necessary to properly cite the source. Citation figures are critical to WordNet funding.

About WordNet

WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing.

WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings. However, there are some important distinctions. First, WordNet interlinks not just word forms—strings of letters—but specific senses of words. As a result, words that are found in close proximity to one another in the network are semantically disambiguated. Second, WordNet labels the semantic relations among words, whereas the groupings of words in a thesaurus does not follow any explicit pattern other than meaning similarity.

Structure

The main relation among words in WordNet is synonymy, as between the words shut and close or car and automobile. Synonyms–words that denote the same concept and are interchangeable in many contexts–are grouped into unordered sets (synsets). Each of WordNet’s 117 000 synsets is linked to other synsets by means of a small number of “conceptual relations.” Additionally, a synset contains a brief definition (“gloss”) and, in most cases, one or more short sentences illustrating the use of the synset members. Word forms with several distinct meanings are represented in as many distinct synsets. Thus, each form-meaning pair in WordNet is unique.

Relations

The most frequently encoded relation among synsets is the super-subordinate relation (also called hyperonymy, hyponymy or ISA relation). It links more general synsets like {furniture, piece_of_furniture} to increasingly specific ones like {bed} and {bunkbed}. Thus, WordNet states that the category furniture includes bed, which in turn includes bunkbed; conversely, concepts like bed and bunkbed make up the category furniture. All noun hierarchies ultimately go up the root node {entity}. Hyponymy relation is transitive: if an armchair is a kind of chair, and if a chair is a kind of furniture, then an armchair is a kind of furniture. WordNet distinguishes among Types (common nouns) and Instances (specific persons, countries and geographic entities). Thus, armchair is a type of chair, Barack Obama is an instance of a president. Instances are always leaf (terminal) nodes in their hierarchies.

Meronymy, the part-whole relation holds between synsets like {chair} and {back, backrest}, {seat} and {leg}. Parts are inherited from their superordinates: if a chair has legs, then an armchair has legs as well. Parts are not inherited “upward” as they may be characteristic only of specific kinds of things rather than the class as a whole: chairs and kinds of chairs have legs, but not all kinds of furniture have legs.

Verb synsets are arranged into hierarchies as well; verbs towards the bottom of the trees (troponyms) express increasingly specific manners characterizing an event, as in {communicate}-{talk}-{whisper}. The specific manner expressed depends on the semantic field; volume (as in the example above) is just one dimension along which verbs can be elaborated. Others are speed (move-jog-run) or intensity of emotion (like-love-idolize). Verbs describing events that necessarily and unidirectionally entail one another are linked: {buy}-{pay}, {succeed}-{try}, {show}-{see}, etc.

Adjectives are organized in terms of antonymy. Pairs of “direct” antonyms like wet-dry and young-old reflect the strong semantic contract of their members. Each of these polar adjectives in turn is linked to a number of “semantically similar” ones: dry is linked to parched, arid, dessicated and bone-dry and wet to soggy, waterlogged, etc. Semantically similar adjectives are “indirect antonyms” of the contral member of the opposite pole. Relational adjectives (“pertainyms”) point to the nouns they are derived from (criminal-crime).
There are only few adverbs in WordNet (hardly, mostly, really, etc.) as the majority of English adverbs are straightforwardly derived from adjectives via morphological affixation (surprisingly, strangely, etc.)

Cross-POS relations

The majority of the WordNet’s relations connect words from the same part of speech (POS). Thus, WordNet really consists of four sub-nets, one each for nouns, verbs, adjectives and adverbs, with few cross-POS pointers. Cross-POS relations include the “morphosemantic” links that hold among semantically similar words sharing a stem with the same meaning: observe (verb), observant (adjective) observation, observatory (nouns). In many of the noun-verb pairs the semantic role of the noun with respect to the verb has been specified: {sleeper, sleeping_car} is the LOCATION for {sleep} and {painter}is the AGENT of {paint}, while {painting, picture} is its RESULT.

More Information

Fellbaum, Christiane (2005). WordNet and wordnets. In: Brown, Keith et al. (eds.), Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, 665-670

 

若想進一步了解,可以閱讀『喬治‧A‧米勒』 與其他作者所寫的《五篇 WordNet 介紹論文集》︰

Introduction to WordNet: An On-line Lexical Database …

wordnetcode.princeton.edu/5papers.pdf
由 GA Miller 著作

Introduction to WordNet: An On-line Lexical Database. George A. Miller, Richard Beckwith, Christiane Fellbaum,. Derek Gross, and Katherine Miller. (Revised August 1993)

 

【第一篇論文‧摘要】

Introduction to WordNet: An On-line Lexical Database

George A. Miller, Richard Beckwith, Christiane Fellbaum,
Derek Gross, and Katherine Miller

(Revised August 1993)

WordNet is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, and adjectives are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.

 

【第二篇論文‧摘要】

Nouns in WordNet: A Lexical Inheritance System

George A. Miller

(Revised August 1993)

Definitions of common nouns typically give a superordinate term plus distinguishing features; that information provides the basis for organizing noun files in WordNet. The superordinate relation (hyponymy) generates a hierarchical semantic organization that is duplicated in the noun files by the use of labeled pointers between sets of synonyms (synsets). The hierarchy is limited in depth, seldom exceeding more than a dozen levels. Distinguishing features are entered in such a way as to create a lexical inheritance system, a system in which each word inherits the distinguishing features of all its superordinates. Three types of distinguishing features are discussed: attributes (modification), parts (meronymy), and functions (predication), but only meronymy is presently implemented in the noun files. Antonymy is also found between nouns, but it is not a fundamental organizing principle for nouns. Coverage is partitioned into twenty-five topical files, each of which deals with a different primitive semantic component.

 

【第三篇論文‧摘要】

Adjectives in WordNet

Christiane Fellbaum, Derek Gross, and Katherine Miller

(Revised August 1993)

WordNet divides adjectives into two major classes: descriptive and relational. Decriptive adjectives ascribe to their head nouns values of (typically) bipolar attributes and consequently are organized in terms of binary oppositions (antonymy) and similarity of meaning (synonymy). Descriptive adjectives that do not have direct antonyms are said to have indirect antonyms by virtue of their semantic similarity to adjectives that do have direct antonyms. WordNet contains pointers between descriptive adjectives expressing a value of an attribute and the noun by which that attribute is lexicalized. Reference-modifying adjectives have special syntactic properties that distinguish them from other descriptive adjectives. Relational adjectives are assumed to be stylistic variants of modifying nouns and so are cross-referenced to the noun files. Chromatic color adjectives are regarded as a special case.

 

【第四篇論文‧摘要】

English Verbs as a Semantic Net

Christiane Fellbaum

This paper describes the semantic network of English verbs in WordNet. The semantic relations used to build networks of nouns and adjectives cannot be applied without modification, but have to be adapted to fit the semantics of verbs, which differ substantially
from those of the other lexical categories. The nature of these relations is discussed, as is their distribution throughout different semantic groups of verbs, which determines certain idiosyncratic patterns of lexicalization. In addition, four variants of lexical entailment are distinguished, which interact in systematic ways with the semantic relations. Finally, the lexical properties of the different verb groups are outlined.

 

【第五篇論文】

Design and Implementation of the WordNet Lexical Database
and Searching Software†

Richard Beckwith, George A. Miller, and Randee Tengi

 

 

── 自主學習常始於查詢、摘要、書單以及索引 ──