How to Make a Dictionary
Course by Prof. Dafydd Gibbon
All information about the How to Make a Dictionary course can be found on Prof. Dafydd Gibbon's website.
Goals of the class:
"The goal of the class is to promote the understanding of how dictionaries are constructed, what kinds of dictionaries there are, what information dictionaries contain, and what evidence about the language dictionaries are based on." [1]
General description of the class:
"The class introduces a systematic approach to lexicography, and covers word forms, word meanings (lexical semantics), inflectional morphology, word formation (derivational and compositional morphology), and practical dictionary construction on the basis of text corpus analysis."[1]
Programme
- Introduction, Text Theory.
- Defining "definition".
- The architecture of a dictionary.
- Dictionaries as databases:
- Types of lexical information:
- Word forms: orthography, phonology
- Word structure: morphology
- Word meanings: lexical semantics
- Syntax
- Computational Lexicography
Intoduction, Text Theory
Lecture 1
17th October 2006
Overview:
Organisational information and introduction to the course. Instruction of how to make an electronic portfolio. Introduction of terms such as portfolio, blog, website, hypertext, text and its properties, dictionary and its diversity, text linguistics. Presentation of text theory and its applications.
What is a portfolio?
- A required reading on this topic is A Note on a Learner Portfolio.[2]
- "In education, portfolio refers to a personal collection of information describing and documenting a person's achievements and learning. There is a variety of portfolios ranging from learning logs to extended collections of achievement evidence. Portfolios are used for many different purposes such as accreditation of prior experience, job search, continuing professional development, certification of competences."[3]
- "Learning Logs are a unique personalised learning resource for children. In the learning Logs, the children record their responses to learning challenges set by their teachers. Each log is a unique record of the child's thinking and learning."[4]
- "An electronic portfolio, also known as an e-portfolio or digital portfolio, is a cohesive, powerful, and well-designed collection of electronic documents that demonstrate your skills, education, professional development, and the benefits you offer to a target reader. A recent development has been the open source portfolio movement on e-portfolios.
An e-portfolio can be seen as a type of learning record that provides actual evidence of achievement. Learning records are closely related to the Learning Plan, an emerging tool that is being used to manage learning by individuals, teams, communities of interest and organizations.
The recent explosion of knowledge, information and learning technologies has led to the development of digital portfolios or electronic portfolios, commonly referred as ePortfolios."[5] - "A Learning Plan is a document (possibly an interactive or on-line document) that is used to plan learning, usually over an extended period of time."[6]
Why is a portfolio important?
- assessment of learning outcomes
- basic input to later learning stages and to preparation for specific tasks, eg. a final exam
A portfolio is important because it provides a systematic overview of what a student covers in class. It helps a student study and prepare for the exam. It is also of great use to the teacher who can see which part of the material needs more explanation, what was not understood in class and causes problems. A portfolio is helpful for both a student and the teacher.
What should a portfolio contain, and how are these components defined?
- Table of contents - contains all the topics covered in the course.
- Introduction - what do I expect to learn in the course? How can I use gained knowledge?
- Learner's Diary - contains the topic of a lecture, the date of the lecture, the content of the class, evaluation of the lecture and glossary entries.
- Exercises and homework.
- Glossary - technical terms introduced in the class. The glossary is designed according to the dictionary making principles (terms in alphabetical order, definition with examples, link to the class in which the term was introduced).
- Evaluation - overview of a single class and assessment.
Why should the portfolio be on a website?
- easier access and interaction than via paper/email
- means of becoming familiar with everyday use of electronic media
- a form of "Applied Text Linguistics"
- a source of material/tasks for the class [7].
What is a website?
A hypertext document published on the web with:
- embedded document objects
- linked document objects
and therefore a text...
"A website is an HTML document that is accessible on the Web." [8]
"HTML Document - a document written in HyperText Markup Language." [8]
"HTML - an acronym for HyperText Markup Language, HTML is the language used to tag various parts of a Web document so browsing software will know how to display that document's links, text, graphics and attached media." [8]
A markup language combines text and extra information about the text. The extra information, for example about the text's structure or presentation, is expressed using markup, which is intermingled with the primary text. The best-known markup language in modern use is HTML (HyperText Markup Language), one of the foundations of the World Wide Web. Historically, markup was (and is) used in the publishing industry in the communication of printed work between authors, editors, and printers.
How do you make a website?
- run your own web server - on a DSL line (with the Apache server),save your HTML files
- use the university website - and upload your HTML files
- use another web service provider - and upload your HTML files
- use blogging software - and make a weblog (blog) [7].
What is a hypertext?
A hypertext is a text connected electronically with other texts.
"Hypertext - this term describes the system that allows documents to be cross- linked in such a way that the reader can explore related documents by clicking on a highlighted word or symbol."[8]
A hypertext document is a text
- either with conventional hierarchical parts
- or as a complex network of parts
A recursive definition of a hypertext:
- A hypertext is a text.
- A hypertext is a text connected with another hypertext.
- Nothing else is a hypertext.
For example:
- Any document on the World Wide Web
- electronic dictionary
- blog
- e-commerce site
- Google (and of course this slide, since it is linked...)
- A help document for a computer application [7] - When you click on the "Help" or "?" menu item of an application, you generally get a Windows help text with linked subtexts.
What is a text?
The term "text" has multiple meanings depending on the context of its use:
- In language, text is a broad term for something that contains words to express something.
- In linguistics a text is a communicative act, fulfilling the seven constitutive and the three regulative principles of textuality. Both speech and written language, or language in other media can be seen as a text within linguistics.
- In literary theory a text is the object being studied, whether it be a novel, a poem, a film, an advertisement, or anything else with a semiotic component. The broad use of the term derives from the rise of semiotics in the 1960s and was solidified by the later cultural studies of the 1980s, which brought a corresponding broadening of what it was one could talk about when talking about literature; see also discourse.
- In mobile phone communication, a text (or text message) is a short digital message between devices, typically using SMS (short message service). The act of sending such a message is commonly referred to as texting.
- In computing, text refers to character data, or to one of the segments of a program in memory.
- In academics, text is often used as a short form for textbook.
- In hip hop, text is a thriving form of post-modern poetry popular on the Internet. It is also referred to as scrypt. [9]
TEXT THEORY - Theory of Appearance, Structure and Meaning of a text.
Text theory treats of text structure (formulation), text manifestation (appearance) and meaning of text.
Which properties does a text have?
- APPEARANCE - Media (paper, electronic media)
- MEANING - Semantics, Pragmatics
- FORMULATION - Text Structure
How do these properties relate
- to the mind? - Formulation of the text is created in the mind, one gives it meaning (sense, semantic interpretation) as well as appearance (style, madia interpretation). Structure of text to express meaning; our mind - processes of our mind
- to the world? - Appearance of the text is what we can see (read) or what we can hear (spoken text). The media is produced (we have a meaning and we produce an utterance. The meaning is what we can deduce from the message (media) - extract the meaning. In other words, appearance and meaning are found in the Shared World - it is what the creator of a text "shares" (sends out) with others.The Shared World = The Shared Knowledge - what people know about the world.
Properties of Text: Text Theory [7]
What is Text Linguistics?
"Text linguistics is a branch of linguistics that deals with texts as communication systems. Its original aims lay in uncovering and describing text grammars. The application of text linguistics has, however, evolved from this approach to a point in which text is viewed in much broader terms that go beyond a mere extension of traditional grammar towards an entire text. Text linguistics takes into account the form of a text, but also its setting, i.e. the way in which it is situated in an interactional, communicative context. Both the author of a (written or spoken) text as well as its addressee are taken into consideration in their respective (social and/or institutonal) roles in the specific communicative context. In general it is an application of at the much broader level of text, rather than just a sentence or word."[10]
What is Applied Text Linguistics? - e.g. website building.
A required reading for all linguists: Gibbon, D. 2004. What a Linguist Needs to Know about Word Processing.
Consequences for a linguistic theory of text?
Text Theory applications
Examples of texts and documents:
- Books: novels, technical handbooks, dictionaries, ...
- Periodicals: newspapers, scientific journals, ...
- The web.
What is a dictionary?
Dictionaries are texts, documents with a structure, meaning and form.
- Structure
How is the dictionary, as a book, structured?
- metadata of the dictionary (cover, title page, author, introduction, index, abbreviations for economical reason)
- content
- cover - gives the title, the name of the author and protects the content of the dictionary
- title page - gives the title, the name of the author, the year of publishing, the name of the publisher, etc.
- index
- introductions
- abbreviations - explains abbreviations used in the dictionary
- content - entries with explanations. The kind of explanation depends on the type of a dictionary. Eg. different kinds of dictionaries may give the explanations of unknown words, origins of words, other forms of a given word, etc.
- Meaning
What is the "meaning" (= content) of a dictionary?
- Form
What kinds of "appearance" can a dictionary have?
- Semasiological dictionary (reader's dictionary, decoding dictionary) - looking for the meaning.
- Onomasiological dictionary - Thesaurus (writer's dictionary, encoding dictionary) - looking for the form.
What does the table of contents of a dictionary look like and what are the parts of the book intended for?
The "meaning" of a dictionary is INFORMATION.
The meaning of a dictionary depends on the kind of a dictionary. Eg. The New Kosciuszko Foundation Dictionary English-Polish, Polish-English contains 140,000 head-words, 400,000 meanings and 100,000 idioms and fixed phrases. The dictionary is addressed to native speakers of Polish, but may be successfully used by native speakers of English. [11]
What is the difference between a semasiological dictionary and an onomasiological dictionary?
In an onomasiological dictionary you know the meaning, you are looking for the form, i.e. with a thesaurus, not with an ordinary dictionary, which is semasiological: you know the form and are looking for the meaning.
Remember: with a SEMAsiological dictionary you have the form and are looking fore the SEMAntics, i.e. the meaning.
What other kinds of dictionaries are there?
- monolingual dictionaries
- bilingual dictionaries
- multilingual dictionaries
- crossword dictionaries
- thesaurus
- synonym dictionary
- etymological dictionaries
- technical terms
- phrasal verbs
- learners dictionaries
- pronunciation
- Slang/jargon
- web-based
- proverb
- illustrated dictionary
- glossary
- idioms
- collocation dictionary
- name dictionary
- vocabulary book - made by students
Homework
What are dictionaries, lexicons, encyclopedias..."language"?
Dictionaries are about words, whereas encyclopaedias are about topics, the whole issues.
A dictionary is a list of words with their definitions, a list of characters with their glyphs, or a list of words with corresponding words in other languages. In a few languages, words can appear in many different forms, but only the lemma form appears as the main word or headword in most dictionaries. Many dictionaries also provide pronunciation information; grammatical information; word derivations, histories, or etymologies; illustrations; usage guidance; and examples in phrases or sentences.
Dictionaries are most commonly found in the form of a book. Some dictionaries are also found in electronic portable handheld devices.
An encyclopedia, or encyclopaedia, is a comprehensive written compendium that contains information on all branches of knowledge or a particular branch of knowledge.
A lexicon is usually a list of words together with additional word-specific information, i.e., a dictionary. Lexicon is a word of Greek origin meaning vocabulary. When linguists study the lexicon, they study such things as what words are, how the vocabulary in a language is structured, how people use and store words, how they learn words, the history and evolution of words, types of relationships between words as well as how words were created.
The term is also sometimes used in the title of an encyclopedic dictionary or an encyclopedia, especially for 19th century works and those written in German (lexikon).
In linguistics, lexicon has a slightly more specialized definition, as it includes the lexemes used to actualize words. Lexemes are formed according to morpho-syntactic rules and express sememes. In this sense, a lexicon organizes the mental vocabulary in a speaker's mind: First, it organizes the vocabulary of a language according to certain principles (for instance, all verbs of motion may be linked in a lexical network) and second, it contains a generative device producing (new) simple and complex words according to certain lexical rules. For example, the suffix '-able' can be added to transitive verbs only such that we get 'read-able' but not '*cry-able'. (Though exceptions exist to this rule: one can certainly imagine a 'sleepable mattress' or the expression, 'Sure, that's workable.')
Furthermore an individual's lexical knowledge (or lexical concept) is that person's knowledge of vocabulary.
A lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of words that are different forms of "the same word". For example, English run, runs, ran and running are forms of the same lexeme. A related concept is the lemma (or citation form), which is a particular form of a lexeme that is chosen by convention to represent a canonical form of a lexeme. Lemmas are used in dictionaries as the headwords, and other forms of a lexeme are often listed later in the entry if they are unusual in some way.
A lexeme belongs to a particular syntactic category, has a particular meaning (semantic value), and in inflecting languages, has a corresponding inflectional paradigm; that is, a lexeme in many languages will have many different forms. For example, the lexeme for run has a present third person singular form runs, a present non-third-singular form run, a past form ran, and a present participle running. The use of the forms of a lexeme is governed by rules of grammar; in the case of English verbs such as run, these include subject-verb agreement and compound tense rules, which determine which form of a verb can be used in a given sentence.
A lexicon consists of lexemes.
In many formal theories of language, lexemes have subcategorization frames to account for the number and types of complements they occur within sentences and other syntactic structures.
The notion of a lexeme is very central to morphology, and thus, many other notions can be defined in terms of it. For example, the difference between inflection and derivation can be stated in terms of lexemes:
- Inflectional rules relate a lexeme to its forms.
- Derivational rules relate a lexeme to another lexeme.
A language is a system of signals, including voice sounds, gestures or written symbols which encodes and decodes information.
Human spoken and written languages are systems of symbols (sometimes known as lexemes) and the grammars (rules) by which the symbols are manipulated. "Language" is also used to refer to common properties of languages.
Language learning is normal in human childhood. Most human languages use patterns of sound or gesture for symbols which enable communication with others around them. There are thousands of human languages, and these seem to share certain properties, even though many shared properties have exceptions.
How would you find the "best" English dictionary?
I would ask my English teacher, friends and I would go to a bookshop to ask about the best and most popular dictionaries. I would take a look at the number of head-words and meanings the dictionaries offer and on the appearence.
Task: Set up a questionnaire of questions about dictionaries, and ask 3 people to respond to it before next week.
The questionnaire is designed for students of English at Universität Bielefeld
- How old are you? ........................
- How long have you been studying English? ..........................
- Do you use dictionaries? Yes. [ ] No. [ ]
- What kind of dictionaries do you use and how often? (Tick the proper box.)
Types of Dictionaries All the time Very often Often Seldom Not at all English-English English-German/German-English Synonym dictionary Technical dictionary Pronunciation dictionary Slang/Jargon Proverb dictionary Illustrated dictionary Pocket dictionary English-German Pocket dictionary German-English Pocket dictionary English-English Other types: .................................
- What is the name of the dictionary you use regularly?............................................
- If you were to choose between two excellent dictionaries, would you take the one that offers a digital version of a dictionary on a CD or the one that is bigger?
the dictionary with a CD [ ] the bigger dictionary [ ]
Justify your answer: I would choose.................................., because ..................................... .
- Do you use electronic dictionaries on CDs?
Yes. [ ] No. [ ] - Do you use on-line dictionaries?
Yes. [ ] No. [ ] - Which of the on-line dictionaries would you recommend?.................................................
- Do you prefer using digital dictionaries (on-line, CDs) to looking words up in a paper dictionary?
Yes. [ ] No. [ ] - Are you planning to buy a paper dictionary in the future?
Yes. [ ] No. [ ] No. I use only on-line dictionaries. [ ] - How many dictionaries do you have at home? Paper: .... Digital: ...
Thank you!
Results of the questionnaire for students of English at Universität Bielefeleld
Jolanta Bachan, 23-10-2006
Three students took part in the questionnaire:
x - one person ticked the box
X - more than one person ticked the box
- The age of people who did the questionnaire: 22, 23, 26.
- They studied English for 11, 13 and 12 years, respectively.
- Do you use dictionaries? Yes. [ X ] No. [ ]
- What kind of dictionaries do you use and how often? (Tick the proper box.)
Types of Dictionaries All the time Very often Often Seldom Not at all English-English x X English-German/German-English X x Synonym dictionary x X Technical dictionary X x Pronunciation dictionary x; x x Slang/Jargon x X Proverb dictionary x X Illustrated dictionary X Pocket dictionary English-German x X Pocket dictionary German-English x X Pocket dictionary English-English x X
Other types: on-line dictionaries - What is the name of the dictionary which you use regularly? OELD, Leo, PONS, Oxford Advanced Learner's Dictionary,
- If you were to choose between two excellent dictionaries, would you take the one that offers a digital version of a dictionary on a CD or the one that is bigger?
a) the dictionary with a CD [ x ] b) the bigger dictionary [ X ] - Do you use electronic dictionaries on CDs?
Yes. [ X ] No. [ x ] - Do you use on-line dictionaries?
Yes. [ X ] No. [ x ] - Which of the on-line dictionaries would you recommend?
leo.org, http://dict.tu-chemnitz.de/ - Do you prefer using digital dictionaries (on-line, CDs) to looking words up in a paper dictionary?
Yes. [ x ] No. [ X ] - Are you planning to buy a paper dictionary in the future?
Yes. [ X ] No. [ x ] No. I use only on-line dictionaries. [ ] - How many dictionaries do you have at home?
Paper: 7, 4, 4 Digital: 1, 1, 2 Favourite on-line dictionaries: Leo, -, 2
Justify your answer: I would choose a) or b) because a) I have enough dictionaries which are printed; b) prefer books, there are more entries.
References
[1] Gibbon, D. 16th October 2005. How to Make a Dictionary. http://wwwhomes.uni-bielefeld.de/~gibbon/Classes/Classes2006WS/HTMD/ 21/10/06.
[2] Gibbon, D. 2004. A note on Learner Portfolio. http://wwwhomes.uni-bielefeld.de/~gibbon/Docs/portfoliodescription.pdf 21/10/06.
[3] Wikipedia, the free encyclopedia. 11th September 2006. Portfolio. http://en.wikipedia.org/wiki/Portfolio 20/10/06.
[4] Wikipedia, the free encyclopedia. 23rd June 2006. Learning log. http://en.wikipedia.org/wiki/Learning_logs20/10/06.
[5] Wikipedia, the free encyclopedia. 20th October 2006. Electronic portfolio. http://en.wikipedia.org/wiki/EPortfolio 20/10/06.
[6] Wikipedia, the free encyclopedia. 27th August 2006. Learning plan. http://en.wikipedia.org/wiki/Learning_Plan 21/10/06
[7] Gibbon, D. 2006 How to Make a Dictionary: Organisation http://wwwhomes.uni-bielefeld.de/~gibbon/Classes/Classes2006WS/HTMD/htmd... 17/10/06.
[8] A Beginner's Web Glossary http://www.case.edu/help/webglossary.html 21/10/06.
[9] Wikipedia, the free encyclopedia. 18th September 2006. http://en.wikipedia.org/wiki/Text 21/10/06.
[10] Wikipedia, the free encyclopedia. 10th September 2006. http://en.wikipedia.org/wiki/Text_linguistics 22/10/06.
[11] Fisiak, J.(Ed.) 2003. The New Kosciuszko Foundation Dictionary. Kraków: Towarzystwo Autorów i Wydawców Prac Naukowych UNIVERSITAS.
On defining "definition"
Lecture 2
24th October 2006
Overview:
In the class we talked about types of information found in dictionaries; we also listed different kinds of dictionaries. The term "definition" and the structure of the definition was explained. Furthermore, examples of different kinds of definitions were shown.
THE "MEANING" OF A DICTIONARY: INFORMATION OF WORDS
Dictionary Information
- Metadata:
- catalogue information about the production of the dictionary, intended for dictionary identification
- Types of lexical information in dictionary entries:
- FORM (cf. appearance), e.g. spelling, pronunciation
- STRUCTURE (cf. formulation), e.g. construction of words, place of words in larger constructions (e.g. sentences)
- CONTENT (cf. meaning):
- definition
- relations with other words (e.g. synonyms, antonyms - opposite and compementary)
- examples [1]
Model of types of lexical information [1]
Basic definition types
- Good definitions:
- Standard dictionary definition: X is a Y kind of Z.
- Contextual definition (putting a word into context)
- Recursive definition (to define an infinitive set)
- Real definition:
- ostensive definitions (a definition by showing)
- models (illustration of a kind of reality) (e.g. the text/information model)
- Bad (but sometimes unavoidable) definitions:
- Circular definition [1]
What is the difference between:
- definition
- explanation - may be more informal (in encyclopaedia)
- scientific explanation
- didactic explanation
- more like description in a dictionary
Standard dictionary definitions
- X is a Y kind of Z
- Definitio per genus proximum et differentia specifica
- definition by nearest kind and specific differences
- Examples (DCE 1987):
- babble: to say or talk quickly and foolishly or in a way that is hard to understand
- baby: a very young child, especially one who has not yet learned to speak or walk
- bad: not good; unpleasant, unwanted, or unacceptable
- blue: of the colour of the clear sky or of the deep seaon a fine day [1]
COMPONENTS OF DEFINITIONS
definiendum - the word to be defined (X)
definens - words that define - genus proximum (Z) and differentia specifica (Y))
Quiz / Tasks
- What are the main kinds of information in a dictionary?
- metadata
- lexical information
- Give examples of
- FORM information
- STRUCTURE information
- CONTENT information
- What is the main kind of information which dictionary users are generally interested in?
The main kinds of information found in a dictionary is the most common information looked for in a dictionary. A common user of a dictionary uses a semasiological dictionary in which one can find information about ortography and definitions of meaning. - Find dictionary defintions of 5 different words of different parts of speech, and
- give examples of genus and differentia specifica
- give examples of other kinds of definition (See Homowork)
Parts of Speech
- nouns
- adjectives
- pronouns
- articles
- verbs
- adverbs
- conjunctions
- prepositions
- interjections
Genus proximum hierarchy (tree structure) = taxonomy - a hierarchy based on the relation of implication
Taxonomy (from Greek verb = "to classify" and = law, science, cf "economy") was once only the science of classifying living organisms (alpha taxonomy), but later the word was applied in a wider sense, and may also refer to either a classification of things, or the principles underlying the classification. Almost anything, animate objects, inanimate objects, places, and events, may be classified according to some taxonomic scheme.
Taxonomies, which are composed of taxonomic units known as taxa (singular taxon), are frequently hierarchical in structure, commonly displaying parent-child relationships.[2]
Taxonomies are used in many contexts:
- traditional lexicography:
- cross-references in standard definitions
- thesaurus construction
Elements of definotions
- genera proxima
- definition by enumeration of hyponyms
- definition by negation of co-hyponyms
"In linguistics, a hyponym is a word or phrase whose semantic range is included within that of another word. For example, scarlet, vermilion, carmine, and crimson are all hyponyms of red (their hypernym).
Hyponyms are a set of related words whose meaning are specific instances of a more general word (so, for example, red, white, blue, etc., are hyponyms of color). Hyponymy is thus the relationship between a general term such as polygon and specific instances of it, such as triangle.$quot; [2]
Hypernym - a word that is more generic or broad than another given word.
Co-hyponyms - hyponyms of the same hypernym.
Recursive definitions
- The technique of defining an infinite set of entities, such as
- the set of possible sentences in a language,
- the set of possible words in a languace,
- the set of natural numbers
- First, define the atomic, finite case or cases, for instance for a morphological stem, then induce the infinite set, then exclude everything else. [1]
- Task - define:
- ancestor - either a parent or a parent of the ancestor
- natural number
- 1 is a natural number (base condition)
- A natural number is a natural number +1(recursive/inductive condition)
- Nothing else is a natural number(exclusion condition)
Example of a recursive definition:
- base (grounding) condition: a stem is a root
fun - recursive/inductive (completeness) condition: a stem is a stem with an affix
fun+y = funny
un+funny = unfunny
unfunny+ness = unfunniness - exclusion (soundness) condition: nothing else is a stem. [1]
Genetic definition - describes the process or method by which a thing is formed
Models (and metaphors)
Models are ostensive definitions, in that they are intended to help us understand something with reference to reality, except that the pointer to a segment of reality is replaced by
- an iconic representation
- of a segment of reality
- which is simplified, stylised, idealised,
- and has artefactual properties not shared by reality.[1]
Models - illustration of a kind of reality - stylisied, simplified representation of reality
An icon - a sign which bears similarity to what is meant, resembles its meaning.
Models (and metaphors)
- Check:
- pictures used in dictionaries
- action games
- Barbie dolls
- Kate Moss, ...
- photos, films, recordings
- computer programmes and virtual reality ...
- Metaphors are verbal models, except that their relation to reality is in general much more subjective. [1]
Homework
- Define
- definition - A definition is a form of words which states the meaning of a term. This may either be the meaning which it bears in general use (a descriptive definition), or that which the speaker intends to impose upon it for the purpose of his or her discourse (a stipulative definition). The term to be defined is known as the definiendum (Latin: that which is to be defined). The form of words which defines it is known as the definiens (Latin: that which is doing the defining). [2]
- explanation - An explanation is a statement which points to causes, context, and consequences of some object, process, state of affairs, etc., together with rules or laws that link these to the object. Some of these elements of the explanation may be implicit.
Explanations can only be given by those with understanding of the object which is explained.
In scientific research, explanation is one of three purposes of research (other two being exploration and description). Explanation is the discovery and reporting of relationships among different aspects of studied phenomenon.
- See the QUIZ, and find dictionary defintions of 5 different words of different parts of speech, and
- give examples of genus (bold) and differentia specifica (italic)
- onto - preposition - into a position on "Gennaro tossed his newspaper onto the table."
- me - pronoun - the person speaking; the objective form of I
- you - pronoun - the person spoken to
- transparent - adjective - (of a substance) allowing light through so that objects can be clearly seen through it
- bake - verb - to cook inside an oven
- printer - noun - a machine connected to a computer that prints onto paper using ink [3]
- give examples of other kinds of definition
- standard definition
- recursive definition
- contextual definition
- ostentive definition
- model
- definition by negation of co-hyponyms
- definition by enumeration of hyponyms
- genetic definition
References
[1] Gibbon, D. 2006. How to Make a Dictionary: On defining "definition" http://wwwhomes.uni-bielefeld.de/~gibbon/Classes/Classes2006WS/HTMD/htmd... 30/10/2006
[2] Wikipedia, the free encyclopedia. wikipedia.org
[3] Cambridge Dictionaries Online. Canmbridge University Press 2006. http://dictionary.cambridge.org/ 30/10/2006
The architecture of a dictionary
Lecture 3
31th October 2006
Overview:
The class concentrated mainly on explaining what was not clear from the previous class. We finished the topic On defining "definition" and explained the terms recursive definition and model. All the misunderstaindgs were clarified. It was a great class! At the end of the class we started talking about the architecture of a dictionary.
Object language - a language in which people talk about objects, their properties, and relations between them.
Metalanguage - a language used to talk about language.
- everyday metalanguage - a language used by common people to talk about language in everyday conversation
- scientific metalanguage - a language used by linguists to talk about language
Long-term homework
- Give detailed examples, from at least 3 different kinds of dictionary, of:
- metadata
-
- Title: The New Kosciuszko Foundation Dictionary
- Editor-in-Chief: Jacek Fisiak
- Year of publishing: 2003
- Publisher: Towarzystwo Autorów i Wydawców Prac Naukowych UNIVERSITAS, Kraków
- Introduction: The New Kosciuszko Foundation Dictionary English-Polish, Polish-English contains 140,000 head-words, 400,000 meanings and 100,000 idioms and fixed phrases. The dictionary is addressed to native speakers of Polish, but may be successfully used by native speakers of English.
- Abbrevations:
a. adjective
Bibl. Bible
bibl. library [1]
-
- Title: Kieszonkowy s³ownik NIEMIECKO-POLSKI POLSKO-NIEMIECKI/Taschenwörterbuch DEUTSCH-POLNISCH POLNISCH-DEUTSCH
- Author: Stanis³aw Schimtzek, Jan Czochralski
- Year of publishing: 1988
- ISBN: 83-214-0349-2
- Publisher: Pañstwowe Wydawnictwo WIEDZA POWSZECHNA, Warszawa
- Content: Przedmowa - Vorwort, Skróty i znaki objasniajace - Abkürzungen und erläuternde Zeichen, Deutsch-Polnish, Deutsches Alphabet [2]
-
- Title: CAMBRIDGE Dictionaries Online
- © Cambridge University Press 2006
- Resources:
- Activities
- Top 40 words
- Data for Language Researchers
- Word of the day
- About the corpus [3]
- types of lexical information for 3 different kinds of lexical entry
-
- Form: devilkin (spelling) 'devlkIn (pronunciation)
- Structure: n. ; devil + kin ( kin - an archaic diminutive suffix)
- Content: lit. l. przen. diabe³ek, diable [1]
-
- Form: Becher (spelling) (pronunciation - stress on the underlined vowel)
- Structure: m.
- Content: kubek; kielich, puchar [2]
-
- Form: tippen (spelling) (pronunciation - stress on the underlined vowel)
-
- Structure: vi (intransitive verb); tip + p + en
- Content: lekko dotykaæ (an etw czegos)
-
- Structure: vt (transitive verb); tip + p + en
- Content: (maschineschreiben) pisaæ na maszynie [2]
- ---> hierarchical structure:
-
- Form: car (spelling) /kar/ (pronunciation)
- Structure: noun [C]; no internal morphological structure
- Content:
- definition: a road vehicle with an engine, usually four wheels, and seating for between one and five people
- examples:
a car accident
She goes to work by car. ---> contextual definition
A car is also one part of a train: a dining/passenger/freight car [3]
- Create definitions by nearest kind and specific differences for:
- hip-hop - a type of popular music in which the words are spoken rather than sung
- love - an intensive feeling of affection for somebody
- lasagna - a type of a dish made of thin wide sheets of pasta arranged into layers combined with cheese, meat and vegetables
- Describe in detail what is the reality and what are artefacts in
- 3 of the models discussed in the section on models
- pictures used in dictionaries - Pictures represent people, animals, objects existing in the real world (reality). A picture is drawn on paper, the real appearance is simplified, represents an exemplar of the set of things that bear the same name (artefacts).
- Kate Moss - The public image of Kate Moss represents an idealised, simplified model of the reality of other women (reality). The artefactual aspect is that she is not other women, she is herself (artefacts), different, a drug user, etc. etc.
Like Claudia Schiffer once said: "Don't think that when I wake up in the morning I look like Claudia Schiffer!" [4] - model trains - model trains resemble real trains (reality). The model trains are small, are made of different materials, are used for entartainment, not as a mean of transport (artefacts).
- in the text model given in the Text Theory introduction
- The text model represents real texts, both spoken and written ones (reality). The artefactual aspect is that the model describes any text, does not show the text itself, the text meaning, appearance and structure, but reduces actual texts to an universal model.
[1] Fisiak, J.(Ed.) 2003. The New Kosciuszko Foundation Dictionary. Kraków: Towarzystwo Autorów i Wydawców Prac Naukowych UNIVERSITAS.
[2] Schimtzek, S. & Czochralski J. 1988. Kieszonkowy s³ownik NIEMIECKO-POLSKI POLSKO-NIEMIECKI. Warszawa: Pañstwowe Wydawnictwo WIEDZA POWSZECHNA.
[3] Cambridge Dictionaries Online. Canmbridge University Press 2006. http://dictionary.cambridge.org/ 12/11/2006.
[4] Gibbon, D. Personal communication.
The architecture of a dictionary
Lecture 4
7th November 2006
Overview:
In class we talked about the architecture of a dictionary. We explained such terms as megastructure, macrostructure, mesostructure and microstructure. The lecture was very informative and clear.
Parts of a dictionary:
Megastructure
The megastructure of a dictionary is the entire structure of the dictionary, including
- the front matter
- abbreviations and explanations of grammar
- the body of the dictionary
- the back matter [1]
QUIZ: give examples of the kinds of information contained in each of these structure types.
- the front matter - metadata: title, author, publisher, ISDN
- abbreviations and explanations of grammar:
- bezokolicznik inf. infinitive;
- VERBS
Aspect
The majority of Polish verbs have two aspects, the imperfective for conveying the frequency of an action or describing a process, and the perfective for emphasis on a single action or a result. It follows that the perfective can only be used in the past and future, while the imperfective can also be used in the present tense. [2] - the body of the dictionary - contains information we are looking for, it's the core of the dictionary e.g.
loudness 'laUdn@s
n.
U- g³osnosæ.
- t. przen. krzykliwosæ (kolorów, ubioru, zachowania). [2]
- the back matter - may contain only reference to the publisher or advertisment.
Macrostructure
The macrostructure of a dictionary refers to the body of a dictionary. It is the organisation of the lexical entries in the body of a dictionary into
- a list of lexical entries ( ---> semasiological dictionary)
- tree structure - general terms on the top, then more specific ones, more specific and very specific items on the bottom ( ---> thesaurus)
- networks
---> collocation - dogs bark
---> synonyms.
Wordnet - dictionary is organised in a net.
Types of macrostructure:
- semasiological
- onomasiological
QUIZ: Are semasiological macrostructures more like lists, trees, or networks?
Semasiological macrostructures more like lists.
Semasiological dictionary ---> relete forms to meaning
Bilingual dictionary ---> look for an equivalent in some other language, but one can find only approximation (of the meaning)
Bilingual onomasiological dictionary - build one and make a fortune! :-)
QUIZ: megastructure, macrostructure
What is the Megastructure of a lexicon? Give examples.
Megastructure is the entire structure of a dictionary. It is composed of front matter, abbreviations and explanations of grammar, the body of the dictionary and the back matter.
What is the Macrostructure of a lexicon? Give examples.
Macrostructure refers to the body of a dictionary. It is the organisation of lexical entries. Lexical entries may be organised ito a list, a tree or a network.
What is a Semasiological dictionary? Give examples.
A semasiological relates form to the meaning. E.g. English-English dictionary, Polish-English dictionary.
What is a Onomasiological dictionary? Give examples.
In onomasiological dictionary you know the meaning, but you are looking for the form. E.g. thesaurus.
Microstructure
The microstructure of a dictionary is the consistent organisation of lexical information within lexical entries in the dictionary. [1]
QUIZ:
How many types of lexical information can you find?
5 main types - spelling, pronunciation, part of speech, definition, examples - contextual definition
plus ---> picture - model, symbol - stylised picture (?), synonyms, antonyms, translation into other languages, ethymological information
Is the microstructure of a semasiological dictionary typically a list, a tree or a network?
The microstructure of a semasiological dictionary is a list.
What kind of structure do the combined macrostructure and microstructure of a semasiological dictionary have?
The combined macrostructure and microstructure of a semasiological dictionary is a table. It a spreadsheet with a list of entries in the first column and types of lexical information in the following columns.
Collums - list of words, spelling, pronunciation, part of speech, etc.
Rows - an entry with lexical information
And an onomasiological dictionary?
An onomasiological dictionary has a simpler structure (fewer columns) - lists of columns with synonyms
OR
like thesaurus it has a tree structure.
QUIZ: microstructure
What is the microstructure of a dictionary?
Microstructure is the way in which the lexical information is organised in the lexical entries.
What kind of lexical information is contained in a dictionary's microstructure?
Microstructure - properties of linguistic units such as words:
- MEANING:
----> Pragmatics - the way of use of words in action
----> Semantics - You can assess if the sentence is true or false. Words contribute to true or false of the whole sentence.
- STRUCTURE:
----> Syntax - the way in which words fit into sentences (text, phrases))
----> Word formation - APPEARANCE: Pronunciation and Orthography
Describe the two dimensions of types of lexical information.
How do you define "definition"? Give examples
A definition is a form of words which states the meaning of a term.
laptop - noun [C] - a computer which is small enough to be carried around easily and is designed for use outside an office.
---> contextual definition: A laptop would be really useful for when I'm working on the train. [3]
Mesostructure
The mesostructure of a dictionary is the set of relations between lexical entries and other entities such as other parts of a dictionary or a text corpus. [1]
QUIZ:
How do lexical entries relate to each other?
cross-references to synonyms, antonyms.
How do lexical entries relate to the mini-grammar in the megastructure?
n. - this is a noun - reference to more general information of the dictionary, such as sketch grammar at the beginning of the dictionary.
How do lexical entries relate to text corpora?
Sometimes lexical entries refer to original examples of text. (The author of a dictionary does not invent the examples to explain the meaning of an enrty. He refers to texts in other books.)
Lexicon mesostructure
Overview:
- Linguistically motivated class hierarchy of DATCAT (DataCategory) subvectors e.g. modality, grammar, object semantics
- Linguistic description references, e.g. use of abbreviations for parts of speech, characterisations of spelling
- Cross-references between related entries, e.g. cohyponyms (synonyms, antonyms, ...)
- Corpus references (concordance) [1]
A concordance is dictionary whose microstructure only shows examples of a particular word in a corpus. [4]
QUIZ: mesostructure
What is the mesostructure of a dictionary?
The mesostructure of a dictionary is the set of relations between lexical entries and other entities such as other parts of a dictionary or a text corpus. [1]
Give examples for mesostructural elements concerning
- Types of information with reference to the sign model
The three component sign model is the same as the text model, the word model, etc., with STRUCTURE, CONTENT, APPEARANCE... [4] - Linguistic description references - Orthography, Pronunciation, Part of Speech, Gender, Morpohology, Inflection class, Definition
- Cross-references between related entries - cohyponyms (synonyms, antonyms)
- Corpus references -
---> examples inverted for the purpose of a dictionary
---> examples refer to real texts
Homework
- Work out optimal answers to the quizzes
- Take one of your dictionaries, and describe in as much detail as possible its
- megastructure (Detailed information about the front matter of a dictionary as well as the abbreviations and explanation of grammar are to be found in the part on matadata.)
- macrostructure - organisation of content
In a semasiological dictionary The New Kosciuszko Foundation Dictionary entries are listed in the alphabetical order. If the entry has more than one meaning, there is an embeded list.
- microstructure - An entry consists of the headword, information about pronunciation, a grammatical indicator, one or more stylistic indicators (eg. sl. = slang, arch. = archaic, etc.), one or more Polish translations with semantic indicators, prsases and idioms. Not all the elements have to appear in each entry.
- loudness - spelling ---> Headword
- 'laUdn@s - pronunciation
- n. - part of speech ---> grammatical indicator
- U - uncountable
- g³osnosæ.
- t. przen. krzykliwosæ (kolorów, ubioru, zachowania). (przen. - metaphorical) [2]
Note that the examples provide contextual definitions, and that if you have two readings in one lexical entry, you have a simple hierarchical structure. [4]
- mesostructure
References to information about English pronunciation, abbreviations, explanation of grammar, references to other words in a dictionary to avoid repeating of the same definition.
References
[1] Gibbon, D. 2006. How to Make a Dictionary: The architecture of a dictionary http://wwwhomes.uni-bielefeld.de/~gibbon/Classes/Classes2006WS/HTMD/htmd... 13/11/2006.
[2] Fisiak, J.(Ed.) 2003. The New Kosciuszko Foundation Dictionary. Kraków: Towarzystwo Autorów i Wydawców Prac Naukowych UNIVERSITAS.
[3] Cambridge Dictionaries Online. Canmbridge University Press 2006. http://dictionary.cambridge.org/ 13/11/2006.
[4] Gibbon, D. 2006. Personal communication
Lexical databases
Lecture 5
14th November 2006
Overview:
In today's lecture we were shown practical ways of making a dictionary as a table. Prof. Gibbon talked about surface and deep structure of a dictionary.
Different kinds of dictionaries
- a book
- an online dictionary
- lexical database
Surface Structure - appearing, rendering of a dictionary; (micro- and macrostructure).
Onomasiological and semasiological dictionaries contain the same kind of information, but the structure is different.
Deep Structure - underlying structure.
- a table
- rows are lexical entreis, with a specific microstructure
- columns are single types of lexical information
Each row has the same length!
Ways of dealing with ambiguous words - words that have more than one meaning:
- the item is repeated
- a subtable is created
How to make a table?
- in OpenOffice - table object, with deep structure
The artefact of such a table is the concept of a page - you cannot see all the columns. - a spreadsheet - there is no artefact of the limit of a page.
----> You can sort the columns.
- a table in HTML - the source/HTML code of the following table is a definition of the surface structure.
love | noun | a feeling of deep affection |
poodle | noun | a dog with haircut |
green | adjective | a colour which is found in leaves in spring and summer |
polysemy, lexical ambiguity - the ambiguity of an individual word or phrase that can be used (in different contexts) to express two or more different meanings. [1]
A homonym is a word that has the same pronunciation and spelling as another word, but a different meaning. Example: The word stalk, meaning either part of a plant or to follow (someone) around.
A homograph is a word that has the same spelling as another word, but a different meaning. Example: The spelling to cleave may denote to adhere to or to divide or split.
A homophone is a word that has the same pronunciation as another word, but whose meaning and/or spelling are different, . Example: All of to, too, and two, or there, their, and they're.
These can cause ambiguity in reading text or in hearing speech. [2]
References
[1] Online French, Italian and Spanish Dictionary http://www.wordreference.com/ 20/11/06.
[2] Wikipedia, the free encyclopedia http://www.wikipedia.org/ 20/11/06.
Paradigmatic relations are classificatory relations, i.e. they define sets of items on the basis of similarities and differences.
Syntagmatic relations are combinatorial relations which define larger units on the basis of their component parts.
For example, in grammar the class "noun" is based on paradigmatic relations of similarity of various kinds between nouns. However, the relation "subject" denotes the relation between a noun (or noun phrase) and the verb, making up a larger unit such as a sentence.
In phonology, the terms "consonant", "vowel" etc. express paradigmatic relations of similarity and difference on the basis of distinctive features.
In phonology, the terms "onset", "nucleus", "coda" express syntagmatic relations between parts of a syllable.
Lexical data and their structur
Lecture 6
21st November 2006
Overview:
Today's lecture was conducted by Dr. Trippel. At the beginning of the class we had a revision of the structure of lexical data. In the second part of the class we talked about different kinds of lexicons and problematic issues connected with ambiguous words and spelling. At the very end we were shown the ways of creating lexicons.
Microstructure - spelling, transcription; order of DatCat.
Mesostructure - interrelation of lexicon entries, cross-references; relations to external information.
Macrostructure - ordering of lexicon entries.
Ega - a language spoken by less than 1000 people in Africa.
How to preserve the language:
- record stories
- no alphabet, so invent a writing system based on IPA allophones, but...
- how to order entries according to the IPA alphabet.
@ - different pronunciations of the same character.
- h@me
- intern@t c@fe
- @home
How to sort words containing this character?
- put the words in places where the user may expect to find them and then add cross-references.
- put the sign (or other different signs) at the beginning or at the end of the alphabet.
Ireland ---> Gaelic - use of prefixes to mark inflection.
German - inflection, eg. das Haus - the base.
Latin - inflection, headword - 1st person, singular, active; not according to infinitive forms.
Corpus:
- collection of language material
- texts (books, newspapers, emails)
- speech (recordings)
- with additional information
- POS marked up
- transcription, annotation
- lemma - de-gramaticalised form of a word; it's a theoretical concept - a name for the class of constructs
- with a specific structure
- interlinesr glossing
- special marking
Types of lexicons:
- semasiological dictionary
- onomasiological dictionary
- Termbase - dictionary of a specific language; technical dictionary - sorted according to the domain, eg. cancer, algebra
- word frequency lexicon - the most frequent words at the beginning of a dictionary
- rhyming lexicon - sorted according to the pronunciation of the word endings
- picture lexicons - by prototype
Problematic issues:
ambiguity
- synonyms
- polysemy
- homonyms
Solutions
- ambiguity - enumeration
- search word: "arbitrary" definition, eg. Latin dictionary
Methods of creating lexicons
- introspection - look inside - a trained linguist ponder about what is to be right
- well - communicative function, a gap filler
- ouch!
- swear words
- taboo words
- vocabulary of low class society
- Create a lexicon based on a questionnaire
- Picture dictionary
---> pointing with a tongue (deaf community), pointing with a finger is rude!
- Reflect evidence - include all the words that appeared in the corpus
- Concordance - to find a word in context
- flat tabulat lexicon - simple translation (like in a speadsheet)
- generalisation - ?
Problems:
Types of Lexical Information: Pronunciation
Lecture 7
28th November 2006
Overview:
The class was about representation of sounds in a dictionary. We defined the difference between the phonetic and phonemic transcription. Prof. Gibbon enjoyed conducting the class, since he is a first-class phonetician and loves phonetics. He calls phonetisc - phon /fVn/. :-)
Surface structure
Two levels:
- linguistic description - metalanguage
- units of language - object language
Surface structure of
- DICTIONARIES
- metalanguage - the typography and layout of a book, hypertext, ...
- WORDS in dictionaries
- object language:
- spelling
- pronunciation [1]
- object language:
Typography is the art and technique of setting written subject matter in type using a combination of fonts, font size, line length, leading (line spacing) and letter spacing.
Typography is performed by typesetters, compositors, typographers, graphic artists, art directors, and clerical workers. Until the Digital Age typography was a specialized occupation. Digitization opened up typography to new generations of visual designers and lay users.[2]
Metalanguage - a language used to talk about language.
An English dictionary - written in metalanguage about English object language.
computer (It., Ger.) - computer (English) - the use of the English metalanguage to talk about other languages.
Object language - a language in which people talk about objects, their properties, and relations between them.
eddy /'edi/ - surface structures of a word
- eddy - visual surface structure
- /'edi/ - representation of pronunciation
Pronunciation ---> phonology - the study of sounds from the point of view of a dictionary.
Rendering structures:
- Pronunciation rules (acoustic modality)
- Spelling (visual modality)
- Sound-spelling rules (inter-modality conversion) [1]
Representation of sounds
Representations of sounds in dictionaries:
- prosodic hierarchy:
- phonemes - signs, code sounds:
- function: "smallest word-distinguishing segments"
- internal structure: "configuations of distinctive phonetic features"
- external structure (see syllables)
- rendering: "contextual variants", "allophones"
- syllables - unit of pronunciation:
- function: "word distinguishing phoneme configurations"
- internal structure: "configurations of sequential features (consonantal, vocalic; voiced, unvoiced; ...) and simultaneous features (tone, accent)
- external structure (word)
- rendering: a function of the rendering of phonemes [1]
Phonemes
There are several ways of defining phonemes, depending on which of the four sign components is focussed:
- The minimal word-distinguishing sound segment (based on the contrastive function of phonemes)
- The smallest unit of a syllable (based on external sound structure)
- Consists of distinctive features (based on the internal sound structure)
- Consists of a set of allophones (based on the rendering of phonemes) [1]
Syllable structure:
- CV structure - Japanese
- CVC
- CCV
- CCCVVCCC - strange (8 phonemes in a syllable)
How many potential syllables does English permit? - 11982!!!
The directed acyclic graphs (DAGs) consist of nodes connected by edges. The nodes are artefactual and do not correspond to actual properties of speech, but are simply anchors for the edges. The edges are labelled with relevant linguistic units such as phonemes; the nodes may also be labelled for convenience of reference.A syllable is defined formally as a path from the starting node to another node via an edge, and from there to the next node via another edge, and so on until a final node is reached.
The network defines syntagmatic relations between the components of syllables.
FSA - In automata terminology, the nodes represent states of the automaton, and the edges represent transitions between states. [3]
More about the structure of the English syllabe can be found in the paper Phonotactics of English monosyllables, by Prof. Gibbon.
transition network or a state diagramme - each transition from one circle/node/state describes the correct position of one phoneme.
Syllable - onset & rhyme: peak (nucleus), coda
Phonemic transcription - sounds represented in minimal details.
Phonetic transcription - all possible information about pronunciation included, all the physical details.
Spelling-to-Sound rules
ghoti /fIS/ - tough + women + nation
Graphemes - character combination corresponding to a phoneme
Task:
- make a list of 5 spelling rules
- /Z/ - spelt "si" in words like vision, invasion, but not after a consonant as in tension, nor where there is a double "s", as in mission.
- /j/ - spelt "y" when it occurs at the beginning of a syllable, as in yellow, yeast.
- /I@/ - spelt:
- "eer" - beer, deer
- "ear" - ear, clear
- "ere" - here, merely, but not were, where, there
- /t/ spelt "t" or "tt" in words like tiny, better, and also "th" in a few words: thyme, Thomas, Thompson, Thames, Theresa, Anthony.
- /O/ - spelt:
- "au" - audience, fraud
- "aw" - law, awe
- in words ending "-ought" - ought, bought
- "a" in some words, especially before "l" - water, all, ball, almost
- make a list of 5 main spelling problems
- Pronunciation does not correspond to spelling.
- "Silent characters" - letters that are not pronounced, eg. who, handsome.
- The same pronunciation of different combinations of letters , see above.
- Different pronunciations of the same combination of letters, eg. read (present and past tense), ending "-ed".
- Irregular spelling, eg - talk, broad, there, buffet, duvet, crepe, peapole, key, should
Homework: English and German
Pronunciation:
List
- the consonants of German which do not occur in English
- /C/ --- sicher --- "zIC6
- /ts/ --- Zahl --- tsa:l
- /pf/ --- Pfahl --- pfa:l
- /x/ --- Buch --- bu:x
- the consonants of English which do not occur in German
- /T/ --- thin
- /D/ --- this
- /w/ --- wasp --- wQsp
- the vowels of German which do not occur in English
- /Y/ --- hübsch --- hYpS
- /9/ --- plötzlich --- "pl9tslIC
- /e:/ --- Beet --- be:t
- /y:/ --- süß --- zy:s
- /2:/ --- blöd --- bl2:t
- the vowels of English which do not occur in German
- /V/ --- but --- bVt
Spelling:
List
- the characters of German which do not occur in English
- ä
- ü
- ö
- ß
- the characters of English which do not occur in German
- ï --- naïve
- 5 English graphemes containing more than one character
In a phonological orthography, a grapheme corresponds to one phoneme. - sh - /S/ - ship
- th - /D/ - this
- au - /O/ - Laura
- ie - /i/ - piece, priest, hygiene
- ch - /k/ - school, character
- 5 German graphemes containing more than one character
- ss - /s/ - Tasse
- sch - /S/ - waschen
- tt - /t/ - bitte
- ie - /i:/ - lieben
- ch - /C/ - ich
The homework was done using information about English and German SAMPA alphabets. [4]
References
[1]Gibbon, D. 2006. Types of Lexical Information: Pronunciation http://wwwhomes.uni-bielefeld.de/~gibbon/Classes/Classes2006WS/HTMD/htmd..., accessed 2/12/06
[2]Wikipedia, the free encyclopedia http:/wikipedia.org/ accessed 2/12/06.
[3] Gibbon, D. 2004. Phonotactics of English monosyllables. http://wwwhomes.uni-bielefeld.de/~gibbon/Docs/PhonologyMaterials/engsyll..., accessed 2/12/06.
[4] SAMPA, computer readable phonetic alphabet. 25th October, 2005. http://www.phon.ucl.ac.uk/home/sampa/index.html, accessed 2/12/2006.
Types of lexical information: morphology
(inflection and word formation)
Lecture 8
5th December 2006
Overview:
The lecture treated about morphology. It was said that morphology is the most important discipline, because morphology allows to create new words.We were shown practical applications of word formation in designing names for new products. It was a very good and entertaining lecture.
Why word formation?
- New concepts require new words
- Sometimes new words are invented on the spot
Who needs it? Why?
- Scientists
- Engineers
- Product branding companies
- Poets
- Everybody else
Branches of morphology
- inflection (syntagmatic relations)
- word formation (paradigmatic relations)
- derivation
- compounding
Morphology sketch
- Inflection:
- Function (external structure):
- marks the relation of words to their contexts
- no change in the basic meaning of words
- Form (internal structure):
- affix (prefix, suffix, infix), superfix, stem vowel change
- Function (external structure):
- Word formation:
- Function (external structure):
- creation of new words / parts of speech / meanings
- in principle infinite extendability of the lexicon
- Form (internal structure):
- Root/morpheme creation (blending, abbreviation, ...)
- Derivation: 1 stem + affix (prefix, suffix, infix), superfix, vowel change
- Compounding: 2 stems, perhaps with interfix or inflection-like affix [1]
- Function (external structure):
Blending - creating new words - blending two words together
Morphemes are smallest meaningful parts of words.
There are 2 main morpheme types:
- lexical morpheme (content morpheme, root):
- open set: girl, boy, car, box, spoon, grass, sky
- grammatical morpheme (structural morpheme):
- closed set - You cannot invent new grammatical morphemes
- free: prepositions, conjunctions, auxiliary verbs
- bound: affixes, suffixes (in word formation and inflection) [1]
- closed set - You cannot invent new grammatical morphemes
Morphemes are realised in different contexts by
- allomorphs
- i.e. variant pronunciations
How are words built?
- Inflection
- Function (external structure)marks the syntagmatic relation of words to their contexts
- syntactic contexts (agreement in person, number, case):
- subject-verb (English)
- subject verb; determiner - adjective - noun, preposition-nominals (German)
- situational contexts:
- Verbs: temporal relations, spatial relations
- Nominals: quantity and definiteness relations
- syntactic contexts (agreement in person, number, case):
- Form (internal structure): stem + affix
- prefix
- suffix
- circumfix
- infix
- superfix
- Function (external structure)marks the syntagmatic relation of words to their contexts
- Root / morpheme creation:
- Function (external structure): creates new POS and meanings
- Form (internal structure): parts of 2 or more existing stems (e.g. brunch, chortle, galumph)
- Derivation:
- Function (external structure): creates new POS and meanings from 1 existing stem
- Form (internal structure): 1 stem + affix
- prefix
- suffix
- circumfix
- infix
- superfix
- Compounding:
- Function (external structure): creates meanings (maybe new POS)
- Form (internal structure): from at least 2 existing stems
- lamp-post
- whisky-soda
- red-head
English words consist of a stem and an inflection
- a stem has lexical meaning, e.g. table, chair, cabbage, happiness, wonderful, blog
- an inflection has grammatical meaning
- relates a word to its syntactic context ---> subject-verb agreement (person, case, number)
- relates a word to its semantic context ---> tense/time, quantity, speaker-addressee, ...
Inflexions of English words are suffixes (or stem vowel changes):
- person
- number
- case
Homework
- Define
- morpheme
Morpheme is the smallest meaningful part of a word.
- lexical morpheme - free morpheme, content morpheme, root; there is an open set of lexical morphemes.
- grammatical morpheme - structural morpheme; threre is a closed set of grammatical morphemes:
- free morphemes: prepositions, conjunctions, auxiliary verbs
- bound morphemes: affixes, suffixes (in word formation and inflection)
- stem
- Simple (i.e. roots, lexical morphemes)
- Complex, i.e. at least one of the following:
- Derivations - a stem and a derivational affix, e.g. red+ish = reddish, beauty + ful = beautiful
- Compounds - a stem plus another stem, e.g. armchair, whisky-soda, red-head
- Both (synthetic compounds) -a derivation plus a stem, e.g. bus-driver, steam-roller [2]
- derived stem - stem with an affix (derivation)
---> either a root (zero derivation) or a derived stem with an affix
- compound stem - a derived stem or a word (stem + affix) + a derived stem or a word (stem + affix) OR a compound stem + compound stem
- morpheme
- What is the difference between inflection and derivation?
The process of derivation changes the meaning and/or wordclass of the base.
Inflection does not change the wordclass of the base, but creates syntactically correct forms of words.
- What is the difference between derivation and compounding?
- Collect 5 longish words and
- divide them into morphemes
- straightforwardness = straight + forward + -ness
- irresistibility = ir- + resist + -ible + -ity
- fraternisation = frater + -n- + -ise + -tion
- livingroom = live + -ing + room
- acyclically = a- + cycle + -ical + -ly
- show construction of a word from their stems as tree diagrammes
-
straightforwardness
straight + forwardness--------> forward + -ness
-
irresistibility
irresistibile + -ity
irresist + -ibleir- + resist
-
fraternisation
fraternise + -ation
fratern + -ise
frater + -n- -
livingroom
living + roomlive + -ing
-
acyclically
acyclical + -lya- + cyclical
cycle + -ical
-
In derivation bound morphemes are added to the base.
In compounding two or more roots are put together to create a new word.
Panini was an Ancient Pakistani grammarian from Gandhara (traditionally 520-460 BC, but estimates range from the 7th to 5th centuries BC). He is most famous for his Sanskrit grammar, particularly for his formulation of the 3,959 rules of Sanskrit morphology in the grammar known as Astadhyayi (meaning "eight chapters"). It is the earliest known grammar of Sanskrit, and the earliest known work on descriptive linguistics, generative linguistics, and perhaps linguistics as a whole. Panini's comprehensive and scientific theory of grammar is conventionally taken to mark the end of the period of Vedic Sanskrit, by definition introducing Classical Sanskrit. [2]
Sanskrit, as defined by Panini, had evolved out of the earlier "Vedic" form, and scholars often distinguish Vedic Sanskrit and Classical or "Paninian" Sanskrit as separate dialects. However, they are extremely similar in many ways and differ mostly in a few points of phonology, vocabulary, and grammar. Classical Sanskrit can therefore be considered a seamless evolution of the earlier Vedic language. Vedic Sanskrit is the language of the Vedas, a large collection of hymns, incantations, and religio-philosophical discussions which form the earliest religious texts in India and the basis for much of the Hindu religion. Modern linguists consider the metrical hymns of the Rigveda Samhita to be the earliest, composed by many authors over centuries of oral tradition. The end of the Vedic period is marked by the composition of the Upanishads, which form the concluding part of the Vedic corpus in the traditional compilations. The current hypothesis is that the Vedic form of Sanskrit survived until the middle of the first millennium BC. It is around this time that Sanskrit began the transition from a first language to a second language of religion and learning, marking the beginning of the Classical period.
The Sanskrit language is a classical language of India, a liturgical language of Hinduism, Buddhism, and Jainism, and one of the 22 official languages of India.
It has a position in the cultures of South and Southeast Asia similar to that of Latin and Greek in Europe, and is a central part of Hindu tradition and Philosophy. It appears in pre-Classical form as Vedic Sanskrit (appearing in the Vedas) with the language of the Rigveda being the oldest and most archaic stage preserved. This fact and comparative studies in historical linguistics show that it is from one of the earliest attested members of the Indo-European language family and descends from the same.
Today, Sanskrit is used as a ceremonial language in Hindu religious rituals in the forms of hymns and mantras. The vast literary tradition of Sanskrit in the form of the Hindu scriptures and the philosophical writings are also studied. The corpus of Sanskrit literature encompasses a rich tradition of poetry and literature, as well as scientific, technical, philosophical and religious texts.
The scope of this article is the Classical Sanskrit language as laid out in the grammar of Panini, around 500 BC.
In Sanskrit grammar a tatpurusa compound is a dependent determinative compound, i.e. a compound XY meaning a type of Y which is related to X in a way corresponding to one of the grammatical cases of X.
There are many tatpurusas (one for each of the noun cases, and a few others besides); in a tatpurusa, one component is related to another. For example, "doghouse" is a dative compound, a house for a dog. It would be called a caturthi-tatpurusa (caturthi refers to the fourth case - that is, the dative). The most frequent kind is the genitive tatpurusa. [2]
A dvandva or copulative or coordinative compound refers to two or more objects that could be connected in sense by the conjunction 'and'. Dvandvas are common in some languages such as Sanskrit, where the term originates, as well as Chinese and Japanese, but less common in English (The term is not often found in English dictionaries.). Examples: matara-pitara (Sanskrit for 'mother and father'), shanchuan and yamakawa (Chinese and Japanese respectively for 'mountains and rivers'), and singer-songwriter in English. [2]
A bahuvrihi , or bahuvrihi compound, is a particular kind of nominal compound that refers to something that is not specified by any of its parts by themselves (i.e., it is headless or exocentric, its core semantic value being subsumed by an elliptical or 'external' semantic value so that the compound is not a hyponym of the head), especially a compound that refers to a possessor of an object specified: a bahuvrihi compound XY tends to mean someone or something which has a Y, and that Y has the characteristic X. For instance, a sabertooth (smil-odon) is neither a saber nor a tooth: it is an extinct feline with saber-like fangs. English bahuvrihis often describe people by referring to specific properties: flatfoot, half-wit, highbrow, lowlife, redhead, tenderfoot, longlegs, and white-collar. Many of these are colloquial, pejorative, or both. [2]
References
[1] Gibbon, D. 2006. Types of Lexical Information: Morphology http://www.spectrum.uni-bielefeld.de/~gibbon/Classes/htmd07-v01-wordform..., accessed 27/01/07
[2] Gibbon, D. 2006. Morphology - word construction. http://wwwhomes.uni-bielefeld.de/~gibbon/Classes/Classes2006WS/Introduct..., accessed 3/12/06.
[3] Wikipedia, the free encyclopedia http:/wikipedia.org/ accessed 2/12/06.
Toolbox
Lecture 9
12th December 2006
Overview:
In today's class we had a guest, Sascha Griffiths. He presented Toolbox, a kind of lexical database.
Toolbox was developed by SIL (Summer ) International. The database was developed for fieldwork purposes. Storing data in Toolbox lexical database allows to generate dictionaries.
Toolbox microstructure consists of:
- Lexeme - e.g. abr
- Part of Speech - e.g. prep
- Gloss - e.g. on
- Definition - e.g. on top of
SIL International (Summer Institute of Linguistics) is a worldwide non-profit evangelical Christian organization whose main purpose is to study, develop and document lesser-known languages in order to expand linguistic knowledge, promote literacy and aid minority language development.
SIL International is the sister organization of Wycliffe Bible Translators, an agency dedicated to translating the Bible into minority languages. The organization provides a database of its research into the world's languages through its Ethnologue, a database of the world's languages. It has more than 6,000 members from over 50 countries. [1]
References
[1] Wikipedia, the free encyclopedia http:/wikipedia.org/ accessed 28/01/07.
Types of lexical information: grammar
(parts of speech categories & subcategories)
Lecture 10
9th January 2007
Overview:
In class we talked about syntax. We described in detain properties of main part of speech. The lecture was interesting and encouraged students to actively take part in it.
Syntax
- structure of sentences
- syntactic categosies: parts of speech, sucategories, phrasal caterogies
- word syntax - morphology
- syntax for texts
the - definite article
a - indefinite article - the speaker does not know what the speaker is talking about. "a" implies singular.
"the" and "a" are grammatical words. They describe relations between the speaker and the hearer.
of - preposition, describes any kind of relations, e.g. relation of belonging.
after - preposition, describes time relation
Noun category: Determiners
- articles - definite (the), indefinite (a)
- possesives - in the first position of the nominal expression (my mother, your father)
- demonstratives - proximal (this - these), distal (that - those)
- Quantifiers
- cardinal numbers: one, two
- existential: some, several, few, many
- dual: both, either
- universal: each, every, all [1]
Noun category: Adjectives
- Adjective type
- scalar - many degrees (small ... big)
- polar - YES/NO (married/unmarried)
"Susan is very unmarried." Special meaning of polar adjective in connections of adverbs of degree. - appraisive - express attitude of the speaker (good, great, fantastic)
- ordinal (first, second)
- Special feature of scalar adjectives - "adverbs" of degree (very, highly, extremely, incedibly) [1]
Noun categories: nouns
- Proper nouns - names: personal, place, product, ...
- Common nouns:
- Countable nouns: knife, fork, spoon
- Mass nouns (uncountable nouns): bread, butter, jam, ... [1]
Noun categories: pronouns
- cardinal numbers: one, two
- existential: some, several, few, many
- dual: both, either
- universal: each, every, all
Verb categories: verbs
- Main verbs:
- finite forms: person (1st, 2nd, 3rd), number (singular, plural), tense (present, past)
- non-finite forms: infinitive, participle (present, past)
- Periphrastic verbs (auxiliary verb + non-finite main verb):
- modal: can, may, will, shall, ought, ...
- aspectual: be+prespart (continuous), have+pastpart (perfect)
- passive: be+pastpart [1]
The car might (modal) have (perfect) been (continuous) being (passive) repaired (main verb).
Verb categories: adverbs
- Deictic, e.g. here, there; now then
- Time (when), e.g. soon, immediately; yesterday, ...
- Place and direction (where), e.g. upwards, into, towards
- Manner, e.g. slowly, quickly; cleverly, stupidly; nicely, nastly; well
- Degree, - better dealt with in connection with adjectives [1]
Glue categories: conjunctions
- Co-ordinating conjunctions: and, but
- Subordinating conjunctions:
- conjunction-like relative pronouns - make sentences (clauses) into adjective-like noun modifiers
- basically - make sentence (clauses) into adverb-like verb modifiers [1]
Glue categories: interjections
- Interjections link parts of dialogues together ("Hi!", "er..")
- They may also be expressions of subjective reactions: "Ouch!", "Wow!") [1]
- Signs are structured in terms of their position in a size hierarchy; the positions in the hierarchy are sometimes referred to as ranks.
- The main ranks (there are subdivisions) are:
- diallogue
- monologue/text - turn in a diallogue
- sentence
- word - stem, affixes --> nouns, verbs
- morpheme - phonemes, syllables
- phoneme - distinctive features [1]
- Signs at each of these ranks have
- structure (internal and external)
- semiotic relations (functions and realisations)
Language structure is determined by following kinds of constitutive relation:
- structural relations:
- syntagmatic relations:
- "glue"
- combinatory relations which create larger signs (and their realisations and interpretations) from
smaller signs (and their realisations and interpretations)
- paradigmatic relations:
- "choice"
- classificatory relations of similarity and difference between signs.
- syntagmatic relations:
- semiotic relations:
- realisation: the visual appearance or acoustic representation of
signs (other senses may also be involved). - interpretation: the assignment of meaning to a sign.
- realisation: the visual appearance or acoustic representation of
Syntagmatic relations are combinatorial relations which define larger units on the basis of their component parts.
Syntagmatic relations - linguistic "glue": combinatory relations which create larger signs (and their realisations and interpretations) from smaller signs (and their realisations and interpretations).
Syntagmatic relations are very often hierarchical.[1]
Examples:
Phonology:
Cs and Vs are glued together as core and periphery of syllables.
Morphology:
lexical morphemes and affixes are glued together into stems.
stems are glued together into compound stems.
stems and inflections are glued together into words.
Syntax:
nouns and verbs are glued together as the subjects and verbs of
sentences.
Paradigmatic relations are classificatory relations, i.e. they define sets of items on the basis of similarities and differences.
Paradigmatic relations - classificatory relations of similarity
and difference between signs.
Similarity and difference of
- internal structure
- external structure
- meaning
- appearance
For example, in grammar the class "noun" is based on paradigmatic relations of similarity of various kinds between nouns. However, the relation "subject" denotes the relation between a noun (or noun phrase) and the verb, making up a larger unit such as a sentence.
In phonology, the terms "consonant", "vowel" etc. express paradigmatic relations of similarity and difference on the basis of distinctive features.
In phonology, the terms "onset", "nucleus", "coda" express syntagmatic relations between parts of a syllable.
References
[1] Gibbon, D. 2007. Types of Lexical Information: grammar (parts of speech categories & subcategories), http://www.spectrum.uni-bielefeld.de/~gibbon/Classes/htmd08-v01-grammar.... accessed 27/01/07
Types of lexical information: semantics
Lecture 11
16th January 2007
Overview:
In class Prof. Gibbon made a revision on what we have covered during the class on types of dictionaries (and their microstructure) and the definitions. We were also familiarised with new types of definitions, namely syntagmatic and paradigmatic definitions. We also talked about relations between words.
SEMANTICS: THE STUDY OF MEANING
Meanings are expressed by definitions.
Main types of definition
- Componential definition
- splits the meaning of a lexical item into components
- e.g. standard dictionary definition by genus proximum
and differentia specifica
- Syntagmatic definition
- contextual definition
- definition by text examples
- Paradigmatic definition
- word fields (e.g. in a thesaurus, synonym dictionary)
- semantic relations: hyponyms, hyperonyms; co-hyponyms: synonyms, antonyms [1]
Contextual definitions - definition by illustrating the meaning in context
Onomasiological dictionary ---> paradigmatic definition - based on similarity and difference.
Ostensive definition - by showing a model.
Task
- Define
- definition - A definition is a form of words which states the meaning of a term. This may either be the meaning which it bears in general use (a descriptive definition), or that which the speaker intends to impose upon it for the purpose of his or her discourse (a stipulative definition). The term to be defined is known as the definiendum (Latin: that which is to be defined). The form of words which defines it is known as the definiens (Latin: that which is doing the defining). [2]
Semantic relations
- taxonomy (generalisation-specialisation relation,
paradigmatic relations)- hyperonym - general term, e.g. dog, pet
- hyponym - specific term, special term, e.g. poddle
- synonym
- antonym:
- opposite
- complementary
- inverse
- co-hyponym
- meronomy (part-whole relation, syntagmatic relations) [1]
Taxonomy - a hierarchy, classification
Meronomy - a different kind of hierarchy - How to build up larger units from smaller units, e.g. car <--- wheel
References
[1] Gibbon, D. 2007. Types of Lexical Information: semantics, http://www.spectrum.uni-bielefeld.de/~gibbon/Classes/htmd09-v01-semantic... 28/01/07
[2] Wikipedia, the free encyclopedia http:/wikipedia.org/ accessed 28/01/07.
Computational Lexicography
Lecture 12
23rd January 2007
Overview:
Today's lecture was very interesting. Prof. Gibbon showed us applications of the knowledge we get in this class. We were familiarised with linguistic work. A KWIC concordance was presented to us.
Criteria for Good Lexicography
- Quantity:
- Completeness of coverage:
- extensional coverage: number of entries
- intensional coverage: number of types of lexical information
- Completeness of coverage:
- Quality:
- Correctness of information:
- Types of lexical information
- Consistency of structure:
- Macrostructure
- Microstructure
- Mesostructure [1]
- Correctness of information:
Quiz
- What is a KWIC concordance?
A KWIC (KeyWord In Context) concordance is a special kind of preliminary, corpusbased dictionary:
- each word in a text corpus is paired with its contexts of occurence in this corpus. [1]Google is a special form of KWIC concordance.
- Which are the two main components of lexicon construction based on empirical data?
Information retrieval and Linguistic analysis.
- Which layers of abstraction are involved in corpus acquisition?
Layer 1: Primary data (audio / video recordingand
Layer 2: Secondary data (transcription, annotation, metadata) - Which layers of abstraction are involved in lexicon construction? Describe them.
Layer 1: Corpus lexicon (wordlist, concordance).
Layer 2: Lexicon matrix (entries x data categories, no generalisations).
Layer 3: Lexicon with selected generalisations (procedurally optimised: semasiological, onomasiological) - Which layer do standard dictionary types typically belong to?
Layer 3: Lexicon with selected generalisations (procedurally optimised: semasiological, onomasiological)
- What are the 6 main steps in KWIC concordance construction?
- Corpus creation/collation - get the corpus, e.g. texts.
- Tokenisation - normalising text, e.g. change upper case letters into lowercase letters, remove punnctuation marks (end of the sentence vs. abbreviation), deal with numbers.
- Keywordlist extraction - create a list of words that occur in the text.
- Context collation - pick contex unit, e.g. the keyword in context of three words on the left side and three words on the right side.
- Keyword search - look for the key word in context.
- Output formatting - make the output look nice and understandable to the user.
- In which programming languages could the
concordance software be implemented?Perl, Unix shell script, Python or LaTeX formatting language
- What are the problems with the demonstration software which need to be removed in a later realistic project?
The program will have to allow flexible handling of contexts and filenames, treat more than one text, have modular structure/ogranisation.
The Status of Dictionaries
The dictionary is
- one of the three main components of language documentation:
- corpus of recordings and texts
- dictionary
- sketch grammar
- the central component of any linguistic description
- the most useful linguistic product for use by the speech community, or non-linguists in general [1]
Homework - Please, check the quiz!
References
[1] Gibbon, D. 2007. Computational Lexicography, http://www.spectrum.uni-bielefeld.de/~gibbon/Classes/htmd10-computationa... 28/01/07
Jolanta Bachan,
28th January 2007