Über den Autor
Markus Dickinson is Assistant Professor at the Department of Linguistics, Indiana University and currently the director of the Computational Linguistics program. His research focuses on improving linguistic annotation for natural language processing technology and automatically analyzing the language of second language learners.rnChris Brew is a Senior Research Scientist with the Educational Testing Service in Princeton, where he is currently the scientific lead for the c-rater project on automated short answer grading. He has been active in Natural Language Processing for over 20 years, first in the UK, then as Associate Professor of Linguistics and Computer Science at The Ohio State University, where he co-directed the Speech and Language Technologies Laboratory, as well as the Computational Linguistics Program.rnDetmar Meurers is Professor of Computational Linguistics and head of the Theoretical Computational Linguistics group at the University of Tübingen. He has a longstanding commitment to teaching Computational Linguistics and Linguistics in a way that combines current technology and research issues with the fundamentals of the field. His research emphasizes the role of linguistic insight and linguistic models in Computational Linguistics. His most recent research adds a focus on theory and applications related to second language acquisition.
What This Book Is About xirnOverview for Instructors xiiirnAcknowledgments xviirn1 Prologue : Encoding Language on Computers 1rn1.1 Where do we start? 1rn1.1.1 Encoding language 2rn1.2 Writing systems used for human languages 2rn1.2.1 Alphabetic systems 3rn1.2.2 Syllabic systems 6rn1.2.3 Logographic writing systems 8rn1.2.4 Systems with unusual realization 11rn1.2.5 Relation to language 11rn1.3 Encoding written language 12rn1.3.1 Storing information on a computer 12rn1.3.2 Using bytes to store characters 14rn1.4 Encoding spoken language 17rn1.4.1 The nature of speech 17rn1.4.2 Articulatory properties 18rn1.4.3 Acoustic properties 18rn1.4.4 Measuring speech 20rnUnder the Hood 1: Reading a spectrogram 21rn1.4.5 Relating written and spoken language 24rnUnder the Hood 2: Language modeling for automatic speech recognition 26rn2 Writers' Aids 33rn2.1 Introduction 33rn2.2 Kinds of spelling errors 34rn2.2.1 Nonword errors 35rn2.2.2 Real-word errors 37rn2.3 Spell checkers 38rn2.3.1 Nonword error detection 39rn2.3.2 Isolated-word spelling correction 41rnUnder the Hood 3: Dynamic programming 44rn2.4 Word correction in context 49rn2.4.1 What is grammar? 50rnUnder the Hood 4: Complexity of languages 56rn2.4.2 Techniques for correcting words in context 58rnUnder the Hood 5: Spell checking for web queries 62rn2.5 Style checkers 64rn3 Language Tutoring Systems 69rn3.1 Learning a language 69rn3.2 Computer-assisted language learning 71rn3.3 Why make CALL tools aware of language? 73rn3.4 What is involved in adding linguistic analysis? 76rn3.4.1 Tokenization 76rn3.4.2 Part-of-speech tagging 78rn3.4.3 Beyond words 80rn3.5 An example ICALL system: TAGARELA 81rn3.6 Modeling the learner 83rn4 Searching 91rn4.1 Introduction 91rn4.2 Searching through structured data 93rn4.3 Searching through unstructured data 95rn4.3.1 Information need 95rn4.3.2 Evaluating search results 96rn4.3.3 Example: Searching the web 97rn4.3.4 How search engines work 100rnUnder the Hood 6: A brief tour of HTML 103rn4.4 Searching semi-structured data with regular expressions 107rn4.4.1 Syntax of regular expressions 108rn4.4.2 Grep: An example of using regular expressions 110rnUnder the Hood 7: Finite-state automata 112rn4.5 Searching text corpora 115rn4.5.1 Why corpora? 116rn4.5.2 Annotated language corpora 117rnUnder the Hood 8: Searching for linguistic patterns on the web 118rn5 Classifying Documents : From Junk Mail Detection to Sentiment Classification 127rn5.1 Automatic document classification 127rn5.2 How computers "learn " 129rn5.2.1 Supervised learning 130rn5.2.2 Unsupervised learning 131rn5.3 Features and evidence 131rn5.4 Application: Spam filtering 133rn5.4.1 Base rates 135rn5.4.2 Payoffs 139rn5.4.3 Back to documents 139rn5.5 Some types of document classifiers 140rn5.5.1 The Naive Bayes classifier 140rnUnder the Hood 9: Naive Bayes 142rn5.5.2 The perceptron 145rn5.5.3 Which classifier to use 148rn5.6 From classification algorithms to context of use 149rn6 Dialog Systems 153rn6.1 Computers that "converse"? 153rn6.2 Why dialogs happen 155rn6.3 Automating dialog 156rn6.3.1 Getting started 156rn6.3.2 Establishing a goal 157rn6.3.3 Accepting the user ' s goal 157rn6.3.4 The caller plays her role 158rn6.3.5 Giving the answer 158rn6.3.6 Negotiating the end of the conversation 159rn6.4 Conventions and framing expectations 159rn6.4.1 Some framing expectations for games and sports 160rn6.4.2 The framing expectations for dialogs 160rn6.5 Properties of dialog 161rn6.5.1 Dialog moves 161rn6.5.2 Speech acts 162rn6.5.3 Conversational maxims 164rn6.6 Dialog systems and their tasks 166rn6.7 Eliza 167rnUnder the Hood 10: How Eliza works 172rn6.8 Spoken dialogs 174rn6.9 How to evaluate a dialog system 175rn6.10 Why is dialog important? 176rn7 Machine Translation Systems 181rn7.1 Computers that "translate"? 181rn7.2 Applications of translation 183rn7.2.1 Translation needs 183rn7.2.2 What is machine translation really for? 184rn7.3 Translating Shakespeare 185rn7.4 The translation triangle 188rn7.5 Translation and meaning 191rn7.6 Words and meanings 193rn7.6.1 Words and other languages 193rn7.6.2 Synonyms and translation equivalents 194rn7.7 Word alignment 194rn7.8 IBM Model 1 198rnUnder the Hood 11: The noisy channel model 200rnUnder the Hood 12: Phrase-based statistical translation 204rn7.9 Commercial automatic translation 205rn7.9.1 Translating weather reports 205rn7.9.2 Translation in the European Union 207rn7.9.3 Prospects for translators 208rn8 Epilogue : Impact of Language Technology 215rnReferences 221rnConcept Index 227
Language and Computers introduces students to the fundamentals of how computers are used to represent, process, and organize textual and spoken information. Concepts are grounded in real-world examples familiar to students' experiences of using language and computers in everyday life.
* A real-world introduction to the fundamentals of how computers process language, written specifically for the undergraduate audience, introducing key concepts from computational linguistics.
* Offers a comprehensive explanation of the problems computers face in handling natural language
* Covers a broad spectrum of language-related applications and issues, including major computer applications involving natural language and the social and ethical implications of these new developments
* The book focuses on real-world examples with which students can identify, using these to explore the technology and how it works
* Features "under-the-hood" sections that give greater detail on selected advanced topics, rendering the book appropriate for more advanced courses, or for independent study by the motivated reader.