Site Search:
 
Get TEFL Certified & Start Your Adventure Today!
Teach English Abroad and Get Paid to see the World!
Job Discussion Forums Forum Index Job Discussion Forums
"The Internet's Meeting Place for ESL/EFL Students and Teachers from Around the World!"
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

That corpus thingy.
Goto page 1, 2  Next
 
Post new topic   Reply to topic    Job Discussion Forums Forum Index -> General Discussion
View previous topic :: View next topic  
Author Message
Justin Trullinger



Joined: 28 Jan 2005
Posts: 3110
Location: Seoul, South Korea and Myanmar for a bit

PostPosted: Tue Jan 20, 2009 11:56 pm    Post subject: That corpus thingy. Reply with quote

Who uses langauge corpora in their teaching, studies, or writing?


I have mixed feelings- they're fun, and I feel can be a really useful tool studying language. I like the way they move the focus from some arbitrary concept of "correct language" to actually looking at how a language is used. But they don't mean what many people think they mean, and everything depends on what they're made of.

Favourite corpus use: An aviation English corpus was developed based on European air traffic, which is really cool because instead of academics speculating about what second language English speaking aviators might say, it lets us see what they do say.

Least favourite: The "Real English Guarantee" apparently based on some kind of a corpus study, that I've seen on some Cambridge grammars. What a heap of BS!

So what do you do with corpora? Love? Hate?


Best,
Justin


PS- No need to post to tell me I'm a nerd. I know.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
fluffyhamster



Joined: 13 Mar 2005
Posts: 3292
Location: UK > China > Japan > UK again

PostPosted: Wed Jan 21, 2009 2:24 am    Post subject: Reply with quote

The main corpus that Cambridge has compiled (the CIC) is certainly impressive in size, and the CANCODE subcorpora has produced some interesting and potentially valuable work; then, one only has to compare say a Murphy with an Azar ( Wink Smile Cool )to gain some appreciation of exactly what they might mean by 'Real English'. But I agree that the label should be used with caution, and it is easy to see how it could be abused either way - 'corpus-informed' grammars may in the final analysis not actually bite the authenticity bullet, whilst 'corpus-driven' stuff can potentially overload the learners and make the wood/forest that bit harder to see. ('But at least there are trees!' shout the corpus-inspired nutters. But wait, actually 'corpus-inspired' is a great term that captures well the middle ground - use corpora to verify theories, notice other perhaps new things etc. Certainly, I swear by modern learner dictionaries for the wealth of more or less frequent words and phrases that have been unearthed and entered into them, as well as their genuine examples of usage and grammar; these often provide seeds or fertilizer for lessons...).

The second chapter ('Corpus linguistics and language teaching') of Seidlhofer's Controversies in Applied Linguistics contains interesting debate on what 'Real English' is, and "its" possible uses and misuses).
http://books.google.co.uk/books?id=d7BXWsQ52mQC&pg=PP1&dq=seidlhofer#PPA77,M1
Back to top
View user's profile Send private message
Marcoregano



Joined: 19 May 2003
Posts: 872
Location: Hong Kong

PostPosted: Wed Jan 21, 2009 3:41 am    Post subject: Reply with quote

fluffyhamster wrote:
The second chapter ('Corpus linguistics and language teaching') of Seidlhofer's Controversies in Applied Linguistics contains interesting debate on what 'Real English' is, and "its" possible uses and misuses).


I'm a bit out of touch with this stuff but enjoyed getting into it when I did my MEd a few years ago. It tends to set language traditionalists against liberals, on the one extreme those who say that the core of any language should remain rigid and on the other those who accept quite radical changes.

I'm currently reading (for pleasure!) a book about the evolution of English dictionaries and some of the characters involved (The Surgeon of Crowthorne by Simon Winchester). Initially, dictionaries were seen as custodians of the language, including only what was 'right and proper' and ignoring slang and recently eveolved items of the language. However, in the 19th century when work on the New Oxford Dictionary began there was a complete turnaround, whereby 'misuse' of the language and its words came to be seen as equally relevant for inclusion - and this attitude has prevailed, to the annoyance of the extreme traditionalists.
Back to top
View user's profile Send private message
fluffyhamster



Joined: 13 Mar 2005
Posts: 3292
Location: UK > China > Japan > UK again

PostPosted: Wed Jan 21, 2009 5:12 am    Post subject: Reply with quote

Winchester's a good writer for sure, Marco! The first book I got of his was The River at the Center of the World, and I've been meaning to dig out and finish his The Meaning of Everything: The Story of the OED; I also saw the one that you mentioned lying around at a friend's place a while ago (I'll ask to borrow it!). (Actually, I've just looked Winchester up on Google Book Search, hadn't realized he'd written quite so many books!).
Back to top
View user's profile Send private message
Cohen



Joined: 30 Dec 2008
Posts: 91
Location: Hong Kong

PostPosted: Wed Jan 21, 2009 1:24 pm    Post subject: Reply with quote

I trained in corpus linguistics as part and parcel of my studies both as an undergraduate and postgraduate student of linguistics, and I have to say that I am a big fan of both the study and use of corpora, especially parallel (i.e., bilingual) corpora. I allowed myself to become convinced of the benefits of such forms of applied linguistics/computational linguistics through seeing how just useful they can be for students and teachers alike. Indeed, I no longer even answer questions (or, as I now see them, non-questions) such as 'Is this grammatical?' or 'Is this ungrammatical?'. The only answer I can give is itself a question: 'Is it in the corpus?', if not then I am not interested. Yes, Chomsky's (in)famous creation of 'Colo[u]rless green ideas sleep furiously' is grammatical, at least in the sense of morpho-syntax (subject-verb agreement/concord, phrase-internal word order, phrase order, etc.), but it is meaningless gibberish and, as noted by Wittgenstein in his Notes on Logic, 'The proper theory of judgement must make it impossible to judge nonsense'. And this is the problem with non-corpus-based linguistics; they are not empirical enough, and they rely too much on 'grammaticality judgement tasks', usually with utterances devoid of any context, and certainly not as part of discourse.

So, if you ask someone whether 'I'll give you perhaps' is grammatical they will typically reject it out of hand. After all, 'give' is a ditransitive verb and this sentence (or non-sentence) does not have a direct object, only the subject and indirect object. But, if we look at a large corpus we find things such as a father asking his son whether he is going to do the washing up, to which the boy says 'Perhaps', to which the father trenchantly remarks, 'I'll give you perhaps'.

One of the best uses of corpora though is to inform textbooks and other learning materials. The insights this technique can offer are often very startling. For a layman's introduction to this aspect of corpus-based studies I would recommend 'Corpus Linguistics: Investigating Language Structure and use', by three giants of corpus studies, Biber, Conrad, and Reppen (1998, CUP). For work with a greater philosophical bent (and anti-generative angle) I would strongly recommend Geoffrey Sampson's 'Empirical Linguistics' (2001, Continuum). He is a professor of natural language computing (heavy stuff, indeed) and was a pioneer in the field of corpus linguistics.

The best freely available corpus in my opinion is, of course, the BNC (the British National Corpus):

http://www.natcorp.ox.ac.uk/
Back to top
View user's profile Send private message Visit poster's website
johnslat



Joined: 21 Jan 2003
Posts: 13859
Location: Santa Fe, New Mexico, USA

PostPosted: Wed Jan 21, 2009 2:30 pm    Post subject: Corpus delicti Reply with quote

The decision of whether or not to use a corpus in instruction would depend, I'd say, on what the ends of the instruction are.
For much of my career, I was preparing students to take TOEFL exams, and it seems to me that with such an end in mind, the use of a corpus would not be indicated.
If, however, the end is to teach English as it's most commonly used, especially in speaking, then I can see how using a corpus could be helpful.
Or am I missing something here?
Regards,
John
Back to top
View user's profile Send private message
Justin Trullinger



Joined: 28 Jan 2005
Posts: 3110
Location: Seoul, South Korea and Myanmar for a bit

PostPosted: Wed Jan 21, 2009 3:06 pm    Post subject: Reply with quote

Quote:
The decision of whether or not to use a corpus in instruction would depend, I'd say, on what the ends of the instruction are.
For much of my career, I was preparing students to take TOEFL exams, and it seems to me that with such an end in mind, the use of a corpus would not be indicated.
If, however, the end is to teach English as it's most commonly used, especially in speaking, then I can see how using a corpus could be helpful.
Or am I missing something here?


I wouldn't say you're missing anything exactly, but would add that it depends on the corpus you're considering using.

One of the corpus areas available where I study (Aston) is academic writing, and yes, I might (sometimes do) use an academic writing corpus in TOEFL teaching.

One of the difficulties of corpus use is the hype. A lot of people (and publishers and advertisements) are hyping the idea that a corpus is the secret to getting at "real English," the way it is "really used." And maybe some are, but it depends entirely on what goes into the corpus.

As mentioned above, corpus samples on academic writing are lovely for teaching...academic English and academic writing. It wouldn't be fair to say that it reflects the way people really use English though.

I've seen a corpus study based on speeches in the parliament of the UK. Is it suprising that the two houses come up with dramatically different word frequencies in some cases. And neither one would be "real English."



Something you see a lot with Collins Cobuild, the various university programs, and a lot of corpora work is the idea that "the bigger the better." So far, I disagree with this- I've seen more useful work done with smaller, specialized samples. (See aviation example in earlier post.)

It seems to me that a small specialized corpus can really provide useful insight into how English is used in a specific situation, field, or endeavour. But the big corpora are all wrapped up in arguing about who has the bigger one, or the "real-er" one.


Best,
Justin
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Justin Trullinger



Joined: 28 Jan 2005
Posts: 3110
Location: Seoul, South Korea and Myanmar for a bit

PostPosted: Wed Jan 21, 2009 3:08 pm    Post subject: Reply with quote

PS- Maybe another interesting question is HOW to use a corpus?

Do you use it in planning? Sylabus design? in class? Do you teach students to use it?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
fluffyhamster



Joined: 13 Mar 2005
Posts: 3292
Location: UK > China > Japan > UK again

PostPosted: Wed Jan 21, 2009 4:43 pm    Post subject: Reply with quote

Well, having bigger and bigger 'monitor' corpora should at least in theory make it easier to date when new entries have not only entered but also become an established part of the language, but I rather suspect the design aspects of the original corpus get harder to manage and things a little unbalanced generally (certainly, larger and larger corpora probably won't tell us much more about the most of the "closed" sets of words).

(Cohen:) There are some interesting threads on Sampson (do a search) over on the Teacher Discussion Applied Linguistics forum, such as the following: http://forums.eslcafe.com/teacher/viewtopic.php?p=37225#37225 ; there are also two Language Log links at the end of that linked thread's post that touch on the validity of the whole "Colorless" argument.

As for Sampson's books, his The Language Instinct Debate is probably of wider and more general interest, and the more readable for it.

Personally, I didn't find the Biber et al as enjoyable as Kennedy's An Introduction to Corpus Linguistics or "even" Hunston's Corpora in Applied Linguistics, but then, the Biber's a somewhat more technical "read". Anyway, it'd be interesting to know what you guys think of McEnery et al's Corpus-based Language Studies; An advanced resource book (Routledge 2006), because Biber for one gushes about it on the paperback edition's rear cover.

Regarding how to use corpora, and for what, well, those are subjects tackled in the above sorts of books, and even those who have read some might not have quite the resources or time to really make more use of corpora than checking the odd collocation or whatever on sites such as the BNC one that Cohen mentioned (or indeed by simply Googling); then, as I said above, many dictionaries nowadays are based on corpora and usually provide enough information to be getting on with in the meantime. But one of the "main" uses of corpora seems unfortunately to be educating knee-jerk prescriptivists (e.g. that 'less children' - more frequent/"natural" it seems than the "demanded" 'fewer children'); that is, corpora certainly can be used to "understand" the perfectly comprehensible in spontaneous usage.
Back to top
View user's profile Send private message
fluffyhamster



Joined: 13 Mar 2005
Posts: 3292
Location: UK > China > Japan > UK again

PostPosted: Thu Jan 22, 2009 3:18 am    Post subject: Reply with quote

BTW Justin, have you seen Steven Cushing's Fatal Words: Communication Clashes and Aircraft Crashes?

Quote:
Synopsis

On March 27, 1977, 583 people died when KLM and Pan Am 747s collided on a crowded, foggy runway in Tenerife, the Canary Islands. The cause, a miscommunication between the pilot and the air traffic controller. The pilot radioed, "We are now at takeoff," meaning that the plane was lifting off, but the tower controller misunderstood and thought the plane was waiting on the runway.

In Fatal Words, Steven Cushing explains how miscommunication has led to dozens of aircraft disasters, and he proposes innovative solutions for preventing them. He examines ambiguities in language when aviation jargon and colloquial English are mixed, when a word is used that has different meanings, and when different words are used that sound alike. To remedy these problems, Cushing proposes a visual communication system and a computerized voice mechanism to help clear up confusing language.

Fatal Words is an accessible explanation of some of the most notorious aircraft tragedies of our time, and it will appeal to scholars in communications, linguistics, and cognitive science, to aviation experts, and to general readers.


( http://www.press.uchicago.edu/presssite/metadata.epl?mode=synopsis&bookkey=44900 )
Back to top
View user's profile Send private message
Cohen



Joined: 30 Dec 2008
Posts: 91
Location: Hong Kong

PostPosted: Thu Jan 22, 2009 1:49 pm    Post subject: Reply with quote

This is becoming an excellent discussion, with some great points having been made.

Yes, specialist, subject-specific corpora are certainly worth their weight in gold, and I would see how this could and should benefit teachers and students in pretty much any ESP course. I would recommend those interested in this to take it one step further and read up on the 'sub-language' theory of the late, great Zellig Harris, who, despite being Chomsky's first mentor and linguistics tutor was in no way a generative grammarian. Harris moved into a highly specialist area, namely, the informatics of the sub-language of immunology. His work of course was highly complex as most of it was in the field of (medical) informatics, natural language generation and/or computational linguistics, but some of his stuff is accessible to those with a general background in technical linguistics (though some training in computational linguistics certainly helps).

Regarding the size of the corpus, it is not too hard to answer why larger corpora are better. A corpus has to be of a certain size so that any findings from that corpus are statistically significant. (On a side note here, there is a lovely � though highly technical � paper by Sampson entitled 'Good-Turing frequency estimation without tears', in his Empirical Linguistics, pp. 94-121. I would also recommend his book on computational models of language origins and evolution, 'Evolutionary Language Understanding', 1996, Cassell.)

Back to the general benefits of corpus linguistics. As far as the student goes, they can be helpful in many ways, not least in regards to collocations. People's intuitions about language and language use are often astonishingly inaccurate. Lars Trapp-Jensen for example asked native speakers of Danish what words they thought would be most likely to 'go with' the Danish noun trae (tree, wood). They said, reasonably enough, (the Danish for) 'scented', 'strong', 'green', 'big', 'overturned', and 'wilted'. He then looked at the most frequent collocations of this noun in a 40-million word Danish corpus and found, interestingly enough, that the most common collocations were actually (the Danish for) 'dry-rotten', 'fairest', 'compregnated' (!), 'newly-planted', and 'leafless'. (People it seems simply do not write about 'green trees'.)

Also, it was corpus research that showed how sentences such as 'The ship sank' (i.e., subject � intransitive verb) are exceedingly rare in English, and even committed Chomskyan generative grammarians (such as Fromkin and Rodman) have had to go back and edit and modify their previous statements about such sentences.

Another great use of corpora is to see how lexical items are actually used, that is, how they are distributed in the corpus. It soon becomes apparent that even seemingly synonymous words such as 'small' and 'little' are used in radically different ways by native speakers, either with different collocations, in different structures, or in different registers, or different text types, or perhaps in different modes of communication such as writing or speaking. It is quite incredible; or at least is to language raincoats such as myself.

As I noted above in another post though, perhaps the greatest benefit and import of corpus techniques is the sheer extent to which they can inform textbooks and other guides, as well as the ability to make them more empirical. There is a classic example of this in Biber, Conrad, and Reppen's introduction to corpus linguistics that I referenced above (pp.80-83). In a previous section they investigate the use of subject that-clauses, such as 'That these identifications are circular is unimportant at the moment', and find such constructions are strongly, almost uniquely, associated with expository written registers and are all but non-existent in spoken discourse (no real surprise there). Furthermore, their use in such written registers was itself strongly associated with complex predicates and the expression of given information. As they state (p.80), "Patterns such as this can be very useful to ESL students, since knowing when to use structures appropriately is an essential part of developing communicative competence in a language. However, they then go on to compare their findings with four major advanced ESL textbooks and find that though all four have a description and explanation of this structure, none of them has any explanation of its variants or use, and one even has two oral exercises dealing with the structure! As they note (p.81), "Students are left knowing that this construction exists, but without knowing what its function is or when it is appropriate to use it", and (pp.81-82) "there is little to be gained by having students practice subject that-clauses orally, given that these constructions are almost never used by native speakers in spoken discourse."

Parallel corpora (i.e., bilingual texts, translations of the same text) can also be of immense worth to students and is one particular area I would recommend to any language teacher.

All in all, I find it hard to believe that there are still many who doubt the import of corpus linguistics, both for teachers and for learners. I can only assume they have not taken the time to use a concordancing package and to investigate the use and distribution of structures and lexical items.
Back to top
View user's profile Send private message Visit poster's website
fluffyhamster



Joined: 13 Mar 2005
Posts: 3292
Location: UK > China > Japan > UK again

PostPosted: Thu Jan 22, 2009 3:12 pm    Post subject: Reply with quote

'Green' would surely be valid as an association of (i.e. not necessarily in any actual syntagmatic relationship with) 'tree', but not having read Trapp-Fluffy's research, it's hard to say what the actual elicitation instructions were (but I imagine it would be hard if not counterproductive to convey a definition of collocation that somehow excluded association, unless you wanted people to think long and hard, write out full sentences, edit things etc, all a process which might or might not produce similar things to those more genuine and "spontaneous" items (i.e. unconscious and unelicited, even if also written and perhaps edited) as held in a (natural) corpus.

Sampson's Good-Turing frequency estimation paper is one that I haven't yet read in my copy of EL, will try to check it out soon then! The 1996 Cassell, too (though I'm guessing that might be a bit too technical for me!).
Back to top
View user's profile Send private message
MO39



Joined: 28 Jan 2004
Posts: 1970
Location: El ombligo de la Rep�blica Mexicana

PostPosted: Thu Jan 22, 2009 6:08 pm    Post subject: Reply with quote

fluffyhamster wrote:
The main corpus that Cambridge has compiled (the CIC) is certainly impressive in size, and the CANCODE subcorpora has produced some interesting and potentially valuable work; then, one only has to compare say a Murphy with an Azar ( Wink Smile Cool )to gain some appreciation of exactly what they might mean by 'Real English'.


Most of this discussion has gone completely over my aging head, since the whole "corpus thingy" came into existence many years after my university training as a language teacher. However, when fluffy hamster mentioning comparing Azar (which I haven't used in a long time) and Murphy (which I discovered a couple of years ago while teaching in Spain and now use as my basic grammar texts), a dim light bulb flickered briefly. My students liked Azar because of the clear explanations and charts, but I always thought that the exercises were quite dull. What I like about Murphy are the examples and exercises, which somehow seem to be more "real" than the ones favored by Ms. Azar. Could it be that the corpus so beloved of the posters on this thread has informed the Murphy texts thus lending them an air of authenticity? I await your comments!
Back to top
View user's profile Send private message Send e-mail
johnslat



Joined: 21 Jan 2003
Posts: 13859
Location: Santa Fe, New Mexico, USA

PostPosted: Thu Jan 22, 2009 6:28 pm    Post subject: Reply with quote

Dear MO39,

"What I like about Murphy are the examples and exercises, which somehow seem to be more "real" than the ones favored by Ms. Azar."

I'm asking this out of puzzlement because I've used Murphy and I've used Azar, but the examples and exercises in both seem, to me, anyway, to be
not so very different.
Is there any way you could point out how Murphy's are "more real" (and I realize that since it's a subjective judgment, that may not be possible?)

Regards,
John
Back to top
View user's profile Send private message
fluffyhamster



Joined: 13 Mar 2005
Posts: 3292
Location: UK > China > Japan > UK again

PostPosted: Fri Jan 23, 2009 10:03 am    Post subject: Reply with quote

Actually, perhaps I shouldn't've brought up Murphy at least, because I no longer have a copy of the latest edition of his English Grammar in Use (the blue intermediate one), but I'm pretty sure he or Cambridge themselves mention something about it drawing on their available corpora to at least an "informed" extent. Anyway, even if Murphy isn't all like, y'know, 'REAL English', it can't be just my and now MO39's imagination that there was a qualitative difference between Murphy's and Azar's examples (so John and whoever else is interested, I'll simply direct you back to that 'Grammar Questions' thread that Littlebird started, Page 1, if you don't mind Cool ).

BTW the Advanced English in Use is by Martin Hewings, and I'm almost certain that he's taken note of corpus findings and included at least a few as near genuine/100% authentic examples as possible.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Job Discussion Forums Forum Index -> General Discussion All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


This page is maintained by the one and only Dave Sperling.
Contact Dave's ESL Cafe
Copyright © 2018 Dave Sperling. All Rights Reserved.

Powered by phpBB © 2001, 2002 phpBB Group

Teaching Jobs in China
Teaching Jobs in China