Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

книги / Методы лингвистического анализа

..pdf
Скачиваний:
1
Добавлен:
12.11.2023
Размер:
224.55 Кб
Скачать

utterances with a low error count will be very high and will decrease dramatically as the number of errors per utterance increases.

A bimodal distribution is like a combination of two normal distributions – there are two peaks. If you find that your data fall in a bimodal distribution you might consider whether the data actually represent two separate populations of measurements. <…>” [Johnson 2008: 13–15].

ЗАДАНИЕ 2. Откройте словарь английского языка на любой странице. Посчитайте количество слов, состоящих из одного, двух, трех, четырех слогов. Какой это тип распределения?

11

2. ОБЛАСТИ ПРИМЕНЕНИЯ КОЛИЧЕСТВЕННЫХ (СТАТИСТИЧЕСКИХ) И ЭКСПЕРИМЕНТАЛЬНЫХ МЕТОДОВ ЛИНГВИСТИЧЕСКОГО АНАЛИЗА

ЗАДАНИЕ 3. Ознакомьтесь с выдержками /9/–/15/ и ответьте на следующие вопросы:

1.В каких областях лингвистики используются количественные методы исследования данных?

2.Какие методы упоминаются в указанных высказываниях?

3.Какие ещё количественные (статистические) и экспериментальные методы используются в области психолингвистики, социолингвистики и исторической лингвистики?

4.Какие методы задействованы в нескольких областях лингвистики?

/9/ “Increasingly, linguists handle quantitative data in their research. Phoneticians, sociolinguists, psycholinguists, and computational linguists deal in numbers and have for decades. Now also, phonologists, syntacticians, and historical linguists are finding linguistic research to involve quantitative methods. For example, Keller (2003) measured sentence acceptability using a psychophysical technique called magnitude estimation. Also, Boersma and Hayes (2001) employed probabilistic reasoning in a constraint reranking algorithm for optimality theory” [Johnson 2008: 1].

/10/ “Phoneticians have a long tradition of quantifying their observations of pronunciation and hearing. In this history both hypothesis testing and regression techniques have been extensively used. Obviously the full range of methods used in the quantitative analysis of phonetic data cannot be covered in a short chapter on the topic, however I do hope to extend the discussion of t-test and regression in interesting ways here, and to introduce factor analysis” [Johnson 2008: 70].

12

/11/ “With respect to phonetics, researchers also have available to them a range of methods, ranging from the more naturalistic approach of having fieldworkers phonetically transcribe responses to questionnaire items (an approach that is widely used to investigate language variation; see Chapter 6), to the more controlled approaches involving the acoustic analysis of speech (see Thomas 2011 for a recent review; see also Chapters 4, 9, and 17). In experimental elicitation, participants may be asked to read words in isolation or in connected prose, with the goal of conducting an acoustic analysis of intonation, word stress, tone, and various features of consonants and vowels (Ladefoged 2003). There is also a range of imaging techniques to investigate the vocal tract during the production of speech, such as X-rays, computed tomography, MRIs, and ultrasound (see Stone 2010 for an overview of these and other laboratory techniques). In addition to informing debates on phonology and phonetics, these analyses may be used in sociolinguistic studies on language variation, first language studies on the development of pronunciation, and second language studies on the phonological development of non-native speakers” [Schütze, Sprouse 2013: 127].

/12/ “In psycholinguistics experiments, research factors such as word frequency, syntactic construction, or linguistic context are manipulated in a set of materials and then participants provide responses that can be scored on a continuous scale – such as reaction time, preference, or accuracy. The gold standard approach to the quantitative analysis of this type of data (continuous response measure, factorial experimental conditions) testing hypotheses about experimental factors is the analysis of variance (ANOVA)” [Johnson 2008: 104].

13

/13/ Combining corpus and experimental data is also common in the field of psycholinguistics (Gilcuin and Gries 2009). For example, a researcher may investigate the frequency of a particular syntactic construction in a corpus and then collect experimental data using one or more of the many procedures available to psycholinguistics, including lexical decision tasks, eye-tracking, priming, sentence completion, moving window experiments, and acceptability judgment tasks (see Fernández and Cairns 2011 for a recent overview of these techniques). Psycholinguists also make use of a variety of neuroimaging techniques to determine which areas of the brain are active during different types of language processing, including electroencephalography (EEG), which measures electrical activity in the brain, and functional magnetic resonance imaging (fMRI), which measures blood flow levels in the brain (see Fernández and Cairns 2011)” [Schütze, Sprouse 2013: 126].

/14/ “The main data that we study in sociolinguistics are counts of the number of realizations of sociolinguistic variables. For example, a phonological variable might span the different realizations of a vowel. In some words, like pen, I say [I] so that pen rhymes with pin, while other speakers say [ε]. The data that go into a quantitative analysis of this phonological variable are the categorical judgments of the researcher – did the talker say [I] or [ε]? Each word of interest gets scored for the different possible pronunciations of /ε/ and several factors that might influence the choice of variant are also noted. For example, my choice of [I] in pen is probably influenced by my native dialect of English and the fact that this /ε/ occurs with a following /n/ in the syllable coda. Perhaps, also, the likelihood that I will say [I] is influenced by my age, socioeconomic status, gender, current peer group, etc.

Other sociolinguistic variables have to do with other domains of language. For example, we can count how many

14

times a person uses a particular verb inflection and try to predict this morphological usage as a function of syntactic environment, social group, etc. Or we could count how many times a person uses a particular syntactic construction, and try to model this aspect of language behavior by noting relevant linguistic and social aspects of the performance.

The key difference between these data and the data that we typically deal with in phonetics and psycholinguistics is that the critical variable – the dependent measure – is nominal. We aren’t measuring a property like formant frequency or reaction time on a continuous scale, but instead are noting which of a limited number of possible categorical variants was produced by the speaker” [Johnson 2008: 144].

/15/ “Though there are many opportunities to use quantitative methods in historical linguistics, this chapter will focus on how historical linguists take linguistic data and produce tree diagrams showing the historical “family” relationships among languages. For example, Figure 6.1 shows a phylogenetic tree of Indo-European languages (after figure 6 of Nakhleh, Ringe, & Warnow, 2005). This tree embodies many assertions about the history of these languages. For example, the lines connecting Old Persian and Avestan assert that these two languages share a common ancestor language and that earlier proto-Persian/Avestan split in about 1200 BC. As another example, consider the top of the tree. Here the assertion is that proto Indo-European, the hypothesized common ancestor of all of these languages, split in 4000 BC in two languages – one that would eventually develop into Hittite, Luvian and Lycian, and another ancient language that was the progenitor of Latin, Vedic, Greek and all of the other Indo-European languages in this sample. The existence of historical proto-languages is that asserted by the presence of lines connecting languages or groups of languages to each other, and the relative height of the connections between branches shows how tightly the

15

languages bind to each other and perhaps the historical depth of their connection (though placing dates on proto-languages is generally not possible without external historical documentation)” [Johnson 2008: 182].

ЗАДАНИЕ 4. Найдите более подробную информацию о количественных и экспериментальных методах, используемых в различных областях лингвистики. Сделайте сообщение.

16

PNRPU

3. МЕТОДЫ СБОРА, ОБРАБОТКИ И АНАЛИЗА ЛИНГВИСТИЧЕСКИХ ДАННЫХ

ЗАДАНИЕ 5. Ознакомьтесь с выдержками /16/–/23/ и ответьте на следующие вопросы:

1.Что оказалось в центре внимания исследователей, использовавших нижеприведенные методы?

2.Что обозначают использованные исследователями базовые понятия?

3.Каковы достоинства и недостатки перечисленных методов?

4.Какие другие методы могли бы быть использованы исследователями?

/16/ “There are hundreds of different inferential statistical tests than can be used in quantitative analyses. The choice of which test to use depends primarily on the kind and number variables in your data set, and the sorts of relationships that exist between the variables you consider. <…>

Different statistical tests are used depending on whether the variables you are examining (both independent and dependent) are continuous or categorical. <…> Statistical tests exist for examining continuous independent variables (e.g. correlation analyses) or for examining a combination of continuous and categorical independent variables (e.g. Generalized Linear Models, Linear Mixed Models); <…> Again, tests exist for examining multiple independent and dependent variables (e.g. ANOVAs, MANOVAs, Linear Regressions – see Bryman and Gramer, 2008), but these tests require a more advanced explanation than we can provide here. <…> When the dependent variable is categorical, the statistical test we use is called a chi-square test (sometimes abbreviated as x2). When the dependent variable is continuous, the statistical test we use is called a t-test” [Levon 2010: 72-73].

17

/17/ “In a Likert scale (LS) task, participants are given a numerical scale with the endpoints defined as acceptable or unacceptable, and asked to rate each sentence along the scale. The most commonly used scales, as in Figure 3.3, usually consist of an odd number of points (such as 1–5 or 1–7) because odd numbers contain a precise middle point; however, if the research goals require it, a preference can be forced by choosing an even number of points. One of the primary benefits of LS is that it is both numerical and intuitive” [Schütze, Sprouse 2013: 33].

/18/ “In the magnitude estimation (ME) task, participants are given a reference sentence and told that the acceptability of the reference sentence is a specific numerical value (e.g., 100). The reference sentence is called the standard and the value it is assigned is called the modulus. Participants are then asked to rate additional sentences as a proportion of the value of the standard, as in Figure 3.4. For example, a sentence that is twice as acceptable as the standard would be rated 200.

ME was developed by Stevens (1957) explicitly to overcome the problem of potentially non-uniform, and therefore non-meaningful, intervals in the LS task (in the domain of psychophysics). In the ME task, the standard is meant to act as a unit of measure for all of the other sentences in the experiment. In this way, the intervals between sentences can be expressed as proportions of the standard (the unit of measure). <…> As a numerical task, an ME experiment requires the same design properties as an LS task (see Section 3). The choice of the standard can affect the amount of the number line that is available for ratings: a highly acceptable standard set at a modulus of 100 means that nearly all ratings will be between 0 and 100, whereas a relatively unacceptable standard means that nearly all ratings will be above 100. For this reason, and in order to prevent certain types of response strategies, it is normal practice to employ

18

a standard that it is in the middle range of acceptability” [Schütze, Sprouse 2013: 33].

/19/ “Linguists, across the subdisciplines of the field, use sound recordings for a great many purposes – as data, stimuli, and a medium for recording notes. For example, phoneticians often record speech under controlled laboratory conditions to infer information about the production and comprehension of speech in subsequent acoustic and perception studies, respectively. In addition to analyzing acoustic data, phoneticians may employ articulatory methods to observe more directly how speech is produced. By contrast, sociolinguists often record unscripted speech outside of a university environment, such as a speaker’s home. Sometimes these recordings themselves constitute the data (e.g., for sociophonetic analysis), while other times they may be transcribed at varying levels of detail (see Chapter 12), with the resultant text serving as the data (e.g., for the analysis of lexical or morphosyntactic variation and discourse analysis)” [Podesva, Zsiga 2013: 169].

/20/ “Forensic speaker recognition (FSR) is a relatively young type of forensic science. The majority of casework in this field is performed using a so-called “auditory-acoustic” method, in which detailed analytic listening by a human expert is combined with acoustic measurements (Cambier-Langeveld, 2007; Gold & French, 2011). The auditory-acoustic approach is targeted towards analysis and documentation of separable features contained within the speech signal (French & Stevens, 2013). This method is also used at the Netherlands Forensic Institute (NFI) and described briefly below.

Casework in FSR generally consists of questioned audio materials, containing speech which has been attributed to a particular suspect but which the suspect denies having produced. In the casework performed at the NFI, questioned samples most often come from wire-tapped telephone

19

conversations or covert recordings. These questioned samples are compared to reference material (i.e., non-disputed audio samples from the suspect). The first hypothesis to be tested is that the questioned samples and the reference material were produced by the same speaker.

The expert (forensic phonetician) will analyse the materials, keeping in mind a certain population of speakers with the same background as the suspect who could also have produced the questioned material. This population serves to test the alternative hypothesis: that the questioned material and the reference material were produced by different speakers. Similarities and differences between the questioned material and the reference material will always be encountered. For the similarities the expert must consider the discriminating power, i.e. the extent to which the similarity sets a speaker in the relevant population. For the differences the expert must consider whether they fall within the variability to be expected within the speech of one person (intra-speaker variability) or whether they involve the kind of variability to be expected when the speech materials are produced by different persons (inter-speaker variability)” [Cambier-Langeveld, Rossum, Vermeulen 2014: 14-15].

/21/ “Another area where on-line methods have made important contributions has to do with the way in which the human language processing system accesses and makes use of different kinds of linguistic information, such as syntactic vs semantic information. A sentence like “The witness examines by the lawyer turned out to be unreliable” is syntactically temporarily ambiguous, because when only the first few words are available (“The witness examined …”), a comprehender might be tempted to interpret “the witness” as the agentive subject of the verb “examined” – an interpretation that is subsequently shown to be false. This kind of situation, where the parser builds a syntactic structure that is later shown to be

20