language testing 5

合集下载

Language Testing

Language Testing

IV.Some Key Skills & Problems of Proficiency Tests
• 1. Application of language knowledge • 2. Cloze • 3. Reading comprehension • 4. Written expression • Language ability-oriented, • never knowledge-oriented
Testing ≠ Teaching
• Testing focuses on competence discrimination • Teaching highlights learning progress • Testing depends on sampling (to infer one’s
competence from his/her sampled performance)
• Teaching lays stress on integrated language learning • A good means of testing can’t be a good way of teaching!
I.the relationship between teaching and testing
• • • • • • • •
Three possibilities 1.Part of teaching,serve for teaching diagnosing,consolidating,evaluating,etc. 2.Guiding teaching test-oriented teaching (negative washback) 3.Independent of teaching serve as other social functions Don’t distinguish them!

语言测试的种类

语言测试的种类

客观性测试备有标准答案,评分工作比较简单, 可用机器阅卷 客观性测试问题
这种测试很难测试考生产出性语言能力(即说、写 和译的能力) 不易考查考生在真实交际环境中的语言交际能力
23
6.2 主观性测试 (subjective test)
试题答案具有开放性或灵活性。考生根据试题构造答案 (构答题,constructive item)。常用口语、写作、翻译 等题型 阅读、听力测试中,有使用半主观性试题的。这种试题的 答案是半开放的 常用来测试考生的产出性语言能力,可用来考查考生在真 实交际环境中的实际交际能力 主观性测试的缺陷主要是难以掌握评分标准,不易保证评 分信度
间接测试面临的问题
试题的目的不易确定,一道试题是测试哪种微技能往往会产生分歧 试题编制比较困难(语料的选择、题眼的确定、选项的编制、试题难易程度的把握等方 面都十分困难)
14
效度不易保证
4. 测试语言技能的分合
根据测试语言技能分合的不同,语言测试可以分 为: 分立式测试 综合式测试
15
4.1 分立式测试(discrete point test)
4
1. 测试涉及语言使用领域的不同
普通语言测试 测试一般场合的语言使用能力 专用语言测试 测试在某个(些)专门领域的语言使用能力
5
2. 测试目的不同
水平测试 成绩测试 分班测试 学能测试 诊断测试
6
2.1 水平测试 (proficiency test) )
可以测量普通的语言能力,也可以用来测量考 生在某一领域或某些领域的语言能力 可以测量考生的某项或几项语言技能的水平 水平测试是一种基于某种语言能力理论 (theory-based)的测试,不一定与某课程或 教学内容直接关联 水平测试一般是大规模的标准化考试,多由专 门的考试机构研发和实施 参加水平测试的考生总体成绩一般呈正态分布 水平测试多用于人才选拔

Language Testing

Language  Testing








Lift 1. V-T If you lift your eyes or your head, you look up, for example, when you have been reading and someone comes into the room. 抬 起 (眼睛或头) 例: When he finished he lifted his eyes and looked out the window. 他完成以后抬眼向窗外看去。 2. V-T If people in authority lift a law or rule that prevents people from doing something, they end it. 解除 (法令等) 例: The European Commission has urged France to lift its ban on imports of British beef. 欧盟委员会已敦促法国解除对英国牛肉进口的禁令。 3. V-T/V-I If something lifts your spirits or your mood, or if they lift, you start feeling more cheerful. 鼓舞 例: He used his incredible sense of humor to lift my spirits. 他以不可思议的幽默感鼓舞了我的士气。 4. N-COUNT If you give someone a lift somewhere, you take them there in your car as a favor to them. 搭便车 例: He had a car and often gave me a lift home. 他有一辆汽车,经常让我搭便车回家。

Language Testing

Language Testing
Language Testing
Chen Huaying
1 1/29/2011 12:56 PM
Language Testing
(For M. A. Students Spring Semester, 2003)
Course instructor: Chen Huaying (E-mail: chenhuaying@) Classroom: Room 631, Main Building Time: Friday 9:30 – 11:20 Office Hours: Wed. 4:00 – 7:00 pm
Chen Huaying
7 1/29/2011 12:56 PM
Core Readings
Hughes, Arthur. 1991. Testing for Language Teachers. Cambridge: CUP Bachman, Lyle F. 1990. Fundamental Considerations in Language Testing. Shanghai: SFEP. Bachman, Lyle F. & Palmer, Adrian S. 1996. Language Testing in Practice. Shanghai: SFEP.
Chen Huaying 4 1/29/2011 12:56 PM
Assignment/Project Schedule (Please refer to
Hughes, Arthur. pp53—57. Bachman, Lyle F. & Palmer, Adrian S. pp53— Bachman, Palmer, 1996. Language Testing in Practice. Shanghai: SFEP. pp253—365.) Practice. pp253—

language testing 测试练习

language testing 测试练习

Quiz for chapter21.What are the 4 approaches in language testing?2.What is the key idea of psychometric-structuralist approach?3.What levels and modes does this approach analyze language proficiency into?4.Translate: item difficulty, discrimination, criterion proficiency, subtest, sampling, construction,stem/lead, options, distracters, forced-choice items, constructed-response items, discrete-point items, method effect5. Although sampling is an indispensable step in test development, there will be no a sampling error. F6.Test construction is the process of converting the test problems chosen to test items or tasks T7. The most frequently and extensively used test item in a psychometric-structuralist test are multiple-choice items. T8. Directions should be brief,simple and unambiguous. T9. Multiple -choice format remains the most valid measures of language proficiency. F10. This approach is out of date today. F11. This psychometric-structuralist tests are direct tests. F12. The results of the scores are hard to interpret. TQuiz for chapter31.The integrative approach proposed to use holistic procedures to test language proficiency. T2.John W Oller was the leading figure in the campaignagainst psychometric –stucturalist approach. T3.The main ideas of Oller are that language proficiency was indivisible and this indivisibility could be measured directly by pragmatic tests. T4.All test items in a psychometric test is context free. F 5.According to Oller, the first feature of psychometric approach is the involvement of context. F 6.Pragmatic expectance grammar is ability responsible for context dependent language processing. T7.All dictation tests are pragmatic tests. F8.The integrative tests are direct tests and their scores have clear meanings. F9.Cloze tests are designed based on Gestalt Psychology. T 10.The linguistic basis for cloze tests is language redundancy.T11.For standard cloze, there are two ways to delete words from a text:error-counting and right asnwer-counting. F 12.The test taker’s performance on cloze is scored either by exact word method or contextually acceptable method. T 13.Short memory is the container of wording while the long memory is the container of meaning. T14.For dictation tests, the pauses should be in natural breakpoints. T15.Partial dictation is different from standard dictation in that part of it is presented in a printed form. T16.The 1980s witnessed the end of the UCH. T 17.Dictation tests are in lack of domain analysis and sampling, which reduces their validity. TQuiz for chapter 41.In early 20th century, there emerged communicative tests. F2.Both psychometric and integrative tests are ability tests. T3.Needs analysis was first put forward by Brendan Carroll. F4.The characteristics of communicative performance testing aremore readily seen in tests of writing and speaking. T5.According to Carroll, test takers’performance can be scoredwith a 9-band scale. This indicates communicative performance tests are norm-referenced tests. F6.Timothy McNamara is the major developer of OET., aperformance-based test of English as a second language for health professionals. T7.Performance tests are often used for screening and selection.Tmunicative performance tests enable test users to makeconnection between test performance and future language behavior. Tmunicative performance test are low in validity becausethey are scored subjectively. F10.M ultiple scoring and Rash measurement technique are oftenintroduced to increase the reliability of communicative performance tests. T11. According to Carroll, listening, speaking, reading, andwriting are not equal to language skills; instead, they are language performance. T12. In Carroll’s model, listening, speaking and reading, writingare presented in isolation, but in real communication they occur in various combinations. TQuiz for chapter 5municative language ability was proposed by Bachmanand is a new development of communicative competence. T municative competence was first proposed by Hymes inthe 1960s and 1970s, including knowledge that and knowledge how. F3.Canale and Swain’s model divides knowledge in to fourcomponents: grammatical, sociolinguistic, discourse and strategic competence. T4.Bachman’s communicative language ability involveslanguage competence, strategic competence and psychophysical mechanisms, which are subdivided into different components. T(psychophysiological)5.Bachman’s test methods are not specific techniques but aframework of five facets which can be used for different purposes. T6.The relationship between input and response in language testscan be either of the two: reciprocal and nonreciprocal F7.Reciprocal tests must be adaptive tests. T8.Reciprocal language use is characterized by interaction andfeedback. T。

国际英语语言测试系统IELTS(雅思)考试介绍

国际英语语言测试系统IELTS(雅思)考试介绍

国际英语语言测试系统IELTS(雅思)考试介绍IELTS考试的全称为:International English Language Testing System,中文译为"雅思".它是由University of Cambridge Local Examinations Syndicate、IDP Education Australia及The British Council三家共同参与组织设计,并由英国文化教育委员会(The British Council)负责在世界各地组织考试。

设在中国的英国大使馆文化教育处专门设有考试部,负责IELTS考试工作。

此项考试是为非英语国家的人士赴英联邦国家高等教育机构就读和进修必须通过的语言测试。

现在大多的英联邦国家对本国申请技术移民的人士也采用这项考试做为申请人英语能力达标的认证。

IELTS考试分为学术类(Academic)和培训类(General Training),移民申请者被要求参加General Training类的考试,整个考试包括四个部分,听力(30分钟)阅读(60分钟)、写作(60分钟)和口语(15分钟)。

两类听力和口语采用相同试卷,阅读和写作采用不同试卷。

考试成绩的有效期为二年,并要求考生的连续参加考试的时间间隔三个月。

整个考试由英国文化教育处考官亲自主持。

目前在中国设有的固定考点有10个,包括北京、沈阳、西安、上海、南京、广州、福州、深圳、成都、香港。

北京和上海每月举办一次,其它地区2-3个月举办一次。

由于中国对外交流的迅速发展,英国文化教育处正在积极发展和推广此项考试,不久将会有更多的考点和考试时间满足各地考生的要求。

IELTS考试与目前的TOEFL和国内四、六级英语考试有一定的差异,它的听力和口语部分使用使用英音,它的书面考试部分,不采用标准多项选项形式,而以填写单词和短句为主,题目形式多样,较好地考察考生的实际英语能力,避免了答题的猜测因素。

语言测试听力篇 language testing --listening

语言测试听力篇 language testing  --listening

Why did the woman have to go to the office twice?
• A. The director could not give her an appointment right away. • B .The office was closed the first time she went. • C .The computers were out of service the first time she was there. • D . She did not have acceptable identification with her on her first visit.
• • • • •

1.听力材料必须是口语材料 2.听力材料的难度不要过高 3.听力材料的量不要过大 4.听力材料的类型要真实多样 5.听力材料的内容要有新意
-ቤተ መጻሕፍቲ ባይዱ-(刘润清,语言测试和它的方 法:139-140)
2.听力材料的难度不要过高 • Speech rate :
• 刘润清先生在《语言测试和它的方法》一 书里对初级英语水平者提议了120—150 WPM的语速要求(刘润清,2000:140)
• In comprehensive language test, the length of the recording should be kept down in 30 min.
4.听力材料的类型要真实多样
1.The authenticity of communicative situation. 2.Increase some accented English. 3.Soften the influence that the memory factors have on the results of the listening comprehension.

Language Testing

Language Testing

Thank you!
is expected to be able to write or speak, and the people for whom reading and listening materials are primarily intended. Length of text: Topics: Readability: Structural range: a list of structures which may occur in texts or should be not excluded or a general indication of range of structures. Vocabulary range: Dialect, accent, style: Speed of processing: how many words candidates are expected to read or write
the required level of performance for success (80% of the items must be responded to correctly)
Scoring procedure
What rating scales will be used? How many people will rate each piece
Number of passages:
Medium/channel: paper and pen; computer etc.
Techniques: multiple choices; true/false; blank filling
Critirial levels of performance
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

2016/11/15

举例来讲,有几个评委对某学生的英语口语进 行评定。评委A认为一个人的口语要好,必须 发音准确,而该学生的发音好,所以他给打了 个5分。评委B认为流利性最能体现一个人的口 语水平,该同学尽管发音不错,但流利性差一 些,所以她给他3分。同一名学生,让不同的 评委去打分,成绩出现了偏差。这也很自然, 原因是他们没有按照一个评定口语成绩的统一 法则(rules)去给这名学生打分,结果造成了 偏差。这个例子提醒我们,在对人的某些心理 特征,如口语表达能力、阅读理解能力等等进 行测量时,首先要制订一个便于操作的、稳定 的法则或标准。这样得到的测量结果才可靠, 才具有可比性。
Why Tests? In a Classroom
2016/11/15
Why Tests? Outside the Classroom

For the purpose of




selection and admission (Screening/Admissions Test) Assigning students into different levels (placement tests). Examining testees’ language ability (proficiency test). o learn a language (aptitude test). A boundary to determine in or out For both CRT and NRT
2016/11/15
(1)行为样本

语言考试的目的是要测量受试者的语言能力。语言能力是无形的, 如何去测量?只能测量它的有形表现,这里所说的有形表现,是 指语言表现,如说出来的话,写出来的句子,对考试题目所做的 各种反应等等。这些行为,都是无形的语言能力的有形表现,用 心理学术语叫“表征”(manifestation)。所谓行为样本,是指 对语言能力表现行为的有效的抽样。我们知道,一个人的语言能 力的表现行为会有各种各样的形式,考试时不可能也没有必要把 它的全部表现行为都测到,只能选取一部分有代表性的抽样进行 测量,然后据此对受试者的语言能力做出推测。所以,测验只选 取一组有代表性的行为来考察个体在相应行为领域的行为特征。 当个体在某一测验中的反应很恰当地反映出测验所要测的东西时, 该测验就为我们提供了有用的信息。因而可以说构成测验的行为 样本是相应行为领域的一个有效的代表。

Cut-off score

2016/11/15
Evaluation with Tests


Uses (purposes) Different types Characteristics of a good test

Validity, reliability, practicality, backwash
2016/11/15
Measurement, Test and Evaluation
测量、考试与评估
2016/11/15
Measurement

Quantifies the characteristics (both physical and mental) of persons

Examples: height, motivation, aptitude
2016/11/15

在谈到评估与测量及考试的关系时, Bachman(1990)指出,在对个体(学 生)做出评估时,我们可以从质量和数 量两个方面进行描述,或只描述其中一 个方面。所谓质量方面的描述是指对学 生的行为做出定性的描述,如某某学生 的口头表达能力优秀,书面表达能力优 良等;数量方面的描述则是指某次测验 的分数等。对于考试、测量及评估三者 之间的关系,他用下面的图来表示。
2016/11/15
(3)标准化的测量

标准化的测量是指测验在编制、实施、记 分及分数解释等方面有一套严密系统程序。 只有这样,考试才有统一的标准,使不同 人的测量结果才有可比性。同时,可以减 少无关因素对测验结果的影响,从而使之 更为准确、可靠。凡是不标准化的测量, 都没有可比性。
2016/11/15
2016/11/15
Test


Reading/writing tests A procedure designed to get specific samples of a person’s ability A measurement instrument
2016/11/15



考试的定义为“用来获取某些行为的方法,其 目的是从这些行为中推断出个人具有的某些特 征”。 Anastasi(1982)认为,“测试实质上是对行 为样本所做的客观的标准化的测量。” 考试包含以下三个基本要素: ①行为样本 ②客观的测量 ③标准化的测量
2016/11/15
Functions of Tests(考试功能)

Pedagogical Functions (教学功能): To reinforce learning and to motivate the student or primarily as a means of assessing the student’s performance in the language.

Traditional paper-and-pencil tests Format (content and type of questions)
2016/11/15
Evaluation without Tests

Alternative assessment
2016/11/15
③法则

法则是指测量所依据的规则和方法,是测量的关 键。法则不好或不可靠,得到的测量结果就会出 偏差,失去测量的意义。简单来说,尺子不准, 测量的结果就无法使人信服。对客观世界的物体 进行测量时,由于有公认的测量法则或尺度,如 测量物体的高度、重量等,一般不会出现大的偏 差。而对人的某些特性(心理特征)进行测量时, 则往往会出现较大的偏差。
2016/11/15
(2)客观的测量


测验的客观性在什么程度上可为公众认可?这 就牵涉到对测验客观性程度的几个评价指标: 题目质量分析,包括难度和区分度,这是筛选 题目以构成一个好测验的基础。信度,指测验 结果的可靠性程度;效度,指测验结果的有效 性程度,这是评价测验质量最重要的指标。 因此,所谓客观的测量是指测量的标准是否符 合实际。对于一项考试的客观性程度可以从这 几个方面去评价:⑴考试题目的难易度和区分 度如何;⑵考试结果的可靠性程度如何?⑶考 试结果的有效性如何?这三项指标是衡量一项 测试质量的重要指标。
2016/11/15
测量、考试与评估之间关系示意图
Evaluation
Test
Measurement
2016/11/15
评估
2016/11/15

从图中可以看出,我们在对某教育目标 (或学生的行为)作出评估时不一定用 到测试或测量(如面积1 所示),这 种评估属于质量评估,或叫定性评估, 如指出学生在学习方面存在的问题。有 时在作出评估时只需测量,而无需测试 (如面积2 所示),对学生的口头表 达能力定出级别就属于这种性质的评估。 如果要检查学生学习的进步情况,通常 就要对学生实施测试,这又是另一种性 质的评估,即只通过测试对学生的成绩 作出评估(如面积3 所示)。
2016/11/15
入学
课程
结业
2016/11/15
入学 考试 课程 考试
结业
2016/11/15
教学功能
教师 学生 家长 管理部门
调 整 教 学 计 划
制 订 教 学 计 划
了 解 实 际 教 学 效 果
反 映 学 习 进 展
反 映 学 习 上 存 在 的 问 题
了 解 子 女 学 业 进 展
2016/11/15
②指派数字或符号

所谓指派数字或符号,就是用数字或符号来代表 某一事物或事物的某一属性的量。如张三在本次 阅读考试中得了87分,李四得了92分,我们说李 四比张三多考了5分。数字本身没有意义,只是一 种符号。我们用它来代表考生的阅读成绩,这时 它就变成了量化的数,可以对其进行解释和分析。 在一定的条件下,还可以对数据进行运算从而对 事物的属性进行推测。

2016/11/15

测量这一定义包含三个要素: ①事物及其属性 ②指派数字或符号 ③法则
2016/11/15
①事物及其属性

这是测量的对象或目标。对桌子的高度进行测量, 属于对物体进行测量,其属性——高度,是可以 观察到的,可以进行客观测量的。在外语教学领 域,我们感兴趣的是学生的语言能力,而学生的 语言能力属于人的心理特征,是无法直接测量的, 但是人的心理活动会在人的具体活动和行为中体 现出来,所以只能通过测量其外显行为或外在表 现特征来推论一个学生语言能力的高低。
教 学 评 估
课 程 设 置 评 估
2016/11/15

Why tests?

In a classroom Outside the classroom


Evaluation with tests Evaluation without tests
2016/11/15


For teachers’ teaching Evaluating on the effectiveness of syllabus, teaching materials, texts. Making adjustment For knowing more about the students Discuss learners’ abilities in search of suitable texts Diagnose learners’ strengths/weaknesses (Diagnostic test) Make sure the students keep up with the teaching (Progress Test) See if the student is ready for the next level (Achievement Test) Motivate students to study
相关文档
最新文档