Formal models for cognition - taxonomy of spatial location description and frames of refere
Cognitive Dynamics- Dynamic Cognition

Cognitive Dynamics-Dynamic Cognition?Reginald Ferber1Fachbereich2Universit¨a t–GH PaderbornD-33095Paderborn,Germany2Abstract:In the last ten years a paradigm shift took place in cognitive science.While during the seventies problems were commonly attacked by symbol processing approaches,in the last decade many researchers employed connectionist models.These models can be seen as dynamical systems on metric spaces.There is not yet a developed theory of the behavior of these systems, but they seem to be a challenge for future research.The purpose of this paper is to introduce the problem and the dynamic approach to it.1Cognitive ProcessesThe subject of cognitive science is the description and simulation of cognitive processes and structures,especially in the areas of memory,problem solving,verbal behavior, and image identification.Some cognitive processes,for example the production and the parsing of sentences, seem to employ sophisticated symbol manipulating operations,other processes,such as image identification or access to word meaning,seem to rely on fast processing of huge amounts of rather vague knowledge that has been learned in many different situations. This learning is performed in a smooth way,enabling generalization,context sensitivity and noisy inputs.This use of experience makes it necessary to compare situations,i.e.to decide if a new situation is equal to or resembles an old one.This comparison might be achieved by use of distance measures which are sensitive to many parameters of the situation.Distances between symbolic objects are rather artificial constructions, while they are elementary to elements of metric spaces.Another controversy in cognitive science,which is closely related to the question of symbolic processing,is the question,to what degree the cognitive system is modular ([8]).A modular view assumes that the cognitive apparatus consists of independent modules between which data are exchanged.This assumption seems to be the natural consequence of a symbol processing approach.On the other hand,it seems difficult to explain the high speed of perception processes with modular symbol processing systems based on rather slow biological neurons.There are also empirical data that seem to contradict a strictly modular approach.1This research was supported by the Heinz-Nixdorf-Institut and by the Deutsche Forschungsgemein-schaft(DFG)Kennwort:Worterkennungssimulation2E-mail:ferber@psycho2.uni-paderborn.de1To explain these aspects,models with distributed memory and parallel processes have been proposed that can be interpreted as dynamical systems on metric spaces.These models are known under many names,such as connectionism,parallel distributed processing(pdp),neural networks([10],[12],[1],[7],[9],[13]),and are defined in many different ways.In the following paragraph some formal definitions will be given that catch the central aspects of these models to unify terminology(See also[3]).2Neural NetworksThe following very general definition includes most of the deterministic models used in literature.Beside these,there are non-deterministic or probabilistic models.2.1.Cellular StructureLet be a set and a countable set.is called a configuration of values of on the cells of.denotes the space of all configurations.For every cell let be afinite,ordered subset,the neighborhood of cell.The set of all neighborhoods defines a directed graph with node set and the set of edges,the connection graph,net structure or grid of the cellular structure.For every cell let be a local function.Let further be the set of all local functions.Then is calleda cellular structure.with is called the globalfunction of.If isfinite,is called afinite cellular structure.If isfinite,is called a cellular automaton.Figure1:Three Different grid structures.Neighbors are indicated by anarrow from the neighbor to the cell itself.a)Arbitrary grid,b)Rectangulargrid with VON NEUMANN neighborhood,c)One dimensional circular grid.The global function defines an autonomous dynamical system on the configuration space. The behavior of the system can be influenced by the structure of the grid and by the nature of the local functions.Both kinds of restrictions are used to construct models of cognitive behavior.The following restriction on the local functions is used frequently:22.2.Neural Net1.A cellular structure with andwith(1)and monotonic not decreasing functions is called a(deterministic) neural net.is called the output function of cell.is called the weight from cell to cell.2.A function of the formis called a linear threshold function with weights and threshold.The dynamic behavior of a neural net on a given grid is determined by the output functions and the weights.In many cases the same output function is chosen for all cells.Then the behavior of the system depends only on the weights.They can be chosen to achieve a desired behavior of the system.This can be done either in one single step(see for examples[6],[4])or in a longer adaptation process of small smooth changes either before the use of the net as a model or even during the use of the system. This construction of appropriate weights is often called learning.The following restriction on the structure of the net forces a simple dynamical behavior: 2.3.Feed Forward NetLet be a cellular structure.The set is called the set of input cells.Let be the set of cells that can be reached from the cells of by passing through exactly edges of the connection graph of.If and all are disjoint,the grid is called a feed forward grid and are called layers of the grid.is called input layer,with is called output layer and all layers in between are called hidden layers.is calleda feed forward net or a feed forward network.Feed forward neural nets are used to transform an input pattern of values on the input layer into a pattern of values on the output layer of the grid.A well known example with three layers is the perceptron developed1962by F.ROSENBLA TT[14]and extensively studied by M.MINSKY and S.PAPERT in[11].Other examples with more layers and continuous output functions are back-propagation networks.The name is due to the way in which the weights are computed:First the weights are set to random values;3Figure2:A feed forward grid with5layersthen,using a sample of given input and target patterns as training material,the pattern on the output cells produced by the network from the input pattern is compared with the target ing a gradient descend method,the weights are changed in such a way that for this input pattern the difference between the output of the net and the target is reduced.To adapt the weights between earlier layers,error values are propagated back to these layers,and a correction for the weights is computed using these error values. (For details compare[9].)The dynamic of a feed forward net is quite simple:Starting with an arbitrary config-uration,the values of the input cells in thefirst iterate depend only on their local functions since they have no input(argument).From the second iteration on the values of the cell in are constant,since they have only the constant input from the cells of.In this way the values of the subsequent layers become constant in subsequent iterates.In a net with layers the iteration sequence reaches the samefixed point from every configuration within iterations.3ExampleIn the following we shall concentrate on experiments investigating the process of word recognition.One goal of these experiments is to answer the question o f modularity of the cognitive system,in this case the question,if there is an independent module functioning as a mental lexicon where the words known by a person are stored.First we shall give a brief description of the experimental situation in which data on human word recognition are collected.Then we shall outline a simulation of such data using a back-propagation network.Finally a dynamic model is proposed.43.1Word Recognition and PrimingWord recognition is an experimental paradigm that is used frequently to investigate cognitive processes in verbal behavior.The basic idea is to measure the time people need to respond to the presentation of a written word,the so called target.The requested reactions are either to name the target(naming experiment),or to decide,if a presented string of characters is a word of the native language of the person or not,by pressing an appropriate button(lexical decision experiment).In both cases the time elapsing between the onset of the presentation of the target and the onset of the reaction is measured.There are many studies investigating the effect of–frequency of the target in language–regularity of pronunciation–length of the targetand the like.Priming experiments investigate the effect of context on naming and lexical decision. In this case the presentation of the target is preceded by the brief presentation of another word,the so called prime.This prime can be related to the target in different ways. It can–be the same word typed differently(upper vs.lower case)(identity priming).–be semantically related(semantic or associative priming)–precede the target frequently in natural language(syntactic priming)–be similar as a string of characters(graphemic priming)If the presentation of a target that is related to the preceding prime leads to a quicker reaction,then the mental lexicon is probably not completely modular.The results show complex behavior(see[5],[16]for an overview and references). While some studies found some of the priming effects,others did not.There seem to be many factors influencing the results.At least it seems to be rather unlikely that a mental lexicon exists that is completely modular.3.2A Back-propagation ModelWe shall now present a model of word recognition that catches some of the features of a parallel and distributed system.3.2.1.The ModelIn1989M.SEIDENBERG and J.McCLELLAND proposed a“Distributed,Developmental Model of Word Recognition and Naming”[15].They used a modified back-propagation model and were able to simulate“many aspects of human performance including(a) differences between words in terms of processing difficulty,(b)pronunciation of novel5items,(c)differences between readers in terms of word recognition skill,(d)transition from beginning to skilled reading,and (e)differences in performance on lexical decision and naming tasks.”[15:page 523].The net they used,consisted of 3layers:an “orthographic”input layerof 400cells,a hidden layer with 100to 200cells,and an output layer that was divided in two parts:a “phonological”output part with 460cells and an orthographic part that was similar to the input layer.The phonological part of the output was used to simulate naming data,the orthographic part was used to simulate lexical decision data.The layers were fully forward connected,i.e.for and it holds .orthographicinput orthographicoutputphonologicaloutputerroneous pattern converges to the correct one.This process should take more time, if the error is big.The model was trained with2,884stimulus–target pairs,presented from about14 times for low frequent words up to230times for the most frequent words.With every presentation the weights were changed for the orthographic part of the output and the phonological part of the output.Thus the weights from the input to the hidden layer were trained twice,for the orthographic to phonological net and the orthographic to orthographic net.3.2.2.RemarksSeveral remarks can be made on the model described above(3.2.1).1.The model realizes a“metric”system,since input and output are elements of andimensional space.It can be seen as a continuous mapping from to.This continuity is probably one of the reasons for the ability of the model to exploit regularities of the training material and generalize them to new material.2.The effectiveness of the continuity in generalization depends on the representationof the input.On the one hand it has to represent enough of the crucial information of the individual input to distinguish it from other inputs,on the other hand it has to generalize over the individual inputs to extract features they have in common.The representations used in the model are very sophisticated,hence a good deal of its power may be due to the“constructed”representations.3.As the authors mention the number of cells in the hidden layer has a strong influenceon the performance of the model.It determines how much information can be passed through this layer,i.e.how detailed or generalizing the treatment of a single input can be.4.The special structure of the net with the hidden layer in common for the orthographicto phonological net and the orthographic to orthographic net,can be a reason for the model’s generalization behavior in the simulation of the lexical decision task.The representation of the information on the hidden layer has to take account of both the phonological and the orthographic properties of a word.5.The authors stress the point that their model has no lexicon.But the orthographicto orthographic net is a device that reproduces a word from a string of letters.Due to the continuity it is somewhat robust against small perturbation.It will produce the correct output even if only partial information is given as input.Hence with an appropriate functional definition of a lexicon,it is just a distributed implementation of a mental lexicon,including phonetic influences as described in the last remark(4).76.The authors view their model as part of a larger model including layers for meaningand context.In the present implementation it is not visible how these additional components should be integrated.Hence the simulation of further processes such as priming is not possible.7.Because of the feed forward structure of the net,there is no possibility to explainthe influence of previous inputs or stronger influence of a longer input.To the model it makes no difference if the input is presented once or for a longer time.4A Dynamic Model of Word Recognition and PrimingThe model outlined in3.2.1simulates reaction times by distances between patterns of activities on parts of a neural net and expected target patterns.It is assumed that larger distances result in longer times for the formation of correct patterns as input to the next component of the cognitive system.In the remaining part of the paper we shall outline some ideas how a simulation could work that uses the convergence of a dynamical system on a metric space to simulate word recognition processes.4.1Basic AssumptionsFirst some assumptions are listed that point out the basic principles of the dynamic model.4.1.1.Cognition as Dynamical ProcessThefirst assumption of the dynamical model is,that cognitive processes are simulated by a dynamical system given by the global function of a neural network.The cognitive states are represented by the configurations,the time course of the process is simulated by the iteration sequence.If the iteration sequence approaches a small attractor,for example afixed point,this either corresponds to a stable cognitive state,for example the meaning of a word or image,or it is a constant or periodic input to other neural nets stimulating further processes.In both cases the assumption is central,that only configuration sequences that have some stability over(a short period of)time can cause something to happen.4.1.2.Learning:Detecting Regularities in the InputThe second basic idea is,that the neural net is slowly but constantly changed by its input in such a way that co-occurring events are associated,i.e.the configurations resulting from frequent and frequently co-occurring events in the input of the system should be stable.This enables the net to“detect”regularities in its input(compare[7]).From the point of view of the dynamical system this means that by changing the weights of the8neural net,the attractors of the global function and their basins of attraction have to be changed in such a way that frequent input patterns become attractors.4.1.3.Constantly Triggering InputIn contrast to3.2.1it is assumed,that the grid has no pre-defined structure,especially no feed forward structure,but that the structure develops during learning.It should be not very densely connected and it should contain loops.Input is presented to the net in such a way that the input pattern is added to the activities of the input cells for several applications of the global function;i. e.the system is no longer an autonomous system,but is triggered by the external input.with.The input cells are only a small fraction of all cells of the net.From this fraction the influence of the input spreads out to the other cells. There it can match with the existing patterns(of the previous attractor)or it can force them to change,moving the system to the basin of a different attractor.This constant triggering allows on the one hand to control the duration and strength of the input,on the other hand influences of previous inputs are preserved for a while to interact with new influences,as it is necessary to simulate priming effects(compare3.2.2.7).4.1.4.Subnets and ModularityThe distributed representation of the processed information as pattern on the cells of the grid allows complicated interactions,including modularization.It is possible that a subset of cells is strongly interconnected,but has only a few connections to other cells. Such a subset or subnet could be called a module.It is also possible that the system converges for a while relatively independent on two such subnets towards sub-patterns of different attractors,and that later on conflicts arise between these two sub-patterns.For example there might be subnets incorporating“meaning”,and“context”,as proposed by[15].In such a case the configuration coming from the(orthographic)input may converge on one part of the net(say meaning)to one attractor but on the other part (context)it may converge to another attractor,because the surrounding information points toward a different interpretation.This may lead to a contradiction andfinally one of the two attractors will win.The idea of shaping attraction basins is very powerful.It opens possibilities for the explanation of many effects in word recognition.On the other hand it is not yet in such a concrete state that any one of these explanations can be more than a hypothesis.4.2Simulation of Word Recognition ProcessesIn terms of this model the processes involved in naming,lexical decision and priming can be described in the following way:94.2.1.NamingFor the naming task the system has to stimulate the pronunciation of the written word.In a modular approach it is assumed that this is done by the production of a phonological code,which in turn is the basis for the generation of a motor code that controls articulation.A comparable system is also possible for the dynamical model, as a cascade of neural nets,one stimulating the next one as soon as it has reached a stable state(see also[2]).The dynamic model can explain several other phenomena: Frequent words are named faster,since their attractors are strong;regularly pronounced words are named faster,since the sequence of letters are more frequent and hence lead to faster convergence.4.2.2.Lexical decisionThe lexical decision task requires to distinguish between character strings representing words and character strings that do not represent words.In general the words used for this purpose are well known,short,and frequent words of the native language of the subject.The non–word strings are constructed in such a way that they have the same length and that they are pronounceable.From4.1.2it should follow that there is no attractor for these strings since they are new to the system,and there is no meaning associated to them.Hence in those parts of the grid whose configurations represent meaning there should be no convergence.Of course there can be convergence just by chance,but that is equivalent to a wrong answer of a person.4.2.3.PrimingPriming effects occur,when the system is moved by the influence of the prime towards the attractor of the target:The input of the prime changes the configuration of the net in such a way that,if the following target is related to the prime,the configuration will be already closer to the attractor of the target,than it has been before the prime influenced the net.Hence the attractor is reached faster than without the prime.4.2.3.1Identity priming.If the target is the same word as the prime but written in lower case letters,while the prime was written in upper case letters,most of the patterns induced by the two strings will be the same.Hence the impact of the prime on the net will be very similar to that of the target.4.2.3.2Semantic priming.If the prime and the target are semantically related,they appear more frequently together(see[18]).Hence they can lead to the same attractor concerning“meaning”and“context”:the influence of the prime moves the system closer to an attractor that is in many respects also a possible attractor for the target. 4.2.3.3Syntactic priming is based on frequent co-occurrence of words in language. According to4.1.2this s hould lead to faster convergence.104.2.3.4Graphemic priming is based on the similarity of character strings,i. e.the prime is a string of characters in which only very few characters are changed compared to the target.If the strings are entered by activating input cells that represent short sequences(tuples)of characters,most of these tuples will be the same in the prime and the target.Hence a weak form of identity priming will take place.4.2.4.Priming with ambiguous wordsOf special interest are experiments with ambiguous targets,i.e.letter strings that have several meanings.In general a semantic priming effect is observed only for the primary meaning,i.e.the more frequent meaning.If the prime has a strong impact towards the less frequent meaning(secondary meaning),for example if a whole sentence is used to prime that meaning,the reaction is also faster.A closer analysis of the processes([17]) shows that atfirst both meanings are activated according to their frequency.While the primary meaning quickly reaches a high availability,the availability of the secondary meaning grows slower.After about300ms the secondary meaning reaches nearly the same availability as the primary meaning.Afterwards its availability decreases again.These data could be explained by a process like that described in4.1.4.First there is an relatively independent evolution of patterns on different parts of the net,one representing the primary meaning,one representing the secondary meaning.After a while the developing patterns grow so large that they get into a conflict in which the pattern of the primary meaning suppresses that of the secondary meaning.abFigure4:Two ambiguousfigures:Left the so called Necker Cube:Eithervertex a or vertex b can be seen as being in front.Thefigure on theright can either be seen as two black faces or as a white candlestick.A similar process could cause the well known switching effects for ambiguousfigures like those shown infigure4:The two meanings are represented by neighboring attractors of the dynamical system.The influence of additional information moves the system from the basin of one attractor to that of the other.11References[1]Arbib,M.A.(1987).Brains,Machines,and Mathematics.Springer,(Expanded edition).[2]Dell,G.(1986).A spreading-activation theory of retrieval in sentencde porduction.Psychological Review93(3),283-321.[3]Ferber,R.(1992).Neuronale Netze.In Jahrbuch¨Uberblicke Mathematik,S.C.Chatterji,B.Fuchssteiner,U.Kulisch,R.Liedl,&W.Purkert,Eds.Vieweg,Braunschweig,Wiesbaden,pp.137–157.[4]Ferber,R.V orhersage der Suchwortwahl von professionellen Rechercheuren in Literatur-datenbanken durch assoziative Wortnetze.In Mensch und Maschine–Informationelle Schnittstellen der Kommunikation.(Proceedings ISI’92)(1992),H.H.Zimmermann,H.-D.Luckhardt,&A.Schulz,Eds.,Universit¨a tsverlag Konstanz,pp.208–218.[5]Gorfein,D.S.,Ed.(1989).Resolving Semantic Ambiguity.Springer-Verlag.[6]Hopfield,J.J.(1982).Neural networks and physical systems with emergent collectivecomputational A.[7]Kohonen,T.(1988).Self-Organization and Associative Memory.Springer Verlag,Berlin,(Second edition).[8]Levelt,W.J.M.(1991).Die konnektionistische Mode.Sprache&Cognition10(Heft2),61–72.[9]McClelland,J.L.,Rumelhart, D. E.,&the PDP Research Group(1986).ParallelDistributed Processing Explorations in the Microstructure of Cognition.The MIT Press, Cambridge Massachusetts.[10]McCulloch,W.S.,&Pitts,W.(1943).A logical calculus of the ideas immanent in nervousactivity.Bulletin of Mathematical Biophysics,vol.5.[11]Minsky,M.L.,&Papert,S.A.(1988).Perceptrons.The MIT Press,Cambridge,Ma.,(Expanded edition,first edition1969).[12]Palm,G.(1982).Neural Assemblies An alternative Approach to Artificial Intelligence.Springer.[13]Ritter,H.,Martinez,T.,&Schulten,K.(1990).Neuronale Netze.Addison-Wesley.[14]Rosenblatt,F.(1962).Principles of Perceptrons.Spartan,Washington,DC.[15]Seidenberg,M.S.,&McClelland,J.L.(1989).Distributed,developmental model of wordrecognition and naming.Psychological Review No.4,523–568.[16]Sereno,J. A.(1991).Graphemic,associative,and syntactic priming effects at a briefstimulus onset asynchrony in lexical decision and naming.Journal of Experimental Psychology:Learning,Memory,and Cognition No.3,459–477.[17]Simpson,G. B.,&Burgess, C.(1985).Activation and selection processes in therecognition of ambiguous words.Journal of Experimental Psychology:Human Perception and Performance(11),28–39.[18]Wettler,M.,Rapp,R.,&Ferber,R.(1993).Freie Assoziationen und Kontiguit¨a ten vonW¨o rtern in Texten.Zeitschrift f¨u r Psychologie201.12。
SPSS术语中英文对照

SPSS术语中英文对照【常用软件】SPSS术语中英文对照Absolute deviation, 绝对离差Absolute number, 绝对数Absolute residuals, 绝对残差Acceleration array, 加速度立体阵Acceleration in an arbitrary direction, 任意方向上的加速度Acceleration normal, 法向加速度Acceleration space dimension, 加速度空间的维数Acceleration tangential, 切向加速度Acceleration vector, 加速度向量Acceptable hypothesis, 可接受假设Accumulation, 累积Accuracy, 准确度Actual frequency, 实际频数Adaptive estimator, 自适应估计量Addition, 相加Addition theorem, 加法定理Additivity, 可加性Adjusted rate, 调整率Adjusted value, 校正值Admissible error, 容许误差Aggregation, 聚集性Alternative hypothesis, 备择假设Among groups, 组间Amounts, 总量Analysis of correlation, 相关分析Analysis of covariance, 协方差分析Analysis of regression, 回归分析Analysis of time series, 时间序列分析Analysis of variance, 方差分析Angular transformation, 角转换ANOVA (analysis of variance), 方差分析ANOVA Models, 方差分析模型Arcing, 弧/弧旋Arcsine transformation, 反正弦变换Area under the curve, 曲线面积AREG , 评估从一个时间点到下一个时间点回归相关时的误差ARIMA, 季节和非季节性单变量模型的极大似然估计Arithmetic grid paper, 算术格纸Arithmetic mean, 算术平均数Arrhenius relation, 艾恩尼斯关系Assessing fit, 拟合的评估Associative laws, 结合律Asymmetric distribution, 非对称分布Asymptotic bias, 渐近偏倚Asymptotic efficiency, 渐近效率Asymptotic variance, 渐近方差Attributable risk, 归因危险度Attribute data, 属性资料Attribution, 属性Autocorrelation, 自相关Autocorrelation of residuals, 残差的自相关Average, 平均数Average confidence interval length, 平均置信区间长度Average growth rate, 平均增长率Bar chart, 条形图Bar graph, 条形图Base period, 基期Bayes' theorem , Bayes定理Bell-shaped curve, 钟形曲线Bernoulli distribution, 伯努力分布Best-trim estimator, 最好切尾估计量Bias, 偏性Binary logistic regression, 二元逻辑斯蒂回归Binomial distribution, 二项分布Bisquare, 双平方Bivariate Correlate, 二变量相关Bivariate normal distribution, 双变量正态分布Bivariate normal population, 双变量正态总体Biweight interval, 双权区间Biweight M-estimator, 双权M估计量Block, 区组/配伍组BMDP(Biomedical computer programs), BMDP统计软件包Boxplots, 箱线图/箱尾图Breakdown bound, 崩溃界/崩溃点Canonical correlation, 典型相关Caption, 纵标目Case-control study, 病例对照研究Categorical variable, 分类变量Catenary, 悬链线Cauchy distribution, 柯西分布Cause-and-effect relationship, 因果关系Cell, 单元Censoring, 终检Center of symmetry, 对称中心Centering and scaling, 中心化和定标Central tendency, 集中趋势Central value, 中心值CHAID -χ2 Automatic Interac tion Detector, 卡方自动交互检测Chance, 机遇Chance error, 随机误差Chance variable, 随机变量Characteristic equation, 特征方程Characteristic root, 特征根Characteristic vector, 特征向量Chebshev criterion of fit, 拟合的切比雪夫准则Chernoff faces, 切尔诺夫脸谱图Chi-square test, 卡方检验/χ2检验Choleskey decomposition, 乔洛斯基分解Circle chart, 圆图Class interval, 组距Class mid-value, 组中值Class upper limit, 组上限Classified variable, 分类变量Cluster analysis, 聚类分析Cluster sampling, 整群抽样Code, 代码Coded data, 编码数据Coding, 编码Coefficient of contingency, 列联系数Coefficient of determination, 决定系数Coefficient of multiple correlation, 多重相关系数Coefficient of partial correlation, 偏相关系数Coefficient of production-moment correlation, 积差相关系数Coefficient of rank correlation, 等级相关系数Coefficient of regression, 回归系数Coefficient of skewness, 偏度系数Coefficient of variation, 变异系数Cohort study, 队列研究Column, 列Column effect, 列效应Column factor, 列因素Combination pool, 合并Combinative table, 组合表Common factor, 共性因子Common regression coefficient, 公共回归系数Common value, 共同值Common variance, 公共方差Common variation, 公共变异Communality variance, 共性方差Comparability, 可比性Comparison of bathes, 批比较Comparison value, 比较值Compartment model, 分部模型Compassion, 伸缩Complement of an event, 补事件Complete association, 完全正相关Complete dissociation, 完全不相关Complete statistics, 完备统计量Completely randomized design, 完全随机化设计Composite event, 联合事件Composite events, 复合事件Concavity, 凹性Conditional expectation, 条件期望Conditional likelihood, 条件似然Conditional probability, 条件概率Conditionally linear, 依条件线性Confidence interval, 置信区间Confidence limit, 置信限Confidence lower limit, 置信下限Confidence upper limit, 置信上限Confirmatory Factor Analysis , 验证性因子分析Confirmatory research, 证实性实验研究Confounding factor, 混杂因素Conjoint, 联合分析Consistency, 相合性Consistency check, 一致性检验Consistent asymptotically normal estimate, 相合渐近正态估计Consistent estimate, 相合估计Constrained nonlinear regression, 受约束非线性回归Constraint, 约束Contaminated distribution, 污染分布Contaminated Gausssian, 污染高斯分布Contaminated normal distribution, 污染正态分布Contamination, 污染Contamination model, 污染模型Contingency table, 列联表Contour, 边界线Contribution rate, 贡献率Control, 对照Controlled experiments, 对照实验Conventional depth, 常规深度Convolution, 卷积Corrected factor, 校正因子Corrected mean, 校正均值Correction coefficient, 校正系数Correctness, 正确性Correlation coefficient, 相关系数Correlation index, 相关指数Correspondence, 对应Counting, 计数Counts, 计数/频数Covariance, 协方差Covariant, 共变Cox Regression, Cox回归Criteria for fitting, 拟合准则Criteria of least squares, 最小二乘准则Critical ratio, 临界比Critical region, 拒绝域Critical value, 临界值Cross-over design, 交叉设计Cross-section analysis, 横断面分析Cross-section survey, 横断面调查Crosstabs , 交叉表Cross-tabulation table, 复合表Cube root, 立方根Cumulative distribution function, 分布函数Cumulative probability, 累计概率Curvature, 曲率/弯曲Curvature, 曲率Curve fit , 曲线拟和Curve fitting, 曲线拟合Curvilinear regression, 曲线回归Curvilinear relation, 曲线关系Cut-and-try method, 尝试法Cycle, 周期Cyclist, 周期性D test, D检验Data acquisition, 资料收集Data bank, 数据库Data capacity, 数据容量Data deficiencies, 数据缺乏Data handling, 数据处理Data manipulation, 数据处理Data processing, 数据处理Data reduction, 数据缩减Data set, 数据集Data sources, 数据来源Data transformation, 数据变换Data validity, 数据有效性Data-in, 数据输入Data-out, 数据输出Dead time, 停滞期Degree of freedom, 自由度Degree of precision, 精密度Degree of reliability, 可靠性程度Degression, 递减Density function, 密度函数Density of data points, 数据点的密度Dependent variable, 应变量/依变量/因变量Dependent variable, 因变量Depth, 深度Derivative matrix, 导数矩阵Derivative-free methods, 无导数方法Design, 设计Determinacy, 确定性Determinant, 行列式Determinant, 决定因素Deviation, 离差Deviation from average, 离均差Diagnostic plot, 诊断图Dichotomous variable, 二分变量Differential equation, 微分方程Direct standardization, 直接标准化法Discrete variable, 离散型变量DISCRIMINANT, 判断Discriminant analysis, 判别分析Discriminant coefficient, 判别系数Discriminant function, 判别值Dispersion, 散布/分散度Disproportional, 不成比例的Disproportionate sub-class numbers, 不成比例次级组含量Distribution free, 分布无关性/免分布Distribution shape, 分布形状Distribution-free method, 任意分布法Distributive laws, 分配律Disturbance, 随机扰动项Dose response curve, 剂量反应曲线Double blind method, 双盲法Double blind trial, 双盲试验Double exponential distribution, 双指数分布Double logarithmic, 双对数Downward rank, 降秩Dual-space plot, 对偶空间图DUD, 无导数方法Duncan's new multiple range method, 新复极差法/Duncan新法Effect, 实验效应Eigenvalue, 特征值Eigenvector, 特征向量Ellipse, 椭圆Empirical distribution, 经验分布Empirical probability, 经验概率单位Enumeration data, 计数资料Equal sun-class number, 相等次级组含量Equally likely, 等可能Equivariance, 同变性Error, 误差/错误Error of estimate, 估计误差Error type I, 第一类错误Error type II, 第二类错误Estimand, 被估量Estimated error mean squares, 估计误差均方Estimated error sum of squares, 估计误差平方和Euclidean distance, 欧式距离Event, 事件Event, 事件Exceptional data point, 异常数据点Expectation plane, 期望平面Expectation surface, 期望曲面Expected values, 期望值Experiment, 实验Experimental sampling, 试验抽样Experimental unit, 试验单位Explanatory variable, 说明变量Exploratory data analysis, 探索性数据分析Explore Summarize, 探索-摘要Exponential curve, 指数曲线Exponential growth, 指数式增长EXSMOOTH, 指数平滑方法Extended fit, 扩充拟合Extra parameter, 附加参数Extrapolation, 外推法Extreme observation, 末端观测值Extremes, 极端值/极值F distribution, F分布F test, F检验Factor, 因素/因子Factor analysis, 因子分析Factor Analysis, 因子分析Factor score, 因子得分Factorial, 阶乘Factorial design, 析因试验设计False negative, 假阴性False negative error, 假阴性错误Family of distributions, 分布族Family of estimators, 估计量族Fanning, 扇面Fatality rate, 病死率Field investigation, 现场调查Field survey, 现场调查Finite population, 有限总体Finite-sample, 有限样本First derivative, 一阶导数First principal component, 第一主成分First quartile, 第一四分位数Fisher information, 费雪信息量Fitted value, 拟合值Fitting a curve, 曲线拟合Fixed base, 定基Fluctuation, 随机起伏Forecast, 预测Four fold table, 四格表Fourth, 四分点Fraction blow, 左侧比率Fractional error, 相对误差Frequency, 频率Frequency polygon, 频数多边图Frontier point, 界限点Function relationship, 泛函关系Gamma distribution, 伽玛分布Gauss increment, 高斯增量Gaussian distribution, 高斯分布/正态分布Gauss-Newton increment, 高斯-牛顿增量General census, 全面普查GENLOG (Generalized liner models), 广义线性模型Geometric mean, 几何平均数Gini's mean difference, 基尼均差GLM (General liner models), 一般线性模型Goodness of fit, 拟和优度/配合度Gradient of determinant, 行列式的梯度Graeco-Latin square, 希腊拉丁方Grand mean, 总均值Gross errors, 重大错误Gross-error sensitivity, 大错敏感度Group averages, 分组平均Grouped data, 分组资料Guessed mean, 假定平均数Half-life, 半衰期Hampel M-estimators, 汉佩尔M估计量Happenstance, 偶然事件Harmonic mean, 调和均数Hazard function, 风险均数Hazard rate, 风险率Heading, 标目Heavy-tailed distribution, 重尾分布Hessian array, 海森立体阵Heterogeneity, 不同质Heterogeneity of variance, 方差不齐Hierarchical classification, 组内分组Hierarchical clustering method, 系统聚类法High-leverage point, 高杠杆率点HILOGLINEAR, 多维列联表的层次对数线性模型Hinge, 折叶点Histogram, 直方图Historical cohort study, 历史性队列研究Holes, 空洞HOMALS, 多重响应分析Homogeneity of variance, 方差齐性Homogeneity test, 齐性检验Huber M-estimators, 休伯M估计量Hyperbola, 双曲线Hypothesis testing, 假设检验Hypothetical universe, 假设总体Impossible event, 不可能事件Independence, 独立性Independent variable, 自变量Index, 指标/指数Indirect standardization, 间接标准化法Individual, 个体Inference band, 推断带Infinite population, 无限总体Infinitely great, 无穷大Infinitely small, 无穷小Influence curve, 影响曲线Information capacity, 信息容量Initial condition, 初始条件Initial estimate, 初始估计值Initial level, 最初水平Interaction, 交互作用Interaction terms, 交互作用项Intercept, 截距Interpolation, 内插法Interquartile range, 四分位距Interval estimation, 区间估计Intervals of equal probability, 等概率区间Intrinsic curvature, 固有曲率Invariance, 不变性Inverse matrix, 逆矩阵Inverse probability, 逆概率Inverse sine transformation, 反正弦变换Iteration, 迭代Jacobian determinant, 雅可比行列式Joint distribution function, 分布函数Joint probability, 联合概率Joint probability distribution, 联合概率分布K means method, 逐步聚类法Kaplan-Meier, 评估事件的时间长度Kaplan-Merier chart, Kaplan-Merier图Kendall's rank correlation, Kendall等级相关Kinetic, 动力学Kolmogorov-Smirnove test, 柯尔莫哥洛夫-斯米尔诺夫检验Kruskal and Wallis test, Kruskal及Wallis检验/多样本的秩和检验/H检验Kurtosis, 峰度Lack of fit, 失拟Ladder of powers, 幂阶梯Lag, 滞后Large sample, 大样本Large sample test, 大样本检验Latin square, 拉丁方Latin square design, 拉丁方设计Leakage, 泄漏Least favorable configuration, 最不利构形Least favorable distribution, 最不利分布Least significant difference, 最小显著差法Least square method, 最小二乘法Least-absolute-residuals estimates, 最小绝对残差估计Least-absolute-residuals fit, 最小绝对残差拟合Least-absolute-residuals line, 最小绝对残差线Legend, 图例L-estimator, L估计量L-estimator of location, 位置L估计量L-estimator of scale, 尺度L估计量Level, 水平Life expectance, 预期期望寿命Life table, 寿命表Life table method, 生命表法Light-tailed distribution, 轻尾分布Likelihood function, 似然函数Likelihood ratio, 似然比line graph, 线图Linear correlation, 直线相关Linear equation, 线性方程Linear programming, 线性规划Linear regression, 直线回归Linear Regression, 线性回归Linear trend, 线性趋势Loading, 载荷Location and scale equivariance, 位置尺度同变性Location equivariance, 位置同变性Location invariance, 位置不变性Location scale family, 位置尺度族Log rank test, 时序检验Logarithmic curve, 对数曲线Logarithmic normal distribution, 对数正态分布Logarithmic scale, 对数尺度Logarithmic transformation, 对数变换Logic check, 逻辑检查Logistic distribution, 逻辑斯特分布Logit transformation, Logit转换LOGLINEAR, 多维列联表通用模型Lognormal distribution, 对数正态分布Lost function, 损失函数Low correlation, 低度相关Lower limit, 下限Lowest-attained variance, 最小可达方差LSD, 最小显著差法的简称Lurking variable, 潜在变量Main effect, 主效应Major heading, 主辞标目Marginal density function, 边缘密度函数Marginal probability, 边缘概率Marginal probability distribution, 边缘概率分布Matched data, 配对资料Matched distribution, 匹配过分布Matching of distribution, 分布的匹配Matching of transformation, 变换的匹配Mathematical expectation, 数学期望Mathematical model, 数学模型Maximum L-estimator, 极大极小L 估计量Maximum likelihood method, 最大似然法Mean, 均数Mean squares between groups, 组间均方Mean squares within group, 组内均方Means (Compare means), 均值-均值比较Median, 中位数Median effective dose, 半数效量Median lethal dose, 半数致死量Median polish, 中位数平滑Median test, 中位数检验Minimal sufficient statistic, 最小充分统计量Minimum distance estimation, 最小距离估计Minimum effective dose, 最小有效量Minimum lethal dose, 最小致死量Minimum variance estimator, 最小方差估计量MINITAB, 统计软件包Minor heading, 宾词标目Missing data, 缺失值Model specification, 模型的确定Modeling Statistics , 模型统计Models for outliers, 离群值模型Modifying the model, 模型的修正Modulus of continuity, 连续性模Morbidity, 发病率Most favorable configuration, 最有利构形Multidimensional Scaling (ASCAL), 多维尺度/多维标度Multinomial Logistic Regression , 多项逻辑斯蒂回归Multiple comparison, 多重比较Multiple correlation , 复相关Multiple covariance, 多元协方差Multiple linear regression, 多元线性回归Multiple response , 多重选项Multiple solutions, 多解Multiplication theorem, 乘法定理Multiresponse, 多元响应Multi-stage sampling, 多阶段抽样Multivariate T distribution, 多元T分布Mutual exclusive, 互不相容Mutual independence, 互相独立Natural boundary, 自然边界Natural dead, 自然死亡Natural zero, 自然零Negative correlation, 负相关Negative linear correlation, 负线性相关Negatively skewed, 负偏Newman-Keuls method, q检验NK method, q检验No statistical significance, 无统计意义Nominal variable, 名义变量Nonconstancy of variability, 变异的非定常性Nonlinear regression, 非线性相关Nonparametric statistics, 非参数统计Nonparametric test, 非参数检验Nonparametric tests, 非参数检验Normal deviate, 正态离差Normal distribution, 正态分布Normal equation, 正规方程组Normal ranges, 正常范围Normal value, 正常值Nuisance parameter, 多余参数/讨厌参数Null hypothesis, 无效假设Numerical variable, 数值变量Objective function, 目标函数Observation unit, 观察单位Observed value, 观察值One sided test, 单侧检验One-way analysis of variance, 单因素方差分析Oneway ANOVA , 单因素方差分析Open sequential trial, 开放型序贯设计Optrim, 优切尾Optrim efficiency, 优切尾效率Order statistics, 顺序统计量Ordered categories, 有序分类Ordinal logistic regression , 序数逻辑斯蒂回归Ordinal variable, 有序变量Orthogonal basis, 正交基Orthogonal design, 正交试验设计Orthogonality conditions, 正交条件ORTHOPLAN, 正交设计Outlier cutoffs, 离群值截断点Outliers, 极端值OVERALS , 多组变量的非线性正规相关Overshoot, 迭代过度Paired design, 配对设计Paired sample, 配对样本Pairwise slopes, 成对斜率Parabola, 抛物线Parallel tests, 平行试验Parameter, 参数Parametric statistics, 参数统计Parametric test, 参数检验Partial correlation, 偏相关Partial regression, 偏回归Partial sorting, 偏排序Partials residuals, 偏残差Pattern, 模式Pearson curves, 皮尔逊曲线Peeling, 退层Percent bar graph, 百分条形图Percentage, 百分比Percentile, 百分位数Percentile curves, 百分位曲线Periodicity, 周期性Permutation, 排列P-estimator, P估计量Pie graph, 饼图Pitman estimator, 皮特曼估计量Pivot, 枢轴量Planar, 平坦Planar assumption, 平面的假设PLANCARDS, 生成试验的计划卡Point estimation, 点估计Poisson distribution, 泊松分布Polishing, 平滑Polled standard deviation, 合并标准差Polled variance, 合并方差Polygon, 多边图Polynomial, 多项式Polynomial curve, 多项式曲线Population, 总体Population attributable risk, 人群归因危险度Positive correlation, 正相关Positively skewed, 正偏Posterior distribution, 后验分布Power of a test, 检验效能Precision, 精密度Predicted value, 预测值Preliminary analysis, 预备性分析Principal component analysis, 主成分分析Prior distribution, 先验分布Prior probability, 先验概率Probabilistic model, 概率模型probability, 概率Probability density, 概率密度Product moment, 乘积矩/协方差Profile trace, 截面迹图Proportion, 比/构成比Proportion allocation in stratified random sampling, 按比例分层随机抽样Proportionate, 成比例Proportionate sub-class numbers, 成比例次级组含量Prospective study, 前瞻性调查Proximities, 亲近性Pseudo F test, 近似F检验Pseudo model, 近似模型Pseudosigma, 伪标准差Purposive sampling, 有目的抽样QR decomposition, QR分解Quadratic approximation, 二次近似Qualitative classification, 属性分类Qualitative method, 定性方法Quantile-quantile plot, 分位数-分位数图/Q-Q图Quantitative analysis, 定量分析Quartile, 四分位数Quick Cluster, 快速聚类Radix sort, 基数排序Random allocation, 随机化分组Random blocks design, 随机区组设计Random event, 随机事件Randomization, 随机化Range, 极差/全距Rank correlation, 等级相关Rank sum test, 秩和检验Rank test, 秩检验Ranked data, 等级资料Rate, 比率Ratio, 比例Raw data, 原始资料Raw residual, 原始残差Rayleigh's test, 雷氏检验Rayleigh's Z, 雷氏Z值Reciprocal, 倒数Reciprocal transformation, 倒数变换Recording, 记录Redescending estimators, 回降估计量Reducing dimensions, 降维Re-expression, 重新表达Reference set, 标准组Region of acceptance, 接受域Regression coefficient, 回归系数Regression sum of square, 回归平方和Rejection point, 拒绝点Relative dispersion, 相对离散度Relative number, 相对数Reliability, 可靠性Reparametrization, 重新设置参数Replication, 重复Report Summaries, 报告摘要Residual sum of square, 剩余平方和Resistance, 耐抗性Resistant line, 耐抗线Resistant technique, 耐抗技术R-estimator of location, 位置R估计量R-estimator of scale, 尺度R估计量Retrospective study, 回顾性调查Ridge trace, 岭迹Ridit analysis, Ridit分析Rotation, 旋转Rounding, 舍入Row, 行Row effects, 行效应Row factor, 行因素RXC table, RXC表Sample, 样本Sample regression coefficient, 样本回归系数Sample size, 样本量Sample standard deviation, 样本标准差Sampling error, 抽样误差SAS(Statistical analysis system ), SAS统计软件包Scale, 尺度/量表Scatter diagram, 散点图Schematic plot, 示意图/简图Score test, 计分检验Screening, 筛检SEASON, 季节分析Second derivative, 二阶导数Second principal component, 第二主成分SEM (Structural equation modeling), 结构化方程模型Semi-logarithmic graph, 半对数图Semi-logarithmic paper, 半对数格纸Sensitivity curve, 敏感度曲线Sequential analysis, 贯序分析Sequential data set, 顺序数据集Sequential design, 贯序设计Sequential method, 贯序法Sequential test, 贯序检验法Serial tests, 系列试验Short-cut method, 简捷法Sigmoid curve, S形曲线Sign function, 正负号函数Sign test, 符号检验Signed rank, 符号秩Significance test, 显著性检验Significant figure, 有效数字Simple cluster sampling, 简单整群抽样Simple correlation, 简单相关Simple random sampling, 简单随机抽样Simple regression, 简单回归simple table, 简单表Sine estimator, 正弦估计量Single-valued estimate, 单值估计Singular matrix, 奇异矩阵Skewed distribution, 偏斜分布Skewness, 偏度Slash distribution, 斜线分布Slope, 斜率Smirnov test, 斯米尔诺夫检验Source of variation, 变异来源Spearman rank correlation, 斯皮尔曼等级相关Specific factor, 特殊因子Specific factor variance, 特殊因子方差Spectra , 频谱Spherical distribution, 球型正态分布Spread, 展布SPSS(Statistical package for the social science), SPSS统计软件包Spurious correlation, 假性相关Square root transformation, 平方根变换Stabilizing variance, 稳定方差Standard deviation, 标准差Standard error, 标准误Standard error of difference, 差别的标准误Standard error of estimate, 标准估计误差Standard error of rate, 率的标准误Standard normal distribution, 标准正态分布Standardization, 标准化Starting value, 起始值Statistic, 统计量Statistical control, 统计控制Statistical graph, 统计图Statistical inference, 统计推断Statistical table, 统计表Steepest descent, 最速下降法Stem and leaf display, 茎叶图Step factor, 步长因子Stepwise regression, 逐步回归Storage, 存Strata, 层(复数)Stratified sampling, 分层抽样Stratified sampling, 分层抽样Strength, 强度Stringency, 严密性Structural relationship, 结构关系Studentized residual, 学生化残差/t化残差Sub-class numbers, 次级组含量Subdividing, 分割Sufficient statistic, 充分统计量Sum of products, 积和Sum of squares, 离差平方和Sum of squares about regression, 回归平方和Sum of squares between groups, 组间平方和Sum of squares of partial regression, 偏回归平方和Sure event, 必然事件Survey, 调查Survival, 生存分析Survival rate, 生存率Suspended root gram, 悬吊根图Symmetry, 对称Systematic error, 系统误差Systematic sampling, 系统抽样Tags, 标签Tail area, 尾部面积Tail length, 尾长Tail weight, 尾重Tangent line, 切线Target distribution, 目标分布Taylor series, 泰勒级数Tendency of dispersion, 离散趋势Testing of hypotheses, 假设检验Theoretical frequency, 理论频数Time series, 时间序列Tolerance interval, 容忍区间Tolerance lower limit, 容忍下限Tolerance upper limit, 容忍上限Torsion, 扰率Total sum of square, 总平方和Total variation, 总变异Transformation, 转换Treatment, 处理Trend, 趋势Trend of percentage, 百分比趋势Trial, 试验Trial and error method, 试错法Tuning constant, 细调常数Two sided test, 双向检验Two-stage least squares, 二阶最小平方Two-stage sampling, 二阶段抽样Two-tailed test, 双侧检验Two-way analysis of variance, 双因素方差分析Two-way table, 双向表Type I error, 一类错误/α错误Type II error, 二类错误/β错误UMVU, 方差一致最小无偏估计简称Unbiased estimate, 无偏估计Unconstrained nonlinear regression , 无约束非线性回归Unequal subclass number, 不等次级组含量Ungrouped data, 不分组资料Uniform coordinate, 均匀坐标Uniform distribution, 均匀分布Uniformly minimum variance unbiased estimate, 方差一致最小无偏估计Unit, 单元Unordered categories, 无序分类Upper limit, 上限Upward rank, 升秩Vague concept, 模糊概念Validity, 有效性VARCOMP (Variance component estimation), 方差元素估计Variability, 变异性Variable, 变量Variance, 方差Variation, 变异Varimax orthogonal rotation, 方差最大正交旋转Volume of distribution, 容积W test, W检验Weibull distribution, 威布尔分布Weight, 权数Weighted Chi-square test, 加权卡方检验/Cochran检验Weighted linear regression method, 加权直线回归Weighted mean, 加权平均数Weighted mean square, 加权平均方差Weighted sum of square, 加权平方和Weighting coefficient, 权重系数Weighting method, 加权法W-estimation, W估计量W-estimation of location, 位置W估计量Width, 宽度Wilcoxon paired test, 威斯康星配对法/配对符号秩和检验Wild point, 野点/狂点Wild value, 野值/狂值Winsorized mean, 缩尾均值Withdraw, 失访Youden's index, 尤登指数Z test, Z检验Zero correlation, 零相关Z-transformation, Z变换。
SPSS词汇(中英文对照)

Asymptotic efficiency, 渐近效率Asymptotic variance, 渐近方差Attributable risk, 归因危险度Attribute data, 属性资料Attribution, 属性Autocorrelation, 自相关Autocorrelation of residuals, 残差的自相关Average, 平均数Average confidence interval length, 平均置信区间长度Average growth rate, 平均增长率Bar chart, 条形图Bar graph, 条形图Base period, 基期Bayes' theorem , Bayes定理Bell-shaped curve, 钟形曲线Bernoulli distribution, 伯努力分布Best-trim estimator, 最好切尾估计量Bias, 偏性Binary logistic regression, 二元逻辑斯蒂回归Binomial distribution, 二项分布Bisquare, 双平方Bivariate Correlate, 二变量相关Bivariate normal distribution, 双变量正态分布Bivariate normal population, 双变量正态总体Biweight interval, 双权区间Biweight M-estimator, 双权M估计量Block, 区组/配伍组BMDP(Biomedical computer programs), BMDP统计软件包Boxplots, 箱线图/箱尾图Breakdown bound, 崩溃界/崩溃点Canonical correlation, 典型相关Caption, 纵标目Case-control study, 病例对照研究Categorical variable, 分类变量Catenary, 悬链线Cauchy distribution, 柯西分布Cause-and-effect relationship, 因果关系Cell, 单元Censoring, 终检Center of symmetry, 对称中心Centering and scaling, 中心化和定标Central tendency, 集中趋势Central value, 中心值CHAID -χ2 Automatic Interaction Detector, 卡方自动交互检测Chance, 机遇Chance error, 随机误差Chance variable, 随机变量Characteristic equation, 特征方程Characteristic root, 特征根Characteristic vector, 特征向量Chebshev criterion of fit, 拟合的切比雪夫准则Chernoff faces, 切尔诺夫脸谱图Chi-square test, 卡方检验/χ2检验Choleskey decomposition, 乔洛斯基分解Circle chart, 圆图Class interval, 组距Class mid-value, 组中值Class upper limit, 组上限Classified variable, 分类变量Cluster analysis, 聚类分析Cluster sampling, 整群抽样Code, 代码Coded data, 编码数据Coding, 编码Coefficient of contingency, 列联系数Coefficient of determination, 决定系数Coefficient of multiple correlation, 多重相关系数Coefficient of partial correlation, 偏相关系数Coefficient of production-moment correlation, 积差相关系数Coefficient of rank correlation, 等级相关系数Coefficient of regression, 回归系数Coefficient of skewness, 偏度系数Coefficient of variation, 变异系数Cohort study, 队列研究Column, 列Column effect, 列效应Column factor, 列因素Combination pool, 合并Combinative table, 组合表Common factor, 共性因子Common regression coefficient, 公共回归系数Common value, 共同值Common variance, 公共方差Common variation, 公共变异Communality variance, 共性方差Comparability, 可比性Comparison of bathes, 批比较Comparison value, 比较值Compartment model, 分部模型Compassion, 伸缩Complement of an event, 补事件Complete association, 完全正相关Complete dissociation, 完全不相关Complete statistics, 完备统计量Completely randomized design, 完全随机化设计Composite event, 联合事件Composite events, 复合事件Concavity, 凹性Conditional expectation, 条件期望Conditional likelihood, 条件似然Conditional probability, 条件概率Conditionally linear, 依条件线性Confidence interval, 置信区间Confidence limit, 置信限Confidence lower limit, 置信下限Confidence upper limit, 置信上限Confirmatory Factor Analysis , 验证性因子分析Confirmatory research, 证实性实验研究Confounding factor, 混杂因素Conjoint, 联合分析Consistency, 相合性Consistency check, 一致性检验Consistent asymptotically normal estimate, 相合渐近正态估计Consistent estimate, 相合估计Constrained nonlinear regression, 受约束非线性回归Constraint, 约束Contaminated distribution, 污染分布Contaminated Gausssian, 污染高斯分布Contaminated normal distribution, 污染正态分布Contamination, 污染Contamination model, 污染模型Contingency table, 列联表Contour, 边界线Contribution rate, 贡献率Control, 对照Controlled experiments, 对照实验Conventional depth, 常规深度Convolution, 卷积Corrected factor, 校正因子Corrected mean, 校正均值Correction coefficient, 校正系数Correctness, 正确性Correlation coefficient, 相关系数Correlation index, 相关指数Correspondence, 对应Counting, 计数Counts, 计数/频数Covariance, 协方差Covariant, 共变Cox Regression, Cox回归Criteria for fitting, 拟合准则Criteria of least squares, 最小二乘准则Critical ratio, 临界比Critical region, 拒绝域Critical value, 临界值Cross-over design, 交叉设计Cross-section analysis, 横断面分析Cross-section survey, 横断面调查Crosstabs , 交叉表Cross-tabulation table, 复合表Cube root, 立方根Cumulative distribution function, 分布函数Cumulative probability, 累计概率Curvature, 曲率/弯曲Curvature, 曲率Curve fit , 曲线拟和Curve fitting, 曲线拟合Curvilinear regression, 曲线回归Curvilinear relation, 曲线关系Cut-and-try method, 尝试法Cycle, 周期Cyclist, 周期性D test, D检验Data acquisition, 资料收集Data bank, 数据库Data capacity, 数据容量Data deficiencies, 数据缺乏Data handling, 数据处理Data manipulation, 数据处理Data processing, 数据处理Data reduction, 数据缩减Data set, 数据集Data sources, 数据来源Data transformation, 数据变换Data validity, 数据有效性Data-in, 数据输入Data-out, 数据输出Dead time, 停滞期Degree of freedom, 自由度Degree of precision, 精密度Degree of reliability, 可靠性程度Degression, 递减Density function, 密度函数Density of data points, 数据点的密度Dependent variable, 应变量/依变量/因变量Dependent variable, 因变量Depth, 深度Derivative matrix, 导数矩阵Derivative-free methods, 无导数方法Design, 设计Determinacy, 确定性Determinant, 行列式Determinant, 决定因素Deviation, 离差Deviation from average, 离均差Diagnostic plot, 诊断图Dichotomous variable, 二分变量Differential equation, 微分方程Direct standardization, 直接标准化法Discrete variable, 离散型变量DISCRIMINANT, 判断Discriminant analysis, 判别分析Discriminant coefficient, 判别系数Discriminant function, 判别值Dispersion, 散布/分散度Disproportional, 不成比例的Disproportionate sub-class numbers, 不成比例次级组含量Distribution free, 分布无关性/免分布Distribution shape, 分布形状Distribution-free method, 任意分布法Distributive laws, 分配律Disturbance, 随机扰动项Dose response curve, 剂量反应曲线Double blind method, 双盲法Double blind trial, 双盲试验Double exponential distribution, 双指数分布Double logarithmic, 双对数Downward rank, 降秩Dual-space plot, 对偶空间图DUD, 无导数方法Duncan's new multiple range method, 新复极差法/Duncan新法Effect, 实验效应Eigenvalue, 特征值Eigenvector, 特征向量Ellipse, 椭圆Empirical distribution, 经验分布Empirical probability, 经验概率单位Enumeration data, 计数资料Equal sun-class number, 相等次级组含量Equally likely, 等可能Equivariance, 同变性Error, 误差/错误Error of estimate, 估计误差Error type I, 第一类错误Error type II, 第二类错误Estimand, 被估量Estimated error mean squares, 估计误差均方Estimated error sum of squares, 估计误差平方和Euclidean distance, 欧式距离Event, 事件Event, 事件Exceptional data point, 异常数据点Expectation plane, 期望平面Expectation surface, 期望曲面Expected values, 期望值Experiment, 实验Experimental sampling, 试验抽样Experimental unit, 试验单位Explanatory variable, 说明变量Exploratory data analysis, 探索性数据分析Explore Summarize, 探索-摘要Exponential curve, 指数曲线Exponential growth, 指数式增长EXSMOOTH, 指数平滑方法Extended fit, 扩充拟合Extra parameter, 附加参数Extrapolation, 外推法Extreme observation, 末端观测值Extremes, 极端值/极值F distribution, F分布F test, F检验Factor, 因素/因子Factor analysis, 因子分析Factor Analysis, 因子分析Factor score, 因子得分Factorial, 阶乘Factorial design, 析因试验设计False negative, 假阴性False negative error, 假阴性错误Family of distributions, 分布族Family of estimators, 估计量族Fanning, 扇面Fatality rate, 病死率Field investigation, 现场调查Field survey, 现场调查Finite population, 有限总体Finite-sample, 有限样本First derivative, 一阶导数First principal component, 第一主成分First quartile, 第一四分位数Fisher information, 费雪信息量Fitted value, 拟合值Fitting a curve, 曲线拟合Fixed base, 定基Fluctuation, 随机起伏Forecast, 预测Four fold table, 四格表Fourth, 四分点Fraction blow, 左侧比率Fractional error, 相对误差Frequency, 频率Frequency polygon, 频数多边图Frontier point, 界限点Function relationship, 泛函关系Gamma distribution, 伽玛分布Gauss increment, 高斯增量Gaussian distribution, 高斯分布/正态分布Gauss-Newton increment, 高斯-牛顿增量General census, 全面普查GENLOG (Generalized liner models), 广义线性模型Geometric mean, 几何平均数Gini's mean difference, 基尼均差GLM (General liner models), 一般线性模型Goodness of fit, 拟和优度/配合度Gradient of determinant, 行列式的梯度Graeco-Latin square, 希腊拉丁方Grand mean, 总均值Gross errors, 重大错误Gross-error sensitivity, 大错敏感度Group averages, 分组平均Grouped data, 分组资料Guessed mean, 假定平均数Half-life, 半衰期Hampel M-estimators, 汉佩尔M估计量Happenstance, 偶然事件Harmonic mean, 调和均数Hazard function, 风险均数Hazard rate, 风险率Heading, 标目Heavy-tailed distribution, 重尾分布Hessian array, 海森立体阵Heterogeneity, 不同质Heterogeneity of variance, 方差不齐Hierarchical classification, 组内分组Hierarchical clustering method, 系统聚类法High-leverage point, 高杠杆率点HILOGLINEAR, 多维列联表的层次对数线性模型Hinge, 折叶点Histogram, 直方图Historical cohort study, 历史性队列研究Holes, 空洞HOMALS, 多重响应分析Homogeneity of variance, 方差齐性Homogeneity test, 齐性检验Huber M-estimators, 休伯M估计量Hyperbola, 双曲线Hypothesis testing, 假设检验Hypothetical universe, 假设总体Impossible event, 不可能事件Independence, 独立性Independent variable, 自变量Index, 指标/指数Indirect standardization, 间接标准化法Individual, 个体Inference band, 推断带Infinite population, 无限总体Infinitely great, 无穷大Infinitely small, 无穷小Influence curve, 影响曲线Information capacity, 信息容量Initial condition, 初始条件Initial estimate, 初始估计值Initial level, 最初水平Interaction, 交互作用Interaction terms, 交互作用项Intercept, 截距Interpolation, 内插法Interquartile range, 四分位距Interval estimation, 区间估计Intervals of equal probability, 等概率区间Intrinsic curvature, 固有曲率Invariance, 不变性Inverse matrix, 逆矩阵Inverse probability, 逆概率Inverse sine transformation, 反正弦变换Iteration, 迭代Jacobian determinant, 雅可比行列式Joint distribution function, 分布函数Joint probability, 联合概率Joint probability distribution, 联合概率分布K means method, 逐步聚类法Kaplan-Meier, 评估事件的时间长度Kaplan-Merier chart, Kaplan-Merier图Kendall's rank correlation, Kendall等级相关Kinetic, 动力学Kolmogorov-Smirnove test, 柯尔莫哥洛夫-斯米尔诺夫检验Kruskal and Wallis test, Kruskal及Wallis检验/多样本的秩和检验/H检验Kurtosis, 峰度Lack of fit, 失拟Ladder of powers, 幂阶梯Lag, 滞后Large sample, 大样本Large sample test, 大样本检验Latin square, 拉丁方Latin square design, 拉丁方设计Leakage, 泄漏Least favorable configuration, 最不利构形Least favorable distribution, 最不利分布Least significant difference, 最小显著差法Least square method, 最小二乘法Least-absolute-residuals estimates, 最小绝对残差估计Least-absolute-residuals fit, 最小绝对残差拟合Least-absolute-residuals line, 最小绝对残差线Legend, 图例L-estimator, L估计量L-estimator of location, 位置L估计量L-estimator of scale, 尺度L估计量Level, 水平Life expectance, 预期期望寿命Life table, 寿命表Life table method, 生命表法Light-tailed distribution, 轻尾分布Likelihood function, 似然函数Likelihood ratio, 似然比line graph, 线图Linear correlation, 直线相关Linear equation, 线性方程Linear programming, 线性规划Linear regression, 直线回归Linear Regression, 线性回归Linear trend, 线性趋势Loading, 载荷Location and scale equivariance, 位置尺度同变性Location equivariance, 位置同变性Location invariance, 位置不变性Location scale family, 位置尺度族Log rank test, 时序检验Logarithmic curve, 对数曲线Logarithmic normal distribution, 对数正态分布Logarithmic scale, 对数尺度Logarithmic transformation, 对数变换Logic check, 逻辑检查Logistic distribution, 逻辑斯特分布Logit transformation, Logit转换LOGLINEAR, 多维列联表通用模型Lognormal distribution, 对数正态分布Lost function, 损失函数Low correlation, 低度相关Lower limit, 下限Lowest-attained variance, 最小可达方差LSD, 最小显著差法的简称Lurking variable, 潜在变量Main effect, 主效应Major heading, 主辞标目Marginal density function, 边缘密度函数Marginal probability, 边缘概率Marginal probability distribution, 边缘概率分布Matched data, 配对资料Matched distribution, 匹配过分布Matching of distribution, 分布的匹配Matching of transformation, 变换的匹配Mathematical expectation, 数学期望Mathematical model, 数学模型Maximum L-estimator, 极大极小L 估计量Maximum likelihood method, 最大似然法Mean, 均数Mean squares between groups, 组间均方Mean squares within group, 组内均方Means (Compare means), 均值-均值比较Median, 中位数Median effective dose, 半数效量Median lethal dose, 半数致死量Median polish, 中位数平滑Median test, 中位数检验Minimal sufficient statistic, 最小充分统计量Minimum distance estimation, 最小距离估计Minimum effective dose, 最小有效量Minimum lethal dose, 最小致死量Minimum variance estimator, 最小方差估计量MINITAB, 统计软件包Minor heading, 宾词标目Missing data, 缺失值Model specification, 模型的确定Modeling Statistics , 模型统计Models for outliers, 离群值模型Modifying the model, 模型的修正Modulus of continuity, 连续性模Morbidity, 发病率Most favorable configuration, 最有利构形Multidimensional Scaling (ASCAL), 多维尺度/多维标度Multinomial Logistic Regression , 多项逻辑斯蒂回归Multiple comparison, 多重比较Multiple correlation , 复相关Multiple covariance, 多元协方差Multiple linear regression, 多元线性回归Multiple response , 多重选项Multiple solutions, 多解Multiplication theorem, 乘法定理Multiresponse, 多元响应Multi-stage sampling, 多阶段抽样Multivariate T distribution, 多元T分布Mutual exclusive, 互不相容Mutual independence, 互相独立Natural boundary, 自然边界Natural dead, 自然死亡Natural zero, 自然零Negative correlation, 负相关Negative linear correlation, 负线性相关Negatively skewed, 负偏Newman-Keuls method, q检验NK method, q检验No statistical significance, 无统计意义Nominal variable, 名义变量Nonconstancy of variability, 变异的非定常性Nonlinear regression, 非线性相关Nonparametric statistics, 非参数统计Nonparametric test, 非参数检验Nonparametric tests, 非参数检验Normal deviate, 正态离差Normal distribution, 正态分布Normal equation, 正规方程组Normal ranges, 正常范围Normal value, 正常值Nuisance parameter, 多余参数/讨厌参数Null hypothesis, 无效假设Numerical variable, 数值变量Objective function, 目标函数Observation unit, 观察单位Observed value, 观察值One sided test, 单侧检验One-way analysis of variance, 单因素方差分析Oneway ANOVA , 单因素方差分析Open sequential trial, 开放型序贯设计Optrim, 优切尾Optrim efficiency, 优切尾效率Order statistics, 顺序统计量Ordered categories, 有序分类Ordinal logistic regression , 序数逻辑斯蒂回归Ordinal variable, 有序变量Orthogonal basis, 正交基Orthogonal design, 正交试验设计Orthogonality conditions, 正交条件ORTHOPLAN, 正交设计Outlier cutoffs, 离群值截断点Outliers, 极端值OVERALS , 多组变量的非线性正规相关Overshoot, 迭代过度Paired design, 配对设计Paired sample, 配对样本Pairwise slopes, 成对斜率Parabola, 抛物线Parallel tests, 平行试验Parameter, 参数Parametric statistics, 参数统计Parametric test, 参数检验Partial correlation, 偏相关Partial regression, 偏回归Partial sorting, 偏排序Partials residuals, 偏残差Pattern, 模式Pearson curves, 皮尔逊曲线Peeling, 退层Percent bar graph, 百分条形图Percentage, 百分比Percentile, 百分位数Percentile curves, 百分位曲线Periodicity, 周期性Permutation, 排列P-estimator, P估计量Pie graph, 饼图Pitman estimator, 皮特曼估计量Pivot, 枢轴量Planar, 平坦Planar assumption, 平面的假设PLANCARDS, 生成试验的计划卡Point estimation, 点估计Poisson distribution, 泊松分布Polishing, 平滑Polled standard deviation, 合并标准差Polled variance, 合并方差Polygon, 多边图Polynomial, 多项式Polynomial curve, 多项式曲线Population, 总体Population attributable risk, 人群归因危险度Positive correlation, 正相关Positively skewed, 正偏Posterior distribution, 后验分布Power of a test, 检验效能Precision, 精密度Predicted value, 预测值Preliminary analysis, 预备性分析Principal component analysis, 主成分分析Prior distribution, 先验分布Prior probability, 先验概率Probabilistic model, 概率模型probability, 概率Probability density, 概率密度Product moment, 乘积矩/协方差Profile trace, 截面迹图Proportion, 比/构成比Proportion allocation in stratified random sampling, 按比例分层随机抽样Proportionate, 成比例Proportionate sub-class numbers, 成比例次级组含量Prospective study, 前瞻性调查Proximities, 亲近性Pseudo F test, 近似F检验Pseudo model, 近似模型Pseudosigma, 伪标准差Purposive sampling, 有目的抽样QR decomposition, QR分解Quadratic approximation, 二次近似Qualitative classification, 属性分类Qualitative method, 定性方法Quantile-quantile plot, 分位数-分位数图/Q-Q图Quantitative analysis, 定量分析Quartile, 四分位数Quick Cluster, 快速聚类Radix sort, 基数排序Random allocation, 随机化分组Random blocks design, 随机区组设计Random event, 随机事件Randomization, 随机化Range, 极差/全距Rank correlation, 等级相关Rank sum test, 秩和检验Rank test, 秩检验Ranked data, 等级资料Rate, 比率Ratio, 比例Raw data, 原始资料Raw residual, 原始残差Rayleigh's test, 雷氏检验Rayleigh's Z, 雷氏Z值Reciprocal, 倒数Reciprocal transformation, 倒数变换Recording, 记录Redescending estimators, 回降估计量Reducing dimensions, 降维Re-expression, 重新表达Reference set, 标准组Region of acceptance, 接受域Regression coefficient, 回归系数Regression sum of square, 回归平方和Rejection point, 拒绝点Relative dispersion, 相对离散度Relative number, 相对数Reliability, 可靠性Reparametrization, 重新设置参数Replication, 重复Report Summaries, 报告摘要Residual sum of square, 剩余平方和Resistance, 耐抗性Resistant line, 耐抗线Resistant technique, 耐抗技术R-estimator of location, 位置R估计量R-estimator of scale, 尺度R估计量Retrospective study, 回顾性调查Ridge trace, 岭迹Ridit analysis, Ridit分析Rotation, 旋转Rounding, 舍入Row, 行Row effects, 行效应Row factor, 行因素RXC table, RXC表Sample, 样本Sample regression coefficient, 样本回归系数Sample size, 样本量Sample standard deviation, 样本标准差Sampling error, 抽样误差SAS(Statistical analysis system ), SAS统计软件包Scale, 尺度/量表Scatter diagram, 散点图Schematic plot, 示意图/简图Score test, 计分检验Screening, 筛检SEASON, 季节分析Second derivative, 二阶导数Second principal component, 第二主成分SEM (Structural equation modeling), 结构化方程模型Semi-logarithmic graph, 半对数图Semi-logarithmic paper, 半对数格纸Sensitivity curve, 敏感度曲线Sequential analysis, 贯序分析Sequential data set, 顺序数据集Sequential design, 贯序设计Sequential method, 贯序法Sequential test, 贯序检验法Serial tests, 系列试验Short-cut method, 简捷法Sigmoid curve, S形曲线Sign function, 正负号函数Sign test, 符号检验Signed rank, 符号秩Significance test, 显著性检验Significant figure, 有效数字Simple cluster sampling, 简单整群抽样Simple correlation, 简单相关Simple random sampling, 简单随机抽样Simple regression, 简单回归simple table, 简单表Sine estimator, 正弦估计量Single-valued estimate, 单值估计Singular matrix, 奇异矩阵Skewed distribution, 偏斜分布Skewness, 偏度Slash distribution, 斜线分布Slope, 斜率Smirnov test, 斯米尔诺夫检验Source of variation, 变异来源Spearman rank correlation, 斯皮尔曼等级相关Specific factor, 特殊因子Specific factor variance, 特殊因子方差Spectra , 频谱Spherical distribution, 球型正态分布Spread, 展布SPSS(Statistical package for the social science), SPSS统计软件包Spurious correlation, 假性相关Square root transformation, 平方根变换Stabilizing variance, 稳定方差Standard deviation, 标准差Standard error, 标准误Standard error of difference, 差别的标准误Standard error of estimate, 标准估计误差Standard error of rate, 率的标准误Standard normal distribution, 标准正态分布Standardization, 标准化Starting value, 起始值Statistic, 统计量Statistical control, 统计控制Statistical graph, 统计图Statistical inference, 统计推断Statistical table, 统计表Steepest descent, 最速下降法Stem and leaf display, 茎叶图Step factor, 步长因子Stepwise regression, 逐步回归Storage, 存Strata, 层(复数)Stratified sampling, 分层抽样Stratified sampling, 分层抽样Strength, 强度Stringency, 严密性Structural relationship, 结构关系Studentized residual, 学生化残差/t化残差Sub-class numbers, 次级组含量Subdividing, 分割Sufficient statistic, 充分统计量Sum of products, 积和Sum of squares, 离差平方和Sum of squares about regression, 回归平方和Sum of squares between groups, 组间平方和Sum of squares of partial regression, 偏回归平方和Sure event, 必然事件Survey, 调查Survival, 生存分析Survival rate, 生存率Suspended root gram, 悬吊根图Symmetry, 对称Systematic error, 系统误差Systematic sampling, 系统抽样Tags, 标签Tail area, 尾部面积Tail length, 尾长Tail weight, 尾重Tangent line, 切线Target distribution, 目标分布Taylor series, 泰勒级数Tendency of dispersion, 离散趋势Testing of hypotheses, 假设检验Theoretical frequency, 理论频数Time series, 时间序列Tolerance interval, 容忍区间Tolerance lower limit, 容忍下限Tolerance upper limit, 容忍上限Torsion, 扰率Total sum of square, 总平方和Total variation, 总变异Transformation, 转换Treatment, 处理Trend, 趋势Trend of percentage, 百分比趋势Trial, 试验Trial and error method, 试错法Tuning constant, 细调常数Two sided test, 双向检验Two-stage least squares, 二阶最小平方Two-stage sampling, 二阶段抽样Two-tailed test, 双侧检验Two-way analysis of variance, 双因素方差分析Two-way table, 双向表Type I error, 一类错误/α错误Type II error, 二类错误/β错误UMVU, 方差一致最小无偏估计简称Unbiased estimate, 无偏估计Unconstrained nonlinear regression , 无约束非线性回归Unequal subclass number, 不等次级组含量Ungrouped data, 不分组资料Uniform coordinate, 均匀坐标Uniform distribution, 均匀分布Uniformly minimum variance unbiased estimate, 方差一致最小无偏估计Unit, 单元Unordered categories, 无序分类Upper limit, 上限Upward rank, 升秩Vague concept, 模糊概念Validity, 有效性VARCOMP (Variance component estimation), 方差元素估计Variability, 变异性Variable, 变量Variance, 方差Variation, 变异Varimax orthogonal rotation, 方差最大正交旋转Volume of distribution, 容积W test, W检验Weibull distribution, 威布尔分布Weight, 权数Weighted Chi-square test, 加权卡方检验/Cochran检验Weighted linear regression method, 加权直线回归Weighted mean, 加权平均数Weighted mean square, 加权平均方差Weighted sum of square, 加权平方和Weighting coefficient, 权重系数Weighting method, 加权法W-estimation, W估计量W-estimation of location, 位置W估计量Width, 宽度Wilcoxon paired test, 威斯康星配对法/配对符号秩和检验Wild point, 野点/狂点Wild value, 野值/狂值Winsorized mean, 缩尾均值Withdraw, 失访Youden's index, 尤登指数Z test, Z检验Zero correlation, 零相关Z-transformation, Z变换。
《神经网络与深度学习综述DeepLearning15May2014

Draft:Deep Learning in Neural Networks:An OverviewTechnical Report IDSIA-03-14/arXiv:1404.7828(v1.5)[cs.NE]J¨u rgen SchmidhuberThe Swiss AI Lab IDSIAIstituto Dalle Molle di Studi sull’Intelligenza ArtificialeUniversity of Lugano&SUPSIGalleria2,6928Manno-LuganoSwitzerland15May2014AbstractIn recent years,deep artificial neural networks(including recurrent ones)have won numerous con-tests in pattern recognition and machine learning.This historical survey compactly summarises relevantwork,much of it from the previous millennium.Shallow and deep learners are distinguished by thedepth of their credit assignment paths,which are chains of possibly learnable,causal links between ac-tions and effects.I review deep supervised learning(also recapitulating the history of backpropagation),unsupervised learning,reinforcement learning&evolutionary computation,and indirect search for shortprograms encoding deep and large networks.PDF of earlier draft(v1):http://www.idsia.ch/∼juergen/DeepLearning30April2014.pdfLATEX source:http://www.idsia.ch/∼juergen/DeepLearning30April2014.texComplete BIBTEXfile:http://www.idsia.ch/∼juergen/bib.bibPrefaceThis is the draft of an invited Deep Learning(DL)overview.One of its goals is to assign credit to those who contributed to the present state of the art.I acknowledge the limitations of attempting to achieve this goal.The DL research community itself may be viewed as a continually evolving,deep network of scientists who have influenced each other in complex ways.Starting from recent DL results,I tried to trace back the origins of relevant ideas through the past half century and beyond,sometimes using“local search”to follow citations of citations backwards in time.Since not all DL publications properly acknowledge earlier relevant work,additional global search strategies were employed,aided by consulting numerous neural network experts.As a result,the present draft mostly consists of references(about800entries so far).Nevertheless,through an expert selection bias I may have missed important work.A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century.For these reasons,the present draft should be viewed as merely a snapshot of an ongoing credit assignment process.To help improve it,please do not hesitate to send corrections and suggestions to juergen@idsia.ch.Contents1Introduction to Deep Learning(DL)in Neural Networks(NNs)3 2Event-Oriented Notation for Activation Spreading in FNNs/RNNs3 3Depth of Credit Assignment Paths(CAPs)and of Problems4 4Recurring Themes of Deep Learning54.1Dynamic Programming(DP)for DL (5)4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL (6)4.3Occam’s Razor:Compression and Minimum Description Length(MDL) (6)4.4Learning Hierarchical Representations Through Deep SL,UL,RL (6)4.5Fast Graphics Processing Units(GPUs)for DL in NNs (6)5Supervised NNs,Some Helped by Unsupervised NNs75.11940s and Earlier (7)5.2Around1960:More Neurobiological Inspiration for DL (7)5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) (8)5.41979:Convolution+Weight Replication+Winner-Take-All(WTA) (8)5.51960-1981and Beyond:Development of Backpropagation(BP)for NNs (8)5.5.1BP for Weight-Sharing Feedforward NNs(FNNs)and Recurrent NNs(RNNs)..95.6Late1980s-2000:Numerous Improvements of NNs (9)5.6.1Ideas for Dealing with Long Time Lags and Deep CAPs (10)5.6.2Better BP Through Advanced Gradient Descent (10)5.6.3Discovering Low-Complexity,Problem-Solving NNs (11)5.6.4Potential Benefits of UL for SL (11)5.71987:UL Through Autoencoder(AE)Hierarchies (12)5.81989:BP for Convolutional NNs(CNNs) (13)5.91991:Fundamental Deep Learning Problem of Gradient Descent (13)5.101991:UL-Based History Compression Through a Deep Hierarchy of RNNs (14)5.111992:Max-Pooling(MP):Towards MPCNNs (14)5.121994:Contest-Winning Not So Deep NNs (15)5.131995:Supervised Recurrent Very Deep Learner(LSTM RNN) (15)5.142003:More Contest-Winning/Record-Setting,Often Not So Deep NNs (16)5.152006/7:Deep Belief Networks(DBNs)&AE Stacks Fine-Tuned by BP (17)5.162006/7:Improved CNNs/GPU-CNNs/BP-Trained MPCNNs (17)5.172009:First Official Competitions Won by RNNs,and with MPCNNs (18)5.182010:Plain Backprop(+Distortions)on GPU Yields Excellent Results (18)5.192011:MPCNNs on GPU Achieve Superhuman Vision Performance (18)5.202011:Hessian-Free Optimization for RNNs (19)5.212012:First Contests Won on ImageNet&Object Detection&Segmentation (19)5.222013-:More Contests and Benchmark Records (20)5.22.1Currently Successful Supervised Techniques:LSTM RNNs/GPU-MPCNNs (21)5.23Recent Tricks for Improving SL Deep NNs(Compare Sec.5.6.2,5.6.3) (21)5.24Consequences for Neuroscience (22)5.25DL with Spiking Neurons? (22)6DL in FNNs and RNNs for Reinforcement Learning(RL)236.1RL Through NN World Models Yields RNNs With Deep CAPs (23)6.2Deep FNNs for Traditional RL and Markov Decision Processes(MDPs) (24)6.3Deep RL RNNs for Partially Observable MDPs(POMDPs) (24)6.4RL Facilitated by Deep UL in FNNs and RNNs (25)6.5Deep Hierarchical RL(HRL)and Subgoal Learning with FNNs and RNNs (25)6.6Deep RL by Direct NN Search/Policy Gradients/Evolution (25)6.7Deep RL by Indirect Policy Search/Compressed NN Search (26)6.8Universal RL (27)7Conclusion271Introduction to Deep Learning(DL)in Neural Networks(NNs) Which modifiable components of a learning system are responsible for its success or failure?What changes to them improve performance?This has been called the fundamental credit assignment problem(Minsky, 1963).There are general credit assignment methods for universal problem solvers that are time-optimal in various theoretical senses(Sec.6.8).The present survey,however,will focus on the narrower,but now commercially important,subfield of Deep Learning(DL)in Artificial Neural Networks(NNs).We are interested in accurate credit assignment across possibly many,often nonlinear,computational stages of NNs.Shallow NN-like models have been around for many decades if not centuries(Sec.5.1).Models with several successive nonlinear layers of neurons date back at least to the1960s(Sec.5.3)and1970s(Sec.5.5). An efficient gradient descent method for teacher-based Supervised Learning(SL)in discrete,differentiable networks of arbitrary depth called backpropagation(BP)was developed in the1960s and1970s,and ap-plied to NNs in1981(Sec.5.5).BP-based training of deep NNs with many layers,however,had been found to be difficult in practice by the late1980s(Sec.5.6),and had become an explicit research subject by the early1990s(Sec.5.9).DL became practically feasible to some extent through the help of Unsupervised Learning(UL)(e.g.,Sec.5.10,5.15).The1990s and2000s also saw many improvements of purely super-vised DL(Sec.5).In the new millennium,deep NNs havefinally attracted wide-spread attention,mainly by outperforming alternative machine learning methods such as kernel machines(Vapnik,1995;Sch¨o lkopf et al.,1998)in numerous important applications.In fact,supervised deep NNs have won numerous of-ficial international pattern recognition competitions(e.g.,Sec.5.17,5.19,5.21,5.22),achieving thefirst superhuman visual pattern recognition results in limited domains(Sec.5.19).Deep NNs also have become relevant for the more generalfield of Reinforcement Learning(RL)where there is no supervising teacher (Sec.6).Both feedforward(acyclic)NNs(FNNs)and recurrent(cyclic)NNs(RNNs)have won contests(Sec.5.12,5.14,5.17,5.19,5.21,5.22).In a sense,RNNs are the deepest of all NNs(Sec.3)—they are general computers more powerful than FNNs,and can in principle create and process memories of ar-bitrary sequences of input patterns(e.g.,Siegelmann and Sontag,1991;Schmidhuber,1990a).Unlike traditional methods for automatic sequential program synthesis(e.g.,Waldinger and Lee,1969;Balzer, 1985;Soloway,1986;Deville and Lau,1994),RNNs can learn programs that mix sequential and parallel information processing in a natural and efficient way,exploiting the massive parallelism viewed as crucial for sustaining the rapid decline of computation cost observed over the past75years.The rest of this paper is structured as follows.Sec.2introduces a compact,event-oriented notation that is simple yet general enough to accommodate both FNNs and RNNs.Sec.3introduces the concept of Credit Assignment Paths(CAPs)to measure whether learning in a given NN application is of the deep or shallow type.Sec.4lists recurring themes of DL in SL,UL,and RL.Sec.5focuses on SL and UL,and on how UL can facilitate SL,although pure SL has become dominant in recent competitions(Sec.5.17-5.22). Sec.5is arranged in a historical timeline format with subsections on important inspirations and technical contributions.Sec.6on deep RL discusses traditional Dynamic Programming(DP)-based RL combined with gradient-based search techniques for SL or UL in deep NNs,as well as general methods for direct and indirect search in the weight space of deep FNNs and RNNs,including successful policy gradient and evolutionary methods.2Event-Oriented Notation for Activation Spreading in FNNs/RNNs Throughout this paper,let i,j,k,t,p,q,r denote positive integer variables assuming ranges implicit in the given contexts.Let n,m,T denote positive integer constants.An NN’s topology may change over time(e.g.,Fahlman,1991;Ring,1991;Weng et al.,1992;Fritzke, 1994).At any given moment,it can be described as afinite subset of units(or nodes or neurons)N= {u1,u2,...,}and afinite set H⊆N×N of directed edges or connections between nodes.FNNs are acyclic graphs,RNNs cyclic.Thefirst(input)layer is the set of input units,a subset of N.In FNNs,the k-th layer(k>1)is the set of all nodes u∈N such that there is an edge path of length k−1(but no longer path)between some input unit and u.There may be shortcut connections between distant layers.The NN’s behavior or program is determined by a set of real-valued,possibly modifiable,parameters or weights w i(i=1,...,n).We now focus on a singlefinite episode or epoch of information processing and activation spreading,without learning through weight changes.The following slightly unconventional notation is designed to compactly describe what is happening during the runtime of the system.During an episode,there is a partially causal sequence x t(t=1,...,T)of real values that I call events.Each x t is either an input set by the environment,or the activation of a unit that may directly depend on other x k(k<t)through a current NN topology-dependent set in t of indices k representing incoming causal connections or links.Let the function v encode topology information and map such event index pairs(k,t)to weight indices.For example,in the non-input case we may have x t=f t(net t)with real-valued net t= k∈in t x k w v(k,t)(additive case)or net t= k∈in t x k w v(k,t)(multiplicative case), where f t is a typically nonlinear real-valued activation function such as tanh.In many recent competition-winning NNs(Sec.5.19,5.21,5.22)there also are events of the type x t=max k∈int (x k);some networktypes may also use complex polynomial activation functions(Sec.5.3).x t may directly affect certain x k(k>t)through outgoing connections or links represented through a current set out t of indices k with t∈in k.Some non-input events are called output events.Note that many of the x t may refer to different,time-varying activations of the same unit in sequence-processing RNNs(e.g.,Williams,1989,“unfolding in time”),or also in FNNs sequentially exposed to time-varying input patterns of a large training set encoded as input events.During an episode,the same weight may get reused over and over again in topology-dependent ways,e.g.,in RNNs,or in convolutional NNs(Sec.5.4,5.8).I call this weight sharing across space and/or time.Weight sharing may greatly reduce the NN’s descriptive complexity,which is the number of bits of information required to describe the NN (Sec.4.3).In Supervised Learning(SL),certain NN output events x t may be associated with teacher-given,real-valued labels or targets d t yielding errors e t,e.g.,e t=1/2(x t−d t)2.A typical goal of supervised NN training is tofind weights that yield episodes with small total error E,the sum of all such e t.The hope is that the NN will generalize well in later episodes,causing only small errors on previously unseen sequences of input events.Many alternative error functions for SL and UL are possible.SL assumes that input events are independent of earlier output events(which may affect the environ-ment through actions causing subsequent perceptions).This assumption does not hold in the broaderfields of Sequential Decision Making and Reinforcement Learning(RL)(Kaelbling et al.,1996;Sutton and Barto, 1998;Hutter,2005)(Sec.6).In RL,some of the input events may encode real-valued reward signals given by the environment,and a typical goal is tofind weights that yield episodes with a high sum of reward signals,through sequences of appropriate output actions.Sec.5.5will use the notation above to compactly describe a central algorithm of DL,namely,back-propagation(BP)for supervised weight-sharing FNNs and RNNs.(FNNs may be viewed as RNNs with certainfixed zero weights.)Sec.6will address the more general RL case.3Depth of Credit Assignment Paths(CAPs)and of ProblemsTo measure whether credit assignment in a given NN application is of the deep or shallow type,I introduce the concept of Credit Assignment Paths or CAPs,which are chains of possibly causal links between events.Let usfirst focus on SL.Consider two events x p and x q(1≤p<q≤T).Depending on the appli-cation,they may have a Potential Direct Causal Connection(PDCC)expressed by the Boolean predicate pdcc(p,q),which is true if and only if p∈in q.Then the2-element list(p,q)is defined to be a CAP from p to q(a minimal one).A learning algorithm may be allowed to change w v(p,q)to improve performance in future episodes.More general,possibly indirect,Potential Causal Connections(PCC)are expressed by the recursively defined Boolean predicate pcc(p,q),which in the SL case is true only if pdcc(p,q),or if pcc(p,k)for some k and pdcc(k,q).In the latter case,appending q to any CAP from p to k yields a CAP from p to q(this is a recursive definition,too).The set of such CAPs may be large but isfinite.Note that the same weight may affect many different PDCCs between successive events listed by a given CAP,e.g.,in the case of RNNs, or weight-sharing FNNs.Suppose a CAP has the form(...,k,t,...,q),where k and t(possibly t=q)are thefirst successive elements with modifiable w v(k,t).Then the length of the suffix list(t,...,q)is called the CAP’s depth (which is0if there are no modifiable links at all).This depth limits how far backwards credit assignment can move down the causal chain tofind a modifiable weight.1Suppose an episode and its event sequence x1,...,x T satisfy a computable criterion used to decide whether a given problem has been solved(e.g.,total error E below some threshold).Then the set of used weights is called a solution to the problem,and the depth of the deepest CAP within the sequence is called the solution’s depth.There may be other solutions(yielding different event sequences)with different depths.Given somefixed NN topology,the smallest depth of any solution is called the problem’s depth.Sometimes we also speak of the depth of an architecture:SL FNNs withfixed topology imply a problem-independent maximal problem depth bounded by the number of non-input layers.Certain SL RNNs withfixed weights for all connections except those to output units(Jaeger,2001;Maass et al.,2002; Jaeger,2004;Schrauwen et al.,2007)have a maximal problem depth of1,because only thefinal links in the corresponding CAPs are modifiable.In general,however,RNNs may learn to solve problems of potentially unlimited depth.Note that the definitions above are solely based on the depths of causal chains,and agnostic of the temporal distance between events.For example,shallow FNNs perceiving large“time windows”of in-put events may correctly classify long input sequences through appropriate output events,and thus solve shallow problems involving long time lags between relevant events.At which problem depth does Shallow Learning end,and Deep Learning begin?Discussions with DL experts have not yet yielded a conclusive response to this question.Instead of committing myself to a precise answer,let me just define for the purposes of this overview:problems of depth>10require Very Deep Learning.The difficulty of a problem may have little to do with its depth.Some NNs can quickly learn to solve certain deep problems,e.g.,through random weight guessing(Sec.5.9)or other types of direct search (Sec.6.6)or indirect search(Sec.6.7)in weight space,or through training an NNfirst on shallow problems whose solutions may then generalize to deep problems,or through collapsing sequences of(non)linear operations into a single(non)linear operation—but see an analysis of non-trivial aspects of deep linear networks(Baldi and Hornik,1994,Section B).In general,however,finding an NN that precisely models a given training set is an NP-complete problem(Judd,1990;Blum and Rivest,1992),also in the case of deep NNs(S´ıma,1994;de Souto et al.,1999;Windisch,2005);compare a survey of negative results(S´ıma, 2002,Section1).Above we have focused on SL.In the more general case of RL in unknown environments,pcc(p,q) is also true if x p is an output event and x q any later input event—any action may affect the environment and thus any later perception.(In the real world,the environment may even influence non-input events computed on a physical hardware entangled with the entire universe,but this is ignored here.)It is possible to model and replace such unmodifiable environmental PCCs through a part of the NN that has already learned to predict(through some of its units)input events(including reward signals)from former input events and actions(Sec.6.1).Its weights are frozen,but can help to assign credit to other,still modifiable weights used to compute actions(Sec.6.1).This approach may lead to very deep CAPs though.Some DL research is about automatically rephrasing problems such that their depth is reduced(Sec.4). In particular,sometimes UL is used to make SL problems less deep,e.g.,Sec.5.10.Often Dynamic Programming(Sec.4.1)is used to facilitate certain traditional RL problems,e.g.,Sec.6.2.Sec.5focuses on CAPs for SL,Sec.6on the more complex case of RL.4Recurring Themes of Deep Learning4.1Dynamic Programming(DP)for DLOne recurring theme of DL is Dynamic Programming(DP)(Bellman,1957),which can help to facili-tate credit assignment under certain assumptions.For example,in SL NNs,backpropagation itself can 1An alternative would be to count only modifiable links when measuring depth.In many typical NN applications this would not make a difference,but in some it would,e.g.,Sec.6.1.be viewed as a DP-derived method(Sec.5.5).In traditional RL based on strong Markovian assumptions, DP-derived methods can help to greatly reduce problem depth(Sec.6.2).DP algorithms are also essen-tial for systems that combine concepts of NNs and graphical models,such as Hidden Markov Models (HMMs)(Stratonovich,1960;Baum and Petrie,1966)and Expectation Maximization(EM)(Dempster et al.,1977),e.g.,(Bottou,1991;Bengio,1991;Bourlard and Morgan,1994;Baldi and Chauvin,1996; Jordan and Sejnowski,2001;Bishop,2006;Poon and Domingos,2011;Dahl et al.,2012;Hinton et al., 2012a).4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL Another recurring theme is how UL can facilitate both SL(Sec.5)and RL(Sec.6).UL(Sec.5.6.4) is normally used to encode raw incoming data such as video or speech streams in a form that is more convenient for subsequent goal-directed learning.In particular,codes that describe the original data in a less redundant or more compact way can be fed into SL(Sec.5.10,5.15)or RL machines(Sec.6.4),whose search spaces may thus become smaller(and whose CAPs shallower)than those necessary for dealing with the raw data.UL is closely connected to the topics of regularization and compression(Sec.4.3,5.6.3). 4.3Occam’s Razor:Compression and Minimum Description Length(MDL) Occam’s razor favors simple solutions over complex ones.Given some programming language,the prin-ciple of Minimum Description Length(MDL)can be used to measure the complexity of a solution candi-date by the length of the shortest program that computes it(e.g.,Solomonoff,1964;Kolmogorov,1965b; Chaitin,1966;Wallace and Boulton,1968;Levin,1973a;Rissanen,1986;Blumer et al.,1987;Li and Vit´a nyi,1997;Gr¨u nwald et al.,2005).Some methods explicitly take into account program runtime(Al-lender,1992;Watanabe,1992;Schmidhuber,2002,1995);many consider only programs with constant runtime,written in non-universal programming languages(e.g.,Rissanen,1986;Hinton and van Camp, 1993).In the NN case,the MDL principle suggests that low NN weight complexity corresponds to high NN probability in the Bayesian view(e.g.,MacKay,1992;Buntine and Weigend,1991;De Freitas,2003), and to high generalization performance(e.g.,Baum and Haussler,1989),without overfitting the training data.Many methods have been proposed for regularizing NNs,that is,searching for solution-computing, low-complexity SL NNs(Sec.5.6.3)and RL NNs(Sec.6.7).This is closely related to certain UL methods (Sec.4.2,5.6.4).4.4Learning Hierarchical Representations Through Deep SL,UL,RLMany methods of Good Old-Fashioned Artificial Intelligence(GOFAI)(Nilsson,1980)as well as more recent approaches to AI(Russell et al.,1995)and Machine Learning(Mitchell,1997)learn hierarchies of more and more abstract data representations.For example,certain methods of syntactic pattern recog-nition(Fu,1977)such as grammar induction discover hierarchies of formal rules to model observations. The partially(un)supervised Automated Mathematician/EURISKO(Lenat,1983;Lenat and Brown,1984) continually learns concepts by combining previously learnt concepts.Such hierarchical representation learning(Ring,1994;Bengio et al.,2013;Deng and Yu,2014)is also a recurring theme of DL NNs for SL (Sec.5),UL-aided SL(Sec.5.7,5.10,5.15),and hierarchical RL(Sec.6.5).Often,abstract hierarchical representations are natural by-products of data compression(Sec.4.3),e.g.,Sec.5.10.4.5Fast Graphics Processing Units(GPUs)for DL in NNsWhile the previous millennium saw several attempts at creating fast NN-specific hardware(e.g.,Jackel et al.,1990;Faggin,1992;Ramacher et al.,1993;Widrow et al.,1994;Heemskerk,1995;Korkin et al., 1997;Urlbe,1999),and at exploiting standard hardware(e.g.,Anguita et al.,1994;Muller et al.,1995; Anguita and Gomes,1996),the new millennium brought a DL breakthrough in form of cheap,multi-processor graphics cards or GPUs.GPUs are widely used for video games,a huge and competitive market that has driven down hardware prices.GPUs excel at fast matrix and vector multiplications required not only for convincing virtual realities but also for NN training,where they can speed up learning by a factorof50and more.Some of the GPU-based FNN implementations(Sec.5.16-5.19)have greatly contributed to recent successes in contests for pattern recognition(Sec.5.19-5.22),image segmentation(Sec.5.21), and object detection(Sec.5.21-5.22).5Supervised NNs,Some Helped by Unsupervised NNsThe main focus of current practical applications is on Supervised Learning(SL),which has dominated re-cent pattern recognition contests(Sec.5.17-5.22).Several methods,however,use additional Unsupervised Learning(UL)to facilitate SL(Sec.5.7,5.10,5.15).It does make sense to treat SL and UL in the same section:often gradient-based methods,such as BP(Sec.5.5.1),are used to optimize objective functions of both UL and SL,and the boundary between SL and UL may blur,for example,when it comes to time series prediction and sequence classification,e.g.,Sec.5.10,5.12.A historical timeline format will help to arrange subsections on important inspirations and techni-cal contributions(although such a subsection may span a time interval of many years).Sec.5.1briefly mentions early,shallow NN models since the1940s,Sec.5.2additional early neurobiological inspiration relevant for modern Deep Learning(DL).Sec.5.3is about GMDH networks(since1965),perhaps thefirst (feedforward)DL systems.Sec.5.4is about the relatively deep Neocognitron NN(1979)which is similar to certain modern deep FNN architectures,as it combines convolutional NNs(CNNs),weight pattern repli-cation,and winner-take-all(WTA)mechanisms.Sec.5.5uses the notation of Sec.2to compactly describe a central algorithm of DL,namely,backpropagation(BP)for supervised weight-sharing FNNs and RNNs. It also summarizes the history of BP1960-1981and beyond.Sec.5.6describes problems encountered in the late1980s with BP for deep NNs,and mentions several ideas from the previous millennium to overcome them.Sec.5.7discusses afirst hierarchical stack of coupled UL-based Autoencoders(AEs)—this concept resurfaced in the new millennium(Sec.5.15).Sec.5.8is about applying BP to CNNs,which is important for today’s DL applications.Sec.5.9explains BP’s Fundamental DL Problem(of vanishing/exploding gradients)discovered in1991.Sec.5.10explains how a deep RNN stack of1991(the History Compressor) pre-trained by UL helped to solve previously unlearnable DL benchmarks requiring Credit Assignment Paths(CAPs,Sec.3)of depth1000and more.Sec.5.11discusses a particular WTA method called Max-Pooling(MP)important in today’s DL FNNs.Sec.5.12mentions afirst important contest won by SL NNs in1994.Sec.5.13describes a purely supervised DL RNN(Long Short-Term Memory,LSTM)for problems of depth1000and more.Sec.5.14mentions an early contest of2003won by an ensemble of shallow NNs, as well as good pattern recognition results with CNNs and LSTM RNNs(2003).Sec.5.15is mostly about Deep Belief Networks(DBNs,2006)and related stacks of Autoencoders(AEs,Sec.5.7)pre-trained by UL to facilitate BP-based SL.Sec.5.16mentions thefirst BP-trained MPCNNs(2007)and GPU-CNNs(2006). Sec.5.17-5.22focus on official competitions with secret test sets won by(mostly purely supervised)DL NNs since2009,in sequence recognition,image classification,image segmentation,and object detection. Many RNN results depended on LSTM(Sec.5.13);many FNN results depended on GPU-based FNN code developed since2004(Sec.5.16,5.17,5.18,5.19),in particular,GPU-MPCNNs(Sec.5.19).5.11940s and EarlierNN research started in the1940s(e.g.,McCulloch and Pitts,1943;Hebb,1949);compare also later work on learning NNs(Rosenblatt,1958,1962;Widrow and Hoff,1962;Grossberg,1969;Kohonen,1972; von der Malsburg,1973;Narendra and Thathatchar,1974;Willshaw and von der Malsburg,1976;Palm, 1980;Hopfield,1982).In a sense NNs have been around even longer,since early supervised NNs were essentially variants of linear regression methods going back at least to the early1800s(e.g.,Legendre, 1805;Gauss,1809,1821).Early NNs had a maximal CAP depth of1(Sec.3).5.2Around1960:More Neurobiological Inspiration for DLSimple cells and complex cells were found in the cat’s visual cortex(e.g.,Hubel and Wiesel,1962;Wiesel and Hubel,1959).These cellsfire in response to certain properties of visual sensory inputs,such as theorientation of plex cells exhibit more spatial invariance than simple cells.This inspired later deep NN architectures(Sec.5.4)used in certain modern award-winning Deep Learners(Sec.5.19-5.22).5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) Networks trained by the Group Method of Data Handling(GMDH)(Ivakhnenko and Lapa,1965; Ivakhnenko et al.,1967;Ivakhnenko,1968,1971)were perhaps thefirst DL systems of the Feedforward Multilayer Perceptron type.The units of GMDH nets may have polynomial activation functions imple-menting Kolmogorov-Gabor polynomials(more general than traditional NN activation functions).Given a training set,layers are incrementally grown and trained by regression analysis,then pruned with the help of a separate validation set(using today’s terminology),where Decision Regularisation is used to weed out superfluous units.The numbers of layers and units per layer can be learned in problem-dependent fashion. This is a good example of hierarchical representation learning(Sec.4.4).There have been numerous ap-plications of GMDH-style networks,e.g.(Ikeda et al.,1976;Farlow,1984;Madala and Ivakhnenko,1994; Ivakhnenko,1995;Kondo,1998;Kord´ık et al.,2003;Witczak et al.,2006;Kondo and Ueno,2008).5.41979:Convolution+Weight Replication+Winner-Take-All(WTA)Apart from deep GMDH networks(Sec.5.3),the Neocognitron(Fukushima,1979,1980,2013a)was per-haps thefirst artificial NN that deserved the attribute deep,and thefirst to incorporate the neurophysiolog-ical insights of Sec.5.2.It introduced convolutional NNs(today often called CNNs or convnets),where the(typically rectangular)receptivefield of a convolutional unit with given weight vector is shifted step by step across a2-dimensional array of input values,such as the pixels of an image.The resulting2D array of subsequent activation events of this unit can then provide inputs to higher-level units,and so on.Due to massive weight replication(Sec.2),relatively few parameters may be necessary to describe the behavior of such a convolutional layer.Competition layers have WTA subsets whose maximally active units are the only ones to adopt non-zero activation values.They essentially“down-sample”the competition layer’s input.This helps to create units whose responses are insensitive to small image shifts(compare Sec.5.2).The Neocognitron is very similar to the architecture of modern,contest-winning,purely super-vised,feedforward,gradient-based Deep Learners with alternating convolutional and competition lay-ers(e.g.,Sec.5.19-5.22).Fukushima,however,did not set the weights by supervised backpropagation (Sec.5.5,5.8),but by local un supervised learning rules(e.g.,Fukushima,2013b),or by pre-wiring.In that sense he did not care for the DL problem(Sec.5.9),although his architecture was comparatively deep indeed.He also used Spatial Averaging(Fukushima,1980,2011)instead of Max-Pooling(MP,Sec.5.11), currently a particularly convenient and popular WTA mechanism.Today’s CNN-based DL machines profita lot from later CNN work(e.g.,LeCun et al.,1989;Ranzato et al.,2007)(Sec.5.8,5.16,5.19).5.51960-1981and Beyond:Development of Backpropagation(BP)for NNsThe minimisation of errors through gradient descent(Hadamard,1908)in the parameter space of com-plex,nonlinear,differentiable,multi-stage,NN-related systems has been discussed at least since the early 1960s(e.g.,Kelley,1960;Bryson,1961;Bryson and Denham,1961;Pontryagin et al.,1961;Dreyfus,1962; Wilkinson,1965;Amari,1967;Bryson and Ho,1969;Director and Rohrer,1969;Griewank,2012),ini-tially within the framework of Euler-LaGrange equations in the Calculus of Variations(e.g.,Euler,1744). Steepest descent in such systems can be performed(Bryson,1961;Kelley,1960;Bryson and Ho,1969)by iterating the ancient chain rule(Leibniz,1676;L’Hˆo pital,1696)in Dynamic Programming(DP)style(Bell-man,1957).A simplified derivation of the method uses the chain rule only(Dreyfus,1962).The methods of the1960s were already efficient in the DP sense.However,they backpropagated derivative information through standard Jacobian matrix calculations from one“layer”to the previous one, explicitly addressing neither direct links across several layers nor potential additional efficiency gains due to network sparsity(but perhaps such enhancements seemed obvious to the authors).。
cognitive semantics

I. IntroductionII. Prototype Category Theory (原型范畴理论)III. Image Schema Theory (意象图式理论)IV. Conceptual Metaphor Theory (概念隐喻理论)V. Conceptual Metonymy Theory (概念转喻理论)VI. Figure-Ground Theory(图形-背景理论)VII. Conceptual Blending Theory (概念合成理论)(Conceptual Integration Theory) (概念整合理论)(Mental Space Theory) (心理空间理论)Frame Theory (框架理论)Script Theory (脚本理论)Construction Grammar (构式语法)Iconicity TheoryRelevance TheoryCognitive Linguistics is the most rapidly expanding school in modern Linguistics. It aims to create a scientific approach to the study of language, incorporating the tools of philosophy, neuroscience and computer science.Cognition is the way we think.Cognitive linguistics is the scientific study of the relation between the way we communicate and the way we think.It is an approach to language that is based on our experience of the world and the way we perceive and conceptualize it.CL is contrastive to a ‘logical’view of language.Our car has broken downTraditionalExplanation:A grammatical structureCognitive Explanation:Represented by three mainapproachesExperiential view(经验观)Prominence view(凸显观)Attentional view(注意观)(1) Experiential View•Its main claim is that instead of postulating logical rules and objective definitions on the basis of theoretical considerations and introspection, a more practical and empirical path should be pursued.•For example, ask language users to describe what is going on in their minds when they produce and understand words and sentences. (e.g. a car•Car: a box-like shape, wheels, doors, windowscomfort, speed, mobility, independence,social status•The experiential view of words provides a much richer and more natural description of their meanings.Cognitive linguists believe that our shared experience of the world is also stored in our everyday language and can thus be collected from the way we express our ideas. The transfer of our experience of well-known objects and events is even more important where abstract categories like emotions is involved.e.g. Dad exploded.In order to get a full grasp of this utterance and the notion of anger expressed, we will call up our knowledge of actual explosions of gas stoves, fireworks and even bombs.This means that we will make use of our experience of the concrete world around us.•The philosophical Foundation of cognitive linguistics is Embodied Philosophy(体验哲学) or Experientialism/Experiential Realism)—一种“身心合一”或“心寓于身”的认知观。
格莱斯会话含义与合作原则综述

校园英语 / 语言文化研究格莱斯会话含义与合作原则综述北京林业大学/王彦尊【摘要】合作原则于1967年由英国语言哲学家格莱斯提出。
他认为人们的会话之所以流畅且不会产生误会是因为在会话中人们默契地遵守着一些原则。
它包含以下四个准则,即量的准则、质的准则、关系准则和方式准则。
合作原则的四个准则以不同的形式在交际活动中起着重要作用。
出于各种各样的交际或会话目的,人们有时会有意或者无意违反其中一条或多条原则,这时就产生了会话含义。
近年来,关于会话含义和合作原则的违反被应用于不同的领域,产生了各种各样的效果。
本文将对合作原则的运用和违反进行总结和反思,为以后的研究打下基础。
【关键词】合作原则 会话含义 运用一、前言人们顺利的交际来源于对合作原则的遵守,具体表现在对四个准则的遵守。
量的原则指既不要说多余的话也不要提供不足的信息,质的准则指不要说谎话或者与事实不符的话,关系原则是指说的内容不能脱离对话的主体和内容,方式准则指说话要有条理,要避免含糊不清或者歧义。
交际时遵守以上准则会产生正常的会话含义,从而保证会话的顺利进行,如果说话者有意违反以上某条或多条准则,就会产生新的会话含义,会话会产生不一样的效果。
二、关于合作原则和会话含义的综述1.国内对合作原则的研究。
一是对“格莱斯”会话含义的反思,主要是指出合作原则的不足和局限性。
例如合作原则的误用。
梁燕华(2006)在《Grice 的合作原则:偏离与误区》一文中指出,人们对合作的实质存在误解,通常情况下,人们会认为只有积极的合作才算合作,然而,消极的合作例如吵架,本身也是一种合作。
二是合作原则的运用领域,合作原则在国内的研究通常与礼貌原则一起出现,主要用于分析口语会话。
他们指出,实际会话过程中,合作原则通常次于礼貌原则之后出现,即人们会优先考虑所说的话是否礼貌,其次在遵循合作原则完成对话。
三是违反合作原则产生的幽默效果的研究。
通过对情景剧文本进行语用分析,得出是怎样通过违反合作原则及其准则产生幽默效果。
Automated Formal Verification of Model Tranformations

Automated Formal Verification ofModel TranformationsD´a niel Varr´o and Andr´a s PatariczaBudapest University of Technology and EconomicsDepartment of Measurement and Information SystemsH-1521Budapest,Magyar tud´o sok k¨o r´u tja2.varro,pataric@mit.bme.huAbstract.As the Model Driven Architecture(MDA)relies on complex andhighly automated model transformations between arbitrary modeling languages,the quality of such transformations is of immense importance as it can easily be-come a bottleneck of a model-driven design process.Automation surely increasesthe quality of such transformations as errors manually implanted into transforma-tion programs during implementation are eliminated;however,conceptualflawsin transformation design still remain undetected.In this paper,we present a meta-level and highly automated technique to formally verify by model checking that amodel transformation from an arbitrary well-formed model instance of the sourcemodeling language into its target equivalent preserves(language specific)dy-namic consistency properties.We demonstrate the feasibility of our approach ona complex mathematical model transformation from UML statecharts to Petrinets.Keywords:model transformation,graph transformation,model checking,formalverification,MDA,UML statecharts,Petri nets.1IntroductionNowadays,the Model Driven Architecture(MDA)of the Object Management Group (OMG)has become the dominating trend in software engineering.The core technology of MDA is the Unified Modeling Language(UML),which provides a standard way to buildfirst a platform independent model(PIM)of the target system under design, which may be refined afterwards into several platform specific models(PSMs).Finally, the target application code should be generated automatically by off-the-shelf UML CASE tools directly from PSMs.While MDA puts the stress on a precise object-oriented modeling language(i.e., UML)as the core technology,it fails to sufficiently emphasize the importance of precise and highly automated model transformations for designing and implementing mappings from PIMs to PSMs,or PSMs to application code.The methodology(if there is any) behind existing code generators integrated into off-the-shelf UML CASE tools relieson textual programming language translations,which does not scale up for the needs of a UML based visual modeling environment.Moreover,PIM-to-PSM mappings are frequently hard wired into the UML tool thus it is almost impossible to be tailored to special requirements of applications.In case of dependable and safety critical applications,further model transforma-tions are necessitated to map UML models into various mathematical domains(like Petri nets,dataflow networks,transition systems,process algebras,etc.)to(i)define formal semantics for UML in a denotational way[2,7,8,16],and/or(ii)carry out for-mal analysis of UML designs[4,13].In the current paper,we investigate the model transformation problem from a gen-eral perspective,i.e.,to specify how to transform a well-formed instance of a source modeling language(which is typically UML in the context of MDA)into its equivalent in the target modeling language(which can be UML,a target programming language, or a mathematical modeling language).Related work in model transformations Model transformation methodologies have been under extensive research recently.Existing model transformation approaches can be grouped into two main categories:–Relational approaches:these approaches typically declare a relationship between objects(and links)of the source and target language.Such a specification typically based upon a metamodel with OCL constraints[1,11,17].–Operational approaches these techniques describe the process of a model transfor-mation from the source to the target language.Such a specification mainly com-bines metamodeling with(c)graph transformation[5,8,9,14,27],(d)triple graph grammars[22](e)term rewriting rules[28],or(f)XSL transformations[6,19].Many of the previous approaches already tackle the problem of automating model transformations in order to provide a higher quality of transformation programs com-pared with manually written ad hoc transformation scripts.Problem statement However,automation alone cannot protect against conceptualflaws implanted into the specification of a complicated model transformation.Consequently, a mathematical analysis carried out on the UML design after an automatic model trans-formation might yield false results,and these errors will directly appear in the target application code.As a summary,it is crucial to realize that model transformations themselves can also be erroneous and thus becoming a quality bottleneck of MDA.Therefore,prior to analyzing the UML model of a target application,we have to prove that the model transformation itself is free of conceptual errors.Correctness criteria of model transformations Unfortunately,due to their wide range of applications in the MDA environment,it is hard to establish a single notion of correct-ness for model transformations.The most elementary requirements of a model transfor-mation are syntactic.–The minimal requirement is to assure syntactic correctness,i.e.,to guarantee that the generated model is a syntactically well–formed instance of the target language.2–An additional requirement(called syntactic completeness)is to completely cover the source language by transformation rules,i.e.,to prove that there exists a corre-sponding element in the target model for each construct in the source language.However,in order to assure a higher quality of model transformations,at least the following semantic requirements should also be addressed.–Termination:Thefirst thing we must also guarantee is that a model transformation will terminate.This is a very general,and modeling language independent semantic criterion for model transformations.–Uniqueness(Confluence,functionality):As non-determinism is frequently used in the specification of model transformations(as in the case of graph transformation based approaches)we must also guarantee that the transformation yields a unique result.Again,this is a language independent criterion.–Semantic correctness(Dynamic consistency):In theory,a straightforward cor-rectness criterion would require to prove the semantic equivalence of source and tar-get models.However,as model transformations may also define a projection from the source language to the target language(with deliberate loss of information), semantic equivalence between models cannot always be proved.Instead we define correctness properties(which are typically transformation specific)that should be preserved by the transformation.Unfortunately,related work addressing these correctness criteria of model transfor-mations is very limited.Syntactic correctness and completeness was attacked in[27] by planner algorithms,and in[10]by graph transformation.Recently in[15],sufficient conditions were set up that guarantee the termination and uniqueness of transformations based upon the static analysis technique of critical pair analysis[12].However,no approaches exist to reason about the semantic correctness of model transformations.To be precise,the CSP based approach of[9]that aims to ensure dy-namic consistency of UML models has the potential to be extended to reason about properties of transformations.However,defining manually the semantics of an arbitrary modeling language by mapping it into CSP is much more difficult and less intuitive than defining the operational semantics of the language by graph transformation.Our contribution In this paper,we present a meta-level and highly automated frame-work(in Sec.2)to formally verify by model checking that a model transformation from an arbitrary well-formed model instance of the source modeling language(specified by metamodeling and graph transformation techniques)into its target equivalent preserves (language specific)dynamic consistency properties.We demonstrate the feasibility of our approach(in Sec.3)on verifying a semantic property of a complex model transfor-mation from UML statecharts to Petri nets.2Automated Formal Verification of Model TransformationsWe present an automated technique to formally verify(based on the model checking approach of[24])the correctness of the model transformation of a specific source model into its target equivalent with respect to semantic properties.32.1Conceptual overviewA conceptual overview of our approach is given in Fig.1for a model transformation from anfictitious modeling language A(which will be UML statecharts for our demon-strating example later on)to B(Petri nets as in our case).Modeling language A Modeling language BFig.1.Model level formal verification of transformations1.Specification of modeling languages.As a prerequisite for the framework,eachmodeling language(both A and B)should be defined precisely using metamodeling and graph transformation techniques(see,for instance,[26]for further details). 2.Specification of model transformations.Moreover,the A2B model transforma-tion should be specified by a set of(non-conflicting)graph transformation rules.3.Automated model generation.For any specific(but arbitrary)well-formed modelinstance of the source language A,we derive the corresponding target model by automatically generated transformation programs(e.g.,generated by VIATRA[5] as tool support).4.Generating transition systems.As the underlying semantic domain,a behav-iorally equivalent transition system is generated automatically for both the source and the target model on the basis of the transformation algorithm presented in[24] (and with a tool support reported in[21]).5.Select a semantic correctness property.We select(one or more)semantic prop-erty p in the source language A which is structurally expressible as a graphical pattern composed of the elements of the source metamodel(and potentially,some temporal logic operators).Note that the formalization of these criteria for a specific model transformation is not at all straightforward.In many cases,we can reduce the question to a reach-ability problem or a safety property,but even in this casefinding the appropriate4temporal logic formulae is non-trivial.More details on using graphical patterns to capture static well-formedness properties can be found,e.g.,in[10].6.Model check the source model.Transition system A is model-checked automati-cally(by existing model checker tools like SAL[3]or SPIN)to prove property p.This model checking process should succeed,otherwise(i)there are inconsisten-cies in the source model itself(a verification problem occurred),(ii)our informal requirements are not captured properly by property p(a validation problem oc-curred),or(iii)the formal semantics of the source language is inappropriate as a counter example is found which should hold according to our informal expectations (another validation problem).7.Transform and validate the property.We transform the property p into a propertyq in the target language(manually,or using the same transformation program).As a potentially erroneous model transformation might transform incorrectly the property p in to property q,domain experts should validate that property q is really the target equivalent of property p or a strengthened variant.8.Model check the target model.Finally,transition system B is model-checkedagainst property q.–If the verification succeeds,then we conclude that the model transformation is correct with respect to the pair(p,q)of properties for the specific pairs of source and target models having semantics defined by a set of graph transformation rules.–Otherwise,property p is not preserved by the model transformation and de-bugging can be initiated based upon the error trace(s)retrieved by the model checker.As before,this debugging phase mayfix problems in the model trans-formation or in the specification of the target language.Note that at Step2,we only require to use graph transformation rules to specify model transformations in order to use the automatic program generation facilities of VIATRA.Our verification technique is,in fact,independent of the model transforma-tion approach(only requires to use metamodeling and graph transformation for speci-fying modeling languages),therefore it is simultaneously applicable to relational model transformation approaches as well.Prior to presenting the verification case study of a model transformation,we briefly discuss the pros and contras of metamodel level and model level verification of model transformations.2.2Metamodel vs.model level verification of model transformationsIn theory,it would be advisable to prove that a model transformation preserves certain semantic properties for any well-formed model instance,but this typically requires the use of sophisticated theorem proving techniques and tools with a huge verification cost. The reason for that relies in the fact that proving properties even in a highly automated theorem prover require a high-level of user guidance since the invariants derived directly from metamodels should be typically manually strengthened in order to construct the proof.In this sense,the effort(cost and time)related to the verification of a transforma-tion would exceed the efforts of design and implementation which is acceptable only for very specific(safety-critical)applications.5However,the overall aim of model transformations is to provide a precise and au-tomated framework for transforming concrete applications(i.e.,UML models).There-fore,in practice,it is sufficient to prove the correctness of a model transformation for any specific but arbitrary source model.Thanks to existing model checker tools and the transformation presented in[24],the entire verification process can be highly auto-mated.In fact,the selection of a pair(p,q)of corresponding semantic properties is the only part in our framework that requires user interaction and expertise.Even if the a verification of a specific model transformation is practically infea-sible due to state space explosion caused by the complexity of the target application, model checkers can act as highly automated debugging aids for model transformations supposing that relatively simply source benchmark models are available as test sets.As a conclusion,from an industrial perspective,a highly automated debugging aid for model transformations(as provided by our model checking based approach)is(at least)as valuable as a user guided excessive formal verification of a transformation.3Case Study:From UML Statecharts to Petri NetsWe present(an extract of)a complex model transformation case study from UML stat-echarts to Petri nets(denoted as SC2PN)in order to demonstrate the feasibility of our verification technique for model transformations.The SC2PN transformation was origi-nally design and implemented as part of an industrial project where UML statecharts are projected into Petri nets in order to carry out various kinds of formal analysis(e.g.,func-tional correctness[18],performance analysis[13])on UML designs(i.e.,to formally analyze UML models but not model transformations).Due to severe page limitations, we can only provide an overview of the verification case study,the reader is referred to[25]for a more detailed discussion.3.1Defining modeling languages by model transformation systemsPrior to reasoning about this model transformation,both the source and target model-ing languages(UML statecharts and Petri nets)have to be defined precisely.For that purpose,in[26]we proposed to use a combination of metamodeling and graph trans-formation techniques:the static structure of language is described by a corresponding metamodel clearly separating static and dynamic concepts of the language,while the dynamic operational semantics is specified by graph transformation.Graph transformation(see[20]for theoretical foundations)provides a rule-based manipulation of graphs,which is conceptually similar to the well-known Chomsky grammar rules but using graph patterns instead of textual ones.Formally,a graph transformation rule(see e.g.addT okenR in Fig.3for demonstration)is a triple,where is the left-hand side graph,is the right-hand side graph,while is(an optional)negative application condition(grey areas infigures).Informally,and of a rule defines the precondition while de-fines the postcondition for a rule application.The application of a rule to a model(graph)(e.g.,a UML model of the user) alters the model by replacing the pattern defined by with the pattern of the. This is performed by61.finding a match of the pattern in model;2.checking the negative application conditions which prohibits the presence ofcertain model elements;3.removing a part of the model that can be mapped to the pattern but not thepattern yielding an intermediate model;4.adding new elements to the intermediate model which exist in the butcannot be mapped to the yielding the derived model.In our framework,graph transformation rules serve as elementary operations while the entire operational semantics of a language or a model transformation is defined by a model transformation system[27],where the allowed transformation sequences are constrained by controlflow graph(CFG)applying a transformation rule in a specific rule application mode at each node.A rule can be executed(i)parallelly for all matches as in case forall mode;(ii)on a(non-deterministically selected)single matching as in case of try mode;or(iii)as long as applicable(in loop mode).UML statecharts as the source modeling language As the formalization of UML statecharts(abbreviated as SC)by using this technique and a model checking case study were discussed in[23,24],we only concentrate on the precise handling of the target language(i.e.,Petri nets)in this paper.We only introduce below a simple UML model as running example and assume the reader’s familiarity with UML and metamodels. Example1(Voting).The simple UML design of Fig.2)models a voting process which requires a consensus(i.e.,unique decision)from the participants.Fig.2.UML model of a voter systemIn the system,a specific task is carried out by multiple calculation units CalcUnit, and they send their local decision to the Voter in the form of a yes or no message.The voter may only accept the result of the calculation if all processing units voted for yes. After thefinal decision of the voter,all calculation units are notified by an accept or a decline message.In the concrete system,two calculation units are working on the7desired task(see the object diagram in the upper right corner of Fig.2),therefore the statechart of the voter is rather simplified in contrast to a parameterized case.Petri nets as the target modeling language Petri nets(abbreviated as PN)are widely used means to formally capture the dynamic semantics of concurrent systems due to their easy-to-understand visual notation and the wide range of available tools.A precise metamodeling treatment of Petri nets was discussed in[26].Now we briefly revisit the metamodel and the operational semantics of Petri nets in Fig.3.enableTransR delTokenRFig.3.Operational semantics of Petri nets by graph transformation According to the metamodel(the Petri Net package in the upper left corner of Fig.3),a simple Petri net consists of Place s,Transition s,InArc s,and OutArc s as depicted by the corresponding classes.InArcs are leading from(incoming)places to transitions,and OutArcs are leading from transitions to(outgoing)places as shown by the associations.Additionally,each place contains an arbitrary(non-negative)num-ber of token s).Dynamic concepts,which can be manipulated by rules(i.e.,attributes token,andfire)are printed in red.The operational behavior of Petri net models are captured by the notion offiring a transition which is performed as follows.1.First,fire attributes are set to false for each transition of the net by applying ruledelFireR in forall mode.82.A single enabled transition T(i.e.,when all the places P with an incoming arc A tothe transition contain at least one token token0)is selected to befired(by setting thefire attribute to true)when applying rule enableTransR in try mode.3.Whenfiring a transition,a token is removed(i.e.,the counter token is decremented)from each incoming place by applying delTokenR in forall mode.4.Then a token is added to each outgoing place of thefiring transition(by increment-ing the counter token)in a forall application of rule addTokenR.5.When no transitions are enabled,the net is dead.3.2Defining the SC2PN model transformationModeling statecharts by Petri nets Each SC state is modeled with a respective place in the target PN model.A token in such a place denotes that the corresponding state is active,therefore,a single token is allowed on each level of the state hierarchy(forming token ring,or more formally,a place invariant).In addition,places are generated to model messages stored in event queues of a statemachine.However,the proper handling of event queues is out of the scope of the current paper,the reader is referred to[25].Each SC step(i.e.,a collection of SC transitions that can befired in parallel)is projected into a PN transition.When such a transition isfired,(i)tokens are removed from source places(i.e.,places generated for the source states of the step)and event queue places,and(ii)new tokens are generated for all the target places and receiver message queues.Therefore,input and output arcs of the transition should be generated in correspondence with this rule.Example2.In Fig.4,we present a(n extract)of the Petri net equivalent of the voter’s UML model(see Fig.2).For improving legibility,only a single transition(leading from state may forfor accept,decline)and message queues for validevents(like yes).The initial state ismarked by a token in wait vote.The depicted transition has two in-coming arcs as well,one from its sourcestate mayforautomatically,which would yield the target Petri net model(Fig.4)as the output when supplying(the XMI representation of)the voter’s UML model(Fig.2)as the input.Figure5gives a brief extract of transforming SC states into PN places.According to this pair of rules,each initial state(i.e.,that is active initially)in the source SC model is transformed into a corresponding PN place containing a single token,while each non-initial state(i.e.,that is passive initially)is projected into a PN place without a token.active2placeRpassive2placeR Fig.5.Transforming SC states into PN placesIt is worth noted that a model transformation rule in VIATRA is composed of ele-ments of the source language(like State S in the rule),elements of the target language (like Place P),and reference elements(such as RefState R).Latter ones are also defined by a corresponding metamodel.Moreover,they provide bi-directional transformations for the static parts of the models,thus serving as a primary basis for back-annotating the results of a Petri net based analysis into the original UML design.3.3Verification of the SC2PN model transformationFor the SC2PN case study,Steps1–3in our verification framework have already been completed.Now,a transition system(TS)is generated automatically(according to[24]) for source and target models as an equivalent(model-level)representation of the oper-ational semantics defined by graph transformation rules(on the meta-level). Generating transition systems Transition systems(or state transition systems)are a common mathematical formalism that serves as the input specification of various model checker tools.They have certain commonalities with structured programming languages(like C or Pascal)as the system is evolving from a given initial state by executing non-deterministic if-then-else like transitions(or guarded commands)that manipulate state variables.In all practical cases,we must restrict the state variables to havefinite domains,since model checkers typically traverse the entire state space of the system to decide whether a certain property is satisfied.For the current paper,we use the easy-to-read SAL[3]syntax for the concrete representation of transition systems.Our generation technique(described in[24]also including feasibility studies from a verification point of view)enables model checking for graph transformation systems by automatically translating them into transitions systems.The main challenge in such a translation is two fold:(i)we have to“step down”automatically from the meta-level to10the model-level when generating model-level transition systems from meta-level graph transformation systems,and(ii)a naive encoding of the graph representation of models would easily explode both the state space and the number of transitions in the tran-sition system even for simple models.Therefore our technique applies the following sophisticated optimizations:–Introducing state variables in TS only for dynamic concepts of a language.–Including only dynamic parts of the initial model in the initial state of the TS.–Collecting potential applications of a graph transformation rule by partially apply-ing them on the static parts of the rule and generating a distinct transition(guarded command)for each of them that only contains dynamic parts as conditions in guards and assignments in actions.In order to give an impression on the generated target transition system,we give below an extract from the SAL encoding of our Petri net model(of Fig.4).%Type declarationsplaceID:TYPE=wait_for_vote,may_accept,decline,v_yes,c1_accept,c1_accept;transID:TYPE=t,...;pn1:MODULE=BEGIN%declaring state variablesGLOBAL token:ARRAY placeID OF INTEGERGLOBAL fire:ARRAY transID OF BOOLEANINITIALIZATIONtoken[wait_for_vote]=1;token[decline]=0;token[may_accept]=0;token[v_yes]=0;...fire[t]=FALSE;...TRANSITION%generated for one potential matching of rule enableTransR fire[t]=FALSE ANDNOT(token[wait_for_vote]=0)ANDNOT(token[v_yes]=0)-->fire’[t]=TRUE;[]...END;–The objects and variable domains are transformed into type(domain)declarations (see,e.g.,the corresponding value for place decline in type placeID).–State variable arrays are introduced only for attributes token andfire(the only dynamic concepts of Petri nets).–Initialization is consistent with the initial marking of the Petri net(i.e.,place wait vote contains a token thus the corresponding variable token is initialized to1).–The guarded command generated from the potential application of rule enable-TransR to the PN transition depicted Fig.4only checks the corresponding dynamic concepts(thefire attribute is false and there are tokens in both places wait vote and vFormalizing the correctness property Now,a semantic criterion is defined for the verification process that should be preserved by the SC2PN model transformation.Note that the term “safety criterion”below refers to a class of temporal logic properties pro-hibiting the occurrence of an undesired situation (and not to the safety of the source UML design).Definition 1(Safety criterion for statecharts).For all OR-states (non-concurrent composite states)in a UML statechart,only a single substate is allowed to be active at any time during execution.This informal requirement can be formalized by the following graphical invariant in the domain of UML statecharts (cf.Fig.6together with its equivalent logic formula).Informally speaking,it prohibits the simultaneous activeness of two distinct substates S1and S2of the same OR state C (i.e.,non-concurrent composite state).Unfortunately,it is difficult to estab-Fig.6.A sample graphical safety criterion lish the same criterion on the meta level in the target language of Petri nets since the SC2PN transformation defines an ab-straction in the sense that message queuesof objects are also transformed into PNplaces (in addition to states).However,in order to model check a certain sys-tem,this meta-level correctness criterion can be re-introduced on the model level.Therefore,we first automatically instan-tiate (the static parts of)the criterion on the concrete SC model (as done during the transformation to transitions systems)to obtain the model level criterion of Fig.7.Note that the different (model level)patterns denote conjunctions,therefore,none of the de-picted situations are allowed tooccur.Fig.7.Model level safety criterionEquivalent property in the target language This model level criterion is appropri-ate to be transformed into an equivalent criterion for the Petri net model.As the state12。
陈述性偏好法中受访者不确定性的研究综述

大多数研究认为产生偏好不确定性的主要原因是由于受访者对评估物品或服务不熟悉、缺乏先验经 验[6] [7]。Loomis 和 Ekstrand (1998)表明受访者的不确定性程度与标价之间具有显著的二次效应,在标价 很高或者很低时确定性更高,而中间范围标价时受访者的确定性比较低[8]。总结现有研究中产生受访者 不确定性的原因具体有:
Modern Marketing 现代市场营销, 2015, 5(3), 41-47 Published Online August 2015 in Hans. /journal/mom /10.12677/mom.2015.53006
42
王佳,葛姣菊
5) 受访者可能对估值问题中的假设条件和实现方式不能充分理解,或者对政策工具和执行机构产生 迟疑[9],例如,受访者对公共物品的所有权、供应者和相关支付机制的不信任可能会导致偏好不确定性 [18]。
受访者不确定性的产生原因是陈述偏好研究的热点问题,分析受访者不确定性的来源有助于寻找解 决偏好不确定性有效方法。多数学者认为不确定性主要来自于受访者对公共物品、未来供给、收入约束、 价格及问卷方式的不完备知识和信息。目前还没有统一的理论模型解释受访者不确定性的产生原因,对 于不同调查对象的不确定性原因可能有所不同,不同调查国家和地域由于文化差异也可能出现不同的不 确定性来源,因此需要更多的实证研究对受访者不确定性的产生原因进行更深入地探析。
受访者不确定性是指受访者缺乏对评估物品真实价值的相关信息或信心[5]。目前对 SP 法中受访者 不确定性的研究还存在较大争议,特别是我国对于 SP 法中受访者不确定性的研究仍是一块空白。因此本 文系统地回顾陈述偏好法中受访者不确定性的研究,分别从受访者不确定性的产生原因、测量手段和校 正方法三个方面展开梳理总结,指出现有研究的不足和未来研究方向及趋势。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Formal Models for Cognition -Taxonomy of Spatial Location Description andFrames of ReferenceAndrew U. FrankDepartment of GeoinformationTechnical University ViennaA-1040 Vienna, Austriafrank@geoinfo.tuwien.ac.atAbstract. Language uses location description with respect to spatial referenceframes. For the transformation from a visual perception to the relativeexpression the reference frames must fix three parameters:•origin (e.g., the speaker, an object, another person),•orientation (e.g., the ax ial frame of the speakers, of the addressee, ofanother object),•handedness of the coordinate system (same as a person’s or inverse).These parameters characterize a reference frame. The paper describes themethods used in the English language and proposes exact definitions of ego-centric, intrinsic or retinal relative reference frames, and egocentric or allocentriccardinal relative reference frames. Invariants of descriptions with respect toclasses of reference frames are discussed and some hints for t he pragmaticpreference of one or the other reference frame suggested.The paper demonstrates two alternative computational methods for Levelt’sperspective taking, which deduces another person’s egocentric perspective fromthe speaker’s egocentric (perceptive) perspective. One method is assumingimagistic (analog) representations and the other method works with apropositional (qualitative) representation. Precise hypotheses can be formulatedin the formalized framework to construct human subject tests to differentiatebetween these alternatives.1 IntroductionTerminology and implied assumptions often confuse the discussion of cognitive abilities of human beings. An example for terminological problems is the extensive discussion of deictic versus intrinsic frames of reference to describe spatial locations, which is ongoing for many years without conclusive results. It has been pointed out that this was - at least partially - due to varying definitions of terminology: “The analysis of spatial terms in familiar European languages remains deeply confused, and those in other languages almost entirely unexplored.” (Levinson 1996, p.134) The standard solution for clarification of terminology is formalization: defining a few base concepts - preferably not from the discipline to be investigated, thus less likely to lead to confusions - and deriving other concepts from these by formaldefinition (Tarski 1946). This is often attempted with complex computer programs, simulating human p erformance (Gopal and Woodstock 1994). However, the resulting models are so complex, contain so much detail and are so difficult to read even for their authors that they contribute little to the clarification and formalization of the discussion.The current trend towards functional descriptions in cognitive science (Bierwisch 1996) naturally leads to models formulated in a functional language (Bird and Wadler 1988). Formalizations in a functional language can be read as definitions, but can be executed as programs to observe their effects. This allows to understand the interaction of definitions and to test our intuition. It becomes also possible to compare the defined behavior with observed behavior (Montello and Frank 1996).A detailed description is achieved through the construction of a formal model to produce the sentences to describe the spatial relations between objects as they are typically found in the literature (e.g., (Talmy 1988; Levinson 1996): “The bicycle is in front of the church”, “The ball is left of the chair”, etc.). The description presented covers the simple case, just sufficient to capture the ongoing discussion (e.g., (Levinson 1996)). This approach is based on the conviction that an explanation must list all the inputs necessary to produce the observed outputs; computational, formal models are the best way to check that all necessary details are considered. Applying Occam’s razor whenever possible, I strive for the simplest model and posit complications only when justified by observable behavior.A taxonomy of reference frames must be sufficient to define mechanisms which are sufficient to produce and differentiate the different verbal descriptions of spatial location using different reference frames. The formalization shows that the current taxonomy is not sufficient for this purpose. Levinson differentiates for English 3 cases: intrinsic, absolute and relative (1996). Levelt combines this with a differentiation between egocentric and allocentric, to give 6 different cases; he uses the term deictic to mean relative egocentric, intrinsic to mean intrinsic allocentric and absolute for absolute allocentric(1996). (A detailed discussion how these terms are used by the authors is delayed till section 11, when the geometric foundation has been laid). The current taxonomy is not sufficient to differentiate the use of left/right for another person or a stage (both are intrinsic in the standard terminology) (figure 1 left) or with respect to a person or an object (both are deictic) (figure 1 right):Der Ball ist links von Peter (The ball is to the left of Peter).-> from the observers perspective right of Peter,Der Ball ist auf der linken Seite der Buehne (The ball is on the left side of the stage).-> from the observers perspective on the left side.Der Ball ist links von mir (The ball is to my left).Der Ball ist links vom Baum (The ball is to the left of the tree).Front Left Front RightFig. 1. left) Two different uses of ‘left’ (right-handed and left -handed coordinate system)right) Two different forms of deictic reference frame (egocentric and retinal)The formal model presented here leads to the classification of reference frames by 3 values:• The reference object (ground ): the speaker, an observer or another object;• The orientation of the reference frame (which implies the selection of the frame type): speaker, ground, and direction from speaker to ground, or one of an externally fixed system (cardinal directions, up/downhill etc.);• The handedness of the reference frame (right- or left-handed).For English, 4 different situations using the ‘front, left, back, right’ terminology (in subsection 10.1.1) and 2 using the cardinal direction terms ‘north, west, south, east’ (in subsection 10.1.2) are characterized. The classification scheme is deduced from a formal (mathematical) model. The formalization can be used as code, which is sufficient to produce acceptable English descriptions for all cases. The classification can be used for other languages as well. The model is not intended to explain the pragmatics of the selection of one or the other perspective, the use of default values and ellipsis in general, etc.Three points seem novel in this formal approach presented here:• The model includes the world and the observing cognizant subject and gives a formal description of the observation operation. This can lead to quantitative modeling of errors in human performance (Frank 1998).• The model combines distance and directions in a single framework following the discussion in (Frank 1991; Frank 1991; Frank 1992; Frank 1996).• The m odel includes an imagistic and a propositional alternative of spatial reasoning. Specific hypotheses to differentiate between the two can now be formulated and tested.2 Overview of the Overall Cognitive Model UsedThere seems to be sufficient agreement on the overall content of a cognitive model to situate the discussion. For example, Jackendoff gives a ‘coarse sketch of the relation between language and vision’ (1996, p.3 Figure 1.3), which can be summarized as Figure 2.Fig. 2. Relation between language and vision (after (Jackendoff 1996), p.3)In an attempt to achieve a formal discussion, a simpler model (figure 3) is proposed, for which each step can be formally defined. It merges transformations Jackendoff separates into single units if a distinction seems not to be definable, but adds a transformation from the propositional representation of the environment to a perspective representation, following Levelt (1996 p.96, figure 3.11). It merges syntax and phonology, which are of less interest here. The perspective transformations - which are at the core of this paper - and the respective representations are then given a formally defined meaning in this paper. This fixes the intermediate representations and the code for the transformation. ‘Frame of Reference’ gets in this context a well defined, operational meaning: it describes the geometric principles that are used to transform from a primary geometric representation of the environment to a perspective representation, which contains all the elements to translate to a verbal expression by a linguistic (non-geometric) transformation.Unlike Jackendoff's model, the model in figure 3 includes a representation of the world. The representation of the world is not part of the ‘cognitive model’ sensu strictu, and is created and manipulated by special commands. The human observers or cognizant beings, in the sequel called EGO, exist in this world. The EGOs have the facility to observe the world - for example, by a visual channel - and to build an imagistic representation. From this imagistic representation a propositional (qualitative) spatial representation can be deduced by ‘internal inspection’. It would be possible that the subjects in the model (humans, animals) interact with the world and change it, but this is not modeled here (it would lead to models as used in Artificial Life research).Such a simple model is rich enough to posit a number of interesting and testable hypotheses. It can be formalized and provides the framework for other questions, a number of which are sketched at the end of this paper. The novel step is to include the world into the model; this permits to formalize the observation process (perception and cognition in one - here undifferentiated - transformation, following Talmy’s ‘ception’ (1996, with previous literature) in the same model.Fig. 3. The relation between language and vision used here In such a model the question of intermodal transfers - of which the first part of Bloom’s introduction is mostly concerned - can be discussed (Bloom, et al. 1994): what are the transformation functions, which translate from one to the other modality. One can posit a haptic observation channel from the world and then state precisely which kind of knowledge is acquired and how it is integrated with the knowledge gained through the visual channel. Is it integrated with the imagistic representation or with the propositional one (or partially one, partially the other)? The question can be posed much more specifically, because the formalization forces to define the observations that belong to a specific modality and the representation it builds.3 Formalization of the ModelI use here a functional programming language, specifically Gofer (Jones 1991), a derivative of the Haskell language (Hudak, et al. 1992). This is a language in the ML tradition, strictly typed with type inference (Milner 1978), which assures that all operations are called with arguments of the correct type. The language is referentially transparent, meaning that substitution is generally permitted as all variables retain the value they are initially assigned; side effects are not allowed. Code can therefore be read as high-school algebra, the left-hand side of an equation is defined as the right-hand side. The code can be obtained from the author.4 Representation of the WorldThe representation of the world is not part of the ‘cognitive model’ sensu strictu, but it must be included in order to permit to formalize a model of the observation function. The representation of the world - and in this case a spatial world - does not matter from a cognitive point of view; it is not part of the cognitive model, but is the model of theobject, the cognizant beings are observing. The model of the world is not the model of the concept of the world the EGO has. Only the observation function must be cognitively sound. The representation of the world must be sufficient to contain the facts that are necessary to produce the human knowledge about the environment that is compatible with human behavior.The world consists of a collection of objects that can be identified by a name. Some of the objects are cognizant subjects (persons), which can observe the world. The world is constructed at the beginning, but it could also be manipulated with special operations (for example, an object is moved). As a running example, the configuration in figure 4 will be used.yFig. 4. The example worldThe objects in the world are seen as points in a two dimensional plane without extension. We follow Levinson (1996, p. 135) in confining the discussion to the horizontal plane, as this is sufficient to reconstruct and clarify the current discussion in the literature. The objects are ‘axial’ (Landau 1996) and for each object, the location is given with 2 coordinates (location of the centroid) and an orientation. The orientation for things that have a natural orientation (for example, persons) is the azimuth for the ‘front direction’ (i.e. the angle with the positive x coordinate axis, measured clockwise) (see figure 5). Objects that do not have a natural coordinate frame are marked with OmniDir .5 Visual Perception and Imagistic RepresentationThe complete visual observation process is captured here - for simplicity - in a single process. It observes the world from the perspective of the EGO and produces a single -The extension and shape of the objects are not considered; this is sufficient to reconstruct the discussion by Levinson (1996), but would need extension to capture the wo rk of Eschenbach (this volume). The model of the world could be more complex, for example, space could be. This is left out here, however, to present the core of the modeling concept without additional complications.ego-centered - representation of the object positions. The visual obser vation is captured in a distance and direction to each object in the world on a ratio scale and in the relative orientation of this object with respect to the observer (see figure 5). At this stage, the arbitrary orientation of the world coordinate system, in which the position of the object is represented in the world model, is removed and replaced by the egocentric coordinate system.Fig. 5. The construction of the ego-centered representation The (minimalist) EGO consists of i ts own name, its orientation in space (which will be used to deduce an absolute representation - see section 8) and the observation of the world (as seen from its perspective). This ‘view’ consists of a list of objects with their name and the vector (distance and direction) from the observer (same as (O'Keefe 1996)) to these and their orientation (with respect to the observer’s orientation). One could introduce limitation to what is visible from the world, but this is not done here.It is assumed here that the observer builds first a representation in which he uses himself and the organization of space as it emanates from himself as ground and represents the other objects on this ground (egocentric view). Since Piaget, there is an extensive discussion how allocentric representation of space is built from such an egocentric representation, but the observations reported in the literature are not sufficient that a specific, more elaborate formal description can be justified possible. For present purposes, an egocentric representation is sufficient.6 Propositional Representation for Direct (‘Egocentric’) View Following the model of Jackendoff (1996) an abstract propositional representation is deduced from the imagistic representation. An observation function accesses theimagistic representation and deduces relative positions for each object in a propositional representation. Little is known about the encoding of this propositional representation. Levelt and Levinson report experiments, which seem to come to contradictory conclusions (Levelt 1996; Levinson 1996).We assume here qualitative representation in an egocentric system (figure 6), which differentiates 4 distance relations, where each successive range reaches twice as far as the previous one and 8 equidistant directions, following the schema proposed in (Frank 1992; Hong, et al. 1995). This system seems ecologically plausible; reasoning with directions, human performance gives approximately the same level of errors as a model with 8 direction cones (Montello and Frank 1996). Distance is encoded in zones: the zone up to 1 unit is here , between 1 and 2 units is near , 2 to 4 units is far and further is very far(see figure 9).Back Left herenearfarvery farFig. 6. The qualitative distances and directionsThe transformation discretizes for each object the distance and the direction value in 4 levels for distances and 8 values for the direction and replaces the quantitative representation in the imagistic representation by a qualitative representation (“Ball” 3.2 45) becomes (“Ball” Far FrontRight). The propositional representation of the world by the EGO is a propositional, qualitative encoding of the imagistic one. This means that the vectors are discretized (i.e. distances encoded as far, near, etc. and directions as front, left, etc., directions by 8 cardinal direction values). Jackendoff’s model assumes that in the propositional representation sufficient information is available for the production of the linguistic code. Sentences likeSimon says:Paul steht links vor mir. (Paul is to my front left) Der Sessel steht gerade vor mir. (The chair is in front of me)etc. can be produced. This produces the most direct representation of a spatial situation. It is often called intrinsic, but Levinson shows the difficulty of this label. We propose the term egocentric for this. In Levinson’s characteristic, it is described as anintrinsic coordinate system, with origin at the speaker and using as relatum (gro und) the speaker.7 Perception from Other PerspectivesIt is generally observed that people are capable of transforming their perception of space into the perception from a different point - Levelt’s ‘perspective taking’ (Levelt 1996). This ability is fundamental to understand other people’s description of space from their perspective. This is often called the intrinsic or deictic description of the relative position of objects: the position of an object (figure) is described not with respect to the speaker but with respect to another object (ground) as if this were the observer. Very often the addressee is used as ground, as in:Peter speaking to Simon: Der Ball liegt vor Dir. (The ball is in front of you)but any person can be used as a ground (relatum):Peter speaking to Paul: Der Ball liegt vor Simon. (The ball is in front of Simon).This section concentrates on the deduction of the relative position of an object with respect to another one. Perspective taking consists of three steps:PaulPaulSimonvector subtraction rotation + mirroringrotationFig. 7. vector subtraction: V ground->object =W observer->object - U observer->ground, rotation and mirroring1. ‘Path completion’ (Frank 1992; Hong, et al. 1995) or vector subtraction, computesthe vector from the ground to the figure given the vector from the observer to the ground, and the vector from the observer to the figure, (figure 7 left). Vector subtraction gives (in the coordinate system of the observer):(vector Simon - chair(w)) - (vector Simon Paul(u))= (vector Paul - chair(v))2. To achieve Paul’s view one has to rotate the result into Paul’s orientation (figure 7middle).3. The handedness of the resulting coordinate system has to be selected, whichdecides how the reference axis of the ‘new’ coordinate system is or iented (figure 7 right). This differentiates the two cases:Der Ball ist links von dir (the ball is to your left).Die Muenze ist in der linken Schublade des Tisches (the coin is in the left drawer of thetable).Two computational paths could be used to deduce this information: The relative position between two objects observed is either deduced from the imagistic representation or from the discrete representation - using the same formulae. One must assume that in the imagistic representation, some analogue computation similar to vector addition and subtraction can be performed. In the discrete representation, vector operations are carried out as table look up, and the combination for all inputs is stored in tables.7.1 Construction of an Imagistic Representation from Another PerspectiveThe person constructs the imagistic (2D top view or distance-direction vectors) representation as it would be deduced from the other perspective. This is, mathematically, simply a transformation of origin and orientation of the imagistic representation gained directly from one’s own perspective. For the distance-direction representation, it amounts to the subtraction of vectors, rotation and mirroring. The results are exactly the same as if the other person had observed the world.From this imagistic view from another person’s perspective the discrete representation can be deduced by the same discretization as described in section 6. 7.2 Deduction of the Propositional Representation for Another Perspective from thePropositional Ego-Centered RepresentationThis requires a qualitative reasoning about the combination of distance, directions and orientations. Simon reasons (figure 7): if the chair is in front of me and far, and Paul is front-left of me and very far, then the chair is (in Simon’s orientation system) left and near from Paul. The transformation to Paul’s orientation gives: if the chair is in Simon’s orientation system left and near from Paul, and Paul is facing left (in Simon’s system), then the chair is in front (and near) from Paul.The deduction of qualitative relations is known as relation combination and traditionally written with the ‘;’ operator (Schroeder in (Bird and Moor 1997)). In the literature several proposals for qualitative relation combination calculations have been made. They are mostly influenced by Allen’s calculus for time relations (Allen 1983) and follow a mathematical tradition, to give as a result the conjunction of all possible results. If a combination of two relations can more than one relation, the disjunction of these is given as a result (written as a list of terms connected by or).This was applied to direction reasoning by Freksa (Freksa 1991; Hernandez 1993) and to reasoning with topological relations (Egenhofer, et al. 1994). I have proposed an approximate mode for the composition operation, which gives always a single relation for the combination of two relations - the most plausible or most likely one (Frank 1992;, Egenhofer et al. 1995; Frank 1996). This is cognitively more plausible, because there are no indications that humans would handle disjunctions of relations. It agrees with the observed ‘preferred model’ tendency of human subjects (Knauff, this volume).The method used here is based on 4 distances and 8 directions, which results in 32 different positions. It is possible to devise methods using symmetry, such that not the full 32 square matrix for all combination must be present in memory, but only about 120 entries; thus a propositional table look-up seems mentally possible (comparable to the memorized multiplication tables with 100 entries).7.3 Comparison of an Imagistic or a Qualitative Transformation to AnotherPerspectiveIn principle, perspective taking should produce the same result whether applied to an imagistic or a propositional, qualitative representation:discretize (a-b) = discretize (a) - discretize (b)Due to the errors introduced by the discretization process, the deduction i n the discretized (qualiative) representation is only an approximation. This difference can be used to perform experiments, which help to decide which process humans use.Given the data reported in the literature, it cannot be decided if the perspective transformation is performed before or after translation to a qualitative model. We have shown here that a qualitative model is possible and can be formalized. Experiments with human subjects will be necessary to answer this hypothesis; the model here presented is the necessary formal framework to formulate a precise hypothesis and design the corresponding experiments.8 Construction of an Absolute Frame of ReferenceHumans have a potential to refer to the location of objects in an absolute frame, in English mostly the cardinal directions, but also up/down in a valley, towards the sea are used. These are particular cases of perspective taking, different from the cases discussed in section 7. There only the relative position of figure, ground and observer were relevant. If the objects in a relative frame are rotated together, the same expression results. In an absolute frame of reference the directions are invariant under individual rotation of the speaker, the observer or the ground, but if the whole configura tion rotates, then the expression changes. If the speaker, ground and figure all together are rotated by a quarter turn, then what was north becomes west (figure 8).Fig. 8. After rotation ‘the ball is north of the chair’ becomes‘the ball is west of the chair’The difference in what is and is not affected by changes is likely to affect the pragmatics of selecting an absolute or a relative reference frame: Absolute directions,are invariant under individual rotation of the speaker, the observer or the ground object. What is to my north is also to your north, independent of our relative position and orientation (provided we are relatively close together with respect to the figure). Absolute directions are therefore mostly used in geographic setting, where both distance relations are given and rotation of the configuration is not possible. The invariance under rotation of the individual objects and the (relative) invariance to the small changes in the positions of ground may explai n why rural people pragmatically prefer absolute frames. Direction expressions related to the body frame are invariant if the full configuration (speaker, figure, and ground) are rotated together; this compensates for the uncertainty of the position in an absolute frame typically experienced indoors. This would explain the preference of urban people for expressions related to the body frame (Pederson 1993).To construct and understand this kind of spatial relation, the EGO needs to know its orientation in this absolute frame. Then all the orientations can be translated from the egocentric reference frame to the absolute reference frame with a single subtraction of angles. This operation can be performed - similar to subsection 7.1 and 7.2 - in the imagistic or in a qualitative, propositional representation.It is worth noting that absolute directions depend also on the ground object; only the orientation of the reference frame is given by the cardinal direction, or the up/down valley direction etc. This is often not noted, because absolute directions are in English mostly used to express positions in geographic space where the speaker and the listener are relatively close. This advantage is lost when used in proximity: Simon says: The ball is to the west.Peter says: The ball is to the east.9 Definition of a Frame of ReferenceThe computation for ‘perspective taking’ has as input a view and produces another view - for example, the view, as the observer would have it. Section 7 offered an imagistic or a propositional method for the computation in detail. In both cases and for a relative or an absolute reference frame (section 7 and section 8), the computational model shows that a new frame of reference must be specified with three characteristics:•the origin of the coordinate system (independent if one assumes the conventional orthogonal coordinate system or not);•the orientation of the coordinate system, given by the direction of its primary axis (even if the secondary axis is not orthogonal to it);•the handedness of the coordinate system (i.e. the relations of the axes).Any description of a spatial relation is thus a relation between: the figure F and the ground G as set up by the reference frame defined by an Origin (which is the ground), an Orientation and a Handedness. All of these can be left away in a verbal description and replaced by default values (for example, ground is speaker, the origin and orientation of the reference frame is to be taken from the speaker and the handedness is conventional right-handed). The abundant use of default values in verbal description and the fact that certain combinations do not or even cannot occur, has certainly led to the confusion in the characterization of spatial frames of reference.。