文献翻译Page-Level Web Data Extraction from Template Pages

合集下载

外文文献翻译

外文文献翻译工具大全/node/2151建议收藏在科研过程中阅读翻译外文文献是一个非常重要的环节，许多领域高水平的文献都是外文文献，借鉴一些外文文献翻译的经验是非常必要的。

由于特殊原因我翻译外文文献的机会比较多，慢慢地就发现了外文文献翻译过程中的三大利器：Google“翻译”频道、金山词霸（完整版本）和CNKI“翻译助手"。

具体操作过程如下：1.先打开金山词霸自动取词功能，然后阅读文献；2.遇到无法理解的长句时，可以交给Google处理，处理后的结果猛一看，不堪入目，可是经过大脑的再处理后句子的意思基本就明了了；3.如果通过Google仍然无法理解，感觉就是不同，那肯定是对其中某个“常用单词”理解有误，因为某些单词看似很简单，但是在文献中有特殊的意思，这时就可以通过CNKI的“翻译助手”来查询相关单词的意思，由于CNKI的单词意思都是来源与大量的文献，所以它的吻合率很高。

另外，在翻译过程中最好以“段落”或者“长句”作为翻译的基本单位，这样才不会造成“只见树木，不见森林”的误导。

注：1、Google翻译：/language_toolsgoogle，众所周知，谷歌里面的英文文献和资料还算是比较详实的。

我利用它是这样的。

一方面可以用它查询英文论文，当然这方面的帖子很多，大家可以搜索，在此不赘述。

回到我自己说的翻译上来。

下面给大家举个例子来说明如何用吧比如说“电磁感应透明效应”这个词汇你不知道他怎么翻译，首先你可以在CNKI里查中文的，根据它们的关键词中英文对照来做，一般比较准确。

在此主要是说在google里怎么知道这个翻译意思。

大家应该都有词典吧，按中国人的办法，把一个一个词分着查出来，敲到google里，你的这种翻译一般不太准，当然你需要验证是否准确了，这下看着吧，把你的那支离破碎的翻译在google里搜索，你能看到许多相关的文献或资料，大家都不是笨蛋，看看，也就能找到最精确的翻译了，纯西式的！我就是这么用的。

noteexpress英文文献网页下载与录入

noteexpress英文文献网页下载与录入（一）英文专业文献通过网页进行检索录入1.通过PubMed进行英文专业文献网页检索，并进行题录和全文的录入打开浏览器登录“PubMed”网站，（提示:用IE浏览器登录，因为有些浏览器不支持目录保存操作）下图中红色横线处可以输入“关键词”，然后点击“search”进行检索。

当检索结果出现后，选择需要的文献，在题录前的方框内打勾选中，如下图所示：如图所示，选中题录后点击红色圈框内的菜单图标。

出现下图所示的对话框后点击“MEDLINE”，再点击“APPLY”。

然后会跳转到如下图所示的页面中，点击页面右上方的“设置”图标，然后在出现的菜单中点击“文件”，接下来点击“另存为”，当出现下图的提示框时选择“是”即可出现下列对话框时选择保存题录文件的位置，然后点击“保存类型”选择保存为“.TXT”格式的文件，点击保存。

然后打开“NoteExpress”软件，新建一个数据库（方法见“第一题”），选中新建的数据库，点击“导入题录”出现下图的对话框后选择所要导入的题录，点击后色圈框内的“…”进行选择，也就是刚才保存网页题录的位置，然后选择过滤器“PubMed”再点击“开始导入”如下图所示显示导入题录成功如下图所示，题录导入成功后点击“综述”可以查看文献的基本信息。

下面是全文的下载和导入，打开浏览器登录“PubMed”网站，（提示:用IE浏览器登录，因为有些浏览器不支持目录保存操作）下图中红色横线处可以输入“关键词”，然后点击“search”进行检索。

检索的结果出现后选择免费的文章进行下载，如下图所示显示“Free Article”和“Free PMC Article”的都是可以下载的免费文章。

点击“Free Article”或“Free PMC Article”然后会出现如下图所示的页面，然后点击图中红线圈出的部分进行下载出现下图的页面后选择“PDF”格式点击下载如下图所示出现保存的对话框后点击“另存为”即可（提示：为了下载文章后便于以后的查找，可以在保存之前先把文章题目复制下来，在之后的保存时便于命名下载的文件）。

获取英文文献全文的13种方法

注：由于大部分院校未能购卖国内外商业医学数据库，如PUBMED、ElseVier等，因而检索国外全文文献很复杂。

这往往成为少数学校的专利。

北大医学院网站上有大量文献题录，但仅供自已学生使用！这太可惜了，由于版权等多种原因，下面介绍一些可行的方法。

1、根据作者E-mail地址，向作者索要。

这是最有效的方法之一。

为了更方便大家向作者索取原文，但一定要简洁！一般都愿意向你提供。

下面是模板：Dear Dr. (author name)I would appreciate receiving a reprint of your article: ********(不必全写),杂志名. However, this Journal is not available in our library.Thank you very much for your consideration.Respectfully yours,Yourself name2、去/ 医学空间网，提供全文检索服务。

用户在使用该网站的索取原文服务时必须注意以下事项：(1)在提供原文复印本时是使用扫描的形式，因此原文每页大小为100k左右，所以索取原文时需考虑到索取量。

(2)考虑到索取文献的用户数量和原文复印本扫描件都非常大，因此每一位用户在索取原文复印本时，每日限制索取3篇文献。

(3)本服务针对会员进行免费服务，如果您还不是会员，请马上免费申请加入。

在索取原文之前请查询您所索取的文献是否在收藏范围之内。

请您[下载]目前馆藏目录，以便您随时查询。

另外健康大脑网也提供原文服务，给版主发e-mail。

3、按部就班，根据文章出处，去图书馆查找原文。

当然去一些较大图书馆。

4、去Science网上杂志找文章。

对中国人完全免费！5、High Wire Press 网站，斯坦复大学主办，文献量十分大，而且free!Free Medical Journals，我的主页的中外期刊中有极详细的说明！6 CNKI：中国期刊网提供三种类型的数据库，题录数据库、题录摘要数据库和全文数据库，其中前两者属参考数据库类型，只提供目次和摘要，可在网上免费检索，全文数据库需付费。

1专利文献翻译技巧之一专利申请文件的阅读及翻译

2012年第4期112中国发明与专利CHINA INVENTION & PATENT读懂并翻译专利文献是开展科研和专利工作的重要环节。

应广大读者的要求，我们从本期开始，特邀北京斐然翻译有限公司对专利申请文献的翻译中需要注意的问题和技巧陆续进行探讨。

每期稿件初步分为两部分：一部分介绍专利文献的阅读和翻译技巧，另一部为“日积月累”，用一些篇幅介绍专利翻译中遇到的一些常见套话和容易翻译出错的句型、短语。

——编者专利申请文件的阅读及翻译专利文献翻译技巧之一专利文献的阅读和翻译技巧在阅读专利申请文件时，首页会看到扉页，扉页含有很多有用的信息。

一篇PCT 专利申请文件的扉页一般包括以下主要信息：（10）International Publication Number 国际公布号，如WO2009/055364 A1（后面的A1表示该专利处于何种状态，例如：A1 Application published with search report 包含检索文件的申请；A2 Application published without search report 未包含检索文件的申请；A3 Search report 检索报告；A4 Supplementary search report 补充检索报告；A8 Modified first page 被修改的第一页；A9 Modified complete specification 被修改的完整文件；B1 Patent specification 授权专利文件；B2 New patent specification 新授权专利文件；B3 After limitation procedure 被限制后的(授权专利文件)；B8 Modified first page granted patent 修改后的授权专利首页；B9 Corrected complete granted patent 更正后的完整授权专利；其中最常见的就是A1和B1了）；（19）World International Property Organization, International Bureau 世界知识产权组织国际局（公布文献的局）；（21）International Application Number 国际申请号；（22）International Filling Date 国际申请日期；（25）Filling Language 申请语言（国际申请提交时的语种）；（26）Publication Language 公布语言（国际公布时的语种）；（30）Priority Data 优先权数据（巴黎公约优先权数据）；（43）International Publication Date 国际公开日（未经审查的专利文献，对于该专利申请在此日或日前尚未授权，通过印刷或类似方法使公众获悉的日期）；（51）International Patent Classification 国际专利分类；（54）Title 发明名称；（57）Abstract 摘要；（71）Applicant 申请人姓名；（72）Inventor 发明人姓名；（73）Assignee 权利人、持有者、受让人或权利所有人的名称或姓名；（74）Agent 律师或代理人姓名；（75）Inventor/Applicant 发明人兼申请人的姓名；（81）Designated States (unless otherwise indicated, for every kind of national protection available) 依据专利合作条约的指定国；（84）Designated States (unless otherwise indicated, for every kind of regional protection available)依据地区专利条约的缔约国；一篇专利申请文件主要包括以下几部分：2012年第4期113中国发明与专利CHINA INVENTION & PATENTField of The Invention (Technical Field) 技术领域（写明要求保护的技术方案所属的技术领域）；Background of The Invention (Background Art) 技术背景（写明对发明理解、检索、审查有用的背景技术；有可能的，并引证反映这些背景技术的文件）；Summary of The Invention 发明内容（写明发明或者实用新型所要解决的技术问题以及解决其技术问题采用的技术方案，并对照现有技术写明发明或者实用新型的有益效果）；Brief Description of The Drawings (Figures) 附图说明（对各附图作简略说明）；Detailed Description of The Invention (Embodiments) 具体实施方式（详细写明申请人认为实现发明或者实用新型的优选方式；必要时，举例说明；有附图的，对照附图）；Examples 实施例；What is claimed is (Claims) 权利要求书。

记录用webscraper爬取裁判文书网的文书列表信息以及批量下载word文书

记录⽤webscraper爬取裁判⽂书⽹的⽂书列表信息以及批量下载word⽂书这个是⼀位⽹友在B站交流的⼀个问题，这⾥记录⼀下。

需求1、爬取的⽹站地址：2、需要抓取的信息爬取⽂书列表内容，报告标题、⽂号、⽇期、摘要等等信息。

3、需要抓取多页，⽐如说前10页。

分析⽹站的情况1、抓取的页⾯翻页的时候，url是不会变的。

⽽在页⾯的源码当中⼜找不到内容，说明⽹页是通过异步加载的。

2、打开F12，就会弹出下⾯的暂停提⽰，阻⽌后续的查看。

没事，点击右下⾓的取消断点，再运⾏即可。

3、点击“network”，点击⽹页的第⼆页，查看请求的数据。

可以看到，是post请求，后⾯需要有⼀堆的参数⼀般⽽⾔，通过这样请求之后，可以获取到真实的json⽂件，⾥⾯就包含了⽹页中⽂书的列表当中，然⽽这次却是不⼀样，请求得到的居然是加密的信息，还需要通过⼀定⽅式来进⾏解密才⾏。

到这⾥，已经可以感受到⽹页开发⼈员的“苦⼼”，反爬的措施可谓是⾮常的多。

不过，我还是在⽹上找到了⼀篇⽹友关于⽤python解决这上⾯问题的办法和代码，有需要的时候可以参考⼀下。

这⾥有些内容在⾃⼰的能⼒圈之外，就暂时不考虑了。

web scraper爬取⽤python⽐较复杂的话，那么就考虑⽤web scraper来试试。

python爬取的效率当然⾼，但是反爬的太厉害了，⼤部分的⽹站都会对python进⾏⼀定的限制和爬取，这样写代码的成本就⽆形增加了很多。

web scraper则不⽤考虑这么多，只要浏览器⾥⾯能看到数据，就能够进⾏爬取。

回头看看⽹站的情况：⼀是url不变，⼆是数据不在⽹页源码当中。

那么就考虑“动态加载进⾏翻页”的这种情况（参考教程：主要配置如图：关键点就是"selector type"和"click selector"的配置"selector type"（⽤于选择⽹页中的⽂书列表）选择"Element click""click selector"（⽤于翻页）这⾥需要注意，⼀般如果是直接在⽹页点选的话，得到的css代码是这样的.left_7_3 a:nth-of-type(n+2)表⽰的意思就是从第⼆个翻页器（上⼀页为第⼀个，1为第⼆个）开始点击，⼀直到最后⼀个。

数据采集系统中英文对照外文翻译文献

中英文对照外文翻译(文档含英文原文和中文翻译)Data Acquisition SystemsData acquisition systems are used to acquire process operating data and store it on，secondary storage devices for later analysis. Many or the data acquisition systems acquire this data at very high speeds and very little computer time is left to carry out any necessary, or desirable, data manipulations or reduction. All the data are stored on secondary storage devices and manipulated subsequently to derive the variables ofin-terest. It is very often necessary to design special purpose data acquisition systems and interfaces to acquire the high speed process data. This special purpose design can be an expensive proposition.Powerful mini- and mainframe computers are used to combine the data acquisition with other functions such as comparisons between the actual output and the desirable output values, and to then decide on the control action which must be taken to ensure that the output variables lie within preset limits. The computing power required will depend upon the type of process control system implemented. Software requirements for carrying out proportional, ratio or three term control of process variables are relatively trivial, and microcomputers can be used to implement such process control systems. It would not be possible to use many of the currently available microcomputers for the implementation of high speed adaptive control systems which require the use of suitable process models and considerable online manipulation of data.Microcomputer based data loggers are used to carry out intermediate functions such as data acquisition at comparatively low speeds, simple mathematical manipulations of raw data and some forms of data reduction. The first generation of data loggers, without any programmable computing facilities, was used simply for slow speed data acquisition from up to one hundred channels. All the acquired data could be punched out on paper tape or printed for subsequent analysis. Such hardwired data loggers are being replaced by the new generation of data loggers which incorporate microcomputers and can be programmed by the user. They offer an extremely good method of collecting the process data, using standardized interfaces, and subsequently performing the necessary manipulations to provide the information of interest to the process operator. The data acquired can be analyzed to establish correlations, if any, between process variables and to develop mathematical models necessary for adaptive and optimal process control.The data acquisition function carried out by data loggers varies from one to 9 in system to another. Simple data logging systems acquire data from a few channels while complex systems can receive data from hundreds, or even thousands, of input channels distributed around one or more processes. The rudimentary data loggers scan the selected number of channels, connected to sensors or transducers, in a sequential manner and the data are recorded in a digital format. A data logger can be dedicated in the sense that it can only collect data from particular types of sensors and transducers. It is best to use a nondedicated data logger since any transducer or sensor can be connected to the channels via suitable interface circuitry. This facility requires the use of appropriate signal conditioning modules.Microcomputer controlled data acquisition facilitates the scanning of a large number of sensors. The scanning rate depends upon the signal dynamics which means that some channels must be scanned at very high speeds in order to avoid aliasing errors while there is very little loss of information by scanning other channels at slower speeds. In some data logging applications the faster channels require sampling at speeds of up to 100 times per second while slow channels can be sampled once every five minutes. The conventional hardwired, non-programmable data loggers sample all the channels in a sequential manner and the sampling frequency of all the channels must be the same. This procedure results in the accumulation of very large amounts of data, some of which is unnecessary, and also slows down the overall effective sampling frequency. Microcomputer based data loggers can be used to scan some fast channels at a higher frequency than other slow speed channels.The vast majority of the user programmable data loggers can be used to scan up to 1000 analog and 1000 digital input channels. A small number of data loggers, with a higher degree of sophistication, are suitable for acquiring data from up to 15, 000 analog and digital channels. The data from digital channels can be in the form of Transistor- Transistor Logic or contact closure signals. Analog data must be converted into digital format before it is recorded and requires the use of suitable analog to digital converters (ADC)．The characteristics of the ADC will define the resolution that can be achieved and the rate at which the various channels can be sampled. An in-crease in the number of bits used in the ADC improves the resolution capability. Successive approximation ADC's arefaster than integrating ADC's. Many microcomputer controlled data loggers include a facility to program the channel scanning rates. Typical scanning rates vary from 2 channels per second to 10, 000 channels per second.Most data loggers have a resolution capability of ±0.01% or better, It is also pos-sible to achieve a resolution of 1 micro-volt. The resolution capability, in absolute terms, also depends upon the range of input signals, Standard input signal ranges are 0-10 volt, 0-50 volt and 0-100 volt. The lowest measurable signal varies form 1 t, volt to 50, volt. A higher degree of recording accuracy can be achieved by using modules which accept data in small, selectable ranges. An alternative is the auto ranging facil-ity available on some data loggers.The accuracy with which the data are acquired and logged-on the appropriate storage device is extremely important. It is therefore necessary that the data acquisi-tion module should be able to reject common mode noise and common mode voltage. Typical common mode noise rejection capabilities lie in the range 110 dB to 150 dB. A decibel (dB) is a tern which defines the ratio of the power levels of two signals. Thus if the reference and actual signals have power levels of N, and Na respectively, they will have a ratio of n decibels, wheren＝10 Log10（Na /Nr）Protection against maximum common mode voltages of 200 to 500 volt is available on typical microcomputer based data loggers.The voltage input to an individual data logger channel is measured, scaled and linearised before any further data manipulations or comparisons are carried out.In many situations, it becomes necessary to alter the frequency at which particu-lar channels are sampled depending upon the values of data signals received from a particular input sensor. Thus a channel might normally be sampled once every 10 minutes. If, however, the sensor signals approach the alarm limit, then it is obviously desirable to sample that channel once every minute or even faster so that the operators can be informed, thereby avoiding any catastrophes. Microcomputer controlledintel-ligent data loggers may be programmed to alter the sampling frequencies depending upon the values of process signals. Other data loggers include self-scanning modules which can initiate sampling.The conventional hardwired data loggers, without any programming facilities, simply record the instantaneous values of transducer outputs at a regular samplingin-terval. This raw data often means very little to the typical user. To be meaningful, this data must be linearised and scaled, using a calibration curve, in order to determine the real value of the variable in appropriate engineering units. Prior to the availability of programmable data loggers, this function was usually carried out in the off-line mode on a mini- or mainframe computer. The raw data values had to be punched out on pa-per tape, in binary or octal code, to be input subsequently to the computer used for analysis purposes and converted to the engineering units. Paper tape punches are slow speed mechanical devices which reduce the speed at which channels can be scanned. An alternative was to print out the raw data values which further reduced the data scanning rate. It was not possible to carry out any limit comparisons or provide any alarm information. Every single value acquired by the data logger had to be recorded eventhough it might not serve any useful purpose during subsequent analysis; many data values only need recording when they lie outside the pre-set low and high limits.If the analog data must be transmitted over any distance, differences in ground potential between the signal source and final location can add noise in the interface design. In order to separate common-mode interference form the signal to be recorded or processed, devices designed for this purpose, such as instrumentation amplifiers, may be used. An instrumentation amplifier is characterized by good common-mode- rejection capability, a high input impedance, low drift, adjustable gain, and greater cost than operational amplifiers. They range from monolithic ICs to potted modules, and larger rack-mounted modules with manual scaling and null adjustments. When a very high common-mode voltage is present or the need for extremely-lowcom-mon-mode leakage current exists（as in many medical-electronics applications），an isolation amplifier is required. Isolation amplifiers may use optical or transformer isolation.Analog function circuits are special-purpose circuits that are used for a variety of signal conditioning operations on signals which are in analog form. When their accu-racy is adequate, they can relieve the microprocessor of time-consuming software and computations. Among the typical operations performed are multiplications, division, powers, roots, nonlinear functions such as for linearizing transducers, rimsmeasure-ments, computing vector sums, integration and differentiation, andcurrent-to-voltage or voltage- to-current conversion. Many of these operations can be purchased in available devices as multiplier/dividers, log/antilog amplifiers, and others.When data from a number of independent signal sources must be processed by the same microcomputer or communications channel, a multiplexer is used to channel the input signals into the A/D converter.Multiplexers are also used in reverse, as when a converter must distribute analog information to many different channels. The multiplexer is fed by a D／A converter which continually refreshes the output channels with new information.In many systems, the analog signal varies during the time that the converter takes to digitize an input signal. The changes in this signal level during the conversion process can result in errors since the conversion period can be completed some time after the conversion command. The final value never represents the data at the instant when the conversion command is transmitted. Sample-hold circuits are used to make an acquisition of the varying analog signal and to hold this signal for the duration of the conversion process. Sample-hold circuits are common in multichannel distribution systems where they allow each channel to receive and hold the signal level.In order to get the data in digital form as rapidly and as accurately as possible, we must use an analog/digital (A/D) converter, which might be a shaft encoder, a small module with digital outputs, or a high-resolution, high-speed panel instrument. These devices, which range form IC chips to rack-mounted instruments, convert ana-log input data, usually voltage, into an equivalent digital form. The characteristics of A/D converters include absolute and relative accuracy, linearity, monotonic, resolu-tion, conversion speed, and stability. A choice of input ranges, output codes, and other features are available. The successive-approximation technique is popular for a large number ofapplications, with the most popular alternatives being the counter-comparator types, and dual-ramp approaches. The dual-ramp has been widely-used in digital voltmeters.D/A converters convert a digital format into an equivalent analog representation. The basic converter consists of a circuit of weighted resistance values or ratios, each controlled by a particular level or weight of digital input data, which develops the output voltage or current in accordance with the digital input code. A special class of D/A converter exists which have the capability of handling variable reference sources. These devices are the multiplying DACs. Their output value is the product of the number represented by the digital input code and the analog reference voltage, which may vary form full scale to zero, and in some cases, to negative values.Component Selection CriteriaIn the past decade, data-acquisition hardware has changed radically due to ad-vances in semiconductors, and prices have come down too; what have not changed, however, are the fundamental system problems confronting the designer. Signals may be obscured by noise, rfi，ground loops, power-line pickup, and transients coupled into signal lines from machinery. Separating the signals from these effects becomes a matter for concern.Data-acquisition systems may be separated into two basic categories:（1）those suited to favorable environments like laboratories -and（2）those required for hostile environments such as factories, vehicles, and military installations. The latter group includes industrial process control systems where temperature information may be gathered by sensors on tanks, boilers, wats, or pipelines that may be spread over miles of facilities. That data may then be sent to a central processor to provide real-time process control. The digital control of steel mills, automated chemical production, and machine tools is carried out in this kind of hostile environment. The vulnerability of the data signals leads to the requirement for isolation and other techniques.At the other end of the spectrum-laboratory applications, such as test systems for gathering information on gas chromatographs, mass spectrometers, and other sophis-ticated instruments-the designer's problems are concerned with the performing of sen-sitive measurements under favorable conditions rather than with the problem ofpro-tecting the integrity of collected data under hostile conditions.Systems in hostile environments might require components for wide tempera-tures, shielding, common-mode noise reduction, conversion at an early stage, redun-dant circuits for critical measurements, and preprocessing of the digital data to test its reliability. Laboratory systems, on the other hand, will have narrower temperature ranges and less ambient noise. But the higher accuracies require sensitive devices, and a major effort may be necessary for the required signal /noise ratios.The choice of configuration and components in data-acquisition design depends on consideration of a number of factors:1. Resolution and accuracy required in final format.2. Number of analog sensors to be monitored.3. Sampling rate desired.4. Signal-conditioning requirement due to environment and accuracy.5. Cost trade-offs.Some of the choices for a basic data-acquisition configuration include：1 ．Single-channel techniques.A. Direct conversion.B. Preamplification and direct conversion.C. Sample-hold and conversion.D. Preamplification, sample-hold, and conversion.E. Preamplification, signal-conditioning, and direct conversion.F. Preamplification, signal-conditioning, sample-hold, and conversion.2. Multichannel techniques.A. Multiplexing the outputs of single-channel converters.B. Multiplexing the outputs of sample-holds.C. Multiplexing the inputs of sample-holds.D. Multiplexing low-level data.E. More than one tier of multiplexers.Signal-conditioning may include:1. Radiometric conversion techniques.B. Range biasing.D. Logarithmic compression.A. Analog filtering.B. Integrating converters.C. Digital data processing.We shall consider these techniques later, but first we will examine some of the components used in these data-acquisition system configurations.MultiplexersWhen more than one channel requires analog-to-digital conversion, it is neces-sary to use time-division multiplexing in order to connect the analog inputs to a single converter, or to provide a converter for each input and then combine the converter outputs by digital multiplexing.Analog MultiplexersAnalog multiplexer circuits allow the timesharing of analog-to-digital converters between a numbers of analog information channels. An analog multiplexer consists of a group of switches arranged with inputs connected to the individual analog channels and outputs connected in common（as shown in Fig. 1）．The switches may be ad-dressed by a digital input code.Many alternative analog switches are available in electromechanical and solid-state forms. Electromechanical switch types include relays, stepper switches,cross-bar switches, mercury-wetted switches, and dry-reed relay switches. The best switching speed is provided by reed relays（about 1 ms）．The mechanical switches provide high do isolation resistance, low contact resistance, and the capacity to handle voltages up to 1 KV, and they are usually inexpensive. Multiplexers using mechanical switches are suited to low-speed applications as well as those having high resolution requirements. They interface well with the slower A/D converters, like the integrating dual-slope types. Mechanical switches have a finite life, however, usually expressed innumber of operations. A reed relay might have a life of 109 operations, which wouldallow a 3-year life at 10 operations/second.Solid-state switch devices are capable of operation at 30 ns, and they have a life which exceeds most equipment requirements. Field-effect transistors（FETs）are used in most multiplexers. They have superseded bipolar transistors which can introduce large voltage offsets when used as switches.FET devices have a leakage from drain to source in the off state and a leakage from gate or substrate to drain and source in both the on and off states. Gate leakage in MOS devices is small compared to other sources of leakage. When the device has a Zener-diode-protected gate, an additional leakage path exists between the gate and source.Enhancement-mode MOS-FETs have the advantage that the switch turns off when power is removed from the MUX. Junction-FET multiplexers always turn on with the power off.A more recent development, the CMOS-complementary MOS-switch has the advantage of being able to multiplex voltages up to and including the supply voltages. A±10-V signal can be handled with a ±10-V supply.Trade-off Considerations for the DesignerAnalog multiplexing has been the favored technique for achieving lowest system cost. The decreasing cost of A/D converters and the availability of low-cost, digital integrated circuits specifically designed for multiplexing provide an alternative with advantages for some applications. A decision on the technique to use for a givensys-tem will hinge on trade-offs between the following factors:1. Resolution. The cost of A/D converters rises steeply as the resolution increases due to the cost of precision elements. At the 8-bit level, the per-channel cost of an analog multiplexer may be a considerable proportion of the cost of a converter. At resolutions above 12 bits, the reverse is true, and analog multiplexing tends to be more economical.2. Number of channels. This controls the size of the multiplexer required and the amount of wiring and interconnections. Digital multiplexing onto a common data bus reduces wiring to a minimum in many cases. Analog multiplexing is suited for 8 to 256 channels; beyond this number, the technique is unwieldy and analog errors be-come difficult to minimize. Analog and digital multiplexing is often combined in very large systems.3. Speed of measurement, or throughput. High-speed A/D converters can add a considerable cost to the system. If analog multiplexing demands a high-speedcon-verter to achieve the desired sample rate, a slower converter for each channel with digital multiplexing can be less costly．4. Signal level and conditioning. Wide dynamic ranges between channels can be difficult with analog multiplexing. Signals less than 1V generally require differential low-level analog multiplexing which is expensive, with programmable-gain amplifiers after the MUX operation. The alternative of fixed-gain converters on each channel, with signal-conditioning designed for the channel requirement, with digital multi-plexing may be more efficient.5. Physical location of measurement points. Analog multiplexing is suitedfor making measurements at distances up to a few hundred feet from the converter, since analog lines may suffer from losses, transmission-line reflections, and interference. Lines may range from twisted wire pairs to multiconductor shielded cable, depending on signal levels, distance, and noise environments. Digital multiplexing is operable to thousands of miles, with the proper transmission equipment, for digital transmission systems can offer the powerful noise-rejection characteristics that are required for29 Data Acquisition Systems long-distance transmission.Digital MultiplexingFor systems with small numbers of channels, medium-scale integrated digital multiplexers are available in TTL and MOS logic families. The 74151 is a typical example. Eight of these integrated circuits can be used to multiplex eight A/D con-verters of 8-bit resolution onto a common data bus.This digital multiplexing example offers little advantages in wiring economy, but it is lowest in cost, and the high switching speed allows operation at sampling rates much faster than analog multiplexers. The A/D converters are required only to keep up with the channel sample rate, and not with the commutating rate. When large numbers of A/D converters are multiplexed, the data-bus technique reduces system interconnections. This alone may in many cases justify multiple A/D converters. Data can be bussed onto the lines in bit-parallel or bit-serial format, as many converters have both serial and parallel outputs. A variety of devices can be used to drive the bus, from open collector and tristate TTL gates to line drivers and optoelectronic isolators. Channel-selection decoders can be built from 1-of-16 decoders to the required size. This technique also allows additional reliability in that a failure of one A/D does not affect the other channels. An important requirement is that the multiplexer operate without introducing unacceptable errors at the sample-rate speed. For a digital MUX system, one can determine the speed from propagation delays and the time required to charge the bus capacitance.Analog multiplexers can be more difficult to characterize. Their speed is a func-tion not only of internal parameters but also external parameters such as channel, source impedance, stray capacitance and the number of channels, and the circuit lay-out. The user must be aware of the limiting parameters in the system to judge their ef-fect on performance.The nonideal transmission and open-circuit characteristics of analog multiplexers can introduce static and dynamic errors into the signal path. These errors include leakage through switches, coupling of control signals into the analog path, and inter-actions with sources and following amplifiers. Moreover, the circuit layout can com-pound these effects.Since analog multiplexers may be connected directly to sources which may have little overload capacity or poor settling after overloads, the switches should have a break-before-make action to prevent the possibility of shorting channels together. It may be necessary to avoid shorted channels when power is removed and a chan-nels-off with power-down characteristic is desirable. In addition to the chan-nel-addressing lines, which are normally binary-coded, it is useful to have inhibited or enable lines to turn all switches off regardless of the channel being addressed. This simplifies the external logic necessary to cascade multiplexers and can also be useful in certain modes of channeladdressing. Another requirement for both analog and digital multiplexers is the tolerance of line transients and overload conditions, and the ability to absorb the transient energy and recover without damage.数据采集系统数据采集系统是用来获取数据处理和存储在二级存储设备，为后来的分析。

中英文参考文献转换

中英文参考文献转换
要将中文参考文献转换为英文参考文献，可以按照以下步骤进行：
1. 将中文作者的姓名转换为英文。

可以根据拼音或者姓名的汉语翻译来进行转换。

例如，将“李明”转换为“Li Ming”。

2. 将中文文章的标题转换为英文。

可以直接将中文标题翻译为英文，或者进行适当的修改和调整。

例如，将“中文参考文献
转换”翻译为“Translation of Chinese References”.
3. 转换中文期刊名称为英文。

可以参考已有的英文期刊名称进行翻译，或者根据该期刊的英文官方名称进行转换。

4. 转换中文出版地和出版社为英文。

可以直接将中文出版地和出版社名称翻译为英文，或者寻找对应的英文出版地和出版社名称。

5. 根据期刊的引用格式要求，转换中文文献的引用信息为英文。

这包括文章的页码、卷号、期号等信息。

对于已有的中文参考文献，转换为英文参考文献后应该包括以下信息：
- 作者的英文姓名
- 文章的英文标题
- 期刊的英文名称
- 文章的年份
- 文章的页码
- 期刊的卷号和期号
- 出版地和出版社的英文名称
最后，确保转换后的英文参考文献格式符合所使用的引用格式要求，如APA、MLA等。

文献英文转中文

文献英文转中文
在进行文献的英文转中文时，可以采取以下步骤：
1. 阅读文献的英文摘要和关键词，了解研究主题和内容。

2. 将英文摘要和关键词翻译成中文，可以使用在线翻译工具或字典进行翻译。

3. 仔细阅读文献的全文，理解研究方法、实验结果和讨论。

4. 理解文献的核心内容后，用自己的话将每个部分（引言、方法、结果、讨论等）逐段翻译成中文。

建议使用简洁明了的语言，并保持原作者的观点和观点一致。

5. 在翻译过程中，尽量忠实地传达原文的意思，但也要注意适当的表达方式和语法规范。

有时候，为了更好地表达意思，可以做适度的调整和重写。

6. 完成翻译后，再仔细检查翻译是否准确、流畅，并修正可能存在的错误和不通顺的地方。

7. 如果有困难或不确定的地方，可以咨询他人的意见或参考其他相似的研究文献，以确保最终的翻译质量和准确性。

请注意，翻译文献需要一定的语言能力和专业知识，准确传达文献内容的同时，也要尊重原著作者的权益。

如果对翻译有疑问，可以请专业的翻译人员或翻译机构提供帮助。

大数据外文翻译文献

大数据外文翻译文献(文档含中英文对照即英文原文和中文翻译)原文：What is Data Mining?Many people treat data mining as a synonym for another popularly used term, “Knowledge Discovery in Databases”, or KDD. Alternatively, others view data mining as simply an essential step in the process of knowledge discovery in databases. Knowledge discovery consists of an iterative sequence of the following steps:· data cleaning: to remove noise or irrelevant data,· data integration: where multiple data sources may be combined,·data selection : where data relevant to the analysis task are retrieved from the database,·data transformation : where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance,·data mining: an essential process where intelligent methods are applied in order to extract data patterns,·pattern evaluation: to identify the truly interesting patterns representing knowledge based on some interestingness measures, and ·knowledge presentation: where visualization and knowledge representation techniques are used to present the mined knowledge to the user .The data mining step may interact with the user or a knowledge base. The interesting patterns are presented to the user, and may be stored as new knowledge in the knowledge base. Note that according to this view, data mining is only one step in the entire process, albeit an essential one since it uncovers hidden patterns for evaluation.We agree that data mining is a knowledge discovery process. However, in industry, in media, and in the database research milieu, the term “data mining” is becoming more popular than the longer term of “knowledge discovery in databases”. Therefore, in this book, we choose to use the term “data mining”. We adop t a broad view of data mining functionality: data mining is the process of discovering interestingknowledge from large amounts of data stored either in databases, data warehouses, or other information repositories.Based on this view, the architecture of a typical data mining system may have the following major components:1. Database, data warehouse, or other information repository. This is one or a set of databases, data warehouses, spread sheets, or other kinds of information repositories. Data cleaning and data integration techniques may be performed on the data.2. Database or data warehouse server. The database or data warehouse server is responsible for fetching the relevant data, based on the user’s data mining request.3. Knowledge base. This is the domain knowledge that is used to guide the search, or evaluate the interestingness of resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes or attribute values into different levels of abstraction. Knowledge such as user beliefs, which can be used to assess a pattern’s interestingness based on its unexpectedness, may also be included. Other examples of domain knowledge are additional interestingness constraints or thresholds, and metadata (e.g., describing data from multiple heterogeneous sources).4. Data mining engine. This is essential to the data mining system and ideally consists of a set of functional modules for tasks such ascharacterization, association analysis, classification, evolution and deviation analysis.5. Pattern evaluation module. This component typically employs interestingness measures and interacts with the data mining modules so as to focus the search towards interesting patterns. It may access interestingness thresholds stored in the knowledge base. Alternatively, the pattern evaluation module may be integrated with the mining module, depending on the implementation of the data mining method used. For efficient data mining, it is highly recommended to push the evaluation of pattern interestingness as deep as possible into the mining process so as to confine the search to only the interesting patterns.6. Graphical user interface. This module communicates between users and the data mining system, allowing the user to interact with the system by specifying a data mining query or task, providing information to help focus the search, and performing exploratory data mining based on the intermediate data mining results. In addition, this component allows the user to browse database and data warehouse schemas or data structures, evaluate mined patterns, and visualize the patterns in different forms.From a data warehouse perspective, data mining can be viewed as an advanced stage of on-1ine analytical processing (OLAP). However, data mining goes far beyond the narrow scope of summarization-styleanalytical processing of data warehouse systems by incorporating more advanced techniques for data understanding.While there may be many “data mining systems” on the market, not all of them can perform true data mining. A data analysis system that does not handle large amounts of data can at most be categorized as a machine learning system, a statistical data analysis tool, or an experimental system prototype. A system that can only perform data or information retrieval, including finding aggregate values, or that performs deductive query answering in large databases should be more appropriately categorized as either a database system, an information retrieval system, or a deductive database system.Data mining involves an integration of techniques from mult1ple disciplines such as database technology, statistics, machine learning, high performance computing, pattern recognition, neural networks, data visualization, information retrieval, image and signal processing, and spatial data analysis. We adopt a database perspective in our presentation of data mining in this book. That is, emphasis is placed on efficient and scalable data mining techniques for large databases. By performing data mining, interesting knowledge, regularities, or high-level information can be extracted from databases and viewed or browsed from different angles. The discovered knowledge can be applied to decision making, process control, information management, query processing, and so on. Therefore,data mining is considered as one of the most important frontiers in database systems and one of the most promising, new database applications in the information industry.A classification of data mining systemsData mining is an interdisciplinary field, the confluence of a set of disciplines, including database systems, statistics, machine learning, visualization, and information science. Moreover, depending on the data mining approach used, techniques from other disciplines may be applied, such as neural networks, fuzzy and or rough set theory, knowledge representation, inductive logic programming, or high performance computing. Depending on the kinds of data to be mined or on the given data mining application, the data mining system may also integrate techniques from spatial data analysis, Information retrieval, pattern recognition, image analysis, signal processing, computer graphics, Web technology, economics, or psychology.Because of the diversity of disciplines contributing to data mining, data mining research is expected to generate a large variety of data mining systems. Therefore, it is necessary to provide a clear classification of data mining systems. Such a classification may help potential users distinguish data mining systems and identify those that best match their needs. Data mining systems can be categorized according to various criteria, as follows.1) Classification according to the kinds of databases mined.A data mining system can be classified according to the kinds of databases mined. Database systems themselves can be classified according to different criteria (such as data models, or the types of data or applications involved), each of which may require its own data mining technique. Data mining systems can therefore be classified accordingly.For instance, if classifying according to data models, we may have a relational, transactional, object-oriented, object-relational, or data warehouse mining system. If classifying according to the special types of data handled, we may have a spatial, time -series, text, or multimedia data mining system , or a World-Wide Web mining system . Other system types include heterogeneous data mining systems, and legacy data mining systems.2) Classification according to the kinds of knowledge mined.Data mining systems can be categorized according to the kinds of knowledge they mine, i.e., based on data mining functionalities, such as characterization, discrimination, association, classification, clustering, trend and evolution analysis, deviation analysis , similarity analysis, etc.A comprehensive data mining system usually provides multiple and/or integrated data mining functionalities.Moreover, data mining systems can also be distinguished based on the granularity or levels of abstraction of the knowledge mined, includinggeneralized knowledge(at a high level of abstraction), primitive-level knowledge(at a raw data level), or knowledge at multiple levels (considering several levels of abstraction). An advanced data mining system should facilitate the discovery of knowledge at multiple levels of abstraction.3) Classification according to the kinds of techniques utilized.Data mining systems can also be categorized according to the underlying data mining techniques employed. These techniques can be described according to the degree of user interaction involved (e.g., autonomous systems, interactive exploratory systems, query-driven systems), or the methods of data analysis employed(e.g., database-oriented or data warehouse-oriented techniques, machine learning, statistics, visualization, pattern recognition, neural networks, and so on ) .A sophisticated data mining system will often adopt multiple data mining techniques or work out an effective, integrated technique which combines the merits of a few individual approaches.什么是数据挖掘？许多人把数据挖掘视为另一个常用的术语—数据库中的知识发现或KDD的同义词。

教你3天搞定毕设外文翻译

一、外文翻译的方法及相关查找方向一、关于材料范围：毕业论文中的外文翻译材料，能与论文主题一致最好，若实在找不到切题的原材料，找与论文主题相关的材料亦可。

如毕业论文题目是《我国工伤事故认定的法律实证分析》，外文材料可以是关于工伤、社会保险、劳工法（劳动法）方面的；毕业论文题目是《从“有偿新闻”看我国的新闻立法》，外文材料可以是新闻法、民法方面的；毕业论文题目是《沈家本的“参考古今、博稽中外”的修律思想》，外文材料可以是有关中国法制史、法律思想史的方面的。

特别注意：1、翻译材料首先必须是法律、法学方面的。

2、中文译文已出版发表过的材料不要选。

诸如梅因《古代法》、贝卡利亚《论犯罪与刑罚》等，此外，网上有中外文对照的不要选。

因为这时的翻译要么是抄袭，要么是吃力不讨好。

3、篇幅在3000单词左右。

二、到哪里去找所要的外文原文？可尝试以下途径：1、直接找外文报刊和教材。

如China Daily,21st Century,Times等报纸（有法律法制专栏或法律法学文章）；“美国法精要影印本”（多卷本的系列版物，中国法律出版社出版）；各类外文版专业教材；各类“法律英语”教材（教材本身应没有中文译文）等等。

上述文献并不难获得。

2、到图书馆获取。

到图书馆外文图书阅览室或外文报刊阅览室查找相关图书或报刊，再确定所要翻译的部分，最后复印下来。

可以先在电脑中输入关键词（最好用law,act,jurisprudence,proceeding等外延较广的词）进行检索。

最好是多请教在阅览室值班的老师，对于学生，他们是文献专家。

3、到网上获取。

方式特多：可在各大搜索引擎（baidu,google等）中用关键词检索；可在相关网站中获得：如各法律法学专业网站，国外各大学法学院网站（如），等等。

可在著名外文文献数据中获得。

这些数据库有综合性外文文献数据库，如“Springerlink外文期刊全文”、“Kluwer Online 全文数据库”、“Wiley外文电子期刊”、“EBSCOhost期刊全文数据库”；有外文法律专业文献数据库，如“westlaw数据库”。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

CENTRAL SOUTH UNIVERSITY本科毕业英文文献翻译题目：基于模板页面的页面级网络数据提取学生姓名：颜希萍指导教师：刘献如张祖平学院：信息科学与工程学院专业班级：电子信息工程1002班完成时间： 2014.3.5基于模板页面的页面级网络数据提取摘要：Web数据提取一直是许多Web数据分析应用程序的一个重要组成部分。

在本文中，基于结构化数据和树，我们为页面生成的解码过程制定了数据提取问题的模板。

提出了一个非监督的、页面级数据提取的方法来为包含单或多个数据记录的每个单独网站推导出模式和模板。

FiVaTech把树匹配、树对齐、挖掘技术用来实现具有挑战性的任务。

在实验中，比起EXALG ，FiVaTech具有更高的精度，并可以与其他记录级萃取系统相比较，例如ViPER和MSE。

实验中，用于许多先进的Web数据提取工作的测试页面显示了一个令人鼓舞的结果。

关键词——半结构化数据、网络数据提取、多树合并，包装器归纳。

一、引言众所周知，深层网络包含比表面网站更大的数量级和更多有价值的信息。

然而，由于这些页面的生成是为了可视化而不是数据交换，使得利用这些综合信息需要大量的努力。

因此，从用于网页搜索的网站提取信息成为网络信息集成的一个关键步骤。

为一个给定的搜索形式生成一个提取程序相当于包装一个数据源，这样提取器或包装器程序就能返回数据信息集成的相同的格式。

页面属于同一网站的一个重要特点是，此类页面共享相同的模板，因为他们所有页面都是以一致的方式进行编码的。

换句话说，这些页面是用的同一个预定义的模板插入数据值生成的。

在实践中，模板页面也可以（以静态超链接的形式）存在于表层网站。

此外，还可以使用模板来呈现一个用来显示相同类型的对象的记录列表。

因此，从模板页面的信息提取可以应用在许多情况下。

模板页面的特别之处在于，对模板网页提取目标几乎是等于在页面生成时的数据值嵌入。

因此，不需要标注非模板页面信息提取的提取目标（如Softmealy [5]，Stalker [9]，WIEN [6]等)，自动提取的关键取决于我们是否可以自动推断模板。

一般来说，模板作为所有页面的公共模型，存在相当固定，这与页面数据值的多样化截然相反。

找到这样的一个共同的模板需要多个页面或一个页面包含多个记录作为输入。

当多个页面被给定了，提取目标就针对的是页面范围信息(例如RoadRunner[4]，EXALG[1])。

给定单个页面时，提取目标通常是限制在记录信息范围内(如IEPAD[2]，DeLa[11]，和DEPTA[14])，其中包括添加记录边界的检测问题。

虽然不涉及添加边界检测问题，但由于页面级提取任务有更多的数据需要关心，使得它成为比记录级萃取更复杂的任务。

一种常见的技术，用于发现模板对齐：字符串排列(如。

、IEPAD RoadRunner)或树对齐(例如DEPTA)。

至于区分模板和数据的问题，大多数方法是假设HTML 标记是模板的一部分，而EXALG认为一个一般模型，单词标记也可以是模板的一部分，标签标记也可以数据。

然而，EXALG的方法，没有明确使用对齐，产生许多偶然的等价类，使重建的模式不完整。

在本文中，我们关注于页面级提取任务和提出一种新方法，称为FiVaTech，用它来自动检测一个网站的模式。

拟议的技术呈现了一种新的结构，称为固定/变异模式树，它是一棵携带所需的所有必要的信息来识别模板和检测数据模式的树。

我们结合几个技术：对齐、模式挖掘，以及树模板的想法来解决页面级模板建设中更困难的问题。

在实验中，FiVaTech比EXALG具有更高的精度，EXALG 是为数不多的页面级萃取系统，可以与其他记录级萃取系统如ViPER和MSE相比。

本文接下来的内容组织如下：第二节定义了数据提取问题。

第三节提供了系统框架以及FiVaTech的详细算法，构建固定/变异模式树的一个例子。

第四节描述模板和网站模式推导的细节。

第五节描述了我们的实验。

第六节比较FiVaTech 与相关网络数据提取技术。

最后，第七节总结了本文。

二、问题公式化在本节中，我们制定页面创建的模型，它描述了怎样使用一个模板将数据嵌入。

正如我们所知，一个网页是由嵌入数据实例x(取自数据库)到一个预定义的模板生成的。

通常一个CGI程序执行编码函数，结合数据实例和模板来形成网页，在这个网页中所有数据库的实例数据符合共同的模式，可以定义如下(类似的定义也可以在EXALG[1]中发现)：定义2.1(结构化数据)一个数据模式可以是以下类型：1、一个基本类型β代表一个字符串的符号，符号是一些文本的基本单位。

2、如果τ1，τ2，…，τk是类型，那么他们的有序列表< τ1，τ2，…，τk >也形成了一个类型τ。

我们说类型τ是由类型τ1，τ2，…，τk使用k阶类型构造函数构造成的。

一个k序列实例τ是< x 1，x 2，…，x k >的形式，其中x 1，x 2，…，x k分别是类型τ1，τ2，…，τk的实例。

这就是类型τ。

a、一个元组，记作< k > τ，如果每一个实例体基数(实体中实例的个数)均为1。

b、一个可选，记作<k>? τ，如果每一个实例体的基数为0或1。

c、一个集合，记作{k} τ，如果某一些实例体基数大于1。

d、一个析取，记作（τ1| τ2| 、、、| τk），如果所有的τi（i=1，2...k）都是可选并且基数其k个可选（τ1、τ2... τk）之和相当于实例体τ的基数1。

例2.1图1显示了一个虚构的网页来显示一个列表的产品。

对于每个产品给出了产品名称，价格，折扣百分比(可选)，和一个特性列表(阴影图中的节点)。

这里的数据实例是{ <”产品1，”“现在3.79美元，”“节省5%，””特性1 1”>，<“产品2、”“现在7.88美元，”ε，{”特性2 1”、“特性2 2”} > }，其中ε表示空字符串，在第二个产品中他是空的。

这个页面中嵌入的数据实例图1可以通过两种不同的模式S 和S’表达，分别如图1 b和图1 c。

图1 b显示了一个集合w 1的四个顺序(表示图1 a中的产品列表)：前两个属性是基本类型(产品的名称和价格)，第三个属性是一个可选w2(折扣百分比)，最后一个属性是一个集合w 3(产品)的特性列表。

除了这种简洁的表示，相同的数据也可以由他们的父节点的DOM(文档对象类型)树来表示。

也就是说，我们可以重新组织上面的数据实例为{ < <产品1，“<”现在3.79美元，”“折扣5%”> >，{”特性1 1”} >，< <“产品2，”<“现在7.88美元，”ε> >，{”特性2 _1”，“特性2 _2”} > }，可以表达成模式S’。

第二种基本数据类型和可选数据（τ4）构成一个二元组τ3（由于每种产品的价格和折扣的可选被嵌入到了网页中同一个父节点下），进一步与第一种基本数据（产品名称）构成另一个二元组（τ2）。

因此，这种新模板S’的根源是一个二元集合（τ1），τ1由τ2和τ5（一元集合）两个部分构成，如图1c所示。

如前所述，模板页面是由在通过CGI程序在一个预定义的模板中嵌入数据实例来生成的。

因此，找寻给定输入网页的模板和数据模式的逆向工程应该建立在一些页面生成模型中，这个我们接下来将会进行描述。

在本文中，我们提出一种基于树的页面生成模型，它由子树连接来编码数据，而不是字符串连接。

这是因为数据模式和网页都是树状结构。

也因此，我们考虑模板树结构。

基于树的页面生成模型的优点是在模板中它不会涉及结束标签(如< / html >，< /body>等)，如同在EXALG应用基于字符串的页面生成模型。

图. 1.(a) 一个网页和它的两个不同模板(b) S ，(c) S’由于子项的数据必须与模板编码形成结果，所以在页面生成模型中连接是一个必需的操作。

例如，用实例x编码一个k阶类型构造函数τ，应该涉及模板树T与x的所有编码树的子项的连接。

然而，树连接更为复杂，因为有多个点附加一个现有树的子树最右边的路径。

因此，我们需要考虑连接树的插入位置。

定义2.2假设T 1和T2是两棵树，我们定义操作，来添加T2到树T1的从叶子节点到第i个节点（位置）的最右路径的一棵新的树。

例如，在图2的上半部分给定树的模板C，E和数据内容P，S（分别为“产品1”和“折扣5%”的内容），---------------图的下半部分我们展示了树连接和。

这些树的虚线圈是虚拟节点，帮助树的表示(比如多个路径连接到树)，可以被忽视。

插入点标记为蓝色实线圈。

对于子树C，插入点是节点<a>，子树P（单节点）被插入在了这里。

对于子树E，插入点是节点<br>的上一个虚拟节点，子树S（也是单节点）也被插入到了这里。

图中也显示两个子树N(内容数据“现在3：79美元”)和作为兄弟节点插入在插入点0的模板D下，我们通过表示该操作。

图2.树连接的例子树连接操作后，我们现在可以定义k序列类型构造函数的编码类似于EXALG。

基本上，这个想法是为了允许k + 1模板放置在k条目前面，中间，或者最后，如下：定义2.3(Level-aware编码)我们为类型构造函数τ定义的模板以及它的编码实例x(在x的子值编码方面)如下所述。

1、如果τ是一种基本类型，β，那么编码λ（T，x）被定义为包含字符串x本身的一个节点。

2、如果τ是一个K序列类型构造函数，那么这个模板定义如：板树。

a、对于单一实例x的形式（x 1，....，x k），λ（Tτ，x）是一棵由连接k+1序列子生成的树，，和b、对于多个实例，它的每一个都是类型τ的一个实例，编码是作为兄弟节点插入子树m到父模板P的最右路径上。

每一棵子树是通过[， ()， ()] 模板使用单一实例的程序编码，如上所述；φ是空模板(或一个虚拟节点)。

参见图3 b编码多个实例说明。

c、析取不需要模板，因为一个实例x的编码将使用一些()模板，其中x是的一个实例。

例2.2我们现在为输入DOM树考虑模式。

我们限制相邻的HTML标记进入到模板树A到G中（为矩形框）。

大部分的模板是单一路径，而模板C和D包含两条路径。

因此，我们添加一个虚拟节点作为根组成一个树结构，如图2所示。

用深度优先的顺序遍历这棵树来给这些模板和数据一个顺序，我们可以看到，大多数数据项插入了之前模板的叶节点。

例如，第1项“产品”是附加到模板树C插入位置为0的叶节点上，等同于在图二中进行操作；同样，“现在3.79美元；”“特性1 1”附加到模板树D和F在叶节点分别插入位置0。