分布式系统领域经典论文翻译集

合集下载

分布式系统概念--第一篇一致性协议、一致性模型、拜占庭问题、租约、副本协议

分布式系统概念--第⼀篇⼀致性协议、⼀致性模型、拜占庭问题、租约、副本协议1，⼀致性协议两阶段提交协议与Raft协议、Paxos协议①两阶段提交协议在分布式系统中，每个节点虽然可以知晓⾃⼰的操作时成功或者失败，却⽆法知道其他节点的操作的成功或失败。

当⼀个事务跨越多个节点时，为了保持事务的特性，需要引⼊⼀个作为协调者的组件来统⼀掌控所有节点(称作参与者)的操作结果并最终指⽰这些节点是否要把操作结果进⾏真正的提交(⽐如将更新后的数据写⼊磁盘等等)。

因此，⼆阶段提交的算法思路可以概括为：参与者将操作成败通知协调者，再由协调者根据所有参与者的反馈情报决定各参与者是否要提交操作还是中⽌操作。

因此，系统包含两类节点，⼀类是协调者，⼀类是参与者，协议的执⾏由两个阶段组成：具体参考：两阶段协议是阻塞的，节点在等待对⽅的应答消息时，它不能做其他事情且持有的资源也不释放。

它主要是⽤来保证跨多个节点的操作的原⼦性--要么都操作，要么都不操作，⽽像Raft协议则诸如⽤来保证操作的⼀致性，即各个节点都执⾏相同的操作。

两阶段协议的举例参考：②Raft协议和Paxos协议Raft与Paxos 在分布式应⽤中的基本功能相似，但是Paxos难于理解，相对⽽⾔Raft算法要简单⼀些。

关于Raft协议有⼀篇经典的论⽂：其中⽂翻译地址参考：还有⼀篇⽂章详细解释了Raft算法的相关实现：Raft论⽂的第 31 号参考⽂献。

下⾯仅记录⼀下看论⽂过程中出现的⼀个问题：为什么 “⼤多数规则” 能够保证对于⼀个给定的任期，只会有⼀个候选⼈最终赢得选举成为Leader？在Raft中，对于⼀个给定的任期号，每⼀台Server按照先来先服务原则对该任期号最多只投⼀张票，若某Candidate发送的请求投票RPC带有的任期号获得超过半数的Server的同意，则该Candidate成为Leader。

正是由于每个Server对某个任期只最多投⼀次票，且获得的投票要超过半数才能成为Leader，故在⼀个给定的任期投票中，最终只会有⼀个Candidate成为Leader。

Google 三大论文(中文)

键字，使用网页的某些属性作为列名，网页的内容存在“contents:” 列中，并用获取该网页的时间戳作为
标识(alex注：即按照获取时间不同，存储了多个版本的网页数据)，如图一所示。
图一：一个存储Web 网页的例子的表的片断。行名是一个反向URL 。contents 列族存放的是网页的内容，anchor 列族存放引用该网页的锚链接文本（alex注：如果不知道HTML的
制数据的位置相关性。最后，可以通过BigTable 的模式参数来控制数据是存放在内存中、还是硬盘上。
第二节描述关于数据模型更多细节方面的东西；第三节概要介绍了客户端API ；第四节简要介绍了
BigTable 底层使用的Google 的基础框架；第五节描述了BigTable 实现的关键部分；第6节描述了我们为了
是anchor ；这个列族的每一个列关键字代表一个锚链接，如图一所示。Anchor 列族的限定词是引用该网
页的站点名；Anchor 列族每列的数据项存放的是链接文本。
访问控制、磁盘和内存的使用统计都是在列族层面进行的。在我们的Webtable 的例子中，上述的控制权
限能帮助我们管理不同类型的应用：我们允许一些应用可以添加新的基本数据、一些应用可以读取基本数
动态控制数据的分布和格式（alex 注：也就是对BigTable 而言，数据是没有格式的，用数据库领域的术语
说，就是数据没有Schema ，用户自己去定义Schema），用户也可以自己推测(alex 注：reason about)
底层存储数据的位置相关性(alex 注：位置相关性可以这样理解，比如树状结构，具有相同前缀的数据的存
行
表中的行关键字可以是任意的字符串（目前支持最大64KB 的字符串，但是对大多数用户，10-100 个字节

科技文献中英文对照翻译

Sensing Human Activity:GPS Tracking感应人类活动:GPS跟踪Stefan van der Spek1,*,Jeroen van Schaick1,Peter de Bois1,2and Remco de Haan1Abstract:The enhancement of GPS technology enables the use of GPS devices not only as navigation and orientation tools,but also as instruments used to capture travelled routes:assensors that measure activity on a city scale or the regional scale.TU Delft developed aprocess and database architecture for collecting data on pedestrian movement in threeEuropean city centres,Norwich,Rouen and Koblenz,and in another experiment forcollecting activity data of13families in Almere(The Netherlands)for one week.Thequestion posed in this paper is:what is the value of GPS as‘sensor technology’measuringactivities of people?The conclusion is that GPS offers a widely useable instrument tocollect invaluable spatial-temporal data on different scales and in different settings addingnew layers of knowledge to urban studies,but the use of GPS-technology and deploymentof GPS-devices still offers significant challenges for future research.摘要：增强GPS技术支持使用GPS设备不仅作为导航和定位工具,但也为仪器用来捕捉旅行路线:作为传感器,测量活动在一个城市或区域范围内规模。

自动化专业可参考的外文文献

1外文原文A: Fundamentals of Single-chip MicrocomputerTh e si ng le-ch i p mi cr oc om pu ter is t he c ul mi nat i on o f bo th t h e d ev el op me nt o f th e d ig it al com p ut er an d t he int e gr at ed ci rc ui ta r gu ab ly th e t ow m os t s i gn if ic ant i nv en ti on s o f t h e 20t h c en tu ry[1].Th es e to w typ e s of a rc hi te ctu r e ar e fo un d i n s in gl e-ch ip m i cr oc om pu te r. So m e em pl oy t he sp l it p ro gr am/d ata me mo ry o f th e H a rv ar d ar ch it ect u re, sh ow n in Fi g.3-5A-1, o th ers fo ll ow t hep h il os op hy, wi del y a da pt ed f or ge n er al-p ur po se co m pu te rs a ndm i cr op ro ce ss o r s, of ma ki ng no lo gi c al di st in ct io n be tw ee n p ro gr am a n d da ta m em or y a s i n th e Pr in cet o n ar ch it ec tu re,sh ow n inF i g.3-5A-2.In g en er al te r ms a s in gl e-chi p m ic ro co mp ut er i sc h ar ac te ri zed b y the i nc or po ra tio n of al l t he uni t s o f a co mp ut er i n to a s in gl e d ev i ce, as s ho wn in Fi g3-5A-3.Fig.3-5A-1 A Harvard typeFig.3-5A-2. A conventional Princeton computerFig3-5A-3. Principal features of a microcomputerRead only memory (ROM).R OM i s u su al ly f or th e p er ma ne nt,n o n-vo la ti le s tor a ge o f an a pp lic a ti on s pr og ra m .M an ym i cr oc om pu te rs an d mi cr oc on tr ol le r s a re in t en de d fo r h ig h-v ol ume a p pl ic at i o ns a nd h en ce t he e co nom i ca l ma nu fa ct ure of t he d ev ic es r e qu ir es t ha t the co nt en ts o f the pr og ra m me mo ry b e co mm it te dp e rm an en tl y d ur in g th e m an uf ac tu re o f c hi ps . Cl ear l y, th is im pl ie sa ri g or ou s a pp roa c h t o R OM co de d e ve lo pm en t s in ce c ha ng es ca nn otb e m a d e af te r man u fa ct ur e .T hi s d e ve lo pm en t pr oce s s ma y in vo lv e e m ul at io n us in g a s op hi st ic at ed deve lo pm en t sy st em w i th a ha rd wa re e m ul at io n ca pa bil i ty a s we ll a s th e u se of po we rf ul so ft wa re t oo ls.So me m an uf act u re rs p ro vi de ad d it io na l RO M opt i on s byi n cl ud in g i n th ei r ra ng e de vi ce s wi th (or i nt en de d fo r us e wi th) u s er pr og ra mm ab le m em or y. Th e s im p le st of th es e i s us ua ll y d ev ice w h ic h ca n op er ate in a m ic ro pr oce s so r mo de b y usi n g so me o f th e i n pu t/ou tp ut li ne s as a n ad dr es s an d da ta b us f or acc e ss in g e xt er na l m e mo ry. T hi s t ype o f d ev ic e c an b e ha ve fu nc ti on al l y a s t he si ng le c h ip mi cr oc om pu te r fr om wh ic h i t i s de ri ve d a lb eit w it h r es tr ic ted I/O an d a mo di fie d e xt er na l ci rcu i t. T he u se o f t h es e RO Ml es sd e vi ce s is c om mo n e ve n in p ro du ct io n c ir cu it s wh er e t he v ol um e do es n o t ju st if y th e d e ve lo pm en t co sts of c us to m on-ch i p RO M[2];t he rec a n st il l b e a si g ni fi ca nt s a vi ng in I/O a nd ot he r c hi ps co mp ar ed t o a c on ve nt io nal mi cr op ro ce ss or b as ed c ir cu it. M o re e xa ctr e pl ac em en t fo r RO M d ev ic es c an b e o bt ai ne d in t he f o rm o f va ri an ts w i th 'pi gg y-ba ck'EP RO M(Er as ab le p ro gr am ma bl e ROM)s oc ke ts o rd e vi ce s w it h EP ROM i ns te ad o f R OM 。

Google Spanner (中文版)

Google Spanner （中文版）翻译者：厦门大学计算机系教师林子雨翻译时间：2012年9月E-mail: ziyulin@ 个人主页：/linziyu【摘要】：Spanner是谷歌公司研发的、可扩展的、多版本、全球分布式、同步复制数据库。

它是第一个把数据分布在全球范围内的系统，并且支持外部一致性的分布式事务。

本文描述了Spanner的架构、特性、不同设计决策的背后机理和一个新的时间API，这个API可以暴露时钟的不确定性。

这个API及其实现，对于支持外部一致性和许多强大特性而言，是非常重要的，这些强大特性包括：非阻塞的读、不采用锁机制的只读事务、原子模式变更。

【关键词】Google Spanner, Bigtable, distributed database【全文目录结构】1. 介绍2. 实现2.1 Spanserver软件栈2.2 目录和放置2.3 数据模型3. TrueTime4. 并发控制4.1 时间戳管理4.2 细节5. 实验分析5.1 微测试基准5.2 可用性5.3 TrueTime5.4 F16. 相关工作7. 未来的工作8. 总结致谢参考文献1 介绍Spanner是一个可扩展的、全球分布式的数据库，是在谷歌公司设计、开发和部署的。

在最高抽象层面，Spanner就是一个数据库，把数据分片存储在许多Paxos[21]状态机上，这些机器位于遍布全球的数据中心内。

复制技术可以用来服务于全球可用性和地理局部性。

客户端会自动在副本之间进行失败恢复。

随着数据的变化和服务器的变化，Spanner会自动把数据进行重新分片，从而有效应对负载变化和处理失败。

Spanner被设计成可以扩展到几百万个机器节点，跨越成百上千个数据中心，具备几万亿数据库行的规模。

应用可以借助于Spanner来实现高可用性，通过在一个洲的内部和跨越不同的洲之间复制数据，保证即使面对大范围的自然灾害时数据依然可用。

我们最初的客户是F1[35]，一个谷歌广告后台的重新编程实现。

毕业论文：基于hdfs的云灾备存储系统——可靠存储及负载均衡方法研究

毕业论文：基于HDFS的云灾备存储系统——可靠存储及负载均衡方法研究毕业论文：基于HDFS的云灾备存储系统——可靠存储及负载均衡方法研究毕业论文：基于HDFS的云灾备存储系统——可靠存储及负载均衡方法研究:2013-8-15 17:54:55毕业设计论文题目：基于HDFS的云灾备存储系统——可靠存储及负载均衡方法研究院（系）计算机科学与技术专业网络工程届别 2012 摘要随着计算机技术及因特网技术的发展，数据信息已成为现代企业以及每个人的重要资源，数据的丢失或被窃取将带来重大的损失，数据的安全存储及备份显得尤为重要。

本文设计一个基于Hadoop的云灾备存储系统来存储数据。

论文采用在Linux虚拟机上创建hadoop分布式文件系统，由分布式系统管理并备份用户的数据。

分布式系统由一个名字节点和多个数据节点构成，名字节点对数据的存储进行管理，而数据节点则负责数据的物理存储。

为了防止名字节点的故障导致系统的崩溃，必须配置一个第二名字节点来作为冗余并定时处理保存名字节点的系统日志。

为了数据的安全备份，必须把数据复制为多个副本存储在多个数据节点上。

系统不仅要实现海量数据的存储，同时也要实现海量用户的的管理。

为了防止某个数据节点的负载过重，导致用户的操作延迟太大，还必须处理好系统数据节点的负载均衡，使海量用户能够同时流畅的访问hdfs系统。

本文，通过配置多台数据节点，并在名字节点上设置一个文件要保存的副本数，来实现数据的安全备份，用户数据分为多份存储在不同的服务器上。

名字节点则通过一张排序表来控制用户访问数据时是由哪个数据节点负责响应，排序表实现了负载低的数据节点首先响应用户的访问，从而达到各数据节点的负载均衡。

关键词：Hadoop；云灾备；可靠存储；负载均衡ABSTRACTWith the development of computer technology and Internet technology, information has become a modern enterprise as well as important resources for everyone.So data’s lost or stolen will bring a significantloss. Secure storage and backup of data is particularly important. This paper designed a cloud disaster recovery storage system witch based on Hadoop to store data.Paper using the Linux virtual machine to create a hadoop distributed file system, distributed systems ma nagement and backup the user’s data. The distributed system consists of a namenode and multiple datanodes, the namenode manage the data’s storage.And the datanode is responsible for the physical storage of data. In order to prevent the namenode’failured le d to the collapse of the system, we should configure a secondary namenode as the namenode’s redundancy and regularly deal with save system log. For the security of data backup, data replication for the storage of multiple copies of multiple datanodes. The system must not only mass data storage, but also mass user’s management. In order to prevent the overloading of a datanode, which lead to the delay become too large for users operation, we must deal with the datanodes’ load balancing, so that the mass us ers’ access will be simultaneously smooth.The article, by configuring multiple datanodes and set the number of copiesf to save for each file on the namenode to achieve the security of data backup, user data is divided into pay would be stored on different servers. The namenode control witch datanode for user’s access through a sorting table .This sorting table is used to achieve the low-loaded datanode first to respond to user access, so as to achieve load balancing of all datanodes .Key words: hadoop clould disaster recoveryreliable storageload balancing 目录1 绪论11.1研究背景 11.2 研究现状21.3 论文主要工作 31.4 论文组织与结构 42 HADOOP的相关知识52.1 数据的存储和分析52.2 HADOOP的发展和现状52.3 HADOOP在数据容灾的优越性72.4 HADOOP分布式文件系统82.4.1 HDFS的设计82.4.2 数据块92.4.3 名称节点和数据节点 92.5命令行接口92.6 HADOOP文件系统103 构建HADOOP集群123.1 集群说明 123.2 LINUX上集群的建立和安装123.2.1 Linux系统的安装123.2.2 开启SSH服务并实现无密码登录133.2.3 Java环境的配置153.2.4 安装Hadoop 153.3 配置文件的设置153.3.1 配置管理163.3.2 Hadoop配置文件163.4 HADOOP集群的运行184 实现HDFS的可靠存储194.1 二级名字节点194.2 数据节点的冗余备份204.3网络割裂215 实现HDFS的负载均衡225.1 概述225.2 负载均衡的重要性225.3 实现HDFS的负载均衡226 HADOOP集群的测试256.1 HADOOP运行的测试256.2 本地文件的上传测试266.3文件的下载297 总结317.1 工作总结 317.2 心得体会 317.3 进一步的改进32参考文献33后记34附录1 外文翻译（译文）35附录2 外文翻译（英文原文）42 1 绪论1.1 研究背景互联网的高速发展，使计算机成为了个人或企业的必不可少的工具，在日常生活，工作，学习中等等计算机都给人们带来了方便和高效的应用，然而每每都离不开数据，人们不再是仅仅利用文本记录数据，那太缺乏效率，而需要用计算机来存储。

The_Part-Time_Parliament(Paxos算法中文翻译)

1.1. Paxos岛 ..................................................................................................................... 2
1.2. 要求(Requirements)................................................................................................... 3
现代的议会可以雇佣秘书来记录它的每一个活动，但是在Paxos没有一个人愿意始终呆在议
会大厅里作为一个秘书从头到尾参与每一个会议。取而代之的是每一个议员都会维护一个律
簿（ledger），用来记录一系列已通过的法令，每个法令会带有一个编号。例如议员Λ(译者
注：由于古希腊字母比较难输入，原文中的希腊文姓名统一用其中的一个字母代替)的律簿
2.1. 数学结论 .................................................................................................................... 5
2.2. 初级协议（The Preliminary Protocol） .................................................................... 9
Paxos议会的所知因此比较零散。虽然基本协议是知道的，但对许多细节我们却一无所知，
而这正是我们感兴趣的地方，因此我将忝为推测Paxos人在这些具体细节上可能的做法。

Computer SCience English 翻译6_中文

第6章数据库第一部分阅读和翻译A部分分布式数据库介绍分布式数据库是受一个中央数据库管理系统控制的数据库，其中的控制存贮设备不全部受控于共同的CPU。

(1)它可以是位于同一个实际位置的多台计算机，也可以是被存放或者被分散在互联的计算机网络上。

数据的收集(在数据库中)可以横跨多个实际位置进行分布。

一个分布式数据库被分布入分开的不同部分。

一个分布式数据库的每个分开的片段可以被复制(即重复故障转移，像独立冗余磁盘阵列一样)。

除了分布式数据库复制和分散，还有许多其他分配数据库的设计技术。

例如自主式的，同步和异步分布式数据库技术。

这些技术的实施取决于事务和敏感性的数据的需要或者机密性要求，花费则是在数据保密、一贯性和正常花费上。

[1]基本的框架数据库用户访问分布式数据库：●本地应用—不要求其他站点的数据的应用。

●全球性应用—要求从其他站点的数据的应用。

要点采取分布式数据库的要求如下：●发布是透明的——用户一定能与系统互动，就像它是一个逻辑系统。

这适用于其他事之中通入系统性能和方法。

●交易是透明的——每种交易必须维护横跨多个数据库的正确性。

每个交易也划分成不同部分，各个部分保证整个数据库系统的运行。

分布式数据库的优点●反射式组织结构——数据库片段位于与他们相关的部分。

●本地独立性——可能控制与之相关的数据(因为他们较熟悉它)。

●被改进的好处——在一个数据库系统的一个错误只影响一个片段，而不是整个数据库。

●被改进的表现——数据极大的要求其附近的站点，并且数据库系统被并行化，数据库的装载可以在服务器之中平衡。

(2) (数据库中的一个模块装载在一个分布式数据库中不会影响数据库的其他模块)。

●经济——花费较少，用一台大规模计算机的力量创建小型计算机网络。

●模块化——系统可以从分布式数据库修改，增加和删除，不影响其他模块(系统)。

分布式数据库的缺点●复杂性——必须由DBAs完成额外劳动来保证系统分布的本质透明。

必须也完成额外劳动维护多个不同的系统，而不是一个大的。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

18. PowerDrill：Processing a Trillion Cells per Mouse Click
19. Google-Wide Profiling:A Continuous Profiling Infrastructure for Data Centers
20. Spanner: Google’s Globally-Distributed Database
21. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
22. The Datacenter as a Computer
google系列论文翻译集(合集)
二．分布式理论系列
00. Appraising Two Decades of Distributed Computing Theory Research
0. 分布式理论系列译序
1. A brief history of Consensus_ 2PC and Transaction Commit (译)
2. 拜占庭将军问题(译) --Leslie Lamport
3. Impossibility of distributed consensus with one faulty process (译)
4. Leases：租约机制(译)
5. Time Clocks and the Ordering of Events in a Distributed System(译) --Leslie Lamport
6. The Part Time Parliament (译zz) --Leslie Lamport
7. How to Build a Highly Available System Using Consensus
8. Paxos Made Simple (译) --Leslie Lamport
9. Fast Paxos --Leslie Lamport
10. Paxos Made Live - An Engineering Perspective(译)
11. Paxos made code - Implementing a high throughput Atomic Broadcast
12. Distributed Snapshots: Determining Global States of a Distributed System
--Leslie Lamport
13. Virtual Time and Global States of Distributed Systems
14. Timestamps in Message-Passing Systems That Preserve the Partial Ordering
15. Fundamentals of Distributed Computing:A Practical Tour of Vector Clock Systems
16. Unreliable Failure Detectors for Reliable Distributed Systems
17. Wait-Free Synchronization
18. Knowledge and Common Knowledge in a Distributed Environment
19. Uniform consensus is harder than consensus
三．数据库理论系列
0. A Relational Model of Data for Large Shared Data Banks --E.F.Codd 1970
1. SEQUEL：A Structured English Query Language 1974
2. Implentation of a Structured English Query Language 1975
3. A System R: Relational Approach to Database Management 1976
4. Granularity of Locks and Degrees of Consistency in a Shared DataBase --Jim Gray 1976
5. Access Path Selection in a RDBMS 1979
6. The Transaction Concept:Virtues and Limitations --Jim Gray
7. 2pc-2阶段提交：Notes on Data Base Operating Systems --Jim Gray
8. 3pc-3阶段提交：NONBLOCKING COMMIT PROTOCOLS
9. Life beyond Distributed Transactions:an Apostate’s Opinion
10. A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem --Jim Gray
11. Consensus on Transaction Commit --Jim Gray & Leslie Lamport
12. What Goes Around Comes Around - Michael Stonebraker, Joseph M. Hellerstein
13. A Formal Model of Crash Recovery in a Distributed System - Skeen, D. Stonebraker
14. ARIES: A Transaction Recovery Method Supporting Fine-Granularit y Locking and Partial Rollbacks Using Write-Ahead Logging-1992
四．大规模存储与计算(NoSql理论系列)
0. Towards Robust Distributed Systems：Brewer's 2000 PODC key notes
1. CAP理论
2. Harvest, Yield, and Scalable Tolerant Systems
九．其他
On Computable Numbers with an Application to the
Entscheidungsproblem-1936.5.28-A.M.Turing
The First Draft Report on the EDVAC-1945.6.30-John von Neumann
Reflections on Trusting Trust --Ken Thompson
Who Needs an Architect?
Go To statements considered harmfull --Edsger W.Dijkstra
No Silver Bullet Essence and Accidents of Software Engineering --Frederick P. Brooks
转载请注明作者：phylips@bmy 2011-4-30
出处：/blog/static/709717672011330101333271/再推荐一个相关文章：/html/1647.html
列举的大部分论文都是相同的，不过也有一些是各自独有的。