Categories and Subject Descriptors B.7.2 [Design Aids] Algorithms General Terms

合集下载

IE论文——精选推荐

A virtual network laboratory for learning IP networkingLluís FàbregaJordi MassaguerTeodor JovéDavid MéridaInstitut d’Informàtica i Aplicacions (IIiA)Universitat de Girona (UdG)Lluís Santaló Av., 17071 Girona, SPAIN+34 972418475{ fabrega | jmassa | teo | dmerida }@eia.udg.esABSTRACTIn this paper, a network laboratory for distance learning of basic concepts in IP networking is presented. Through a web interface,students can choose various configurations of a real private network (number of routers and subnetworks, use of IPv4/IPv6,etc.), and learn about them using Linux network commands. The utilization of the virtual laboratory and how it is implemented are described.Categories and Subject DescriptorsK.3.1 [Computers and Education ]: Computer uses in education–distance learning.General TermsDesign, Experimentation.KeywordsRemote laboratory, networking.1. INTRODUCTIONThe possibilities offered by the use of Internet in teaching activities are increasingly important. However, the physical presence of students in laboratories is required when the subject has a practical component, and this makes distance learning more difficult. Virtual laboratories can be used to overcome this situation.We have built a virtual network laboratory for distance learning of IP networking concepts such as IP addressing, routing tables,address resolution between IP and Ethernet, and the combined use of IPv4 and IPv6. Students access the virtual laboratory through a web interface and can change the network configuration by choosing one of the available preset configurations. Then, using Linux network commands, they can learn about these configurations and how to test the network.The virtual network laboratory is a private IP over an Ethernet network. It consists of several PCs (with one or more Ethernet cards) connected through a configurable Ethernet switch. One ofthese PCs, which is connected to the Internet, runs the web server and performs the different network configurations upon receiving a student’s request.The paper is organized as follows. In section 2 we describe the user interface of the virtual laboratory and some examples of lab classes to show how it can be used. Section 3 deals with the implementation of the virtual laboratory, the composition of the physical network and the remote configuration method used for the switch and for IPv4 and IPv6 in the PCs. Conclusions and future work conclude the paper.2. LEARNING IN THE VIRTUAL LABThe virtual network uses IP over an Ethernet and consists of four nodes. These nodes can be grouped to build IP subnetworks in different topologies, so they act as a host or router depending on the topology. The student can choose between four available topologies (see Figure 1).Our objective was to build a tool whereby remote students could learn the basic concepts of IP networking and related Linux network commands. We want them to learn the following:-IP addresses within an IP subnetwork share a common prefix that defines the subnetwork address.-The composition of the routing tables in hosts and routers,which defines the next node to send a packet depending on the IP subnetwork where the destination host belongs.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.ITiCSE’02, June 24-26, 2002, Aarhus, DenmarkCopyright 2002 ACM 1-58113-499-1/02/0006…$5.00.Figure 1. The configured network topologies.host routerD subnet A BCDsubnet subnet C A B 12B subnet subnet subnetC A D34Dsubnet subnet subnet B C A-Address resolution between Ethernet and IP.-The main differences between the addressing scheme in IPv4 and IPv6.-The issues derived from the combined utilization of IPv6and IPv4, and the need for tunneling that allows hosts to send IPv6 packets through IPv4 networks.-The Linux commands for network administration such as ifconfig for interface addressing, route for the routing tables,traceroute for the routing path between two nodes, ping for checking connectivity, arp for the address resolution table,and others.In a lab class the student first chooses one of the available network configurations and then studies it by using Linux network commands. Section 2.1 describes the user interface and section 2.3 shows some examples of lab classes.2.1 The user interfaceStudents access the virtual laboratory through a web interface after an authentication phase. Then they choose one of the following options, “topologies”, “protocols”, “tunneling”,“commands” or “exit”:-The “topologies” option allows the student to choose one of several network topologies that differ in the number of IP subnetworks (1, 2 or 3) and therefore in the number of routers (see Figure 1).-Using “protocols”, the student can choose to configure each network node either with only IPv4 or with both IPv4 and IPv6. Predefined addresses are assigned to each node.-Using “tunneling”, the student can choose one of several network configurations where some nodes use IPv4 and others IPv6, which have configured the required tunnels to work properly (see Figure 2).-With “commands”, the student can choose any of the nodes of the network and then study how it is configured by using Linux network commands. The student is not allowed to change any configuration parameter, but can test it.2.2 Some examples of lab classesThe basic scheme of a lab class is first to choose one of the network configurations using the appropriate options (“topology”,“protocols”, and “tunneling”), and second, to study it using Linux network commands (“commands” option).A first example is the study of IP addressing and routing tables using the 4th topology in Figure 1 (three subnetworks and one router) with IPv4 in all the nodes. This results in the configuration of IP addresses and the routing table in each node. Then, using the ifconfig command, students can see the IP addressing used, by using route , the contents of the routing table, and by using traceroute , the routing path between two nodes.A second example is the study of the address resolution between IP and Ethernet using the third topology in Figure 1 (three subnetworks and two routers) with IPv4 in all nodes. The use of arp in each node allows the student to see the contents of the ARP (Address Resolution Protocol) table listing IP and Ethernet addresses. Other commands like ping , traceroute , or route , would be also interesting for studying the configuration.The third example is the study of the combined use of IPv4 and IPv6 nodes. Here the chosen topology is the third one in Figure 1,IPv4 in the B and C nodes and IPv4/IPv6 in the A and D nodes.First, no tunnel configuration is selected. Then, students can see that there is no virtual interface created in the A and D nodes by using the ifconfig command, that the routing table is wrong by using route , and that there is no connectivity between the A and D nodes by using ping . After that, the tunnel configuration number 2(see Figure 2) is chosen and the network configuration is tested again to check the right behavior.3. IMPLEMENTATION 3.1 EquipmentThe network of the virtual laboratory consists of four PCs running the Linux operating system (kernel 2.2.12-20, RedHat 6.1distribution) and an Ethernet switch (Cisco Catalyst 2920XL with 24 ports). Each PC has one or more Ethernet cards connected to the switch as shown in Figure 3. It is a private network that is accessed from the Internet through the node A, which has two Ethernet cards, one connected to the private network and the other to the Internet. Node A does not forward packets from or to the private network and the Internet.The Ethernet switch is used to build the subnetworks of each topology. Through its configuration, each switch port is assignedFigure 2. The configured tunnels.123Figure 3. The physical network and the groups of portsfor the third topology.ABCDEthernet switchto a specific group of ports. The different groups are isolated, i.e. the switch does not provide connectivity between two nodes that belong to different groups of ports. In this way, “virtual” Ethernet subnetworks are built, as if the nodes of each subnetwork were physically connected to different Ethernet switches. The groups of ports for creating the third topology (see Figure 1) are shown in Figure 3.3.2The remote configurationThe remote configuration of the PCs and the Ethernet switch is made through a web server (Apache release 1.3.9 [1]) running in node A. Using a CGI (Common Gateway Interface), the web server executes different UNIX scripts files. In turn these scripts execute a Telnet client that connects to the Telnet server running in the target device (PC or switch), and then executes the necessary commands to configure it. The script files use the expect scripts language [2] that allows CGIs to interact with the Telnet client, so an interpreter of this language needs to be installed in node A. This is the usual method for the configuration. The next sections focus on the configuration of the switch and the nodes.3.3The switch configuration and topologies The four different network topologies (see Figure 1) are built by configuring the (virtual) subnetworks in the switch. The number of subnetworks varies from one to three, as does the number of hosts that belong to a subnetwork. Note that only one of the two interfaces of node A can belong to the different network topologies, while the other interface is always connected to the Internet.The switch configuration is made via Telnet by using some specific commands [4] for assigning ports to (virtual) subnetworks. As explained in section 3.2, this is carried out by expect scripts files, which use these commands for each configuration. An example file is shown in Figure 4. First a Telnet session is initiated and then the commands for assigning ports to subnetworks are executed.3.4IPv4 and IPv6 in the nodesThe user can choose a topology and the IP protocol version for each node, i.e. only IPv4 or both IPv4 and IPv6 (there are no only IPv6 nodes). IPv4 is the default option. According to the selected option and topology, a predefined IP address (IPv4 and/or IPv6) is assigned to each network interface of the node depending on the subnetwork it belongs to. The corresponding routing table of the node is then configured. Moreover, nodes with interfaces in different subnetworks act as IP routers, so packet forwarding must be enabled in them.Linux systems configure network aspects at the boot time by executing several UNIX scripts files and using the information written in several configuration files [3]. All nodes are dual, i.e. both IP versions are installed on them. In order to configure IPv6, we have added some specific UNIX script files and configuration files [5], which are the equivalents to the installed files for IPv4. The configuration files for IPv4 are the following: ifcfg-eth0, for information about the Ethernet interface 0, such as the assigned IPv4 address (and the corresponding files for each interface); the file static-routes, for adding static routes to the routing table; the file network, for enabling or disabling networking and forwarding; the file hosts, with the table that relates DNS names and IP addresses. Network configuration is made at the boot time by executing the script network.The configuration files for IPv6 are the following: network-ip6, for enabling or disabling networking and forwarding, and also for defining the files names where there is the information about the interfaces, the routing and the tunnels; the file network-ip6.conf (or the file name defined in network-ip6), for the interfaces (equivalent to ifcfg-ethx files in IPv4) and for the static routes of the routing table (equivalent to the static-route file in IPv4); the file tunnels.conf (or the file name defined in network-ip6) for setting up virtual interfaces that define the IPv6/IPv4 tunnels (see next section); the file hosts, with the table relating Domain Name Services (DNS) names and IPv6 addresses (the same file as in IPv4). Network configuration is performed at the boot time by executing the script network-ip6.init.For each one of the different PCs’ configurations we have created the corresponding specific files. For example, for the file ifcfg-eth0, we have created the files 1.ifcfg-eth0, 2.ifcfg-eth0 and 3.ifcfg-eth0, corresponding to network topologies 1, 2 and 3 respectively, and using IPv4. Remote configuration is performed by replacing these files (e.g., ifcfg-eth0 by 1.ifcfg-eth0), and then enforcing the system to a network reconfiguration (e.g., executing the script network). As explained in section 3.2, this is done through expect scripts via Telnet.3.5IPv4 and IPv6 interconnectionIn order to provide connectivity between IPv6 nodes through a path that involves IPv4 nodes, it is necessary to set up tunnels between them, i.e. IPv6 packets are encapsulated in IPv4 packets and IPv6 addresses are mapped to IPv4 addresses at the sending point [6]. This operation is undone at the receiving point. Tunnels are created between two nodes by configuring a network virtual interface at both ends. The interface encapsulates and sends the packets to a real interface (i.e. eth0), and it is used in the IPv6 routing tables for a specific route. In Figure 2 three examples of tunnels are shown:-In the first one, there are three IPv6 subnetworks and two IPv6 routers. The second subnetwork uses IPv6 over IPv4.A tunnel is created between theB andC routers for therouting path that connects the first and the third subnetworks.-In the second example there is a single IPv6 subnetwork with IPv6 over IPv4. A tunnel is created between the A andD hosts.-In the third example there are three IPv6 subnetworks and two IPv6 routers. The first subnetwork uses IPv6 over IPv4.A tunnel is created between A andB for the routing paththat connects the first subnetwork with the other ones.3.6Remote commands executionOur platform allows the students to execute commands in a remote way at any of the nodes, such as ifconfig, route,and traceroute. These commands are executed in a Telnet session as a non-root user using expect scripts. As mentioned in section 2.1, students can test any configuration parameter but they are not allowed to change them.4.CONCLUSIONS AND FUTURE WORK We have built a virtual network laboratory for learning basic concepts of IP networking and have shown how students can use it. Using a web interface, students can choose one of the available configurations and study them by using (and learning) the main Linux network commands.There are several areas for future work. One is to allow students not only to choose and study predefined configurations, but also to build it on demand. In this way students would create their own topologies, assign IP addresses, and configure the routing tables (graphically and/or through Linux commands).Another interesting possibility is using the Simple Network Management Protocol (SNMP) for configuration of the network laboratory. We are also planning to add an enhanced user management (e.g. session scheduling, the ability of saving the configuration for recovery in a future session), and a network monitor for capturing Ethernet packets in order to study TCP/IP protocols, such as e-mail, web, DNS and others.5.ACKNOWLEDGMENTSThis study was partially supported by the CICYT (Spanish Education Ministry) contract TEL-98-0408-C02-01 and the Acciones Integradas program ref. HI1998-0032.6.REFERENCES[1]The Apache Software Foundation. HTTP Apache Server,[2]Don Libes. Exploring Expect: A Tcl-Based Toolkit forAutomating Interactive Programs. O’Reilly and Associates(1995)[3]Richard Petersen. Linux - Manual de Referencia.Osborne/MacGraw-Hill (1997)[4]Cisco Systems, Inc. Cisco IOS Desktop Switching SoftwareConfiguration Guide – Catalyst 2900 Series XL Cisco IOSRelease 11.2(8)SA6 (1999)[5]Peter Bieringer. Linux: IPv6.http://www.bieringer.de/linux/IPv6/index.html[6]Christian Huitema. IPv6 – The new Internet Protocol.Prentice Hall (1998)。

Mesh 医学主题词表

Mesh 医学主题词表The following is a table of the categories and subcategories of the tree structure chart:A。

AnatomyA1.Body ns___A3.Digestive SystemA4.Respiratory SystemA5.Urogenital SystemA6.Endocrine SystemA7.vascular SystemA8.Nervous SystemA9.Sense OrgansA10.TissuesA11.CellsA12.Fluids and nsA13.Animal Structures___A15.___A16.Embryonic StructuresB。

OrganismsB1.InvertebratesB2.___B3.BacteriaB4.VirusesB5.Algae and FungiB6.PlantsB7.ArchaeaC。

Diseases___C2.Virus DiseasesC3.Parasitic DiseasesC4.NeoplasmsC5.Musculoskeletal Diseases C6.Digestive System Diseases ___C8.Respiratory Tract Diseases___C10.Nervous System DiseasesC11.Eye Diseases______ of the tree structure chart。

The categories are Anatomy。

Organisms。

and Diseases。

with ___ each。

The Anatomy category includes Body ns。

Musculoskeletal System。

Digestive System。

Respiratory System。

基于屏幕空间的泊松表面重建

Screened Poisson Surface ReconstructionMICHAEL KAZHDANJohns Hopkins UniversityandHUGUES HOPPEMicrosoft ResearchPoisson surface reconstruction creates watertight surfaces from oriented point sets.In this work we extend the technique to explicitly incorporate the points as interpolation constraints.The extension can be interpreted as a generalization of the underlying mathematical framework to a screened Poisson equation.In contrast to other image and geometry processing techniques,the screening term is deﬁned over a sparse set of points rather than over the full domain.We show that these sparse constraints can nonetheless be integrated efﬁciently.Because the modiﬁed linear system retains the sameﬁnite-element discretization,the sparsity structure is unchanged,and the system can still be solved using a multigrid approach. Moreover we present several algorithmic improvements that together reduce the time complexity of the solver to linear in the number of points, thereby enabling faster,higher-quality surface reconstructions.Categories and Subject Descriptors:I.3.5[Computer Graphics]:Compu-tational Geometry and Object ModelingAdditional Key Words and Phrases:screened Poisson equation,adaptive octree,ﬁnite elements,surfaceﬁttingACM Reference Format:Kazhdan,M.,and Hoppe,H.Screened Poisson surface reconstruction. ACM Trans.Graph.NN,N,Article NN(Month YYYY),PP pages.DOI=10.1145/XXXXXXX.YYYYYYY/10.1145/XXXXXXX.YYYYYYY1.INTRODUCTIONPoisson surface reconstruction[Kazhdan et al.2006]is a well known technique for creating watertight surfaces from oriented point samples acquired with3D range scanners.The technique is resilient to noisy data and misregistration artifacts.However, as noted by several researchers,it suffers from a tendency to over-smooth the data[Alliez et al.2007;Manson et al.2008; Calakli and Taubin2011;Berger et al.2011;Digne et al.2011].In this work,we explore modifying the Poisson reconstruc-tion algorithm to incorporate positional constraints.This mod-iﬁcation is inspired by the recent reconstruction technique of Calakli and Taubin[2011].It also relates to recent work in im-age and geometry processing[Nehab et al.2005;Bhat et al.2008; Chuang and Kazhdan2011],in which a dataﬁdelity term is used to“screen”the associated Poisson equation.In our surface recon-struction context,this screening term corresponds to a soft con-straint that encourages the reconstructed isosurface to pass through the input points.The approach we propose differs from the traditional screened Poisson formulation in that the position and gradient constraints are deﬁned over different domain types.Whereas gradients are constrained over the full3D space,positional constraints are introduced only over the input points,which lie near a2D manifold. We show how these two types of constraints can be efﬁciently integrated,so that we can leverage the original multigrid structure to solve the linear system without incurring a signiﬁcant overhead in space or time.To demonstrate the beneﬁts of screening,Figure1compares results of the traditional Poisson surface reconstruction and the screened Poisson formulation on a subset of11.4M points from the scan of Michelangelo’s David[Levoy et al.2000].Both reconstructions are computed over a spatial octree of depth10,corresponding to an effective voxel resolution of10243.Screening generates a model that better captures the input data(as visualized by the surface cross-sections overlaid with the projection of nearby samples), even though both reconstructions have similar complexity(6.8M and6.9M triangles respectively)and required similar processing time(230and272seconds respectively,without parallelization).1 Another contribution of our work is to modify both the octree structure and the multigrid implementation to reduce the time complexity of solving the Poisson system from log-linear to linear in the number of input points.Moreover we show that hierarchical point clustering enables screened Poisson reconstruction to attain this same linear complexity.2.RELA TED WORKReconstructing surfaces from scanned points is an important and extensively studied problem in computer graphics.The numerous approaches can be broadly categorized as follows. Combinatorial Algorithms.Many schemes form a triangula-tion using a subset of the input points[Cazals and Giesen2006]. Space is often discretized using a tetrahedralization or a voxel grid,and the resulting elements are partitioned into inside and outside regions using an analysis of cells[Amenta et al.2001; Boissonnat and Oudot2005;Podolak and Rusinkiewicz2005], eigenvector computation[Kolluri et al.2004],or graph cut [Labatut et al.2009;Hornung and Kobbelt2006].Implicit Functions.In the presence of sampling noise,a common approach is toﬁt the points using the zero set of an implicit func-tion,such as a sum of radial bases[Carr et al.2001]or piecewise polynomial functions[Ohtake et al.2005;Nagai et al.2009].Many techniques estimate a signed-distance function[Hoppe et al.1992; 1The performance of the unscreened solver is measured using our imple-mentation with screening weight set to zero.The implementation of the original Poisson reconstruction runs in412seconds.ACM Transactions on Graphics,V ol.VV,No.N,Article XXX,Publication date:Month YYYY.2•M.Kazhdan and H.HoppeFig.1:Reconstruction of the David head ‡,comparing traditional Poisson surface reconstruction (left)and screened Poisson surface reconstruction which incorporates point constraints (center).The rightmost diagram plots pixel depth (z )values along the colored segments together with the positions of nearby samples.The introduction of point constraints signiﬁcantly improves ﬁt accuracy,sharpening the reconstruction without amplifying noise.Bajaj et al.1995;Curless and Levoy 1996].If the input points are unoriented,an important step is to correctly infer the sign of the resulting distance ﬁeld [Mullen et al.2010].Our work extends Poisson surface reconstruction [Kazhdan et al.2006],in which the implicit function corresponds to the model’s indicator function χ.The function χis often deﬁned to have value 1inside and value 0outside the model.To simplify the derivations,inthis paper we deﬁne χto be 12inside and −12outside,so that its zero isosurface passes near the points.The function χis solved using a Laplacian system discretized over a multiresolution B-spline basis,as reviewed in Section 3.Alliez et al.[2007]form a Laplacian system over a tetrahedral-ization,and constrain the solution’s biharmonic energy;the de-sired function is obtained as the solution to an eigenvector prob-lem.Manson et al.[2008]represent the indicator function χusing a wavelet basis,and efﬁciently compute the basis coefﬁcients using simple local sums over an adapted octree.Calakli and Taubin [2011]optimize a signed-distance function to have value zero at the points,have derivatives that agree with the point normals,and minimize a Hessian smoothness norm.The resulting optimization involves a bilaplacian operator,which requires estimating derivatives of higher order than in the Laplacian.The reconstructed surfaces are shown to have good accuracy,strongly suggesting the importance of explicitly ﬁtting the points within the optimization.This motivated us to explore whether a Laplacian system could be extended in this respect,and also be compatible with a multigrid solver.Screened Poisson Surface Fitting.The method of Nehab et al.[2005],which simultaneously ﬁts position and normal constraints,may also be viewed as the solution of a screened Poisson equation.The ﬁtting algorithm assumes that a 2D parametric domain (i.e.,a plane or triangle mesh)is already established.The position and derivative constraints are both deﬁned over this 2D domain.In contrast,in Poisson surface reconstruction the 2D domain manifold is initially unknown,and therefore the goal is to infer anindicator function χrather than a parametric function.This leadsto a hybrid problem with derivative (Laplacian)constraints deﬁned densely over 3D and position constraints deﬁned sparsely on the set of points sampled near the unknown 2D manifold.3.REVIEW OF POISSON SURFACE RECONSTRUCTIONThe approach of Poisson surface reconstruction is based on the observation that the (inward pointing)normal ﬁeld of the boundary of a solid can be interpreted as the gradient of the solid’s indicator function.Thus,given a set of oriented points sampling the boundary,a watertight mesh can be obtained by (1)transforming the oriented point samples into a continuous vector ﬁeld in 3D,(2)ﬁnding a scalar function whose gradients best match the vector ﬁeld,and (3)extracting the appropriate isosurface.Because our work focuses primarily on the second step,we review it here in more detail.Scalar Function Fitting.Given a vector ﬁeld V :R 3→R 3,thegoal is to solve for the scalar function χ:R 3→R minimizing:E (χ)=∇χ(p )− V (p ) 2d p .(1)Using the Euler-Lagrange formulation,the minimum is obtainedby solving the Poisson equation:∆χ=∇· V .System Discretization.The Galerkin formulation is used totransform this into a ﬁnite-dimensional system [Fletcher 1984].First,a basis {B 1,...,B N }:R 3→R is chosen,namely a collection of trivariate (usually triquadratic)B-spline functions.With respect to this basis,the discretization becomes:∆χ,B i [0,1]3= ∇· V ,B i [0,1]31≤i ≤Nwhere ·,· [0,1]3is the standard inner-product on the space of(scalar-and vector-valued)functions deﬁned on the unit cube:F ,G [0,1]3=[0,1]3F (p )·G (p )d p , U , V [0,1]3=[0,1]3U (p ), V (p ) d p .Since the solution is itself expressed in terms of the basis functions:χ(p )=N∑i =1x i B i (p ),ACM Transactions on Graphics,V ol.VV ,No.N,Article XXX,Publication date:Month YYYY .1.离散化->连续2.找个常量函数最佳拟合这些这些向量域;3.抽取等值面这里已经将离散的有向点转化为了连续的向量域表示;点集合的最初的思考Screened Poisson Surface Reconstruction•3ﬁnding the coefﬁcients{x i}of the solution reduces to solving the linear system Ax=b where:A i j= ∇B i,∇B j [0,1]3and b i= V,∇B i [0,1]3.(2) The basis functions{B1,...,B N}are chosen to be compactly supported,so most pairs of functions do not have overlapping support,and thus the matrix A is sparse.Because the solution is expected to be smooth away from the input samples,the linear system is discretized byﬁrst adapting an octree to the input samples and then associating an(appropriately scaled and translated)trivariate B-spline function to each octree node. This provides high-resolution detail in the vicinity of the surface while reducing the overall dimensionality of the system.System Solution.Given the hierarchy deﬁned by an octree of depth D,a multigrid approach is used to solve the linear system. The basis functions are partitioned according to the depths of their associated nodes and,for each depth d,a linear system A d x d=b d is deﬁned using the corresponding B-splines{B d1,...,B d Nd},such thatχ(p)=∑D d=0∑i x d i B d i(p).Because the octree-selected B-spline functions do not form a complete grid at each depth,it is generally not possible to prolong the solution x d at depth d into the solution x d+1at depth d+1. (The B-spline associated with a given node is a sum of B-spline functions associated not only with its own child nodes,but also with child nodes of its neighbors.)Instead,the constraints at depth d+1are adjusted to account for the part of the solution already realized at coarser depths.Pseudocode for a cascadic solver,where the solution is only relaxed on the up-stroke of the V-cycle,is given in Algorithm1.Algorithm1:Cascadic Poisson Solver1For d∈{0,...,D}Iterate from coarse toﬁne2For d ∈{0,...,d−1}Remove the constraints3b d=b d−A dd x d met at coarser depths4Relax A d x d=b d Adjust the system at depth dHere,A dd is the N d×N d matrix used to transform solution coefﬁcients at depth d into constraints at depth d:A dd i j= ∇B d i,∇B d j [0,1]3.Note that,by deﬁnition,A d=A dd.Isosurface Extraction.Solving the Poisson equation,one obtains a functionχthat approximates the indicator function.Ideally,the function’s zero level-set should therefore correspond to the desired surface.In practice however,the functionχcan differ from the true indicator function due to several sources of error:—The point sampling may be noisy,possibly containing outliers.—The Galerkin discretization is only an approximation of the continuous problem.—The point sampling density is approximated during octree construction.To mitigate these errors,in[Kazhdan et al.2006]the implicit function is adjusted by globally subtracting the average value of the function at the input samples.4.INCORPORA TING POINT CONSTRAINTSThe original Poisson surface reconstruction algorithm adjusts the implicit function using a single global offset such that its average value at all points is zero.However,the presence of errors can cause the implicit function to drift so that no global offset is satisfactory. Instead,we seek to explicitly interpolate the points.Given the set of input points P with weights w:P→R≥0,we add to the energy of Equation1a term that penalizes the function’s deviation from zero at the samples:E(χ)=V(p)−∇χ(p) 2d p+α·Area(P)∑p∈P∑p∈Pw(p)χ2(p)(3)whereαis a weight that trades off the importance ofﬁtting the gradients andﬁtting the values,and Area(P)is the area of the reconstructed surface,estimated by computing the local sampling density as in[Kazhdan et al.2006].In our implementation,we set the per-sample weights w(p)=1,although one can also use conﬁdence values if these are available.The energy can be expressed concisely asE(χ)= V−∇χ, V−∇χ [0,1]3+α χ,χ (w,P)(4)where ·,· (w,P)is the bilinear,symmetric,positive,semi-deﬁnite form on the space of functions in the unit-cube,obtained by taking the weighted sum of function values:F,G (w,P)=Area(P)∑p∈P w(p)∑p∈Pw(p)·F(p)·G(p).4.1Interpretation as a Screened Poisson EquationThe energy in Equation4combines a gradient constraint integrated over the spatial domain with a value constraint summed at discrete points.As shown in the appendix,its minimization can be interpreted as a screened Poisson equation(∆−α˜I)χ=∇· V with an appropriately deﬁned operator˜I.4.2DiscretizationWe apply a discretization similar to that in Section3to the minimization of the energy in Equation4.The coefﬁcients of the solutionχwith respect to the basis{B1,...,B N}are again obtained by solving a linear system of the form Ax=b.The right-hand-side b is unchanged because the constrained value at the sample points is zero.Matrix A now includes the point constraints:A i j= ∇B i,∇B j [0,1]3+α B i,B j (w,P).(5) Note that incorporating the point constraints does not change the sparsity of matrix A because B i(p)·B j(p)is nonzero only if the supports of the two functions overlap,in which case the Poisson equation has already introduced a nonzero entry in the matrix.As in Section3,we solve this linear system using a cascadic multigrid algorithm–iterating over the octree depths from coarsest toﬁnest,adjusting the constraints,and relaxing the system.Similar to Equation5,the matrix used to transform a solution at depth d to a constraint at depth d is expressed as:A dd i j= ∇B d i,∇B d j [0,1]3+α B d i,B d j (w,P).ACM Transactions on Graphics,V ol.VV,No.N,Article XXX,Publication date:Month YYYY.4•M.Kazhdan and H.HoppeFig.2:Visualizations of the reconstructed implicit function along a planar slice through the cow ‡(shown in blue on the left),for the original Poisson solver,and for the screened Poisson solver without and with scale-independent screening.This operator adjusts the constraint b d (line 3of Algorithm 1)not only by removing the Poisson constraints met at coarser resolutions,but also by modifying the constrained values at points where the coarser solution does not evaluate to zero.4.3Scale-Independent ScreeningTo balance the two energy terms in Equation 3,it is desirable to adjust the screening parameter αsuch that (1)the reconstructed surface shape is invariant under scaling of the input points with respect to the solver domain,and (2)the prolongation of a solution at a coarse depth is an accurate estimate of the solution at a ﬁner depth in the cascadic multigrid approach.We achieve both these goals by adjusting the relative weighting of position and gradient constraints across the different octree depths.Noting that the magnitude of the gradient constraint scales with resolution,we double the weight of the interpolation constraint with each depth:A ddi j = ∇B d i ,∇B dj [0,1]3+2d α B d i ,B dj (w ,P ).The adaptive weight of 2d is chosen to keep the Laplacian and screening constraints around the surface in balance.To see this,assume that the points are locally planar,and consider the row of the system matrix corresponding to an octree node overlapping the points.The coefﬁcients of the system in that row are the sum of Laplacian and screening terms.If we consider the rows corresponding to the child nodes that overlap the surface,we ﬁnd that the contribution from the Laplacian constraints scales by a factor of 1/2while the contribution from the screening term scales by a factor of 1/4.2Thus,scaling the screening weights by a factor of two with each resolution keeps the two terms in balance.Figure 2shows the beneﬁt of scale-independent screening in reconstructing a cow model.The leftmost image shows a plane passing through the bounding cube of the cow,and the images to the right show the values of the computed indicator function along that plane,for different implementations of the solver.As the ﬁgure shows,the unscreened Poisson solver provides a good approximation of the indicator functions,with values inside (resp.outside)the surface approximately 1/2(resp.-1/2).However,applying the same solver to the screened Poisson equation (second from right)provides a solution that is only correct near the input samples and returns to zero near the faces of the bounding cube,2Forthe Laplacian term,the Laplacian scales by a factor of 4with reﬁnement,and volumetric integrals scale by a factor of 1/8.For the screening term,area integrals scale by a factor of 1/4.potentially resulting in spurious surface sheets away from the surface.It is only with scale-independent screening (right)that we obtain a high-quality solution to the screened Poisson ing this resolution adaptive weighting,our system has the property that the reconstruction obtained by solving at depth D is identical to the reconstruction that would be obtained by scaling the point set by 1/2and solving at depth D +1.To see this,we consider the two energies that guide the reconstruc-tion,E V (χ)measuring the extent to which the gradients of the so-lution match the prescribed vector ﬁeld,and E (w ,P )(χ)measuring the extent to which the solution meets the screening constraint:E V (χ)=V (p )−∇χ(p )2d p E (w ,P )(χ)=Area (P )∑p ∈P w (p )∑p ∈Pw (p )χ2(p ).Scaling by 1/2,we obtain a new point set (˜w ,˜P)with positions scaled by 1/2,unchanged weights,˜w (p )=w (2p ),and scaled area,Area (˜P )=Area (P )/4;a new scalar ﬁeld,˜χ(p )=χ(2p );and a new vector ﬁeld,˜ V (p )=2 V (2p ).Computing the correspondingenergies,we get:E ˜ V (˜χ)=1E V(χ)and E (˜w ,˜P )(˜χ)=1E (w ,P )(χ).Thus,scaling the screening weight by a factor of two with eachsuccessive depth ensures that the sum of energies is unchanged (up to multiplication by a constant)so the minimizer remains the same.4.4Boundary ConditionsIn order to deﬁne the linear system,it is necessary to deﬁne the behavior of the function space along the boundary of the integration domain.In the original Poisson reconstruction the authors imposed Dirichlet boundary conditions,forcing the implicit function to havea value of −12along the boundary.In the present work we extend the implementation to support Neumann boundary conditions as well,forcing the normal derivative to be zero along the boundary.In principle these two boundary conditions are equivalent for watertight surfaces,since the indicator function has a constant negative value outside the model.However,in the presence of missing data we ﬁnd Neumann constraints to be less restrictive because they only require that the implicit function have zero derivative across the boundary of the integration domain,a property that is compatible with the gradient constraint since the guiding vector ﬁeld V is set to zero away from the samples.(Note that when the surface does cross the boundary of the domain,the Neumann boundary constraints create a bias to crossing the domain boundary orthogonally.)Figure 3shows the practical implications of this choice when reconstructing the Angel model,which was only scanned from the front.The left image shows the original point set and the reconstructions using Dirichlet and Neumann boundary conditions are shown to the right.As the ﬁgure shows,imposing Dirichlet constraints creates a water-tight surface that closes off before reaching the boundary while using Neumann constraints allows the surface to extend out to the boundary of the domain.ACM Transactions on Graphics,V ol.VV ,No.N,Article XXX,Publication date:Month YYYY .Screened Poisson Surface Reconstruction•5Fig.3:Reconstructions of the Angel point set‡(left)using Dirichlet(center) and Neumann(right)boundary conditions.Similar results can be seen at the bases of the models in Figures1 and4a,with the original Poisson reconstructions obtained using Dirichlet constraints and the screened reconstructions obtained using Neumann constraints.5.IMPROVED ALGORITHMIC COMPLEXITYIn this section we discuss the efﬁciency of our reconstruction al-gorithm.We begin by analyzing the complexity of the algorithm described above.Then,we present two algorithmic improvements. Theﬁrst describes how hierarchical clustering can be used to re-duce the screening overhead at coarser resolutions.The second ap-plies to both the unscreened and screened solver implementations, showing that the asymptotic time complexity in both cases can be reduced to be linear in the number of input points.5.1Efﬁciency of basic solverLet us begin by analyzing the computational complexity of the unscreened and screened solvers.We assume that the points P are evenly distributed over a surface,so that the depth of the adapted octree is D=O(log|P|)and the number of octree nodes at depth d is O(4d).We also note that the number of nonzero entries in matrix A dd is O(4d),since the matrix has O(4d)rows and each row has at most53nonzero entries.(Since we use second-order B-splines, basis functions are supported within their one-ring neighborhoods and the support of two functions will overlap only if one is within the two-ring neighborhood of the other.)Assuming that the matrices A dd have already been computed,the computational complexity for the different steps in Algorithm1is: Step3:O(4d)–since A dd has O(4d)nonzero entries.Step4:O(4d)–since A d has O(4d)nonzero entries and the number of relaxation steps performed is constant.Steps2-3:∑d−1d =0O(4d)=O(4d·d).Steps2-4:O(4d·d+4d)=O(4d·d).Steps1-4:∑D d=0O(4d·d)=O(4D·D)=O(|P|·log|P|). There still remains the computation of matrices A dd .For the unscreened solver,the complexity of computing A dd is O(4d),since each entry can be computed in constant time.Thus, the overall time complexity remains O(|P|·log|P|).For the screened solver,the complexity of computing A dd is O(|P|)since deﬁning the coefﬁcients requires accumulating the screening contribution from each of the points,and each point contributes to a constant number of rows.Thus,the overall time complexity is dominated by the cost of evaluating the coefﬁcients of A dd which is:D∑d=0d−1∑d =0O(|P|)=O(|P|·D2)=O(|P|·log2|P|).5.2Hierarchical Clustering of Point ConstraintsOurﬁrst modiﬁcation is based on the observation that since the basis functions at coarser resolutions are smooth,it is unnecessary to constrain them at the precise sample locations.Instead,we cluster the weighted points as in[Rusinkiewicz and Levoy2000]. Speciﬁcally,for each depth d,we deﬁne(w d,P d)where p i∈P d is the weighted average position of the points falling into octree node i at depth d,and w d(p i)is the sum of the associated weights.3 If all input points have weight w(p)=1,then w d(p i)is simply the number of points falling into node i.This alters the computation of the system matrix coefﬁcients:A dd i j= ∇B d i,∇B d j [0,1]3+2dα B d i,B d j (w d,P d).Note that since d>d ,the value B d i,B d j (w d,P d)is obtained by summing over points stored with theﬁner resolution.In particular,the complexity of computing A dd for the screened solver becomes O(|P d|)=O(4d),which is the same as that of the unscreened solver,and both implementations now have an overall time complexity of O(|P|·log|P|).On typical examples,hierarchical clustering reduces execution time by a factor of almost two,and the reconstructed surface is visually indistinguishable.5.3Conforming OctreesTo account for the adaptivity of the octree,Algorithm1subtracts off the constraints met at all coarser resolutions before relaxing at a given depth(steps2-3),resulting in an algorithm with log-linear time complexity.We obtain an implementation with linear complexity by forcing the octree to be conforming.Speciﬁcally, we deﬁne two octree cells to be mutually visible if the supports of their associated B-splines overlap,and we require that if a cell at depth d is in the octree,then all visible cells at depth d−1must also be in the tree.Making the tree conforming requires the addition of new nodes at coarser depths,but this still results in O(4d)nodes at depth d.While the conforming octree does not satisfy the condition that a coarser solution can be prolonged into aﬁner one,it has the property that the solution obtained at depths{0,...,d−1}that is visible to a node at depth d can be expressed entirely in terms of the coefﬁcients at depth d−ing an accumulation vector to store the visible part of the solution,we obtain the linear-time implementation in Algorithm2.3Note that the weight w d(p)is unrelated to the screening weight2d introduced in Section4.3for scale-independent screening.ACM Transactions on Graphics,V ol.VV,No.N,Article XXX,Publication date:Month YYYY.6•M.Kazhdan and H.HoppeHere,P d d−1is the B-spline prolongation operator,expressing a solution at depth d−1in terms of coefﬁcients at depth d.The number of nonzero entries in P d d−1is O(4d),since each column has at most43nonzero entries,so steps2-5of Algorithm2all have complexity O(4d).Thus,the overall complexity of both the unscreened and screened solvers becomes O(|P|).Algorithm2:Conforming Cascadic Poisson Solver1For d∈{0,...,D}Iterate from coarse toﬁne.2ˆx d−1=P d−1d−2ˆx d−2Upsample coarseraccumulation vector.3ˆx d−1=ˆx d−1+x d−1Add in coarser solution.4b d=b d−A d d−1ˆx d−1Remove constraintsmet at coarser depths.5Relax A d x d=b d Adjust the system at depth d.5.4Implementation DetailsThe algorithm is implemented in C++,using OpenMP for multi-threaded parallelization.We use a conjugate-gradient solver to re-lax the system at each multigrid level.With the exception of the octree construction,most of the operations involved in the Poisson reconstruction can be categorized as operations that either“accu-mulate”or“distribute”information[Bolitho et al.2007,2009].The former do not introduce write-on-write conﬂicts and are trivial to parallelize.The latter only involve linear operations,and are par-allelized using a standard map-reduce approach:in the map phase we create a duplicate copy of the data for each thread to distribute values into,and in the reduce phase we merge the copies by taking their sum.6.RESULTSWe evaluate the algorithm(Screened)by comparing its accuracy and computational efﬁciency with several prior methods:the original Poisson reconstruction of Kazhdan et al.[2006](Poisson), the Wavelet reconstruction of Manson et al.[2008](Wavelet),and the Smooth Signed Distance reconstruction of Calakli and Taubin [2011](SSD).For the new algorithm,we set the screening weight toα=4and use Neumann boundary conditions in all experiments.(Numerical results obtained using Dirichlet boundaries were indistinguishable.) For the prior methods,we set algorithmic parameters to values recommended by the authors,using Haar Wavelets in the Wavelet reconstruction and setting the value/normal/Hessian weights to 1/1/0.25in the SSD reconstruction.For Poisson,SSD,and Screened we set the“samples-per-node”parameter to1and the “bounding-box-scale”parameter to1.1.(For Wavelet the bounding box scale is hard-coded at1and there is no parameter to adjust the sampling density.)6.1AccuracyWe run three different types of experiments.Real Scanner Data.To evaluate the accuracy of the different reconstruction algorithms on real-world data,we gathered several scanned datasets:the Awakening(10M points),the Stanford Bunny (0.2M points),the David(11M points),the Lucy(1.0M points), and the Neptune(2.4M points).For each dataset,we randomly partitioned the points into two equal-sized subsets:input points for the reconstruction algorithms,and validation points to measure point-to-reconstruction distances.Figure4a shows reconstructions results for the Neptune and David models at depth10.It also shows surface cross-sections overlaid with the validation points in their vicinity.These images reveal that the Poisson reconstruction(far left),and to a lesser extent the SSD reconstruction(center left),over-smooth the data,while the Wavelet reconstruction(center left)has apparent derivative discontinuities.In contrast,our screened Poisson approach(far right)provides a reconstruction that faithfullyﬁts the samples without introducing noise.Figure4b shows quantitative results across all datasets,in the form of RMS errors,measured using the distances from the validation points to the reconstructed surface.(We also computed the maximum error,but found that its sensitivity to individual outlier points made it an unreliable and unindicative statistic.)As theﬁgure indicates,the Screened Poisson reconstruction(blue)is always more accurate than both the original Poisson reconstruction algorithm(red)and the Wavelet reconstruction(purple),and generates reconstruction whose RMS errors are comparable to or smaller than those of the SSD reconstruction(green).Clean Uniformly Sampled Data.To evaluate reconstruction accuracy on clean data,we used the approach of Osada et al.[2001] to generate oriented point sets by uniformly sampling the surfaces of the Fandisk,Armadillo Man,Dragon,and Raptor models.For each model,we generated datasets of100K and1M points and reconstructed surfaces from each point set using the four different reconstruction algorithms.As an example,Figure5a shows the reconstructions of the fandisk and raptor models using1M point samples at depth10.Despite the lack of noise in the input data,the Wavelet reconstruction has spurious high-frequency detail.Focusing on the sharp edges in the model,we also observe that the screened Poisson reconstruction introduces less smoothing,providing a reconstruction that is truer to the original data than either the original Poisson or the SSD reconstructions.Figure5b plots RMS errors across all models,measured bidirec-tionally between the original surface and the reconstructed surface using the Metro tool[Cignoni and Scopigno1998].As in the case of real scanner data,screened Poisson reconstruction always out-performs the original Poisson and Wavelet reconstructions,and is comparable to or better than the SSD reconstruction. Reconstruction Benchmark.We use the benchmark of Berger et al.[2011]to evaluate the accuracy of the algorithms under different simulations of scanner error,including nonuniform sampling,noise,and misalignment.The dataset consists of mul-tiple virtual scans of implicit surfaces representing the Anchor, Dancing Children,Daratech,Gargoyle,and Quasimodo models. As an example,Figure6a visualizes the error in the reconstructions of the anchor model from a virtual scan consisting of210K points (demarked with a dashed rectangle in Figure6b)at depth9.The error is visualized using a red-green-blue scale,with red signifyingACM Transactions on Graphics,V ol.VV,No.N,Article XXX,Publication date:Month YYYY.。

Advanced Micro Devices, Inc.

To appear in ACM Transactions on Computer SystemsA General Framework for Prefetch Scheduling in Linked Data Structures and its Application to Multi-Chain PrefetchingSEUNGRYUL CHOIUniversity of Maryland,College ParkNICHOLAS KOHOUTEVI Technology LLC.SUMIT PAMNANIAdvanced Micro Devices,Inc.andDONGKEUN KIM and DONALD YEUNGUniversity of Maryland,College ParkThis research was supported in part by NSF Computer Systems Architecture grant CCR-0093110, and in part by NSF CAREER Award CCR-0000988.Author’s address:Seungryul Choi,University of Maryland,Department of Computer Science, College Park,MD20742.Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for proﬁt or commercial advantage,the ACM copyright/server notice,the title of the publication,and its date appear,and notice is given that copying is by permission of the ACM,Inc.To copy otherwise,to republish, to post on servers,or to redistribute to lists requires prior speciﬁc permission and/or a fee.c 2001ACM1529-3785/2001/0700-0001$5.00ACM Transactions on Computer Systems2·Seungryul Choi et al.Pointer-chasing applications tend to traverse composite data structures consisting of multiple independent pointer chains.While the traversal of any single pointer chain leads to the seri-alization of memory operations,the traversal of independent pointer chains provides a source of memory parallelism.This article investigates exploiting such inter-chain memory parallelism for the purpose of memory latency tolerance,using a technique called multi-chain prefetching. Previous works[Roth et al.1998;Roth and Sohi1999]have proposed prefetching simple pointer-based structures in a multi-chain fashion.However,our work enables multi-chain prefetching for arbitrary data structures composed of lists,trees,and arrays.This article makesﬁve contributions in the context of multi-chain prefetching.First,we intro-duce a framework for compactly describing LDS traversals,providing the data layout and traversal code work information necessary for prefetching.Second,we present an oﬀ-line scheduling algo-rithm for computing a prefetch schedule from the LDS descriptors that overlaps serialized cache misses across separate pointer-chain traversals.Our analysis focuses on static traversals.We also propose using speculation to identify independent pointer chains in dynamic traversals.Third,we propose a hardware prefetch engine that traverses pointer-based data structures and overlaps mul-tiple pointer chains according to the computed prefetch schedule.Fourth,we present a compiler that extracts LDS descriptors via static analysis of the application source code,thus automating multi-chain prefetching.Finally,we conduct an experimental evaluation of compiler-instrumented multi-chain prefetching and compare it against jump pointer prefetching[Luk and Mowry1996], prefetch arrays[Karlsson et al.2000],and predictor-directed stream buﬀers(PSB)[Sherwood et al. 2000].Our results show compiler-instrumented multi-chain prefetching improves execution time by 40%across six pointer-chasing kernels from the Olden benchmark suite[Rogers et al.1995],and by3%across four pared to jump pointer prefetching and prefetch arrays,multi-chain prefetching achieves34%and11%higher performance for the selected Olden and SPECint2000benchmarks,pared to PSB,multi-chain prefetching achieves 27%higher performance for the selected Olden benchmarks,but PSB outperforms multi-chain prefetching by0.2%for the selected SPECint2000benchmarks.An ideal PSB with an inﬁnite markov predictor achieves comparable performance to multi-chain prefetching,coming within6% across all benchmarks.Finally,speculation can enable multi-chain prefetching for some dynamic traversal codes,but our technique loses its eﬀectiveness when the pointer-chain traversal order is highly dynamic.Categories and Subject Descriptors:B.8.2[Performance and Reliability]:Performance Anal-ysis and Design Aids;B.3.2[Memory Structures]:Design Styles—Cache Memories;C.0[Gen-eral]:Modeling of computer architecture;System Architectures; C.4[Performance of Sys-tems]:Design Studies;D.3.4[Programming Languages]:Processors—CompilersGeneral Terms:Design,Experimentation,PerformanceAdditional Key Words and Phrases:Data Prefetching,Memory parallelism,Pointer Chasing CodeA General Framework for Prefetch Scheduling·3performance platforms.The use of LDSs will likely have a negative impact on memory performance, making many non-numeric applications severely memory-bound on future systems. LDSs can be very large owing to their dynamic heap construction.Consequently, the working sets of codes that use LDSs can easily grow too large toﬁt in the processor’s cache.In addition,logically adjacent nodes in an LDS may not reside physically close in memory.As a result,traversal of an LDS may lack spatial locality,and thus may not beneﬁt from large cache blocks.The sparse memory access nature of LDS traversal also reduces the eﬀective size of the cache,further increasing cache misses.In the past,researchers have used prefetching to address the performance bot-tlenecks of memory-bound applications.Several techniques have been proposed, including software prefetching techniques[Callahan et al.1991;Klaiber and Levy 1991;Mowry1998;Mowry and Gupta1991],hardware prefetching techniques[Chen and Baer1995;Fu et al.1992;Jouppi1990;Palacharla and Kessler1994],or hybrid techniques[Chen1995;cker Chiueh1994;Temam1996].While such conventional prefetching techniques are highly eﬀective for applications that employ regular data structures(e.g.arrays),these techniques are far less successful for non-numeric ap-plications that make heavy use of LDSs due to memory serialization eﬀects known as the pointer chasing problem.The memory operations performed for array traver-sal can issue in parallel because individual array elements can be referenced inde-pendently.In contrast,the memory operations performed for LDS traversal must dereference a series of pointers,a purely sequential operation.The lack of memory parallelism during LDS traversal prevents conventional prefetching techniques from overlapping cache misses suﬀered along a pointer chain.Recently,researchers have begun investigating prefetching techniques designed for LDS traversals.These new LDS prefetching techniques address the pointer-chasing problem using several diﬀerent approaches.Stateless techniques[Luk and Mowry1996;Mehrotra and Harrison1996;Roth et al.1998;Yang and Lebeck2000] prefetch pointer chains sequentially using only the natural pointers belonging to the LDS.Existing stateless techniques do not exploit any memory parallelism at all,or they exploit only limited amounts of memory parallelism.Consequently,they lose their eﬀectiveness when the LDS traversal code contains insuﬃcient work to hide the serialized memory latency[Luk and Mowry1996].A second approach[Karlsson et al.2000;Luk and Mowry1996;Roth and Sohi1999],which we call jump pointer techniques,inserts additional pointers into the LDS to connect non-consecutive link elements.These“jump pointers”allow prefetch instructions to name link elements further down the pointer chain without sequentially traversing the intermediate links,thus creating memory parallelism along a single chain of pointers.Because they create memory parallelism using jump pointers,jump pointer techniques tolerate pointer-chasing cache misses even when the traversal loops contain insuﬃcient work to hide the serialized memory latency.However,jump pointer techniques cannot commence prefetching until the jump pointers have been installed.Furthermore,the jump pointer installation code increases execution time,and the jump pointers themselves contribute additional cache misses.ACM Transactions on Computer Systems4·Seungryul Choi et al.Finally,a third approach consists of prediction-based techniques[Joseph and Grunwald1997;Sherwood et al.2000;Stoutchinin et al.2001].These techniques perform prefetching by predicting the cache-miss address stream,for example us-ing hardware predictors[Joseph and Grunwald1997;Sherwood et al.2000].Early hardware predictors were capable of following striding streams only,but more re-cently,correlation[Charney and Reeves1995]and markov[Joseph and Grunwald 1997]predictors have been proposed that can follow arbitrary streams,thus en-abling prefetching for LDS traversals.Because predictors need not traverse program data structures to generate the prefetch addresses,they avoid the pointer-chasing problem altogether.In addition,for hardware prediction,the techniques are com-pletely transparent since they require no support from the programmer or compiler. However,prediction-based techniques lose their eﬀectiveness when the cache-miss address stream is unpredictable.This article investigates exploiting the natural memory parallelism that exists between independent serialized pointer-chasing traversals,or inter-chain memory parallelism.Our approach,called multi-chain prefetching,issues prefetches along a single chain of pointers sequentially,but aggressively pursues multiple independent pointer chains simultaneously whenever possible.Due to its aggressive exploitation of inter-chain memory parallelism,multi-chain prefetching can tolerate serialized memory latency even when LDS traversal loops have very little work;hence,it can achieve higher performance than previous stateless techniques.Furthermore,multi-chain prefetching does not use jump pointers.As a result,it does not suﬀer the overheads associated with creating and managing jump pointer state.Andﬁnally, multi-chain prefetching is an execution-based technique,so it is eﬀective even for programs that exhibit unpredictable cache-miss address streams.The idea of overlapping chained prefetches,which is fundamental to multi-chain prefetching,is not new:both Cooperative Chain Jumping[Roth and Sohi1999]and Dependence-Based Prefetching[Roth et al.1998]already demonstrate that simple “backbone and rib”structures can be prefetched in a multi-chain fashion.However, our work pushes this basic idea to its logical limit,enabling multi-chain prefetching for arbitrary data structures(our approach can exploit inter-chain memory paral-lelism for any data structure composed of lists,trees,and arrays).Furthermore, previous chained prefetching techniques issue prefetches in a greedy fashion.In con-trast,our work provides a formal and systematic method for scheduling prefetches that controls the timing of chained prefetches.By controlling prefetch arrival, multi-chain prefetching can reduce both early and late prefetches which degrade performance compared to previous chained prefetching techniques.In this article,we build upon our original work in multi-chain prefetching[Kohout et al.2001],and make the following contributions:(1)We present an LDS descriptor framework for specifying static LDS traversalsin a compact fashion.Our LDS descriptors contain data layout information and traversal code work information necessary for prefetching.(2)We develop an oﬀ-line algorithm for computing an exact prefetch schedulefrom the LDS descriptors that overlaps serialized cache misses across separate pointer-chain traversals.Our algorithm handles static LDS traversals involving either loops or recursion.Furthermore,our algorithm computes a schedule even ACM Transactions on Computer SystemsA General Framework for Prefetch Scheduling·5when the extent of dynamic data structures is unknown.To handle dynamic LDS traversals,we propose using speculation.However,our technique cannot handle codes in which the pointer-chain traversals are highly dynamic.(3)We present the design of a programmable prefetch engine that performs LDStraversal outside of the main CPU,and prefetches the LDS data using our LDS descriptors and the prefetch schedule computed by our scheduling algorithm.We also perform a detailed analysis of the hardware cost of our prefetch engine.(4)We introduce algorithms for extracting LDS descriptors from application sourcecode via static analysis,and implement them in a prototype compiler using the SUIF framework[Hall et al.1996].Our prototype compiler is capable of ex-tracting all the program-level information necessary for multi-chain prefetching fully automatically.(5)Finally,we conduct an experimental evaluation of multi-chain prefetching us-ing several pointer-intensive applications.Our evaluation compares compiler-instrumented multi-chain prefetching against jump pointer prefetching[Luk and Mowry1996;Roth and Sohi1999]and prefetch arrays[Karlsson et al.2000], two jump pointer techniques,as well as predictor-directed stream buﬀers[Sher-wood et al.2000],an all-hardware prediction-based technique.We also inves-tigate the impact of early prefetch arrival on prefetching performance,and we compare compiler-and manually-instrumented multi-chain prefetching to eval-uate the quality of the instrumentation generated by our compiler.In addition, we characterize the sensitivity of our technique to varying hardware stly,we undertake a preliminary evaluation of speculative multi-chain prefetching to demonstrate its potential in enabling multi-chain prefetching for dynamic LDS traversals.The rest of this article is organized as follows.Section2further explains the essence of multi-chain prefetching.Then,Section3introduces our LDS descriptor framework.Next,Section4describes our scheduling algorithm,Section5discusses our prefetch engine,and Section6presents our compiler for automating multi-chain prefetching.After presenting all our algorithms and techniques,Sections7and8 then report on our experimental methodology and evaluation,respectively.Finally, Section9discusses related work,and Section10concludes the article.2.MULTI-CHAIN PREFETCHINGThis section provides an overview of our multi-chain prefetching technique.Sec-tion2.1presents the idea of exploiting inter-chain memory parallelism.Then, Section2.2discusses the identiﬁcation of independent pointer chain traversals. 2.1Exploiting Inter-Chain Memory ParallelismThe multi-chain prefetching technique augments a commodity microprocessor with a programmable hardware prefetch engine.During an LDS computation,the prefetch engine performs its own traversal of the LDS in front of the processor,thus prefetching the LDS data.The prefetch engine,however,is capable of traversing multiple pointer chains simultaneously when permitted by the application.Conse-quently,the prefetch engine can tolerate serialized memory latency by overlapping cache misses across independent pointer-chain traversals.ACM Transactions on Computer Systems6·Seungryul Choi et al.<compute>ptr = A[i];ptr = ptr->next;while (ptr) {for (i=0; i < N; i++) {a)b)}<compute>ptr = ptr->next;while (ptr) {}}PD = 2INIT(ID ll);stall stall stallINIT(ID aol);stall stallFig.1.Traversing pointer chains using a prefetch engine.a).Traversal of a single linked list.b).Traversal of an array of lists data structure.To illustrate the idea of exploiting inter-chain memory parallelism,weﬁrst de-scribe how our prefetch engine traverses a single chain of pointers.Figure1a shows a loop that traverses a linked list of length three.Each loop iteration,denoted by a hashed box,contains w1cycles of work.Before entering the loop,the processor ex-ecutes a prefetch directive,INIT(ID ll),instructing the prefetch engine to initiate traversal of the linked list identiﬁed by the ID ll label.If all three link nodes suﬀer an l-cycle cache miss,the linked list traversal requires3l cycles since the link nodes must be fetched sequentially.Assuming l>w1,the loop alone contains insuﬃcient work to hide the serialized memory latency.As a result,the processor stalls for 3l−2w1cycles.To hide these stalls,the prefetch engine would have to initiate its linked list traversal3l−2w1cycles before the processor traversal.For this reason, we call this delay the pre-traversal time(P T).While a single pointer chain traversal does not provide much opportunity for latency tolerance,pointer chasing computations typically traverse many pointer chains,each of which is often independent.To illustrate how our prefetch engine exploits such independent pointer-chasing traversals,Figure1b shows a doubly nested loop that traverses an array of lists data structure.The outer loop,denoted by a shaded box with w2cycles of work,traverses an array that extracts a head pointer for the inner loop.The inner loop is identical to the loop in Figure1a.In Figure1b,the processor again executes a prefetch directive,INIT(ID aol), causing the prefetch engine to initiate a traversal of the array of lists data structure identiﬁed by the ID aol label.As in Figure1a,theﬁrst linked list is traversed sequentially,and the processor stalls since there is insuﬃcient work to hide the serialized cache misses.However,the prefetch engine then initiates the traversal of subsequent linked lists in a pipelined fashion.If the prefetch engine starts a new traversal every w2cycles,then each linked list traversal will initiate the required P T cycles in advance,thus hiding the excess serialized memory latency across multiple outer loop iterations.The number of outer loop iterations required to overlap each linked list traversal is called the prefetch distance(P D).Notice when P D>1, ACM Transactions on Computer SystemsA General Framework for Prefetch Scheduling·7 the traversals of separate chains overlap,exposing inter-chain memory parallelism despite the fact that each chain is fetched serially.2.2Finding Independent Pointer-Chain TraversalsIn order to exploit inter-chain memory parallelism,it is necessary to identify mul-tiple independent pointer chains so that our prefetch engine can traverse them in parallel and overlap their cache misses,as illustrated in Figure1.An important question is whether such independent pointer-chain traversals can be easily identi-ﬁed.Many applications perform traversals of linked data structures in which the or-der of link node traversal does not depend on runtime data.We call these static traversals.The traversal order of link nodes in a static traversal can be determined a priori via analysis of the code,thus identifying the independent pointer-chain traversals at compile time.In this paper,we present an LDS descriptor frame-work that compactly expresses the LDS traversal order for static traversals.The descriptors in our framework also contain the data layout information used by our prefetch engine to generate the sequence of load and prefetch addresses necessary to perform the LDS traversal at runtime.While compile-time analysis of the code can identify independent pointer chains for static traversals,the same approach does not work for dynamic traversals.In dynamic traversals,the order of pointer-chain traversal is determined at runtime. Consequently,the simultaneous prefetching of independent pointer chains is limited since the chains to prefetch are not known until the traversal order is computed, which may be too late to enable inter-chain overlap.For dynamic traversals,it may be possible to speculate the order of pointer-chain traversal if the order is pre-dictable.In this paper,we focus on static LDS ter in Section8.7,we illustrate the potential for predicting pointer-chain traversal order in dynamic LDS traversals by extending our basic multi-chain prefetching technique with specula-tion.3.LDS DESCRIPTOR FRAMEWORKHaving provided an overview of multi-chain prefetching,we now explore the al-gorithms and hardware underlying its implementation.We begin by introducing a general framework for compactly representing static LDS traversals,which we call the LDS descriptor framework.This framework allows compilers(and pro-grammers)to compactly specify two types of information related to LDS traversal: data structure layout,and traversal code work.The former captures memory refer-ence dependences that occur in an LDS traversal,thus identifying pointer-chasing chains,while the latter quantiﬁes the amount of computation performed as an LDS is traversed.After presenting the LDS descriptor framework,subsequent sections of this article will show how the information provided by the framework is used to perform multi-chain prefetching(Sections4and5),and how the LDS descriptors themselves can be extracted by a compiler(Section6).3.1Data Structure Layout InformationData structure layout is speciﬁed using two descriptors,one for arrays and one for linked lists.Figure2presents each descriptor along with a traversal code exampleACM Transactions on Computer Systems8·Seungryul Choi etal.a).b).Bfor (i = 0 ; i < N ; i++) {... = data[i].value;}for (ptr = root ; ptr != NULL; ) { ptr = ptr->next;}Fig.2.Two LDS descriptors used to specify data layout information.a).Array descriptor.b).Linked list descriptor.Each descriptor appears inside a box,and is accompanied by a traversal code example and an illustration of the data structure.and an illustration of the traversed data structure.The array descriptor,shown in Figure 2a,contains three parameters:base (B ),length (L ),and stride (S ).These parameters specify the base address of the array,the number of array elements traversed by the application code,and the stride between consecutive memory ref-erences,respectively.The array descriptor speciﬁes the memory address stream emitted by the processor during a constant-stride array traversal.Figure 2b illus-trates the linked list descriptor which contains three parameters similar to the array descriptor.For the linked list descriptor,the B parameter speciﬁes the root pointer of the list,the L parameter speciﬁes the number of link elements traversed by the application code,and the ∗S parameter speciﬁes the oﬀset from each link element address where the “next”pointer is located.The linked list descriptor speciﬁes the memory address stream emitted by the processor during a linked list traversal.To specify the layout of complex data structures,our framework permits descrip-tor composition.Descriptor composition is represented as a directed graph whose nodes are array or linked list descriptors,and whose edges denote address generation dependences.Two types of composition are allowed.The ﬁrst type of composition is nested composition .In nested composition,each address generated by an outer descriptor forms the B parameter for multiple instantiations of a dependent inner descriptor.An oﬀset parameter,O ,is speciﬁed in place of the inner descriptor’s B parameter to shift its base address by a constant oﬀset.Such nested descriptors cap-ture the memory reference streams of nested loops that traverse multi-dimensional data structures.Figure 3presents several nested descriptors,showing a traversal code example and an illustration of the traversed multi-dimensional data structure along with each nested descriptor.Figure 3a shows the traversal of an array of structures,each structure itself containing an array.The code example’s outer loop traverses the array “node,”ac-cessing the ﬁeld “value”from each traversed structure,and the inner loop traverses ACM Transactions on Computer SystemsA General Framework for Prefetch Scheduling·9a).b).c).for (i = 0 ; i < L 0 ; i++) {... = node[i].value;for (j = 0 ; j < L 1 ; j++) {... = node[i].data[j];}}for (i = 0 ; i < L 0 ; i++) {down = node[i].pointer;for (j = 0 ; j < L 1 ; j++) {... = down->data[j];}}node for (i = 0 ; i < L 0 ; i++) {for (j = 0 ; j < L 1 ; j++) {... = node[i].data[j];}down = node[i].pointer;for (j = 0 ; j < L 2 ; j++) {... = down->data[j];}}node Fig.3.Nested descriptor composition.a).Nesting without indirection.b).Nesting with indirection.c).Nesting multiple descriptors.Each descriptor composition appears inside a box,and is accompanied by a traversal code example and an illustration of the composite data structure.each embedded array “data.”The outer and inner array descriptors,(B,L 0,S 0)and (O 1,L 1,S 1),represent the address streams produced by the outer and inner loop traversals,respectively.(In the inner descriptor,“O 1”speciﬁes the oﬀset of each inner array from the top of each structure).Figure 3b illustrates another form of descriptor nesting in which indirection is used between nested descriptors.The data structure in Figure 3b is similar to the one in Figure 3a,except the in-ner arrays are allocated separately,and a ﬁeld from each outer array structure,“node[i].pointer,”points to a structure containing the inner array.Hence,as shown in the code example from Figure 3b,traversal of the inner array requires indirect-ing through the outer array’s pointer to compute the inner array’s base address.In our framework,this indirection is denoted by placing a “*”in front of the inner descriptor.Figure 3c,our last nested descriptor example,illustrates the nestingACM Transactions on Computer Systems10·Seungryul Choi et al.main() { foo(root, depth_limit);}foo(node, depth) { depth = depth - 1; if (depth == 0 || node == NULL)return;foo(node->child[0], depth);foo(node->child[1], depth);foo(node->child[2], depth);}Fig.4.Recursive descriptor composition.The recursive descriptor appears inside a box,and is accompanied by a traversal code example and an illustration of the tree data structure.of multiple inner descriptors underneath a single outer descriptor to represent the address stream produced by nested distributed loops.The code example from Fig-ure 3c shows the two inner loops from Figures 3a-b nested in a distributed fashion inside a common outer loop.In our framework,each one of the multiple inner array descriptors represents the address stream for a single distributed loop,with the order of address generation proceeding from the leftmost to rightmost inner descriptor.It is important to note that while all the descriptors in Figure 3show two nesting levels only,our framework allows an arbitrary nesting depth.This permits describ-ing higher-dimensional LDS traversals,for example loop nests with >2nesting depth.Also,our framework can handle non-recurrent loads using “singleton”de-scriptors.For example,a pointer to a structure may be dereferenced multiple times to access diﬀerent ﬁelds in the structure.Each dereference is a single non-recurrent load.We create a separate descriptor for each non-recurrent load,nest it under-neath its recurrent load’s descriptor,and assign an appropriate oﬀset value,O ,and length value,L =1.In addition to nested composition,our framework also permits recursive compo-sition .Recursively composed descriptors describe depth-ﬁrst tree traversals.They are similar to nested descriptors,except the dependence edge ﬂows backwards.Since recursive composition introduces cycles into the descriptor graph,our frame-work requires each backwards dependence edge to be annotated with the depth of recursion,D ,to bound the size of the data structure.Figure 4shows a simple recursive descriptor in which the backwards dependence edge originates from and terminates to a single array descriptor.The “L”parameter in the descriptor spec-iﬁes the fanout of the tree.In our example,L =3,so the traversed data structure is a tertiary tree,as shown in Figure 4.Notice the array descriptor has both B and O parameters–B provides the base address for the ﬁrst instance of the descriptor,while O provides the oﬀset for all recursively nested instances.In Figures 2and 4,we assume the L parameter for linked lists and the D parame-ter for trees are known a priori,which is generally not ter in Section 4.3,we discuss how our framework handles these unknown descriptor parameters.In addi-ACM Transactions on Computer Systems。

2009-Challenges and opportunities for virtualized security in the clouds

Keynote TalkChallenges and Opportunitiesfor Virtualized Security in the CloudsFrank SiebenlistArgonne National Laboratory – University of ChicagoArgonne, IL, USAfranks@AbstractThe virtualization technologies that underlie the cloud computing infrastructures pose challenges on enforcing security policy when we have a sense of ambiguity concerning the actual physical properties of the resources. On the other hand, the virtual machine managers provide us with better sandboxing, detailed monitoring capabilities and fine-grained access control on the resource usage.As we expect the whole world to virtualize over the next 5-10 years, the presentation will present a forward-looking view on the cloudy road ahead with a focus on the associated security topics.1Categories & Subject Descriptors: D.4.6 [OPERATING SYSTEMS] Security andProtection; K.6.5 [MANAGEMENT OF COMPUTING AND INFORMATION SYSTEMS]Security and Protection; C.2.0 [COMPUTER-COMMUNICATION NETWORKS] General ---Security and protection (e.g., firewalls)General Terms: Security, Management, DesignBioFrank Siebenlist is a Senior Security Architect at the Mathematics and Computer Science Division at Argonne National Laboratory and a Fellow at the Computation Institute of the University of Chicago. Frank has a Ph.D. in Physics from the University of Amsterdam in the Netherlands. He has extensive experience with distributed computing and security. He has worked for major financial institutions (VP at Citibank and Senior Consultant at J.P. Morgan) in Hong Kong and New York. He has also worked for a number of technology companies, including start-ups in Silicon Valley (Chief Architect at DASCOM and Chief Security Architect at Eazel), and at IBM asa Senior Security Architect. He currently works on the security aspects of various DOE/NSF/NIH-funded Grid projects that deal with cancer research, climate modeling, astronomy, elementary particles, etc. Furthermore, Frank authored, influenced and contributed to numerous of security related standards at X/Open, Open Group, IETF, OMG, OGF, and OASIS.1 This work was supported by the U.S. Dept. of Energy under contract DE-AC02-06CH11357.Copyright is held by the author/owner(s).SACMAT’09, June 3–5, 2009, Stresa, Italy.ACM 978-1-60558-537-6/09/06.1。

Boid规则

Abstract
The aggregate motion of a flock of birds, a herd of land animals, or a school of fish is a beautiful and familiar part of the n a t u r a l world. But this type of complex motion is rarely seen in computer animation. This paper explores an approach based on simulation as an alternative to scripting the paths of each bird individually. The simulated flock is an elaboration of a particle system, with the simulated birds being the particles. The aggregate motion of the simulated flock is created by a distributed behavioral model much like that at work in a natural flock; the birds choose their own course. Each simulated bird is implemented as an independent actor that navigates according to its local perception of the dynamic environment, the laws of simulated physics that rule its motion, and a set of behaviors programmed into it by the "animator." The aggregate motion of the simulated flock is the result of the dense interaction of the relatively simple behaviors of the individual simulated birds. Categories and Subject Descriptors: 1.2.10 [Artificial Intelligence]: Vision and Scene Understanding; 1.3.5 [Computer Graphics]: Computational Geometry and Object Modeling; 1.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism--Animation; 1.6.3 [Simulation and Modeling[: Applications. General Terms: Algorithms, design. Additional Key Words, and Phrases: flock, herd, school, bird, fish, aggregate motion, particle system, actor, flight, behavioral animation, constraints, path planning. it seems randomly arrayed and yet is magnificently synchronized. Perhaps most puzzling is the strong impression of intentional, centralized control. Yet all evidence indicates that flock motion must be merely the aggregate result of the actions of individual animals, each acting solely on the basis of its own local perception of the world. One area of interest within computer animation is the description and control of all types of motion. Computer animators seek both to invent wholly new types of abstract motion and to duplicate (or make variations on) the motions found in the real world. At first glance, producing an animated, computer graphic portrayal of a flock of birds presents significant difficulties. Scripting the path of a large number of individual objects using traditional computer animation techniques would be tedious. Given the complex paths that birds follow, it is doubtful this specification could be made without error. Even if a reasonable number of suitable paths could be described, it is unlikely that the constraints of flock motion could be maintained (for example, preventing collisions between all birds at each frame). Finally, a flock scripted in this manner would be hard to edit (for example, to alter the course of all birds for a portion of the animation). It is not impossible to script flock motion, but a better approach is needed for efficient, robust, and believable animation of flocks and related group motions. This paper describes one such approach. This approach assumes a flock is simply the result of the interaction between the behaviors of individual birds. To simulate a flock we simulate the behavior of an individual bird (or at least that portion of the bird's behavior that allows it to participate in a flock). To support this behavioral "control structure" we must also simulate portions of the bird's perceptual mechanisms and aspects of the physics of aerodynamic flight. If this simulated bird model has the correct flock-member behavior, all that should be required to create a simulated flock is to create some instances of the simulated bird model and allow them to interact. ** Some experiments with this sort of simulated flock are described in more detail in the remainder of this paper. The suc*In this paperflock refers generically to a group of objects that exhibit this general class of polarized, noncolliding, aggregate motion. The term polarization is from zoology, meaning alignment of animal groups. English is rich with terms for groups of animals; for a charming and literate discussion of such words see An Exultation of Larks. [16] **This paper refers to these simulated bird-like, "bird-old" objects generically as "boids" even when they represent other sorts of creatures such as schooling fish.

科技外文文献原文

AMBULANT:A Fast,Multi-Platform Open Source SML Player Dick C.A. Bulterman, Jack Jansen, Kleanthis Kleanthous, Kees Blom and Daniel Benden CWI: Centrum voor Wiskunde en InformaticaKruislaan 4131098 SJ Amsterdam, The Netherlands +31 20 592 43 00 Dick.Bulterman@cwi.nl ABSTRACTThis paper provides an overview of the Ambulant Open SMIL player. Unlike other SMIL implementations, the Ambulant Player is a reconfigureable SMIL engine that can be customized for use as an experimental media player core.The Ambulant Player is a reference SMIL engine that can be integrated in a wide variety of media player projects. This paper starts with an overview of our motivations for creating a new SMIL engine then discusses the architecture of the Ambulant Core (including the scalability and custom integration features of the player).We close with a discussion of our implementation experiences with Ambulant instances for Windows,Mac and Linux versions for desktop and PDA devices.Categories and Subject Descriptors H.5.1 Multimedia Information Systems [Evaluation]H.5.4 Hypertext/Hypermedia [Navigation]. General TermsExperimentation, Performance, V erification KeywordsSMIL, Player, Open-Source, Demos1.MOTIV ATIONThe Ambulant Open SMIL Player is an open-source, full featured SMIL 2.0 player. It is intended to be used within the researcher community (in and outside our institute) in projects that need source code access to a production-quality SMIL player environment. It may also be used as a stand-alone SMIL player for applications that do not need proprietary mediaformats.The player supports a range of SMIL 2.0 profiles ( including desktop and mobile configurations) and is available in distributions for Linux, Macintosh, and Windows systems ranging from desktop devices to PDA and handheld computers. While several SMIL player implementationsexist,including the RealPlayer [4], InternetExplorer [5], PocketSMIL [7],GRiNS [6],X-SMILES [8] and various proprietary implementations for mobile devices, we developed Ambulant for three reasons:Permission to make digital or hard copiesof all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish,to post on servers or to redistribute tolists,requires prior specific permissionand/or a fee.'MM' 04, October 10-16, 2004, New Y ork, New Y ork, USA.Copyright 2004 ACM 1-58113-893-8/04/0010...$5.00.•N one of the existi ng SMIL players provides a complete and correct SMIL 2.0 implementation. The Ambulant player implements all of SMIL, based on the SMIL 2.0 Language profile plus extensions to support advanced animation and the needs of the mobile variant used by the 3GPP/PSS-6 SMIL specification [9]. •A ll commercial SMIL players are geared to the presentation of proprietary media. The Ambulant player uses open-source media codecs and open-source network transfer protocols, so that the player can be easily customized foruse in a wide range of researchprojects.• Our goal is to build a platform that will encourage the development of comparable multimedia research output.By providing what we expect will be a standard baseline player, other researchers and developmentorganizations can concentrate on integratingextensions to the basic player (either in terms of new media codecs or new network control algorithms). These extensions can then be shared by others.In contrast to the Helix client architecture [10], which also moved to a GPL core in mid-2004, the Ambulant player supports a wider range of SMIL target application architectures,it provides a more complete and correct implementation of the SMIL language,it provides much better performance on low-resource devices and it provides a more extensible media player architecture. It also provides an implementation that includes all of the media codecs as part of the open client infrastructure.The Ambulant target community is not viewers of media content, but developers of multimedia infrastructures, protocols and networks. Our goal has been to augument the existing partial SMIL implementations produced by many groups with a complete implementation that supports even the exotic features of the SMIL language. The following sections provide an introduction to the architecture of the player and describe the state of the various Ambulant implementations. We then discuss how the Ambulant Core can be re-purposed in other projects. We start with a discussion of Ambulant 's functional support for SMIL.2.FUNCTIONAL SUPPORT FOR SMIL 2.0The SMIL 2.0 recommendation [1] defines 10 functional groups that are used to structure the standard '5s0+ modules. These modules define the approximately 30 XML elements and 150 attributes that make up the SMIL 2.0 language. In addition to defining modules, the SMIL 2.0 specification also defines a number of SMIL profiles: collection of elements, attributes and attribute values that are targeted to meet the needs of a particular implementation community. Common profiles include the full SMIL 2.0 Language, SMIL Basic, 3GPP SMIL,XHTML+SMIL and SMIL 1.0 profiles.A review of these profiles is beyond the scope of this paper(see [2]), but a key concern of Ambulant ' sdevelopment has been to provide a player core that can be used to support a wide range of SMIL target profiles with custom player components.This has resulted in an architecture that allows nearly all aspects of the player to be plug-replaceable via open interfaces. In this way, tailored layout, scheduling, media processing and interaction modules can be configured to meet the needs of individual profile requirements. The Ambulant player is the only player that supports this architecture.The Ambulant player provides a direct implementation of the SMIL 2.0 Language profile, plus extensions that provide enhanced support for animation and timing control. Compared with other commercial and non-commercial players, the Ambulant player implements not only a core scheduling engine, it also provides complete support for SMIL layout,interaction, content control and networking facilities.Ambulant provides the most complete implementation of the SMIL language available to date.3.AMBULANT ARCHITECTUREThis section provides an overview of the architecture of the Ambulant core. While this discussion is high-level, it will provide sufficient detail to demonstrate the applicability of Ambulant to a wide range of projects. The sections below consider thehigh-level interface structure, the common services layer and the player com mon core architecture.3.1The High-Level Interface StructureFigure 1 shows the highest level player abstract ion. The player core support top-level con trol exter nal entry points (in clud ing play/stop/pause) and in turn man ages a collection of external factories that provide in terfaces to data sources (both for sta ndard and pseudo-media), GUI and window system interfaces and in terfaces to ren derers. Unlike other players that treat SMIL as a datatype [4],[10], the Ambula nt en gi ne has acen tral role in in teractio n with the input/output/scree n/devices in terfaces.This architecture allows the types of entry points (and the moment of evaluation) to be customized and separated from the various data-sources and renderers. This is important forintegration with environments that may use non-SMIL layout or special device in terface process ing.Figuit 1 k Ambulaittliigk-ljtwLstruchm.3.2The Common Services LayerFigure 2 shows a set of com mon services that are supplied for the player to operate. These in clude operati ng systems in terfaces, draw ing systems in terfaces and support for baseli ne XML fun ctio ns.All of these services are provided by Ambulant; they may also be integrated into other player-related projects or they may be replaced by new service components that are optimized for particular devices or algorithms. Hsurt 2. Amldant Common [Services Liwr/3.3The Player Common CoreFigure 3 shows a slightly abstracted view ofthe Ambula nt com mon core architecture. The view is essentially that of a single instanceof the Ambula nt player. Although only oneclass object is shown for eachservice,multiple interchangeable implementations have been developed for all objects (except the DOM tree) during theplayer 'development. As an example,multiple schedulers have bee n developed to match the fun cti onalcapabilities of various SMIL profiles.Arrows in the figure denote that one abstract class depends on the services offered by the other abstract class. Stacked boxes denote that a si ngle in sta nce of the player will con tain in sta nces of multiple con crete classes impleme nting that abstract class: one for audio, one for images, etc. All of the stacked-box abstract classes come with a factory function to create the in sta nces of the required con crete class.The bulk of the player implementation is architected to be platform in depe ndent. As we will discuss, this platform in depe ndent component has already been reused for five separate player impleme ntati ons. The platform dependent portions of the player include support for actual ren deri ng, UI in teract ion and datasource processing and control. When the player is active, there is asingle instanee of the scheduler and layout manager, both of which depend on the DOM tree object. Multiple instances of data source and playable objects are created. These in teract with multiple abstract rendering surfaces. The playable abstract class is the scheduler in terface (play, stop) for a media no de, while the renderer abstract class is the drawing in terface (redraw). Note that not all playables are ren derers (audio, SMIL ani mati on). The architecture has bee n desig ned to have all comp onents be replaceable, both in terms of an alter native impleme ntati on of a give n set of functionality and in terms of a complete re-purposing of the player components. In this way, the Ambulant core can be migrated to being a special purpose SMIL engine or a non-SMIL engine (such as support for MPEG-4 or other sta ndards).The abstract in terfaces provided by the player do not require a “ SMIL on Top” model of docume nt process ing. The abstract in terface can be used with other high-level control 4.1 Implementation PlatformsSMIL profiles have been defined for a widerange of platforms and devices, ranging fromdesktop implementations to mobile devices. Inorder to support our research on distributedmodels (such as in an XHTML+SMIL implementation), or to control non-SMILlower-level rendering (such as timed text).Note that in order to improve readability of theillustrati on, all auxiliary classes (threadi ng, geometry and color han dli ng, etc.) and several classes that were not important for general un dersta nding (player driver engine, transitions, etc.) have been left out of the diagram.4. IMPLEMENTATION EXPERIENCESThis sectio nwill briefly review ourimpleme ntatio n experie nces with theAmbula nt player. We discuss the implementation platforms used during SMIL ' s development and describe a set of test documents that were created to test the fun cti on ality of the Ambula nt player core. We con clude with a discussi on on the performa nee of the Ambula nt player.SMIL document extensions and to provide a player that was useful for other research efforts, we decided to provide a wide range of SMIL implementations for the Ambulant project. The Ambulant core is available as a single C++ source distribution that provides support for the following platforms:•Linux: our source distributi on in elude makefiles that are used with the RH-8 distribution of Linux. We provide support for media using the FF-MPEG suite [11]. The player interface is built using the Qt toolkit [12]. •Macintosh:Ambulant supports Mac OS X 10.3. Media rendering support is available via the internal Quicktime API and via FF-MPEG . The player user interface uses standard Mac conventions and support (Coca). •Windows: Ambulant provides conventional Win32 support for current generation Windows platforms. It has been most extensivelytested with XP (Home,Professional and TabletPC) and Windows-2000. Media rendering include third-party and local support for imaging and continuous media. Networking and user interface support are provided using platform-embeddedlibraries.•PocketPC: Ambulant supports PocketPC-2000,PocketPC-2002andWindows Mobile 2003 systems. The PocketPC implementations provide support for basic imaging, audio and text facilities.•Linux PDA support:Ambulant provides support for the Zaurus Linux-PDA. Media support is provided via the FF-MPEG library and UI support is provide via Qt. Media support includes audio, images and simple text.In each of these implementations, our initial focus has been on providing support for SMIL scheduling and control functions.We have not optimized media renderer support in the Ambulant 1.0 releases, but expect to provide enhanced support in future versions. 4.2 Demos and Test SuitesIn order to validate the Ambulant player implementation beyond that available with the standard SMIL test suite [3], several demo and test documents have been distributed with the player core. The principal demos include: •Welcome: A short presentation that exercises basic timing,media rendering, transformations and animation.•NYC: a short slideshow in desktop and mobile configurations that exercises scheduling, transformation and media rendering.•News: a complex interactive news document that tests linking, event-based activation, advanced layout, timing and media integration. Like NYC, this demo support differentiated mobile and desktop configurations.•Links: a suite of linking and interaction test cases.•Flashlight: an interactive user'sguide that tests presentation customization using custom test attributes and linking/interaction support. These and other demos are distributed as part of the Ambulant player web site [13].4.3Performance EvaluationThe goal of the Ambulant implementation was to provide a complete and fast SMIL player. We used a C++ implementation core instead of Java or Python because our experience had shownthat on small devices (which we feel hold significant interest for future research), the efficiency of the implementation still plays a dominant role. Our goal was to be able to read, parse, model and schedule a 300-node news presentation in less than two seconds on desktop and mobile platforms. This goal was achieved for all of the target platforms used in the player project. By comparison, the same presentation on the Oratrix GRiNS PocketPC player took 28 seconds to read, parse and schedule. (The Real PocketPC SMIL player and the PocketSMIL players were not able to parseand schedule the document at all because of their limited SMIL language support.)In terms of SMIL language performance, our goal was to provide a complete implementation of the SMIL 2.0 Language profile[14]. Where other players have implemented subsets of this profile,Ambulant has managed to implement the entire SMIL 2.0 feature set with two exceptions: first, we currently do not support the prefetch elements of the content control modules; second, we provide only single top-level window support in the platform-dependent player interfaces. Prefetch was not supported because of the close association of an implementation with a given streaming architecture. The use of multiple top-level windows, while supported in our other SMIL implementation, was not included in version 1.0 of Ambulant because of pending working on multi-screen mobile devices. Both of these feature are expected to be supported in the next release of Ambulant.5.CURRENT STATUS AND AVAILABILITYT his paper describes version 1.0 of the Ambulant player, which was released on July 12, 2004. (This version is also known as the Ambulant/O release of the player.) Feature releases and platform tuning are expected to occur in the summer of 2004. The current release of Ambulant is always available via our SourceForge links [13], along with pointers to the most recent demonstrators and test suites.The W3C started its SMIL 2.1 standardization in May, 2004.At the same time, the W3C' s timed text working group is completing itsfirst public working draft. We will support both of these activities in upcoming Ambulant releases.6.CONCLUSIONSWhile SMIL support is becoming ubiquitous (in no small part due to its acceptance within the mobile community), the availability of open-source SMIL players has been limited. This has meant that any group wishing to investigate multimedia extensions or high-/low-level user or rendering support has had to make a considerable investment in developing a core SMIL engine.We expect that by providing a high-performance, high-quality and complete SMIL implementation in an open environment, both our own research and the research agendas of others can be served. By providing a flexible player framework, extensions from new user interfaces to new rendering engines or content control infrastructures can be easily supported.7.ACKNOWLEDGEMENTS This work was supported by the Stichting NLnet in Amsterdam.8.REFERENCES[1]W3C,SMIL Specification,/AudioVideo.[2]Bulterman,D.C.A and Rutledge, L.,SMIL 2.0:Interactive Multimedia for Weband Mobile Devices, Springer, 2004.[3]W3C,SMIL2.0 Standard Testsuite,/2001/SMIL20/testsuite/[4]RealNetworks,The RealPlayer 10,/[5]Microsoft,HTML+Time in InternetExplorer 6,/workshop/author/behaviors/time.asp[6]Oratrix, The GRiNS 2.0 SMIL Player./[7]INRIA,The PocketSMIL 2.0 Player,wam.inrialpes.fr/software/pocketsmil/. [8],X-SMILES: An Open XML-Browser for ExoticApplications./[9]3GPP Consortium,The Third-GenerationPartnership Project(3GPP)SMIL PSS-6Profile./ftp/Specs/archive/26_series/26.246/ 26246-003.zip[10]Helix Community,The Helix Player./.[11]FFMPEG ,FF-MPEG:A Complete Solution forRecording,Converting and Streaming Audioand Video./[12]Trolltech,Qtopia:The QT Palmtop/[13]Ambulant Project,The Ambulant 1.0 Open Source SMIL 2.0Player, /.[14]Bulterman,D.C.A.,A Linking andInteraction Evaluation Test Set for SMIL,Proc. ACM Hypertext 2004, SantaCruz,August, 2004.。

3 Natural Language Question Answering over RDF - A Graph Data Driven Approach

Natural Language Question Answering over RDF — A Graph Data Driven ApproachLei ZouPeking University Beijing, ChinaRuizhe HuangPeking University Beijing, ChinaHaixun Wang ∗Microsoft Research Asia Beijing, Chinazoulei@ Jeffrey Xu YuThe Chinese Univ. of Hong Kong Hong Kong, Chinahuangruizhe@ Wenqiang HePeking University Beijing, Chinahaixun@ Dongyan ZhaoPeking University Beijing, Chinayu@.hk ABSTRACThewenqiang@zhaody@RDF question/answering (Q/A) allows users to ask questions in natural languages over a knowledge base represented by RDF. To answer a national language question, the existing work takes a twostage approach: question understanding and query evaluation. Their focus is on question understanding to deal with the disambiguation of the natural language phrases. The most common technique is the joint disambiguation, which has the exponential search space. In this paper, we propose a systematic framework to answer natural language questions over RDF repository (RDF Q/A) from a graph data-driven perspective. We propose a semantic query graph to model the query intention in the natural language question in a structural way, based on which, RDF Q/A is reduced to subgraph matching problem. More importantly, we resolve the ambiguity of natural language questions at the time when matches of query are found. The cost of disambiguation is saved if there are no matching found. We compare our method with some state-of-theart RDF Q/A systems in the benchmark dataset. Extensive experiments conﬁrm that our method not only improves the precision but also speeds up query performance greatly.and predicates are edge labels. Although SPARQL is a standard way to access RDF data, it remains tedious and difﬁcult for end users because of the complexity of the SPARQL syntax and the RDF schema. An ideal system should allow end users to proﬁt from the expressive power of Semantic Web standards (such as RDF and SPARQLs) while at the same time hiding their complexity behind an intuitive and easy-to-use interface [13]. Therefore, RDF question/answering (Q/A) systems have received wide attention in both NLP (natural language processing) [29, 2] and database areas [30].1.1MotivationCategories and Subject DescriptorsH.2.8 [Database Management]: Database Applications—RDF, Graph Database, Question Answering1.INTRODUCTIONAs more and more structured data become available on the web, the question of how end users can access this body of knowledge becomes of crucial importance. As a de facto standard of a knowledge base, RDF (Resource Description Framework) repository is a collection of triples, denoted as subject, predicate, object , and can be represented as a graph, where subjects and objects are vertices∗ Haixun Wang is currently with Google Research, Mountain View, CA.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. Request permissions from Permissions@. SIGMOD’14, June 22–27, 2014, Snowbird, UT, USA. Copyright 2014 ACM 978-1-4503-2376-5/14/06 ...$15.00. /10.1145/2588555.2610525 ...$.There are two stages in RDF Q/A systems: question understanding and query evaluation. Existing systems in the ﬁrst stage translate a natural language question N into SPARQLs [29, 6, 13], and in the second stage evaluate all SPARQLs translated in the ﬁrst stage. The focus of the existing solutions is on query understanding. Let us consider a running example in Figure 1. The RDF dataset is given in Figure 1(a). For the natural language question N “Who was married to an actor that played in Philadelphia ? ”, Figure 1(b) illustrates the two stages done in the existing solutions. The inherent hardness in RDF Q/A is the ambiguity of natural language. In order to translate N into SPARQLs, each phrase in N should map to a semantic item (i.e, an entity or a class or a predicate) in RDF graph G. However, some phrases have ambiguities. For example, phrase “Philadelphia” may refer to entity Philadelphia(ﬁlm) or Philadelphia_76ers . Similarly, phrase “play in” also maps to predicates starring or playForTeam . Although it is easy for humans to know the mapping from phrase “Philadelphia” (in question N ) to Philadelphia_76ers is wrong, it is uneasy for machines. Disambiguating one phrase in N can inﬂuence the mapping of other phrases. The most common technique is the joint disambiguation [29]. Existing disambiguation methods only consider the semantics of a question sentence N . They have high cost in the query understanding stage, thus, it is most likely to result in slow response time in online RDF Q/A processing. In this paper, we deal with the disambiguation in RDF Q/A from a different perspective. We do not resolve the disambiguation problem in understanding question sentence N , i.e., the ﬁrst stage. We take a lazy approach and push down the disambiguation to the query evaluation stage. The main advantage of our method is it can avoid the expensive disambiguation process in the question understanding stage, and speed up the whole performance. Our disambiguation process is integrated with query evaluation stage. More speciﬁcally, we allow that phrases (in N ) correspond313Subject Antonio_Banderas Antonio_Banderas Antonio_Banderas Philadelphia_(film) Jonathan_Demme Philadelphia Aaron_McKie James_Anderson Constantin_Stanislavski Philadelphia_76ers An_Actor_Prepares c1 actor <type> u2 Antonio_Banderas <starring> <spouse> u1 Melanie_Griffith c2Predicate type spouse starring type director type bornIn create type type c3Object actor Melanie_Griffith Philadelphia_(film) film Philadelphia_(film) city Philadelphia An_Actor_Prepares Basketball_team Book Philadelphia city actor play in Disambiguation be married to SPARQL GenerationWho was married to an actor that play in Philadelphia ? Generating a Semantic Query Graph <spouse> <actor> <An_Actor_Prepares> <starring> <playedForTeam> <Philadelphia> <Philadelphia(film)> <Philadelphia_76ers> ?who v1 Who be married to “that” v2 play in actor <spouse, 1.0> <playForTeam, 1.0> <actor, 1.0> <Philadelphia, 1.0> <starring, 0.9> <Philadelphia(film), 0.9> <An_Actor_Prepares, 0.9> <director, 0.5> <Philadelphia_76ers, 0.8> Philadelphia v3 Semantic Query GraphplayedForTeam Philadelphia_76ersfilm <type> u3 Philadelphia_(film) <director> u4 Jonathan_Demme <type> u5 Philadelphia <bornIn > u6 Aaron_McKie c4 Basketball_team <type>SELECT ?y WHERE { ?x starring Philadelphia_ ( film ) . ?x type actor . ?x spouse ?y. } Query EvaluationFinding Top-k Subgraph Matches c1 actor u2 <spouse> u1 Melanie_Griffith <type> Antonio_Banderasu7 James_Anderson<playedForTeam>u10 Philadelphia_76ers u8 c5 Book Constantin_Stanislavski <create> <type> u10 An_Actor_Prepares (a) RDF Dataset and RDF GraphSPARQL Query Engine<starring> u3 Philadelphia_(film)?y: Melanie_Griffith (b) SPARQL Generation-and-Query Framework(c) Our FrameworkFigure 1: Question Answering Over RDF Dataset to multiple semantic items (e.g., subjects, objects and predicates) in RDF graph G in the question understanding stage, and resolve the ambiguity at the time when matches of the query are found. The cost of disambiguation is saved if there are no matching found. In our problem, the key problem is how to deﬁne a “match” of question N in RDF graph G and how to ﬁnd matches efﬁciently. Intuitively, a match is a subgraph (of RDF graph G) that can ﬁt the semantics of question N . The formal deﬁnition of the match is given in Deﬁnition 3 (Section 2). We illustrate the intuition of our method by an example. Consider a subgraph of graph G in Figure 1(a) (the subgraph induced → by vertices u1 , u2 , u3 and c1 ). Edge − u− 2 c1 says that “Antonio Ban− − → deras is an actor”. Edge u2 u1 says that “Melanie Grifﬁth is mar→ ried to Antonio Banderas”. Edge − u− 2 u3 says that “Antonio Banderas starred in a ﬁlm Philadelphia(ﬁlm) ”. The natural language question N is “Who was married to an actor that played in Philadel→ − − → phia”. Obviously, the subgraph formed by edges − u− 2 c1 , u2 u1 and − − → u2 u3 is a match of N . “Melanie Grifﬁth” is a correct answer. On the other hand, we cannot ﬁnd a match (of N ) containing Philadelphia _76ers in RDF graph G. Therefore, the phrase “Philadelphia” (in N ) cannot map to Philadelphia_76ers . This is the basic idea of our data-driven approach. Different from traditional approaches, we resolve the ambiguity problem in the query evaluation stage. A challenge of our method is how to deﬁne a “match” between a subgraph of G and a natural language question N . Because N is unstructured data and G is graph structure data, we should ﬁll the gap between two kinds of data. Therefore, we propose a semantic query graph QS to represent the question semantics of N . We formally deﬁne QS in Deﬁnition 2. An example of QS is given in Figure 1(c), which represents the semantic of the question N . Each edge in QS denotes a semantic relation. For example, edge v1 v2 denotes that “who was married to an actor”. Intuitively, a match of question N over RDF graph G is a subgraph match of QS over G (formally deﬁned in Deﬁnition 3). N . The coarse-grained framework is given in Figure 1(c). In the question understanding stage, we interpret a natural language question N as a semantic query graph QS (see Deﬁnition 2). Each edge in QS denotes a semantic relation extracted from N . A semantic relation is a triple rel,arg 1,arg 2 , where rel is a relation phrase, and arg 1 and arg 2 are its associated arguments. For example, “play in”,“actor”,“Philadelphia” is a semantic relation. The edge label is the relation phrase and the vertex labels are the associated arguments. In QS , two edges share one common endpoint if the two corresponding relations share one common argument. For example, there are two extracted semantic relations in N , thus, we have two edges in QS . Although they do not share any argument, arguments “actor” and “that” refer to the same thing. This phenomenon is known as “coreference resolution” [25]. The phrases in edges and vertices of QS can map to multiple semantic items (such as entities, classes and predicates) in RDF graph G. We allow the ambiguity in this stage. For example, the relation phrase “play in” (in edge v2 v3 ) corresponds to three different predicates. The argument “Philadelphia” in v3 also maps to three different entities, as shown in Figure 1(c). In the query evaluation stage, we ﬁnd subgraph matches of QS over RDF graph G. For each subgraph match, we deﬁne its matching score (see Deﬁnition 6) that is based on the semantic similarity of the matching vertices and edge in QS and the subgraph match in G. We ﬁnd the top-k subgraph matches with the largest scores. For example, the subgraph induced by u1 , u2 and u3 matches query QS , as shown in Figure 1(c). u2 matches v2 (“actor”), since u2 ( Antonio_Banderas ) is a type-constraint entity and u2 ’s type is actor . u3 ( Philadelphia(ﬁlm) ) matches v3 (“Philadelphia”) and u1 ( Melanie_Grifﬁth ) matches v1 (“who”). The result to question N is Melanie_Grifﬁth . Also based on the subgraph match query, we cannot ﬁnd a subgraph containing u10 ( Philadelphia_76ers ) to match QS . It means that the mapping from “Philadelphia” to u10 is a false alarm. We deal with disambiguation in query evaluation based on the matching result. Pushing down disambiguation to the query evaluation stage not only improves the precision but also speeds up the whole query response time. Take the up-to-date DEANNA [20] as an example. DEANNA [29] proposes a joint disambiguation technique. It mod-1.2Our ApproachAlthough there are still two stages “question understanding” and “query evaluation” in our method, we do not adopt the existing framework, i.e., SPARQL generation-and-evaluation. We propose a graph data-driven solution to answer a natural language question314Table 1: NotationsNotation G(V, E ) N Q Y D T rel vi /ui Cvi /Cvi vj d Deﬁnition and Description RDF graph and vertex and edge sets A natural language question A SPARQL query The dependency tree of qN L The paraphrase dictionary A relation phrase dictionary A relation phrase A vertex in query graph/RDF graph Candidate mappings of vertex vi /edge vi vj Candidate mappings of vertex vi /edge vi vjĂĂFigure 3: Paraphrase Dictionary D Philadelphia(ﬁlm) and Philadelphia_76ers . We need to know which one is users’ concern. In order to address the ﬁrst challenge, we extract the semantic relations (Deﬁnition 1) implied by the question N , based on which, we build a semantic query graph QS (Deﬁnition 2) to model the query intention in N . D EFINITION 1. (Semantic Relation). A semantic relation is a triple rel, arg 1, arg 2 , where rel is a relation phrase in the paraphrase dictionary D, arg 1 and arg 2 are the two argument phrases. In the running example, “be married to”, “who”,“actor” is a semantic relation, in which “be married to” is a relation phrase, “who” and “actor” are its associated arguments. We can also ﬁnd another semantic relation “play in”, “that”,“Philadelphia” in N . D EFINITION 2. (Semantic Query Graph) A semantic query graph is denoted as QS , in which each vertex vi is associated with an argument and each edge vi vj is associated with a relation phrase, 1 ≤ i, j ≤ |V (QS )| . Actually, each edge in QS together with the two endpoints represents a semantic relation. We build a semantic query graph QS as follows. We extract all semantic relations in N , each of which corresponds to an edge in QS . If the two semantic relations have one common argument, they share one endpoint in QS . In the running example, we get two semantic relations, i.e., “be married to”, “who”,“actor” and “play in”, “that”,“Philadelphia” , as shown in Figure 2. Although they do not share any argument, arguments “actor” and “that” refer to the same thing. This phenomenon is known as “coreference resolution” [25]. Therefore, the two edges also share one common vertex in QS (see Figure 2(c)). We will discuss more technical issues in Section 4.1. To deal with the ambiguity issue (the second challenge), we propose a data-driven approach. The basic idea is: for a candidate mapping from a phrase in N to an entity (i.e., vertex) in RDF graph G, if we can ﬁnd the subgraph containing the entity that ﬁts the query intention in N , the candidate mapping is correct; otherwise, this is a false positive mapping. To enable this, we combine the disambiguation with the query evaluation in a single step. For example, although “Philadelphia” can map three different entities, in the query evaluation stage, we can only ﬁnd a subgraph containing Philadelphia_ﬁlm that matches the semantic query graph QS . Note that QS is a structural representation of the query intention in N . The match is based on the subgraph isomorphism between QS and RDF graph G. The formal deﬁnition of match is given in Deﬁnition 3. For the running example, we cannot ﬁnd any subgraph match containing Philadelphia or Philadelphia_76ers of QS . The answer to question N is “Melanie_Grifﬁth” according to the resulting subgraph match. Generally speaking, there are ofﬂine and online phases in our solution.els the disambiguation as an ILP (integer liner programming) problem, which is an NP-hard problem. To enable the disambiguation, DEANNA needs to build a disambiguation graph. Some phrases in the natural language question map to some candidate entities or predicates in RDF graph as vertices. In order to introduce the edges in the disambiguation graph, DEANNA needs to compute the pairwise similarity and semantic coherence between every two candidates on the ﬂy. It is very costly. However, our method avoids the complex disambiguation algorithms, and combines the query evaluation and the disambiguation in a single step. We can speed up the whole performance greatly. In a nutshell, we make the following contributions in this paper. 1. We propose a systematic framework (see Section 2) to answer natural language questions over RDF repositories from a graph data-driven perspective. To address the ambiguity issue, different from existing methods, we combine the query evaluation and the disambiguation in a single step, which not only improves the precision but also speed up query processing time greatly. 2. In the ofﬂine processing, we propose a graph mining algorithm to map natural language phrases to top-k possible predicates (in a RDF dataset) to form a paraphrase dictionary D, which is used for question understanding in RDF Q/A. 3. In the online processing, we adopt two-stage approach. In the query understanding stage, we propose a semantic query graph QS to represent the users’ query intention and allow the ambiguity of phrases. Then, we reduce RDF Q/A into ﬁnding subgraph matches of QS over RDF graph G in the query evaluation stage. We resolve the ambiguity at the time when matches of the query are found. The cost of disambiguation is saved if there are no matching found. 4. We conduct extensive experiments over several real RDF datasets (including QALD benchmark) and compare our system with some state-of-the-art systems. Experiment results show that our solution is not only more effective but also more efﬁcient.2.FRAMEWORKThe problem to be addressed in this paper is to ﬁnd the answers to a natural language question N over an RDF graph G. Table 1 lists the notations used throughout this paper. There are two key challenges in this problem. The ﬁrst one is how to represent the query intention of the natural language question N in a structural way. The underlying RDF repository is a graph structured data, but, the natural language question N is unstructured data. To enable query processing, we need a graph representation of N . The second one is how to address the ambiguity of natural language phrases in N . In the running example, “Philadelphia” in the question N may refer to different entities, such as315(a) Natural Language Question1 Who actor 23 1 2 3(b) Semantic Relations Extraction and Building Semantic Query Graph3 5 2?who <spouse, 1.0> 1 <actor, 1.0> 7469 8An_Actor_Prepares5 10Figure 2: Natural Language Question Answering over Large RDF Graphs2.1OfﬂineTo enable the semantic relation extraction from N , we build a paraphrase dictionary D, which records the semantic equivalence between relation phrases and predicates. For example, in the running example, natural language phrases “be married to” and “play in” have the similar semantics with predicates spouse and starring , respectively. Some existing systems, such as Patty [18] and ReVerb [10], provide a rich relation phrase dataset. For each relation phrase, they also provide a support set with entity pairs, such as ( Antonio_Banderas , Philadelphia(ﬁlm) ) for the relation phrase “play in”. Table 2 shows two sample relation phrases and their supporting entity pairs. The intuition of our method is as follows: for each relation phrase reli , let Sup(reli ) denotes a set of supporting entity pairs. We assume that these entity pairs also occur in RDF graph. Experiments show that more than 67% entity pairs in the Patty relation phrase dataset occur in DBpedia RDF graph. The frequent predicates (or predicate paths) connecting the entity pairs in Sup(reli ) have the semantic equivalence with the relation phrase reli . Based on this idea, we propose a graph mining algorithm to ﬁnd the semantic equivalence between relation phrases and predicates (or predicate paths).2.2OnlineThere are two stages in RDF Q/A: question understanding and query evaluation. 1) Question Understanding. The goal of the question understanding in our method is to build a semantic query graph QS for representing users’ query intention in N . We ﬁrst apply Stanford Parser to N to obtain the dependency tree Y of N . Then, we extract the semantic relations from Y based on the paraphrase dictionary D. The basic idea is to ﬁnd a minimum subtree (of Y ) that contains all words of rel, where rel is a relation phrase in D. The subtree is called an embedding of rel in Y . Based on the embedding position in Y , we also ﬁnd the associated arguments according to some linguistics rules. The relation phrase rel together with the two associated arguments form a semantic relation, denoted as a triple rel,arg 1,arg 2 . Finally, we build a semantic query graphQS by connecting these semantic relations. We will discuss more technical issues in Section 4.1. 2) Query Evaluation. As mentioned earlier, a semantic query graph QS is a structural representation of N . In order to answer N , we need to ﬁnd a subgraph (in RDF graph G) that matches QS . The match is deﬁned according to the subgraph isomorphism (formally deﬁned in Deﬁnition 3) First, each argument in vertex vi of QS is mapped to some entities or classes in the RDF graph. Given an argument argi (in vertex vi of QS ) and an RDF graph G, entity linking [31] is to retrieve all entities and classes (in G) that possibly correspond to argi , denoted as Cvi . Each item in Cvi is associated with a conﬁdence probability. In Figure 2, argument “Philadelphia” is mapped to three different entities Philadelphia , Philadelphia(ﬁlm) and Philadelphia_76ers , while argument “actor” is mapped to a class Actor and an entity An_ Actor_Prepares . We can distinguish a class vertex and an entity vertex according to RDF’s syntax. If a vertex has an incoming adjacent edge with predicate rdf:type or rdf:subclass , it is a class vertex; otherwise, it is an entity vertex. Furthermore, if arg is a wh-word, we assume that it can match all entities and classes in G. Therefore, for each vertex vi in QS , it also has a ranked list Cvi containing candidate entities or classes. Each relation phrase relvi vj (in edge vi vj of QS ) is mapped to a list of candidate predicates and predicate paths. This list is denoted as Cvi vj . The candidates in the list are ranked by the conﬁdence probabilities. It is important to note that we do not resolve the ambiguity issue in this step. For example, we allow that “Philadelphia” maps to three possible entities, Philadelphia_76ers , Philadelphia and Philadelphia(ﬁlm) . We push down the disambiguation to the query evaluation step. Second, if a subgraph in RDF graph can match QS if and only if the structure (of the subgraph) is isomorphism to QS . We have the following deﬁnition about match. D EFINITION 3. (Match) Consider a semantic query graph QS with n vertices {v1 ,...,vn }. Each vertex vi has a candidate list Cvi , i = 1, ..., n. Each edge vi vj also has a candidate list of Cvi vj ,316Joseph_P._Kennedy,_Sr.Table 2: Relation Phrases and Supporting Entity PairsRelation Phrase “play in” “uncle of” ( ( ( ( Supporting Entity Pairs Antonio_Banderas , Philadelphia(ﬁlm) ), Julia_Roberts , Runaway_Bride ),...... Ted_Kennedy , John_F._Kennedy,_Jr. ) Peter_Corr , Jim_Corr ),......Antonio_BanderashasChildTed_KennedyhasChildJohn_F._KennedyhasChild hasGenderMale John_F._Kennedy,_Jr.starringPhiladelphia(film) (a) Āplay ināwhere 1 ≤ i = j ≤ n. A subgraph M containing n vertices {u1 ,...,un } in RDF graph G is a match of QS if and only if the following conditions hold: 1. If vi is mapping to an entity ui , i = 1, ..., n, ui must be in list Cvi ; and 2. If vi is mapping to a class ci , i = 1, ..., n, ui is an entity whose type is ci (i.e., there is a triple ui rdf:type ci in RDF graph) and ci must be in Cvi ; and → − − → u− 3. ∀vi vj ∈ QS ; − i uj ∈ G ∨ uj ui ∈ G. Furthermore, the − → − − → predicate Pij associated with u− i uj (or uj ui ) is in Cvi vj , 1 ≤ i, j ≤ n. Each subgraph match has a score, which is derived from the probability conﬁdences of each edge and vertex mapping. Deﬁnition 6 deﬁnes the score, which we will discuss later. Our goal is to ﬁnd all subgraph matches with the top-k scores. A TA-style algorithm [11] is proposed in Section 4.2.2 to address this issue. Each subgraph match of QS implies an answer to the natural language question N , meanwhile, the ambiguity is resolved. For example, in Figure 2, although “Philadelphia” can map three different entities, in the query evaluation stage, we can only ﬁnd a subgraph (included by vertices u1 , u2 , u3 and c1 in G) containing Philadelphia_ﬁlm that matches the semantic query graph QS . According to the subgraph graph, we know that the result is “Melanie_Grifﬁth”, meanwhile, the ambiguity is resolved. Mapping phrases “Philadelphia” to Philadelphia or Philadelphia_76ers of QS is false positive for the question N , since there is no data to support that.hasGender(b) Āuncle ofāFigure 4: Mapping Relation Phrases to Predicates or Predicate Paths Although mapping these relation phrases into canonicalized representations is the core challenge in relation extraction [17], none of the prior approaches consider mapping a relation phrase to a sequence of consecutive predicate edges in RDF graph. Patty demo [17] only ﬁnds the equivalence between a relation phrase and a single predicate. However, some relation phrases cannot be interpreted as a single predicate. For example, “uncle of” corresponds to a length-3 predicate path in RDF graph G, as shown in Figure 3. In order to address this issue, we propose the following approach. Given a relation phrase reli , its corresponding support set containing entity pairs that occurs in RDF graph is denoted as Sup(reli ) j 1 m = { (vi , vi1 ), ..., (vi , vim )}. Considering each pair (vi , vij ), j j j = 1, ..., m, we ﬁnd all simple paths between vi and vi in RDF j graph G, denoted as P ath(vi , vij ). Let P S (reli ) = j =1,...,m j j P ath(vi , vi ). For example, given an entity pair ( Ted_Kennedy , John_F._Kennedy,_Jr. ), we locate them at RDF graph G and ﬁnd simple pathes between them (as shown in Figure 4). If a path L is frequent in P S (“uncle of”), L is a good candidate to represent the semantic of relation phrase “uncle of”. For efﬁciency considerations, we only ﬁnd simple paths with no longer than a threshold1 . We adopt a bi-directional BFS (breathj j ﬁrst-search) search from vertices vi and vij to ﬁnd P ath(vi , vij ). Note that we ignore edge directions (in RDF graph) in a BFS process. For each relation phrase reli with m supporting entity pairs, j we have a collection of all path sets P ath(vi , vij ), denoted as j j P S (reli ) = j =1,...,m P ath(vi , vi ). Intuitively, if a predicate path is frequent in P S (reli ), it is a good candidate that has semantic equivalence with relation phrase reli . However, the above simple intuition may introduce noises. For example, we ﬁnd that (hasGender , hasGender) is the most frequent predicate path in P S (“uncle of”) (as shown in Figure 4). Obviously, it is not a good predicate path to represent the sematic of relation phrase “uncle of”. In order to eliminate noises, we borrow the intuition of tf-idf measure [15]. Although (hasGender ,hasGender) is frequent in P S (“uncle of”), it is also frequent in the path sets of other relation phrases, such as P S (“is parent of”), P S (“is advisor of”) and so on. Thus, (hasGender ,hasGender) is not an important feature for P S (“uncle of”). It is exactly the same with measuring the importance of a word w with regard to a document. For example, if a word w is frequent in lots of documents in a corpus, it is not a good feature. A word has a high tf-idf, a numerical statistic in measuring how important a word is to a document in a corpus, if it occurs in a document frequently, but the frequency of the word in the whole corpus is small. In our problem, for each relation phrase reli , i = 1, ..., n, we deem P S (reli ) as a virtual document. All predicate paths in P S (reli ) are regarded as virtual words. The corpus contains all P S (reli ), i = 1, ..., n. Formally, we deﬁne tf-idf value of a predicate path L in the following deﬁnition. Note that if L is a length-1 predicate path, L is a predicate P .1 We set the threshold as 4 in our experiments. More details about the parameter setting will be discussed in Section 6.3.OFFLINEThe semantic relation extraction relies on a paraphrase dictionary D. A relation phrase is a surface string that occurs between a pair of entities in a sentence [17], such as “be married to” and “play in” in the running example. We need to build a paraphrase dictionary D, such as Figure 3, to map relation phrases to some candidate predicates or predicate paths. Table 2 shows two sample relation phrases and their supporting entity pairs. In this paper, we do not discuss how to extract relation phrases along with their corresponding entity pairs. Lots of NLP literature about relation extraction study this problem, such as Patty [18] and ReVerb [10]. For example, Patty [18] utilizes the dependency structure in sentences and ReVerb [10] adopts the n-gram to ﬁnd relation phrases and the corresponding support set. In this work, we assume that the relation phrases and their support sets are given. The task in the ofﬂine processing is to ﬁnd the semantic equivalence between relation phrases and the corresponding predicates (and predicate paths) in RDF graphs, i.e., building a paraphrase dictionary D like Figure 3. Suppose that we have a dictionary T = {rel1 , ..., reln }, where each reli is a relation phrase, i = 1, ..., n. Each reli has a support set of entity pairs that occur in RDF graph, i.e., Sup (reli ) 1 m = { (vi , vi1 ), ..., (vi , vim )}. For each reli , i = 1, ..., n, the goal is to mine top-k possible predicates or predicate paths formed by consecutive predicate edges in RDF graph, which have sematic equivalence with relation phrase reli .317。

广域存储的一致性和稳定性COPS理论

Wyatt Lloyd , Michael J. Freedman , Michael Kaminsky† , and David G. Andersen‡
Princeton University, † Intel Labs, ‡ Carnegie Mellon University
ABSTRACT
To appear in Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11)
Don’t Settle for Eventual:
Scalable Causal Consistency for Wide-Area Storage with COPS1.INTRODUCTION
Categories and Subject Descriptors
C.2.4 [Computer Systems Organization]: Distributed Systems
General Terms
Design, Experimentation, Performance
Distributed data stores are a fundamental building block of modern Internet services. Ideally, these data stores would be strongly consistent, always available for reads and writes, and able to continue operating during network partitions. The CAP Theorem, unfortunately, proves it impossible to create a system that achieves all three [13, 23]. Instead, modern web services have chosen overwhelmingly to embrace availability and partition tolerance at the cost of strong consistency [16, 20, 30]. This is perhaps not surprising, given that this choice also enables these systems to provide low latency for client operations and high scalability. Further, many of the earlier high-scale Internet services, typically focusing on web search, saw little reason for stronger consistency, although this position is changing with the rise of interactive services such as social networking applications [46]. We refer to systems with these four properties—Availability, low Latency, Partition-tolerance, and high Scalability—as ALPS systems. Given that ALPS systems must sacriﬁce strong consistency (i.e., linearizability), we seek the strongest consistency model that is achievable under these constraints. Stronger consistency is desirable because it makes systems easier for a programmer to reason about. In this paper, we consider causal consistency with convergent conﬂict handling, which we refer to as causal+ consistency. Many previous systems believed to implement the weaker causal consistency [10, 41] actually implement the more useful causal+ consistency, though none do so in a scalable manner. The causal component of causal+ consistency ensures that the data store respects the causal dependencies between operations [31]. Consider a scenario where a user uploads a picture to a web site, the picture is saved, and then a reference to it is added to that user’s album. The reference “depends on” the picture being saved. Under causal+ consistency, these dependencies are always satisﬁed. Programmers never have to deal with the situation where they can get the reference to the picture but not the picture itself, unlike in systems with weaker guarantees, such as eventual consistency. The convergent conﬂict handling component of causal+ consistency ensures that replicas never permanently diverge and that conﬂicting updates to the same key are dealt with identically at all sites. When combined with causal consistency, this property ensures that clients see only progressively newer versions of keys. In comparison, eventually consistent systems may expose versions out of order. By combining causal consistency and convergent conﬂict handling, causal+ consistency ensures clients see a causally-correct, conﬂict-free, and always-progressing data store. Our COPS system (Clusters of Order-Preserving Servers) provides causal+ consistency and is designed to support complex online applications that are hosted from a small number of large-scale datacenters, each of which is composed of front-end servers (clients of

OpenFlow Enabling Innovation in Campus Network以及中文翻译

附录A 外文原文OpenFlow: Enabling Innovation in Campus NetworksNick McKeown Stanford University Guru Parulkar Stanford UniversityTom AndersonUniversity of WashingtonLarry PetersonPrinceton UniversityHari BalakrishnanMITJennifer RexfordPrinceton UniversityScott Shenker University of California,BerkeleyJonathan Turner Washington University inSt. LouisThis article is an editorial note submitted to CCR. It has NOT been peer reviewed.Authors take full responsibility for this article’s technical ments can be posted through CCR Online.ABSTRACTThis whitepaper proposes OpenFlow: a way for researchers to run experimental protocols in the networks they use every day. OpenFlow is based on an Ethernet switch, with an internal ﬂow-table, and a standardized interface to add and remove ﬂow entries. Our goal is to encourage networking vendors to add OpenFlow to their switch products for deployment in college campus backbones and wiring closets. We believe that OpenFlow is a pragmatic compromise: on one hand, it allows researchers to run experiments on heterogeneous switches in a uniform way at line-rate and with high port-density; while on the other hand, vendors do not need to expose the internal workings of their switches. In addition to allowing researchers to evaluate their ideas in real-world traffic settings, OpenFlow could serve as a useful campus component in proposed large-scale testbeds like GENI. Two buildingsat Stanford University will soon run OpenFlow networks, using commercial Ethernet switches and routers. We will work to encourage deployment at other schools; and we encourage you to consider deploying OpenFlow in your university network too.Categories and Subject DescriptorsC.2 [Internetworking]: RoutersGeneral TermsExperimentation, DesignKeywordsEthernet switch, virtualization, flow-based1. THE NEED FOR PROGRAMMABLE NETWORKSNetworks have become part of the critical infrastructure of our businesses, homes and schools. This success has been both a blessing and a curse for networking researchers; their work is more relevant, but their chance of making an impact is more remote. The reduction in real-world impact of any given network innovation is because the enormous installed base of equipment and protocols, and the reluctance to experiment with production traffic, which have created an exceedingly high barrier to entry for new ideas. Today, there is almost no practical way to experiment with new network protocols (e.g., new routing protocols, or alternatives to IP) in sufficiently realistic settings (e.g., at scale carrying real traffi c) to gain the conﬁdence needed for their widespread deployment. The result is that most new ideas from the networking research community go untried and untested; hence the commonly held belief that the network infrastructure has “ossiﬁed”.Having recognized the problem, the networking community is hard at work developing programmable networks, such as GENI [1] a proposed nationwide research facility for experimenting with new network architectures and distributed systems. These programmable networks call for programmable switches and routers that (using virtualization) can process packets for multiple isolated experimental networks simultaneously. For example, in GENI it is envisaged that a researcher will be allocated a slice of resources across the whole network, consisting of a portion of network links, packet processing elements (e.g. routers) and end-hosts; researchers program their slices to behave as they wish. A slice could extend across the backbone, into access networks, into college campuses, industrial research labs, and include wiring closets, wireless networks, and sensor networks.Virtualized programmable networks could lower the barrier to entry for new ideas, increasing the rate of innovation in the network infrastructure. But the plans for nationwide facilities are ambitious (and costly), and it will take years for them to be deployed.This whitepaper focuses on a shorter-term question closer to home: As researchers, how can we run experiments in our campus networks? If we can ﬁgure out how, we can start soon and extend the technique to other campuses to beneﬁt the whole community.To meet this challenge, several questions need answering, including: In the early days, how will college network administrators get comfortable putting experimental equipment (switches, routers, access points, etc.) into their network? How will researchers control a portion of their local network in a way that does not disrupt others who depend on it? And exactly whatfunctionality is needed in network switches to enable experiments? Our goal here is to propose a new switch feature that can help extend programmability into the wiring closet of college campuses.One approach -that we do not take -is to persuade commercial “name-brand” equipment vendors to provide an open, programmable, virtualized platform on their switches and routers so that researchers can deploy new protocols, while network administrators can take comfort that the equipment is well supported. This outcome is very unlikely in the short-term. Commercial switches and routers do not typically provide an open software platform, let alone provide a means to virtualize either their hardware or software. The practice of commercial networking is that the standardized external interfaces are narrow (i.e., just packet forward ing), and all of the switch’s internalﬂexibility is hidde n. The internals differ from vendor to vendor, with no standard platform for researchers to experiment with new ideas. Further, network equipment vendors are understandably nervous about opening up interfaces inside their boxes: they have spent years deploying and tuning fragile distributed protocols and algorithms, and they fear that new experiments will bring networks crashing down. And, of course, open platforms lower the barrier-to-entry for new competitors.A few open software platforms already exist, but do not have the performance or port-density we need. The simplest example is a PC with several network interfaces and an operating system. All well-known operating systems support routing of packets between interfaces, and open-source implementations of routing protocols exist (e.g., as part of the Linux distribution, or from XORP [2]); and in most cases it is possible to modify theoperating system to process packets in almost any manner (e.g., using Click [3]). The problem, of course, is performance: A PC can neither support the number of ports needed for a college wiring closet (a fan out of 100+ ports is needed per box), nor the packet-processing performance (wiring closet switches process over 100Gbits/s of data, whereas a typical PC struggles to exceed 1Gbit/s; and the gap between the two is widening).Existing platforms with specialized hardware for line-rate processing are not quite suitable for college wiring closets either. For example, an ATCA-based virtualized programmable router called the Supercharged Planet Lab Platform [4] is under development at Washington University, and can use network processors to process packets from many interfaces simultaneously at line-rate. This approach is promising in the long-term, but for the time being is targeted at large switching centers and is too expensive for widespread deployment in college wiring closets. At the other extreme is NetFPGA [5] targeted for use in teaching and research labs. NetFPGA is a low-cost PCI card with a user-programmable FPGA for processing packets, and 4 ports of Gigabit Ethernet. NetFPGA is limited to just four network interfaces—insufficient for use in a wiring closet.Thus, the commercial solutio ns are too closed and inﬂex ible and the research solutions either have insufficient performance or fan out, or are too expensive. It seems unlikely that the research solutions, with their complete generality, can overcome their performance or cost limitations. A more promising approach is to compromise on generality and to seek a degree of switch flexibility that is:Figure 1 Idealized OpenFlow Switch. The Flow Table is controlled by a remote controllervia the Secure Channel.•Amenable to high-performance and low-cost implementations.•Capable of supporting a broad range of research.•Assured to isolate experimental traffic from production traffic. •Consistent with vendors’ need for closed platforms.This paper describes the OpenFlow Switch—a specification that is an initial attempt to meet these four goals.2. THE OPENFLOW SWITCHThe basic idea is simple: we exploit the fact that most modern Ethernet switches and routers contain ﬂow-tables (typically built from TCAMs) that run at line-rate to im plement ﬁrewalls, NAT, QoS, and to collect statistics. While each vendor’s ﬂow-table is diﬀerent, we’ve identiﬁed an interesting common set of functions that run in many switches and routers. OpenFlow exploits this common set of functions.OpenFlow provides an open protocol to program the ﬂow-table in diﬀerent switches and routers. A network administrator can partition traﬃc into production and research ﬂows. Researchers can control their own ﬂows -by choosing the routes their packets follow and the processing they receive. Inthis way, researchers can try new routing protocols, security models, addressing schemes, and even alternatives to IP. On the same network, the production traﬃc is isolated and processed in the same way as today.The datapath of an OpenFlow Switch consists of a Flow Table, and an action associated with each ﬂow entry. The set of actions supported by an OpenFlow Switch is extensible, but below we describe a minimum requirement for all switches. For high-performance and low-cost the data-path must have a carefully prescribed degree of ﬂexibility. This means forgoing the ability to specify arbitrary handling of each packet and seeking a more limited, but still useful, range of actions. Therefore, later in the pape r, deﬁne a basic required set of actions for all OpenFlow switches.An OpenFlow Switch consists of at least three parts: (1) A Flow Table, with an action associated with each ﬂow entry, to tell the switch how to process the ﬂow, (2) A Secure Channel that connects the switch to a remote control process (called the controller), allowing commands and packets to be sent between a controller and the switch using (3) The OpenFlow Protocol, which provides an open and standard way for a controller to communicate with a switch. By specifying a standard interface (the OpenFlow Protocol) through which entries in the Flow Table can be deﬁned externally, the OpenFlow Switch avoids the need for researchers to program the switch.It is useful to categorize switches into dedicated OpenFlow switches that do not support normal Layer 2 and Layer 3 processing, and OpenFlow-enabled general purpose commercial Ethernet switches and routers, to which the Open-Flow Protocol and interfaces have been added as a new feature.Dedicated OpenFlow switches. A dedicated OpenFlow Switch is a dumb datapath element that forwards packets between ports, as deﬁned b y a remote control process. Figure 1 shows an example of an OpenFlow Switch.In this context, ﬂows are broadly deﬁned, and are limit ed only by the capabilities of the particular implementation of the Flow Table. For example, a ﬂow could be a TCP con nection, or all packets from a particular MAC address or IP address, or all packets with the same VLAN tag, or all packets from the same switch port. For experiments involving non-IPv4 packets, a ﬂow could be deﬁned as all packets matching a speciﬁc (but non-standard) header.Each ﬂow-entry has a simple action associated with it; the three basic ones (that all dedicated OpenFlow switches must support) are:1 Forward this ﬂow’s packets to a given port (or ports). This allows packets to be routed through the network. In most switches this is expected to take place at line-rate.2 Encapsulate and forw ard this ﬂow’s packets to a con troller. Packet is delivered to Secure Channel, where it is encapsulated and sent to a controller. Typically used for the ﬁrst packet in a new ﬂow, so a controller can decide if the ﬂow should be added to the Flow Table. Or in some experiments, it could be used to forward all packets to a controller for processing.3 Drop this ﬂow’s packets. Can be used for security, to curb denial of service attacks, or to reduce spurious broadcast discovery traffic from end-hosts.An entry in the Flow-Table has three ﬁelds: (1) A packet header that deﬁnes the ﬂow, (2) The action, which deﬁnes how the packets should be processed, and (3) Statistics, which keep track of the number of packets and bytes foreach ﬂow, and the time since the last packet matched the ﬂow (to help with the removal of inactive ﬂows).In the ﬁrst generation “Type 0” switches, the ﬂow header is a 10-tuple shown in T able 1. A TCP ﬂow could be speciﬁed by all ten ﬁelds, whereas an IP ﬂow might not include the transport ports in its deﬁnition. Each header ﬁeld can be a wildcard to allow for aggregation of ﬂows, such as ﬂows in which only the VLAN ID is deﬁned would apply to all traﬃc on a particular VLAN.Table 1 The header ﬁelds matched in a “Type 0” OpenFlow switch.The detailed requirements of an OpenFlow Switch are defined by the OpenFlow Switch Specification [6].OpenFlow-enabled switches. Some commercial switches, routers and access points will be enhanced with the OpenFlow feature by adding the Flow Table, Secure Channel and OpenFlow Protocol (we list some examples in Section 5). Typically, the Flow Table will re-use existing hardware, such as a TCAM; the Secure Channel and Proto col will be ported to run on the switch’s operating system. Figure 2 shows a network of OpenFlow-enabled commercial switches and access points. In this example, all the Flow Tables are managed by the same controller; the OpenFlow Protocol allows a switch to be controlled by two or more controllers for increased performance or robustness. Our goal is to enable experiments to take place in an existing production network alongside regular traﬃc and applications. Therefore, to win theconﬁdence of network administrators, OpenFlow-enabled switches must isolate experimental traﬃc (processed by the Flow Table) from productiontraﬃc that is to be processed by the normal Layer 2 and Layer 3 pipeline of the switch. There are two ways to achieve this separation. One is to add a fourth action:4. Forward this ﬂow’s packets through the switch’s nor mal processing pipeline.The other is to deﬁne separat e sets of VLANs for experimental and production traﬃc. Both approaches allow normal production traﬃc that isn’t part of an experiment to be processed in the usual way by the switch. All OpenFlow-enabled switches are required to support one approach or the other; some will support both.Additional features. If a switch supports the header formats and the four basic actions mentioned above (and detailed in the OpenFlow SwitchSpeciﬁcation), then we call it a “Type 0” switch. We expect that many switches will support additional actions, for example to rewrite portions of the packet header (e.g., for NAT, or to obfuscate addresses on intermediate links), and to map packets to a priority class. Likewise, some Flow Tables will be able to match on arbitrary ﬁelds in the packet header, enabling experiments with new non-IP protocols. As a particular set of features emerges, we willdeﬁne a “Type 1” switch.Controllers.A controller adds and removes ﬂow-entries from the Flow Table on behalf of experiments. For example, a static controller might be a simple application running on a PC to statica lly establish ﬂows to interconnect a set of test computers for the duration of an experiment. In this case the ﬂows resemble VLANs in current networks— providing a simple mechanism toisolate experimental traﬃc from the production network. Viewed this way, OpenFlow is a generalization of VLANs.One can also imagine more sophisticated controllers that dynamicallyadd/remove ﬂows as an experiment progresses. In one usage model, a researcher might control the complete network of OpenFlow Switches and be free to decide how all ﬂows are processed.Figure 2 Example of a network of OpenFlow-enabled commercial switches and routers.A more sophisticated controller might support multiple researchers, each with diﬀerent accounts and permissions, enabling them to run multiple independent experiments on diﬀerent sets of ﬂows. Flows identiﬁed as under the control of a particular researcher (e.g., by a policy table running in a controller) could be delivered to a researcher’s user-level control program which then decides if a new ﬂow-entry should be added to the network of switches.3. USING OPENFLOWAs a simple example of how an OpenFlow Switch might be used imagine that Amy (a researcher) invented Amy-OSPF as a new routing protocol toreplace OSPF. She wants to try her protocol in a network of OpenFlow Switches, without changing any end-host software. Amy-OSPF will run in a controller; each time a new application ﬂow starts Amy-OSPF picks a route through a series of OpenFlow Switches, and adds a ﬂow-entry in each switch along the path. In her experiment, Amy decides to use Amy-OSPF for thetraﬃc entering the OpenFlow network from her own desktop PC— so she doesn’t disrupt the network for others. To do this, she deﬁnes one ﬂow to be all the traﬃc entering the Open-Flow switch through the switch port her PC is connected to, and adds a ﬂow-entry with the action “Enca psulate and forward all packets to a controller”. When her packets reach a controller, her new protocol chooses a route and adds a new ﬂow-entry (for the application ﬂow) to every switch along the chosen path. When subsequent packets arrive at a switch, they are processed quickly (and at line-rate) by the Flow Table. There are legitimate questions to ask about the performance, reliability and scalability of a controller that dynam ically adds and removes ﬂows as an experiment progresses: Can such a centralized controller be fast enough to process new ﬂows and program the Flow Switches? What happens when a controller fails? To some extent these questions were addressed in the context of the Ethane prototype, which used simple ﬂow switches and a central controller [7]. Preliminary results suggested that an Ethane controller based on a low-cost desktop PC could process over 10,000 new ﬂows per second —enough for a large college campus. Of course, the rate at which ne w ﬂows can be processed will depend on the complexity of the processing required by the re searcher’s experiment. But it gives us conﬁdence that mean ingful experiments can be run. Scalability and redundancy are possible by making acontroller (and the experiments) stateless, allowing simple load-balancing over multiple separate devices.3.1 Experiments in a Production NetworkChances are, Amy is testing her new protocol in a network used by lots of other people. We therefore want the network to have two additional properties:1 Packets belonging to users other than Amy should be routed using a standard and tested routing protocol running in the switch or router from a “name-brand” v endor.2 Amy should only be able to add ﬂow entries for her traﬃc, or for any traﬃc her network administrator has allowed her to control.Property 1 is achieved by OpenFlow-enabled switches. In Amy’s experiment, the default action for all packets that don’t come from Amy’s PC could be to forward them through the normal processing pipeline. Amy’s own packets would be forwarded directly to the outgoing port, without being processed by the normal pipeline.Property 2 depends on the controller. The controller should be seen as a platform that enables researchers to implement various experiments, and the restrictions of Property 2 can be achieved with the appropriate use of permissions or other ways to limit the powers of individual researchers to control ﬂow e ntries. The exact nature of these permission-like mechanisms will depend on how the controller is implemented. We expect that a variety of controllers will emerge. As an example of a concrete realization of a controller, some of the authors are working on a controller called NOX as a follow-on to the Ethane work [8]. A quite diﬀerent controller might emerge by extending the GENI management software to OpenFlow networks.3.2 More ExamplesAs with any experimental platform, the set of experiments will exceed those we can think of up-front — most experiments in OpenFlow networks are yet to be thought of. Here, for illustration, we oﬀer some examples of how OpenFlow-enabled networks could be used to experiment with new network applications and architectures.Example 1: Network Management and Access Con trol. We’ll use Ethane as our ﬁrst example [7] as it was the research that inspired OpenFlow. In fact, an OpenFlow Switch can be thought of as a generalization of Ethane’s datapath switch. Ethane used a speciﬁc implementation of a controller, suited for network management and control, that manages the admittance and routing of ﬂows. The basic idea of Ethane is to allow network managers to deﬁne a network-wide policy in the central controller, which is enforced directly by making admission control decisions for each new ﬂow. A controller checks a new ﬂow against a set of rules, such as “Guests can communicate using HTTP, but only via a web proxy” or “VoIP phones are not allowed to communicate with laptops.” A controller associates pack ets with their senders by managing all the bindings between names and addresses — it essentially takes over DNS, DHCP and authenticates all users when they join, keeping track of which switch port (or access point) they are connected to. One could envisage an extension to Ethane in which a policy dictates that particular ﬂows are sent to a user’s process in a controller, hence allowing researcher-speciﬁc processing to be performed in the network.Example 2: VLANs. OpenFlow can easily provide users with their own isolated network, just as VLANs do. The simplest approach is to statically declare a set of ﬂows which specify the ports accessible by traﬃc on a givenVLAN ID. Traﬃc identiﬁed as coming from a single user (for example, originating from speciﬁc switch ports or MAC addresses) is tagged by the switches (via an action) with the appropriate VLAN ID.A more dynamic approach might use a controller to manage authentication of users and use the knowledge of the users’ locations for tagging traﬃc at runtime.Example 3: Mobile wireless VOIP clients. For this example consider an experiment of a new call-handoﬀ mechanism for WiFi-enabled phones. In the experiment VOIP clients establish a new connection over the OpenFlow-enabled network. A controller is implemented to track the location of clients, re-routing connections — by reprogramming the Flow Tables — as users move through the network, allowing seamless handoﬀ from one access point to another.Example 4: A non-IP network. So far, our examples have assumed an IP network, but OpenFlow doesn’t require packets to be of any one format — so long as the Flow Table is able to match on the packet header. This would allow experiments using new naming, addressing and routing schemes. There are several ways an OpenFlow-enabled switch can support non-IP traﬃc. For example, ﬂows could be identiﬁed using their Ethernet header (MAC src and dst addresses), a new EtherType value, or at the IP level, by a new IP Version number. More generally, we hope that future switches will allow a controller to create a generic mask (oﬀset + value + mask), allowing packets to be processed in a researcher-speciﬁed way.Example 5: Processing packets rather than ﬂows.The examples above are for experiments involving ﬂows — where a controller makes decisions when the ﬂow starts. There are, of course, interesting experiments to be performed that require every packet to be processed. For example, an intrusion detection system that inspects every packet, an explicit congestion control mechanism, or when modifying the contents of packets, such as when converting packets from one protocol format to another.Figure 3: Example of processing packets through anexternal line-rate packet-processing device, such as a programmable NetFPGA router.There are two basic ways to process packets in an OpenFlow-enabled network. First, and simplest, is to force all of a ﬂow’s packets to pass t hrough a controller. To do this, a controller doesn’t add a new ﬂow entry into the Flow Switch — it just allows the switch to default to forwarding every packet to a controller. This has the advantage of ﬂexibility, at the cost of performance. It might provide a useful way to test the functionality of a new protocol, but is unlikely to be of much interest for deployment in a large network.The second way to process packets is to route them to a programmable switch that does packet processing — for example, a NetFPGA-based programmable router. The advantage is that the packets can be processed at line-rate in a user-deﬁnable way; Figure 3 shows an example of how this could be done, in which the OpenFlow-enabled switch operates essentially as a patch-panel to allow the packets to reach the NetFPGA. In some cases, the NetFPGA board (a PCI board that plugs into a Linux PC) might be placed in the wiring closet alongside the OpenFlow-enabled switch, or (more likely) in a laboratory.4. THE OPENFLOW CONSORTIUMThe OpenFlow Consortium aims to popularize OpenFlow and maintain the OpenFl ow Switch Speciﬁcation. The Con sortium is a group of researchers and network administrators at universities and colleges who believe their research mission will be enhanced if OpenFlow-enabled switches are installed in their network.Membership is open and free for anyone at a school, college, university, or government agency worldwide. The OpenFlow Consortium welcomes individual members who are not employed by companies that manufacture or sell Ethernet switches, routers or wireless access points (because we want to keep the consortium free of vendor inﬂuence). To join, send email to***********************.The Consortium web-site contain s the OpenFlow Switch Speciﬁcation, a list of consortium members, and reference implementations of OpenFlow switches.Licensing Model: The OpenFlow Switch Speciﬁcation is free for all commercial and non-commercial use. (The exact wording is on the web-site.) Commercial switches and routers claiming to be “OpenFlow-enabled” must conform to the requirements of an OpenFlow Type 0 Switch, as deﬁned in the OpenFlow Switch Speciﬁcation. OpenFlow is a trademark of Stanford University, and will be protected on behalf of the Consortium.5. DEPLOYING OPENFLOW SWITCHESWe believe there is an interesting market opportunity for network equipment vendors to sell OpenFlow-enabled switches to the research community. Every building in thousands of colleges and universities contains wiring closets with Ethernet switches and routers, and with wireless access points spread across campus.We are actively working with several switch and router manufacturers who are adding the OpenFlow feature to their products by implementing a Flow Table in existing hardware; i.e. no hardware change is needed. The switches run the Secure Channel software on their existing processor.We have found network equipment vendors to be very open to the idea of adding the OpenFlow feature. Most vendors would like to support the research community without having to expose the internal workings of their products. We are deploying large OpenFlow networks in the Computer Science and Electrical Engineering departments at Stanford University. The networks in two buildings will be replaced by switches running OpenFlow. Eventually, all traﬃc will run over the OpenFlow network, with production traﬃc and experimental traﬃc being isolated on different VLANs under the control of。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Geometric Programming for Circuit Optimization[Extended Abstract]Stephen P.Boyd Department of Electrical Engineering Stanford Universityboyd@Seung Jean Kim Department of Electrical Engineering Stanford Universitysjkim@ABSTRACTThis tutorial concerns a method for solving a variety of cir-cuit sizing and optimization problems,which is based on formulating the problem as a geometric program(GP),or a generalized geometric program(GGP).These nonlinear, constrained optimization problems can be transformed to convex optimization problems,and then solved(globally) very eﬃciently.Categories and Subject DescriptorsB.7.2[Design Aids]:AlgorithmsGeneral TermsAlgorithms,DesignKeywordsCircuit sizing,geometric programming,generalized geomet-ric programming,convex optimization1.CIRCUIT SIZING AND OPTIMIZATION We assume that a circuit topology has been selected,and what remains is to choose appropriate sizes for the gates, transistors,wires,and other components.It is also possible to include other design variables such as threshold voltage, supply voltage,and oxide thickness.In many practical cases, some of these design variables are restricted to lie in discrete sets of values;in other cases,they can be well modeled as continuous variables.The choice of these design variables determines various top level circuit objectives,such as the total area of the circuit,the total power it consumes,speed at which it can operate(for a digital circuit),its bandwidth(for an analog circuit),and other objectives such as noise tolerance,ro-bustness to process and environment parameter variations, and so on.These objectives can be very complex functions Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.ISPD’05,April3–6,2005,San Francisco,California,USA. Copyright2005ACM1-59593-021-3/05/0004...$5.00.of the design variables.In addition,there are many con-straints that must be satisﬁed.There are many approaches to circuit sizing,including heuristics,and methods based on circuit simulation coupled to a generic numerical optimiza-tion method.In this tutorial we focus on a particular approach,in which the sizing problem is modeled(at least approximately)as a geometric program(GP),a special type of mathematical optimization problem.We refer the reader to the paper A Tutorial on Geometric Programming[5]for an introduction to geometric programming,some of the basic tricks used to formulate problems in GP form,a number of examples,and an extensive list of references.Over the last ten years,eﬃcient interior-point methods for GP have been developed.Current implementations ap-proach the eﬃciency of linear program(LP)solvers.Sparse GPs with tens of thousands of variables,and hundreds of thousands of constraints,can be reliably solved in minutes, on a personal computer.For more on algorithms for solving GPs,as well as the broader context of convex optimization, we refer the reader to[6].2.GP-BASED SIZINGGP-based circuit sizing is not new;it has been used for digital circuits since the1980s.In1985Fishburn and Dun-lop[16]proposed a method for transistor and wire sizing, based on Elmore delay,that was later found to be a GP. Since then many digital circuit design problems have been formulated as GPs or closely related optimization problems. Work on gate and device sizing can be found in,e.g.,[10, 22,31,31,36,37].These are all based on gate delay mod-els that are compatible with geometric programming;see [22,38,33,1]for more on such models.The logical eﬀort method,described in the inﬂuential book[38]by Sproull, Sutherland,and Harris,does not explicitly rely on GP,but is very closely connected.See[4,35]for more on GP-based digital circuit sizing.Work on interconnect sizing related to GP includes[12, 13,14,17,23,25,26,34];simultaneous gate and wire sizing is considered in[21].In some of these papers,the authors develop custom methods for solving the resulting GPs,in-stead of using general purpose interior-point methods(see, e.g.,[10,41]).For some simple problems,analytic solutions are available(see,e.g.,[8,17]).Other problems in digital circuit design where GP plays a role include buﬀering and wire sizing[8,9],sizing and placement[7],yield maximiza-tion[24,30],parasitic reduction[32],and routing[2]. Geometric programming has also been used for the designof analog circuits[15,19,27,39],mixed-signal circuits[11, 18]and RF(radio frequency)circuits[20,29,28,40].We refer the reader to the tutorial material[3]for GP-based sizing of analog and RF circuits.3.GP MODELINGThe tutorial focuses on the modeling of a variety of prob-lems in GP form.There are several advantages to modeling a problem,at least approximately,as a GP.Theﬁrst is com-putational:new methods can(globally)solve even large GP problems eﬃciently.Even if these new methods are not ex-ploited to solve the problem,the knowledge that a problem is(approximately)a GP is useful.For example,it tells us that a particular logarithmic transformation of the variables and constraints yields a convex optimization problem,and this can be exploited to develop a more eﬃcient solution method.In addition,we have the very useful conclusion that any local solution of the problem is in fact global.If an ad hoc solution method can be shown toﬁnd a local so-lution,we can conclude that itﬁnds a global solution.(For more discussion of these issues,see[5,6].)Another advantage of expressing a sizing problem in GP form is conceptual:we claim that GP serves as a unifying standard form for circuit sizing problems,the same way that linear programming(LP)serves as a unifying standard form for a wide variety of simple resource allocation problems. Like all methods,the GP modeling approach has advan-tages and disadvantages.One advantage is that complex interactions between the optimization variables are easily accounted for,and additional constraints are easily added. The method handles complex problems,such as joint op-timization of devices sizes,threshold,and supply voltage; robust design over corners or taking statistical variations into account;and the design of circuits that operate in mul-tiple modes(such as a low power and a high performance mode).When compared with other methods based on nu-merical optimization,methods based on GP(and interior-point solution methods)have the advantage of not needing an initial design,or any algorithm parameter tuning,and alwaysﬁnding the global solution.We can also list some shortcomings of the approach.The method does not give much insight into why some set of speciﬁcations cannot be achieved,nor does it suggest how the designer might change the circuit topology to do better. While solving GPs is fast,it is not as fast as methods that choose sizes using simple rules,with a few passes over the circuit.4.ACKNOWLEDGEMENTSSome of the work reported in the tutorial was carried out with support from C2S2,the MARCO Focus Center for Circuit and System Solutions,under MARCO contract 2003-CT-888.5.REFERENCES[1]A.Abou-Seido,B.Nowak,and C.Chu.Fitted Elmoredelay:a simple and accurate interconnect delaymodel.IEEE Transactions on VLSI Systems,12(7):691–696,2004.[2]M.Borah,R.Owens,and M.Irwin.A fast algorithmfor minimizing the Elmore delay to identiﬁed criticalsinks.IEEE Transactions on Computer-Aided Designof Integrated Circuits and Systems,16(7):753–759,1997.[3]S.Boyd,S.-J.Kim,and S.Mohan.Geometricprogramming and its applications to EDA problems.Design and Test in Europe(DATE)2005tutorial.Available from/~boyd/date05.html.[4]S.Boyd,S.-J.Kim,D.Patil,and M.Horowitz.Digitalcircuit optimization via geometric programming.Available from/~boyd/~gp_digital_ckt.html,Jan.2005.Technical Report,Stanford University.[5]S.Boyd,S.-J.Kim,L.Vandenberghe,and A.Hassibi.A tutorial on geometric programming,2004.Manuscript.Available from/boyd/~gp_tutorial.html.[6]S.Boyd and L.Vandenberghe.Convex Optimization.Cambridge University Press,2004.[7]W.Chen,C.-T.Hseih,and M.Pedram.Simultaneousgate sizing and placement.IEEE Transactions onComputer-Aided Design of Integrated Circuits andSystems,19(2):206–214,Feb.2000.[8]C.Chu and D.Wong.Closed form solutions tosimultaneous buﬀer insertion/sizing and wire sizing.ACM Transactions on Design Automation ofElectronic Systems,6(3):343–371,2001.[9]C.Chu and D.Wong.An eﬃcient and optimalalgorithm for simultaneous buﬀer and wire sizing.IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems,18(9):1297–1304,1999.[10]C.Chu and D.Wong.VLSI circuit performanceoptimization by geometric programming.Annals ofOperations Research,105:37–60,2001.[11]D.Colleran,C.Portmann,A.Hassibi,C.Crusius,S.Mohan,S.Boyd,T.Lee,and M.Hershenson.Optimization of phase-locked loop circuits viageometric programming.In Proceedings of the Custom Integrated Circuits Conference(CICC),pages326–328,Sept.2003.[12]J.Cong and C.-K.Koh.Simultaneous driver and wiresizing for performance and power optimization.IEEE Transactions on Very Large Scale Integration Systems, 2(4):408–423,1994.[13]J.Cong and K.-S.Leung.Optimal wiresizing underElmore delay model.IEEE Transactions onComputer-Aided Design of Integrated Circuits andSystems,14(3):321–336,1995.[14]J.Cong and Z.Pan.Wire width planning forinterconnect performance optimization.IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems,21(3):319–329,2002.[15]J.Dawson,S.Boyd,M.Hershenson,and T.Lee.Optimal allocation of local feedback in multistageampliﬁers via geometric programming.IEEETransactions on Circuits and Systems I,48(1):1–11,January2001.[16]J.Fishburn and A.Dunlop.TILOS:A posynomialprogramming approach to transistor sizing.In IEEEInternational Conference on Computer-Aided Design: ICCAD-85.Digest of Technical Papers,pages326–328.IEEE Computer Society Press,1985.[17]Y.Gao and D.Wong.Optimal shape function for abidirectional wire under Elmore delay model.IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems,18(7):994–999,1999.[18]M.Hershenson.Design of pipeline analog-to-digitalconverters via geometric programming.In Proceedings of the IEEE/ACM International Conference onComputer Aided Design,pages317–324,San Jose,CA,November2002.[19]M.Hershenson,S.Boyd,and T.Lee.GPCAD:A toolfor CMOS op-amp synthesis.In Proceedings of theIEEE/ACM International Conference on ComputerAided Design,pages296–303,Nov.1998.[20]M.Hershenson,A.Hajimiri,S.Mohan,S.Boyd,andT.Lee.Design and optimization of LC oscillators.InProceedings of the IEEE/ACM InternationalConference on Computer-Aided Design,pages65–69,1999.[21]I.Jiang,Y.Chang,and J.Jou.Crosstalk-driveninterconnect optimization by simultaneous gate andwire sizing.IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems,19(9):999–1010,2000.[22]K.Kasamsetty,M.Ketkar,and S.Sapatnekar.A newclass of convex functions for delay modeling and itsapplication to the transistor sizing problem.IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems,19(7):779–788,July2000. [23]R.Kay and L.Pileggi.EWA:Eﬃcient wiring-sizingalgorithm for signal nets and clock nets.IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems,17(1):40–49,Jan.1998.[24]S.-J.Kim,S.Boyd,S.Yun,D.Patil,andM.Horowitz.A heuristic for optimizing stochasticactivity networks with applications to statisticaldigital circuit sizing,2005.Manuscript.Available from /~boyd/heur_san_opt.html. [25]Y.-M.Lee,C.Chen,and D.Wong.Optimalwire-sizing function under the Elmore delay modelwith bounded wire sizes.IEEE Transactions onCircuits and Systems I:Fundamental Theory andApplications,49(11):1671–1677,2002.[26]T.Lin and L.Pileggi.RC(L)interconnect sizing withsecond order considerations via posynomialprogramming.In Proceedings of the ACM/SIGDAInternational Symposium on Physical Design,pages16–21,Sonoma,CA,June2001.[27]P.Mandal and V.Visvanathan.CMOS op-amp sizingusing a geometric programming formulation.IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems,20(1):22–38,Jan.2001.[28]S.Mohan,M.Hershenson,S.Boyd,and T.Lee.Simple accurate expressions for planar spiralinductances.IEEE Journal of Solid-State Circuit,34(10):1419–1424,Oct.1999.[29]S.Mohan,M.Hershenson,S.Boyd,and T.Lee.Bandwidth extension in CMOS with optimizedon-chip inductors.IEEE Journal of Solid-StateCircuits,35(3):346–355,March2000.[30]D.Patil,Y.Yun,S.-J.Kim,S.Boyd,andM.Horowitz.A new method for robust design ofdigital circuitss,2004.To be presented atInternational Symposium on Quality Electronic Design (ISQED)2005.[31]M.Pattanaik,S.Banerjee,and B.Bahinipati.GPbased transistor sizing for optimal design of nanoscale CMOS inverter.In IEEE Conference onNanotechnology,pages524–527,Aug.2003.[32]Z.Qin and C.-K.Cheng.Realizable parasiticreduction using generalized Y-∆transformation.InProceedings of the40th IEEE/ACM DesignAutomation Conference,pages220–225,June2003. [33]J.Rubenstein,P.Penﬁeld,and M.Horowitz.Signaldelay in RC tree networks.IEEE Transactions onComputer-Aided Design of Integrated Circuits andSystems,2(3):202–211,July1983.[34]S.Sapatnekar.Wire sizing as a convex optimizationproblem:Exploring the area-delay tradeoﬀ.IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems,15:1001–1011,Aug.1996. [35]S.Sapatnekar.Timing.Kluwer Academic Publishers,2004.[36]S.Sapatnekar and W.Chuang.Power-delayoptimization in gate sizing.ACM Transactions onDesign Automation of Electronic Systems,5(1):98–114,2000.[37]S.Sapatnekar,V.Rao,P.Vaidya,and S.Kang.Anexact solution to the transistor sizing problem forCMOS circuits using convex optimization.IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems,12(11):1621–1634,Nov.1993. [38]I.Sutherland,B.Sproull,and D.Harris.LogicalEﬀort:Designing Fast CMOS Circuits.MorganKaufmann Publishers,San Francisco,CA,1999. [39]J.Vanderhaegen and R.Brodersen.Automated designof operational transconductance ampliﬁers usingreversed geometric programming.In Proceedings of the 41st IEEE/ACM Design Automation Conference,pages133–138,San Diego,CA,June2004.[40]Y.Xu,L.Pileggi,and S.Boyd.ORACLE:Optimization with recourse of analog circuitsincluding layout extraction.In Proceedings of the41st IEEE/ACM Design Automation Conference,pages151–154,San Diego,CA,June2004.[41]F.Young,C.Chu,W.Luk,and Y.Wong.Handlingsoft modules in general nonslicingﬂoorplan usingLagrangian relaxation.IEEE Transactions onComputer-Aided Design of Integrated Circuits andSystems,20(5):687–629,May2001.。