一篇关于FPGA的英文文献及翻译

合集下载

英文翻译

关于基于FPGA的波形发生器论文翻译的译文FPGA 技术介绍概述场域可程式化闸阵列(FPGA) 技术正持续发展，而全世界FPGA 市场的产值，则预估可从2005 年的19 亿美金提升到2010 年的27 亿 5 千万美金。

FPGA 是在1984 年由Xilinx 公司所发明，从简单的胶合逻辑(Glue logic) 晶片，演变为可取代客制的特定应用积体电路(ASIC) 与处理器，适用于讯号处理与控制应用。

为何FPGA 技术如此成功？此篇文章将介绍FPGA，并说明数项让FPGA 如此独特的优点。

何谓FPGA？最笼统来说，FPGAs 即为可再程式化的晶片。

透过预先建立的逻辑区块与可程式化路由资源，不需更改面包板或焊锡部分，即可设定这些晶片以建置客制硬体功能。

使用者可于软体中开发数位运算系统(Computing task) 并将之编译为组态档案或位元流(Bitstream)，可包含元件接线的相关资讯。

此外，FPGA 完全为可重设性质，当使用者重新编译不同的电路设定时，可立刻拥有不同的特性。

在过去，工程师必须深入了解数位硬体设计，才能够使用FPGA 技术。

然而，高阶设计工具的新技术可针对图形化程式区或 C 程式码，转换为数位硬体电路，即变更了FPGA 程式设计的规则。

FPGA 整合了ASIC 与处理器架构系统的最佳部分，使FPGA 晶片可应用于所有产业。

FPGA 具有硬体时脉的速度与可靠性，且其仅需少量即可进行作业；可降低客制化ASIC 设计的费用。

可重新程式设计的晶片，具有与软体相同的弹性，却不受限于处理核心的数量。

与处理器不同的是，FPGA 为实际的平行架构，因此不同的处理作业并不需要占用相同资源。

每个独立的处理作业均将指派至专属的晶片区块，不需影响其他逻辑区块即可自动产生功能。

因此，当新增其他处理作业时，应用某部分的效能亦不会受到影响。

FPGA 技术的5 大优点效能–透过硬体的平行机制，FPGA 可突破依序执行(Sequential execution) 的固定运算，并于每时脉循环完成更多作业，以超越数位讯号处理器(DSP) 的计算功能。

FPGA外文文献

High Level Programming for Real TimeFPGA Based Image ProcessingD Crookes, K Benkrid, A Bouridane, K Alotaibi and A BenkridSchool of Computer Science, The Queen‟s University of Belfast, Belfast BT7 1NN, UK ABSTRACTReconfigurable hardware in the form of Field Programmable Gate Arrays (FPGAs) has been proposed as a way of obtaining high performance for computationally intensive DSP applications such us Image Processing (IP), even under real time requirements. The inherent reprogrammability of FPGAs gives them some of the flexibility of software while keeping the performance advantages of an application specific solution.However, a major disadvantage of FPGAs is their low level programming model. To bridge the gap between these two levels, we present a high level software environment for FPGA-based image processing, which aims to hide hardware details as much as possible from the user. Our approach is to provide a very high level Image Processing Coprocessor (IPC) with a core instruction set based on the operations of Image Algebra. The environment includes a generator which generates optimised architectures for specific user-defined operations.1. INTRODUCTIONImage Processing application developers require high performance systems for computationally intensive Image Processing (IP) applications, often under real time requirements. In addition, developing an IP application tends to be experimental and interactive. This means the developer must be able to modify, tune or replace algorithms rapidly and conveniently.Because of the local nature of many low level IP operations (e.g. neighbourhood operations), one way of obtaining high performance in image processing has been to use parallel computing [1]. However, multiprocessor IP systems have generally speaking not yet fulfilled their promise. This is partly a matter of cost, lack of stability and software support for parallel machines; it is also a matter of communications overheads particularly if sequences of images are being captured and distributed across the processors in real time.A second way of obtaining high performance in IP applications is to use Digital Signal Processing (DSP) processors [2,3]. DSP processors provide a performance improve-ment over standard microprocessors while still maintaining a high level programming model. However, because of the software based control, DSP processors have still difficulty in coping with real time video processing.At the opposite end of the spectrum lie the dedicated hardware solutions. Application Specific Integrated Circuits (ASICs) offer a fully customised solution to a particular algorithm [4]. However, this solution suffers from a lack of flexibility, plus the high manufacturing cost and the relatively lengthy development cycle.Reconfigurable hardware solutions in the form of FPGAs [5] offer high performance, with the ability to be electrically reprogrammed dynamically to perform other algorithms. Though the first FPGAs were only capable of modest integration levels and were thus usedmainly for glue logic and system control, the latest devices [6] have crossed the Million gate barrier hence making it possible to implement an entire System On a Chip. Moreover, the introduction of the latest IC fabrication techniques has increased the maximum speed at which FPGAs can run. Design‟s performance exceeding 150MHz are no longer outside the realm of possibilities in the new FPGA parts, hence allowing FPGAs to address high bandwidth applications such as video processing.A range of commercial FPGA based custom computing systems includes: the Splash-2 system [7]; the G-800 system [8] and VCC‟s HOTWorks HOTI & HOTII development [9]. Though this solution seems to enjoy the advantages of both the dedicated solution and the software based one, many people are still reluctant to move toward this new technology because of the low level programming model offered by FPGAs. Although behavioural synthesis tools have made enormous progress [10, 11], structural design techniques (including careful floorplanning) often still result in circuits that are substantially smaller and faster than those developed using only behavioural synthesis tools [12].In order to bridge the gap between these two levels, this paper presents a high level software environment for an FPGA-based Image Processing machine, which aims to hide the hardware details from the user. The environment generates optimised architectures for specific user-defined operations, in the form of a low level netlist. Our system uses Prolog as the basic notation for describing and composing the basic building blocks. Our current implementation of the IPC is based on the Xilinx 4000 FPGA series [13].The paper first outlines the programming environment at the user level (the programming model). This includes facilities for defining low level Image Processing algorithms based on the operators of Image Algebra [14], without any reference to hardware details. Next, the design of the basic building blocks necessary for implementing the IPC instruction set are presented. Then, we describe the runtime execution environment.2. THE USER’S PROGRAMMING MODELAt its most basic level, the programming model for our image processing machine is a host processor (typically a PC programmed in C++) and an FPGA-based Image Processing Coprocessor (IPC) which carries out complete image operations (such as convolution, erosion etc.) as a single coprocessor instruction. The instruction set of the IPC provides a core of instructions based on the operators of Image Algebra. The instruction set is also extensible in the sense that new compound instructions can be defined by the user, in terms of the primitive operations in the core instruction set. (Adding a new primitive instruction is a task for an architecture designer).The coprocessor core instruction setMany IP neighbourhood operations can be described by a template (a static window with user defined weights) and one of a set of Image Algebra operators. Indeed, simple neighbourhood operations can be split in two stages:∙ A …local‟ operato r applied between an image pixel and the corresponding window coefficient.∙ A …global‟ operator applied to the set of intermediate results of the local operation, to reduce this set to a single result pixel.The set of local operators contains …Add‟ (…+‟) and …multiplication‟ (…*‟), whereas the global operator contains …Accumulation‟ (…∑‟), …Maximum‟ (…Max‟) and …Minimum‟ (…Min‟). With these local and global operators, the following neighbourhood operations can be built:For instance, a simple Laplace operation would be performed by doing convolution (i.e. Local Operation = …∑‟ and Global operation= …*‟) with the following template:The programmer interface to this instruction set is via a C++ class. First, the programmer creates the required instruction object (and its FPGA configuration), and subsequently applies it to an actual image. Creating an instruction object is generally in two phases: firstly build an object describing the operation, and then generate the configuration, in a file. For neighbourhood operations, these are carried out by two C++ object constructors:image_operator (template & operator details)image_instruction (operator object, filename)For instructions with a single template operator, these can be conveniently combined in a single constructor:Neighbourhood_instruction (template, operators, filename)The details required when building a new image operator object include:∙The dimension of the image (e.g. 256 ⨯ 256)∙The pixels size (e.g. 16 bits).∙The size of the window (e.g. 3⨯3).∙The weights of the neighbourhood window.∙The target position within the window, for aligning it with the image pixels (e.g. 1,1).∙The …local‟ and …global‟ operations.Later, to apply an instruction to an actual image, the apply method of the instruction object is used:Result = instruction_object.apply (input image)This will reconfigure the FPGA (if necessary), download the input pixel data and store the result pixels in the RAM of the IPC as they are generated.The following example shows how a programmer would create and perform a 3 by 3 Laplace operation. The image is 256 by 256; the pixel size is 16 bits.2.1 Extending the Model for Compound OperationsIn practical image processing applications, many algorithms comprise more than a single operation. Such compound operations can be broken into a number of primitive core instructions.Instruction Pipelining: A number of basic image operations can be put together in series. A typical example of two neighbourhood operations in series is the …Open‟ operation. To do an …Open‟ operation, an …Erode‟ neighbourhood operation is first performed, and the resulting image is fed into a …Dilate‟ neighbourhood operation as shown in Figure 1.Figure 1 ‘Open’ complex operationThis operation is described as follows in our high level environment:Task parallel: A number of basic image operations can be put together in parallel.For example, the Sobel edge detection algorithm can be performed (approximately) by adding the absolute results of two separate convolutions. Assuming that the FPGA has enough computing resources available, the best solution is to implement the operations in parallel using separate regions of the FPGA chip.Figure 2 Sobel complex operationThe following is an example of the code, based on our high level instruction set, to define and use a Sobel edge detection instruction. The user defines two neighbourhood operators(horizontal and vertical Sobel), and builds the image instruction by summing the absolute results from the two neighbourhood operations.The generation phase will automatically insert the appropriate delays to synchronise the two parallel operations.3. ARCHITECTURES FROM OPERATIONSWhen a new Image_instruction object(e.g. Neighbourhood_instruction) is created (by new), the corresponding FPGA configuration will be generated dynamically. In this section, we will present the structure of the FPGA configurations necessary to implement the high level instruction set for the neighbourhood operations described above. As a key example, the structure of a general 2-D convolver will be presented. Other neighbourhood operations are essentially variations of this, with different local and global operators sub-blocks.A general 2D convolverAs mentioned earlier, any neighbourhood image operation involves passing a 2-D window over an image, and carrying out a calculation at each window position.To allow each pixel to be supplied only once to the FPGA, internal line delays are required. These synchronise the supply of input values to the processing elements, ensuringthat all the pixel values involved in a particular neighbourhood operation are processed at the same instant[15, 16]. Assuming a vertical scan of the image, Figure 3 shows the architecture of a generic 2-D convolver with a P by Q template. Each Processing Element (PE) performs the necessary Multiply/Accumulate operation.Figure 3 Architecture of a generic 2-D, P by Q convolution operation Architecture of a Processing ElementBefore deriving the architecture of a Processing Element, we first have to decide which type of arithmetic to be used- either bit parallel or bit serial processing.While parallel designs process all data bits simultaneously, bit serial ones process input data one bit at a time. The required hardware for a parallel implementation is typically …n‟ times the equivalent serial implementation (for an n-bit word). On the other hand, the bit serial approach requires …n… clock cycles to process an n-bit word while the equivalent parallel one needs only one clock cycle. However, bit serial architectures operates at a higher clock frequency due to their smaller combinatorial delays. Also, the resulting layout in a serial implementation is more regular than a parallel one, because of the reduced number of interconnections needed between PEs (i.e. less routing stress). This regularity feature means that FPGA architectures generated from a high level specification can have more predictable layout and performance. Moreover, a serial architecture is not tied to a particular processing word length. It is relatively straightforward to move from one word length to another withvery little extra hardware (if any). For these reasons, we decided to implement the IPC hardware architectures using serial arithmetic.Note, secondly, that the need to pipeline the bit serial Maximum and Minimum operations common in Image Algebra suggests we should process data Most Significant Bit first (MSBF). Following on from this choice, because of problems in doing addition MSBF in 2‟s complement, there are certain advantages in using an alternative number representation to 2‟s complement. For the p urposes of the work described in this paper, we have chosen to use a redundant number representation in the form of a radix-2 Signed Digit Number system (SDNR) [17]. Because of the inherent carry-free property of SDNR add/subtract operations, the corresponding architectures can be clocked at high speed. There are of course several alternative representations which could have been chosen, each with their own advantages. However, the work presented in this paper is based on the following design choices:∙Bit serial arithmetic∙Most Significant Bit First processing∙Radix-2 Signed Digit Number Representation (SDNR) rather than 2‟s complement.Because image data may have to be occasionally processed on the host processor, the basic storage format for image data i s still, however, 2‟s complement. Therefore, processing elements first convert their incoming image data to SDNR. This also reduces the chip area required for the line buffers (in which data is held in 2‟s complement). A final unit to convert a SDNR resu lt into 2‟s complement will be needed before any results can be returned to the host system. With these considerations, a more detailed design of a general Processing Element (in terms of a local and a global operation) is given in Figure 4.Figure 4 Architecture of a standard Processing ElementDesign of the Basic Building BlocksIn what follows, we will present the physical implementation of the five basic building blocks stated in section 2 (the adder, multiplier, accumulator and maximum/ minimum units). These basic components were carefully designed in order to fit together with as little wastage as possible.The ‘multiplier’ unitThe multiplier unit used is based on a hybrid serial-parallel multiplier outlined in [18]. It multiplies a serial SDNR input with a two‟s complement parallel coefficient B=b N b N-1…b1 as shown in Figure 5. The multiplier has a modular, scaleable design, and comprises four distinct basic building components [19]: Type A, Type B, Type C and Type D. An N bit coefficient multiplier is constructed by:Type A → Type B→ (N-3)*TypeC → Type DThe coefficient word length may be varied by varying the number of type C units. On the Xilinx 4000 FPGA, Type A, B and C units occupy one CLB, and a Type D unit occupies 2 CLBs. Thus an N bit coefficient multiplier is 1 CLB wide and N+1 CLBs high. The online delay of the multiplier is 3.In+In-Figure 5 Design of an N bit hybrid serial-parallel multiplierThe ‘accumulation’ g lobal operation unitThe accumulation unit is the global operation used in the case of a convolution. It adds two SDNR operands serially and outputs the result in SDNR format as shown in Figure 6. The accumulation unit is based on a serial online adder presented in [20]. It occupies 3 CLBs laid out vertically in order to fit with the multiplier unit in a convolver design.Figure 6Block diagram and floorplan of an accumulation unitThe ‘Addition’ local operation unitThis unit is used in additive/maximum and additive/minimum operations. It takes a single SDNR input value and adds it to the corresponding window template coefficient. The coefficient is stored in 2‟s complement format into a RAM addressed by a counter whose period is the pixel word length. To keep the design compact, we have implemented the counter using Linear Feedback Shift Registers (LFSRs). The coefficient bits are preloaded into the appropriate RAM cells according to the counter output sequence. The input SDNR operand is added to the coefficient in bit serial MSBF.+-+-Figure 7. Block diagram and floorplan of an …Addition‟ local operation unitOut-Out+The adder unit occupies 3 CLBs. The whole addition unit occupies 9 CLBs laid out in a 3x3 array. The online delay of this unit is 3 clock cycles.The Maximum/Minimum unitThe Maximum unit selects the maximum of two SDNR inputs presented to its input serially, most significant bit first. Figure 10 shows the transition diagram of the finite state machine performing the maximum …O‟ of two SDNRs …X‟ and ‟Y‟. The physical impl ementation of this machine occupies an area of 13 CLBs laid out in 3 CLBs wide by 5 high. Note that this will allow this unit to fit the addition local operation in an Additive/Maximumneighbourhood operation. The online delay of this unit is 3, compatible with the online delay of the accumulation global operation.*(O=X)*(O=Y)X +X --+Figure 8. State diagram and floorplan of a Maximum unitThe minimum of two SDNRs can be determined in a similar manner knowing that Min(X,Y)=- Max(-X,-Y).5. THE COMPLETE ENVIRONMENTThe complete system is given in Figure 11. For internal working purposes, we have developed our own intermediate high level hardware description notation called HIDE4k [21]. This is Prolog-based [22], and enables highly scaleable and parameterised component descriptions to be written.In the front end, the user programs in a high level software environment (typically C++) or can interact with a Dialog-based graphical interface, specifying the IP operation to be carried out on the FPGA in terms of Local and Global operators, window template coefficients etc. The user can also specify:The desired operating speed of the circuit.∙The input pixel bit-length.∙Whether he or she wants to use our floorplanner to place the circuit or leave this task to the FPGA vendor‟s Placement and Routing tools.The system provides the option of two output circuit description formats: EDIF netlist (the normal), and VHDL at RTL level.Behind the scenes, when the user gives all the parameters needed for the specific IP operation, the intermediate HIDE code is generated. Depending on the choice of the output netlist format, the HIDE code will go through either the EDIF generator tool to generate an EDIF netlist, or the VHDL generator tool to generate a VHDL netlist. In the latter case, the resulting VHDL netlist needs to be synthesised into an EDIF netlist by a VHDL synthesiser tool. Finally, the resulting EDIF netlist will go through the FPGA vendor‟s specific tools to generate the configuration bitstream file. The whole process is invisible to the user, thus making the FPGA completely hidden from the user‟s point of view. Note that the resulting configuration is stored in a library, so it will not be regenerated if exactly the same operation happens to be defined again.Complete and efficient configurations have been produced from our high level instruction set for all the Image Algebra operations and for a variety of complex operations including…Sobel‟, …Open‟ and …Close‟. They have been successfully simulat ed using the Xilinx Foundation Project Manager CAD tools.Figure 10 presents the resulting layout for a Sobel edge detection operation on XC4036EX-2 for 256x256 input image of 8-bits pixels. An EDIF configuration file, with all the placement information, has been generated automatically by our tools from the high level description in 2.1. Note that the generator optimises the design, and uses just a single shared line buffer area for the two (task parallel) neighbourhood operations. The resulting EDIF file is fed to Xilinx PAR tools to generate the FPGA configuration bitstream. The circuit occupies 475 CLBs. Timing simulation shows that the circuit can run at a speed of 75MHz which leads to a theoretical frame rate of 143 frames per second.Figure 10 Physical configuration of Sobel operation on XC4036EX-2 Figure 11 presents the resulting layout for an 'Open' operation on XC4036EX-2 for 256x256 input image of 8-bits pixels. As previously, EDIF configuration file with all the placement information has been generated, automatically by our tools from the correspondinghigh level description presented in section 2.1. The resulting EDIF file is then fed to Xilinx PAR tools to generate the FPGA configuration bitstream. The circuit occupies 962 CLBs. Timing simulation shows that the circuit can run at a speed of 75MHz which leads to a theoretical frame rate of 133 frames per second.Figure 11 Physical configuration of Open operation on XC4036EX-26. CONCLUSIONSIn this paper, we have presented the design of an FPGA-based Image Processing Coprocessor (IPC) along with its high level programming environment. The coprocessor instruction set is based on a core level containing the operations of Image Algebra. Architectures for user-defined compound operations can be added to the system. Possibly the most significant aspect of this work is that it opens the way to image processing application developers toexploit the high performance capability of a direct hardware solution, while programming in an application-oriented model. Figures presented for actual architectures show that real time video processing rates can be achieved when staring from a high level design.The work presented in this paper is based specifically on Radix-2 SDNR, bit serial MSBF processing. In other situations, alternative number representations may be more appropriate. Sets of alternative representations are being added to the environment, including a full bit parallel implementation of the IPC [23]. This will give the user a choice when trying to satisfy competing constraints.Although our basic approach is not tied to a particular FPGA, we have implemented our system on XC4000 FPGA series. However, the special facilities provided by the new Xilinx VIRTEX family (e.g. large on-chip synchronous memory, built in Delay Locked Loops etc.) make it a very suitable target architecture for this type of application. Upgrading our system to operate on this new series of FPGA chips is underway.REFERENCES[1] Webber, H C (ed.), …Image processing and transputers‟, IOS Press, 1992.[2] Rajan, K, Sangunni, K S and Ramakrishna, J, …Dual-DSP systems for signal and image-processing‟, Microprocessing & Microsystems, Vol 17, No 9, pp 556-560, 1993.[3] Akiyama, T, Aono, H, Aoki, K, et al,…MPEG2 video codec using Image compressionDSP‟, IEEE Transactions on Consumer Electronics, Vol 40, No 3, pp 466-472, 1994. [4] L.A. Christopher, W.T. Mayweather and S.S. Perlman, …VLSI median filter for impulsenoi se elimination in composite or component TV signals‟, IEEE Transactions on Consumer Electronics, Vol 34, no. 1, pp. 263-267, 1988.[5] J. Rose and A. Sangiovanni-Vincentelli, …Architecture of Field Programmable GateArrays‟, Proceedings of the IEEE Volume 81, No7, pp 1013-1029, 1993.[6] /products/virtex/ss_vir.htm[6] Arnold, J M, Buell, D A and Davis, E G, …Splash-2‟, Proceedings of the 4th AnnualACM Symposium on Parallel Algorithms and Architectures, ACM Press, pp 316-324, June 1992.[7] Gigaops Ltd., The G-800 System, 2374 Eunice St. Berkeley, CA 94708.[8] Chan, S C, Ngai, H O and Ho, K L, …A programmable image processing system usingFPGAs‟, International Journal of Electronics, Vol 75, No 4, pp 725-730, 1993.[9] /[10] /news/pubs/snug/snug99_papers/Jaffer_Final.pdf[11] FPL99.[12] Hutchings.[13] Xilinx 4000.[14] Ritter G X, Wilson J N and Davidson J L, …Image Algebra: an overview‟, ComputerVision, Graphics and Image Processing, No 49, pp 297-331, 1990.[15] Shoup, R G, …Parameterised Convolution Filtering in an FPGA‟, More FPGAs, WMoore and W Luk (editors), Abington, EE&CS Books, pp 274, 1994.[16] Kamp, W, Kunemund, H, Soldner and Hofer, H, …Programmable 2D linear filter forvideo applications‟, IEEE Journal of Solid State Circuits, pp 735-740, 1990.[17] Avizienis A, …Signed Digit Number Representation for Fast Parallel Arithmetic”, IRETransactions on Electronic Computer, Vol. 10, pp 389-400, 1961.[18] Moran, J, Rios, I and Mene ses, J, …Signed Digit Arithmetic on FPGAs‟, More FPGAs, WMoore and W Luk (editors), Abington, EE&CS Books, pp 250, 1994.[19] Donachy, P, …Design and implementation of a high level image processing machineusing reconfigurable hardware‟, PhD Thesis, Depar tment of Computer Science, The Queen‟s University of Belfast, 1996.[20] Duprat, J, Herreros, Y and Muller, J, …Some results about on-line computation offunction‟, 9th Symposium on Computer Arithmetic, Santa Monica, September 1989. [21]D Crookes, K Alota ibi, A Bouridane, P Donachy and A Benkrid, 1998, …An Environmentfor Generating FPGA Architectures for Image Algebra-based Algorithms‟, ICIP98, Vol.3, pp. 990-994.[22]Clocksin W F and Melish C S, 1994, …Programming in Prolog‟, Springer-Verlag.。

毕设文献翻译

TranslationThe Micro-architecture of FPGA BasedSoft ProcessorsPeter Yiannacouras, Jonathan Rose, and J. Gregory Steffan Department of Electrical and Computer EngineeringUniversity of Toronto 10 King’s College RoadToronto, CanadaABSTRACTAs more embedded systems are built using FPGA platforms, there is an increasing need to support processors in FPGA. One option is the soft processor, a programmable instruction processor implemented in the reconfigurable logic of the FPGA. Commercial soft processors have been widely deployed, and hence we are motivated to understand their micro-architecture. We must re-evaluate micro-architecture in the soft processor context because an FPGA platform is significantly different than an ASIC platformor, example, the relative speed of memory and logic is quite different in the two platforms, as is the area cost. In this paper we present an infrastructure for rapidly generating RTL models of soft processors, as well as a methodology for measuring their area, performance, and power. Using our automatically-generated soft processors we explore the micro-architecture trade-off space including: (1) hardware vs software multiplication support; (ii) shifter implementations; and (iii) pipeline depth, organization, and forwarding. For example, we find that a 3-stage pipeline has better wall-clock-time performance than deeper pipelines, despite lower clock frequency. We also compare our designs to Altera's NiosII commercial soft processor variations and find that our automatically generated designs span the design space while remaining very competitive.General TermsKeywords Soft processor, FPGA, exploration, micro-architecture, RTL generation, application specific tradeoff, nios, embedded processor, pipeline, ASIP。

FPGA论文相关的英文文献

FPGA Implementation of RS232 to Universalserial bus converter1 V.Vijaya, (PhD) M.Tech2 Rama Valupadasu (Ph D), M.Tech, 3.B.RamaRao Chunduri, PhD, M.TechAssoc. Professor. VCEW Asst.Professor, NIT, Warangal Professor, NIT, Warangalvsrtej@yahoo.co.in agnivesh91@yahoo.co.in cbrr@nitw.ac.in4. Ch.Kranthi Rekha, M.Tech,5. B.Sreedevi, M.Tech,Asst.Professor Assoc. Professor. VCEWLUC, Mantin, Malaysia vaagvijs_15@yahoo.co.inmadakranthirekha@yahoo.co.inAbstract— Universal Serial Bus (USB) is a new personal computer interconnection protocol, developed to make the connection of peripheral devices to a computer easier and more efficient. It reduces the cost for the end user, improves communication speed and supports simultaneous attachment of multiple devices (up to127)RS232, in another hand, was designed to single device connection, but is one of the most used communication protocols. An embedded converter from RS232 to USB is very interesting, since it would allow serial-based devices to experience USB advantages without major changes. This work describes the specification and development of such converter and it is also a useful guide for implementing other USB devices. The main blocks in the implementation are USB device, UART (RS232 protocol engine) and interface FIFO logic. The USB device block has to know how to detect and respond to events at a USB port and it has to provide a way for the device to store data to be sent and retrieve data that have been received UART consists of different blocks which handle the serial communication through RS232 protocol. There are a set of control registers to control the data transfer. The interface FIFO logic has FIFO to bridge the data rate differences between USB and RS232 protocols. Index Terms— First-In-First-Out, RS-232, Universal Asynchronous Receive Transmit, Universal Serial Bus.I.INTRODUCTIONThis paper describes the specification and implementation of a converter from RS232 to USB (Universal Serial Bus). Thisconverter is responsible for receiving data from a peripheraldevice’s serial interface and sending it to a computer’s USBinterface. In the same way, it must be able to send data from the PC’s USB interface to the device. The problems faced with the old standards stimulated the development of a newcommunication protocol, which should be easier to use,faster, and more efficient. RS232 is a definition for serial communication on a 1:1 base. RS232defines the interface layer, but not the application layer. To use RS232 in a specific situation, application specific software must be written on devices on both ends of the connecting RS232 cable. RS232 ports can be either accessed directly by an application, or via a device driver in the operating system. USB is a new personal computer interconnection standard developed by industry and telecommunication leaders, which implements the Plug and Play technology. It allows multiple devices connection (up to 127) ranges. The use of a the devices attachment to PCs. USB is a low cost, easing solution and supports transfer rates up to 12Mbs, comprehending the low-speed and mid-speed data converter from a serial interface to USB would free a serial communication port to other applications, allowing a device that uses a serial interface to communicate using an USB interface. USB on the other hand is a bus system which allows more than one peripheral to be connected to a host computer via one USB port. Hubs can be used in the USB chain to extend the cable length and allow for even more devices to connect to the same USB port. The standard not only describes the physical properties of the interface, but also the protocols to be used. Because of the complex USB protocol requirements, communication with USB ports on a computer is always performed via a device driver. This way, we are not limited to the availability of a serial port and we can experience the USB advantages. Using a converter allows us to have the device unchanged, making the converter responsible for treating the differences between the protocols. This work was based on protocol engine which can be managed by exchanging data with a PC across a serial interface. Most of the times, this communication is not done constantly, since it is necessary to have a serial port available just for it. This paper presents the converter implementation, focusing on the development process, which comprehends the device itself and the PC-side software that will communicate with it. This methodology can be extended to other devices. We first present some important USB standard concepts. Then, we define the system specification, divided on host and device requirements. After, we describe the hardware (UART) features and software design and implementation. Finally, we discuss about achieved results and future workII.PROBLEM DESCRIPTIONThe USB specification describes bus attributes, protocol definition, programming interface and other features required to design and build systems and peripherals compliant with the USB standard. We briefly explain features used in our project.2011 IEEE Symposium on Computers & InformaticsThe USB interface does not give this flexibility. When however an RS232 port is used via an USB to RS232 converter, this flexibility should be present in some way. Therefore to use an RS232 port via an USB port, a second device driver is necessary which emulates a RS232 UART, but communicates via USB. USB works as a Master/Slave bus, where the USB Host is the Master and the devices are the Slaves. The only system resources required by a USB system are the memory locations used by USB system software and the memory and/or I/O address space and IRQ line used by the USB host controller. USB devices can be functional (displays, mice, etc) or hubs, used to connect other devices in the bus. They can be implemented as low or high-speed devices. Low-speed devices are limited to a maximum 1.5 Mb/s rate. Each device has a number of individual registers - known as Endpoints which are indirectly accessed by the device drivers for data exchange. Each endpoint supports particular transfer characteristic has a unique address and direction. A special case is Endpoint 0, which is used for control operations and can do bi-directional transfers. It must be present in all devices. According to the device’s characteristics, other types of endpoints can be defined. USB Host verifies the attachment and detachment of new devices, initiating the enumeration process and managing all the following transactions. It is responsible to install device driver (based on information provided by device descriptors), to automatically reconfigure the system (hot attachment) and to collect statistics and status of each device. USB on the other hand is a bus system which allows more than one peripheral to be connected to a host computer via one USB port. Hubs can be used in the USB chain to extend the cable length and allow for even more devices to connect to the same USB port. The standard not only describes the physical properties of the interface, but also the protocols to be used. Because of the complex USB protocol requirements, communication with USB ports on a computer is always performed via a device driver. Device’s descriptors specify USB devices attributes and characteristics and describe device communication requirements (Endpoint Descriptors). The USB host uses this information to configure the device, to find its driver, and to access it. Devices with similar functions are grouped into classes [1, 2] in order to share common features and even use the same device drivers. Each class can define their own descriptors (class-specific descriptors), as for example, HID (Human Interface Device) Class Descriptors and Report Descriptors. The HID class consists of devices used by people to control computer systems. It defines a structure that describes a HID device, with specific communication requirements. According to the converter characteristics, it can be implemented as a HID device, using already developed HID drivers. A HID device’s descriptors must support an Interrupt IN endpoint and the firmware must also contain a report descriptor that defines the format for transmitted and received device data.A. RequestsThe USB protocol is based on requests sent by the host and processed by the USB devices. These requests can be directed to a device or a specific endpoint in it. Standard requests must be implemented by all devices and are used for configuring a device and controlling the state of its USB interface, among other features. Two HID-specific requests must be supported by the converter: Set Report and Get Report. These requests enable the device to receive and send generic device information to the host. Set Report request is the only way the host can send data to a HID device, once it does not have an Interrupt OUT endpointB. Communication FlowUSB is a shared bus and many devices might use it at the same time. The devices share the bandwidth using a protocol based on tokens and commanded by the host. USB communication is based on transferring data at regular intervals called frames. A frame is composed by one or more transactions that must be executed in a 1 ms time. USB data transfers are typically originated by a USB Device Driver when it needs to communicate with its device. It supplies a memory buffer used to store the data in transfers to or from the USB device. The USB Driver provides the interface between USB Device Driver and USB Host Controller, translating transfer requests into USB transactions, consistent with the bandwidth requirements and protocol structure. Some of these transfers consist of a large block of data, which need to be splitted into several transactions. The Host Controller generates the transaction based on the Transfer Descriptor, which describes the frame sharing among the several devices requests. When a transaction is sent to the bus, all devices see it. Each transaction begins with a packet that determines its type and the endpoint address. The USB driver controls this addressing scheme. Inside the device, the USB Device Layer comprehends the actual USB communication mechanism and transfer characteristics. USB Logical Device implements a collection of endpoints that comprise a given functional interface, which can be manipulated by its respective USB client.C. Transfer TypesThe USB specification defines four transfer types: Control, Interrupt, Isochronous and Bulk. Control transfers send requests and data relating to the device’s abilities and configuration. They can also be used to transfer blocks of information for any other purpose. Control transfers consist of a Setup stage, followed by a Data stage, which is composed of one or more Data transactions, and a Status stage. All data transactions in a Data Stage must be in the same direction (In or out). Interrupt transfers are typically used for devices that need to transfer data at regular period of time, and consequently must be polled periodically. The polling interval is defined in the Endpoint Descriptor. The data payloadfor this kind of transfer for low-speed devices is 8 bytes. Error correction is done in this kind of transfer. Two other transfer types are Isochronous and Bulk , which are used for devices thatneed a guaranteed transfer rate or for large blocks of data transfers. They are not used in this work.III. PROCEDURE/ALGORITHM A. System SpecificationTo develop a USB peripheral we need all the following: A host that supports USB. Driver software on the host to communicate with the peripheral. An application executing in the host that communicates with the peripheral device. A UART with a USB interface. Code implementation on the USB controller to carry Out the USB communication. Code implementation on the USB controller to carry out the peripheral functions. Hardware specific problem arises from handshaking to prevent buffer overflows at the receiver's side. RS232 applications can use two types of handshaking, either with control commands in the data stream, called software flow control, or with physical lines, called hardware flow control. Not all USB to RS232 converters provide these hardware flow control lines. It is not always easily identified if an application needs them. Some applications do not use hardware flow control at all, and those cheap USB to RS232 converters will work without problems. Other applications use hardware flow control, but infrequently. Only with large data bursts, or in situations where the CPU is busy performing other tasks, hardware flow control might kick in to prevent data loss. In those situations, communications may seem error free, but with sometimes bytes lost, or unspecified errors in the communications. In a UART& FIFO used to store sent and received data in the USB communication process. Two endpoints were defined for the converter, where the first one is Endpoint 0, used for control operations and the second one is an Interrupt IN Endpoint, defined for sending data to the host. This way, a converter from a serial interface to USB can be implemented as a HID device with the features mentioned above.Fig 2.RS232 to USB ConverterB. HOST REQUIREMENTSThe choice of the Operating System used by the host wasdone in 1999, based on the USB support it provides. Itshould provide the entire drivers infrastructure and supportthe protocol characteristics, as for example, Plug and Play. The host must be able to receive USB data using its device drivers and make them available to the applications that have done the request. It is essential that we have a driver in the host to process USB transfers, recognizing the device, receiving and sending data to a USB device.A. Device requirementsSome communication requirements, such as transmission speed, frequency and amount of data to be transferred, were essential in communication the process of defining the UART be used. Considering the speeds available for USB devices, it was clear that the converter could be implemented as a low speed device, where the communication speed varies from 10 to 100Kb/s. Considering the amount of data transferred and the transmission frequency, the converter was defined to use Interrupt transfers, a transfer type where considerable amounts of data must be transferred in pre defined amounts of time. The host is responsible for verifying if the device needs to transmit data from time to time. Interrupt transfers can be done in both directions, but needs to transmit data from time to time. Interrupt transfers can be done in both directions, but not at the same time. For the converter, they could be used to send and receive data from the PC. The Operating System provides HID drivers that allow us to use this transfer type. The maximum packet size for one transaction is 8 bytes for low speed devices. If we are sending larger amounts of data, they need to be splitted into many transactions, once USB is a shared bus. Another feature defined for the converter was the number of endpoints needed. As explained before, endpoints are buffersFig 1.RS232 to USB Interface DiagramIII HARDWARE DESCRIPTIONIt is a low-cost solution for low-speed applications with high I/O requirements. RS232 ports which are physically mounted in a computer are often powered by three power sources: +5 Volts for the UART logic, and -12 Volts and +12 Volts for the outputdrivers. USB however only provides a +5 Volt power source.Some USB to RS232converters use integrated DC /DC converters to create the appropriate voltage levels for the RS232 signals, implementations, the +5 Volt voltages is directly used to drive the output The UART has serial interface to the RS232 driver. The operation of UART is controlled by an external host processor. There is an 8-bit data interface to host along with read and write control signals. Clock is fed from external crystal. Thefamily is USB specification [1] compliant and supports one address and three data endpoints [5]. The choice of a UART with three endpoint was done in order to allow us to have, beyond the Interrupt IN, an Interrupt OUT endpoint for receiving data from the host (OUT). Its definition requires we have an odd endpoint number besides Endpoint 0. This configurationcould not be implemented at the time the project was being developed once the Operating System did not offer support for Interrupt OUT endpoints, which were defined in a later version of the specification. The instruction set has been optimized specifically for USB operations, USB controller provides one USB device address with three endpoints. The USB device address is assigned to the device and saved in the USB Device Address Register (7 bits) during the USB enumeration process. The USB controller communicates with the host using dedicated FIFO, one per endpoint. Each endpoint FIFO is implemented as 8 bytes of dedicated SRAM and the status and control of each of them can be done using its Mode Register and Count Register.IV. SOFTWARE DESIGN AND IMPLEMENTATIONThe development of the converter was divided in phases: Descriptors definition. Device detection and enumeration module (request treatment), Serial data exchange module, USB/serial modules interface be overlapped. USB data exchange module (request treatment). The phases definition does not imply that they cannot be overlapped.A .Descriptors definitionThe main structure to be data implemented consists of device descriptors, as defined by the USB specification [1] These descriptors store information about the device and the USB communication process, used by the host to identify the device and its characteristics. The Device Descriptor is the first descriptor the host reads on device attachment. It includes the basic information the host needs in order to retrieve further characteristics from the device. Its fields' values were defined according to the converter characteristics [7]. To implement a new device, some of these values must be re evaluated and changed if necessary. The converter was defined to use just one interface and two endpoints (Control and Interrupt IN). Interrupt OUT endpoints were defined just in a later version of HID specification. To solve this problem, data packets are sent to the UPS across Endpoint 0, using the SET REPORT request, and received through Endpoint 1, using Interrupt transfers. The data reception is done through Output Reports, which were defined as 16 8-bit fields, according to the largest command sent to the UPS. Sending data to the host is done through Input Reports, which were defined as 8, 8- bit fields. Report Descriptors define the size and uses for the data that implements the device’s functionality.B. Device Detection and EnumerationThe second phase consists of the implementation of the code that enables the host to detect and enumerate the device. The implementation of these routines was based on some example codes [8, 9, 10]. Inside the we must have the code to access the descriptors, to recognize and to respond to the request codes that the host sends when it enumerates the device.C. The process of sending and receiving dataThe process of sending data to the UPS is done through ControlTransfers using SET REPORT on Endpoint 0. The host sends a request to the USB device, indicating it wants to send data. Aninterrupt informs the device when new data have arrived onEndpoint 0 and the corresponding Interrupt Service Routinecopies it into a data buffer, which is used in the serial communication process.. The maximum packet size that isreceived from the host was defined according to the largestcommand that must be sent to the function must be changed toallow receiving an arbitrary number of bytes. These routines are called after the Host or the controller sends a packet to the bus.Endpoint 0 ISR receives. Using hardware flow control impliesthat more lines must be present between the sender and thereceiver, leading to a thicker and more expensive cable. Therefore, software flow control is a good alternative if it is notneeded to gain maximum performance in communications.Software flow control makes use of the data channel between thetwo devices which reduces the bandwidth. The reduce of bandwidth is in most cases however not so astonishing that it is areason to not use it. First, the computer sets its RTS line to signalthe device that some information is present. The device checks ifthere is room to receive the information and if so, it sets the CTS line to start the transfer. When using a null modem connection,this is somewhat different. There are two ways to handle thistype of handshaking in that situation. One is, where the RTS ofeach side is connected with the CTS side of the other. In that way, the communication protocol differs somewhat from theoriginal one. The RTS output of computer A signals computer B that A is capable of receiving information, rather than a requestfor sending information as in the original configuration. Thistype of communication can be performed with a null modemcable for full handshaking. Although using this cable is not completely compatible with the original way hardware flow control was designed, if software is properly designed for it it can achieve the highest possible speed because no overhead ispresent for requesting on the RTS line and answering on the CTSline. In the second situation of null modem communication withhardware flow control, the software side looks quite similar to the original use of the handshaking lines. The CTS and RTS linesof one device are connected directly to each other. This means,that the request to send query answers itself. As soon as the RTSoutput is set, the CTS input will detect a high logical value indicating that sending of information is allowed. This impliesthat information will always be sent as soon as sending isrequested by a device if no further checking is present. Toprevent this from happening, two other pins on the connector are used, the data set ready DSR and the data terminal ready DTR. These two lines indicate if the device attached is working properly and willing to accept data. When these lines are cross-connected (as in most null modem cables) flow control can be performed using these lines. A DTR output is set, if that computer accepts incoming characters.V.R ESULT A NALYSIS:Fig.4.Shows the Waveforms of RS232USBconverterFIG. 5. RTLSCHEMATICSFIG. 6.The Routed designVI. CONCLUSIONSAn embedded converter from RS232 to USB is designed in this project. VHDL will be used for implementing all these blocks. ModelSim Simulator tool will be used for functional simulation of the design. Reduces the cost for the end user, improves communication speed and supports simultaneous attachment of multiple devices (up to 127). USB protocol operates at 480 Mbps FPGA implementation of the design is done on Spartan 3E FPGA (XC3S500E). The design used 6% of the FPGA area and a maximum frequency of 130MHz is obtained.ACKNOWLEDGMENTWe are grateful to management Vaagdevi college of Engineering, Warangal, NIT Warangal, Linton University College, and Mantin for the facilities to provide to complete the project in time.REFERENCES1.Ana Luiza de Almeida Pereira Zuquim, Claudionor JosCNunes Coelho Jr, Antanio Ot6vio Fernández, Marcos PCgode Oliveira, AndrCa Iabrudi Tavares, “An EmbeddedConverter from RS232 to Universal Serial Bus”, IEEE2.Jan axelson, “USB Complete, Everything you need todevelop custom USB peripherals”, Penram Intl.Publishing(India), 19993.Universal Serial Bus Specification Revision 2.04.5.Charles H.Roth, Jr, “Digital Systems Design using VHDL”,PWS publishing company, 1996.6.ZainalabediNavabi,“VHDL Analysis and Modelling ofDigital Systems”, McGraw – Hill, Second Edition.7.8.9.Douglas L. Perry ,”VHDL”, Second Edition, McGraw-Hill,Inc, 199310..au/catalog/targus-usb-to-parallel-adapter-p-1160.html11. USB Complete: The Developer's Guide, 4th Edition12. USB Mass Storage: Designing and ProgrammingDevices and Embedded Hosts14. FPGAPrototyping by VHDL Examples: Xilinx Spartan-3Version. Pong P.ChuBibliographical notesV.Vijaya obtained her B.Tech Degree in Electronics & Communication Engg., from (JNTU) Jawaharlal Nehru Technological University College of Engg., Ananthapur, and M.Tech. Degree in Instrumentation and Control Systems, from JNTUK college of Engg Kakinada and Pursuing PhD from JNTUH, Hyderabad. V.Vijaya worked at APEL Radio Communication Systems, Hyderabad and presently, she is working as Associate Professor in the ECE Dept of Vaagdevi College of Engineering at Warangal. She has 10 years of Teaching Experience and 2 years of Industrial Experience. Attended 15 workshops/refresher courses/short term courses at various places. Member of Project Review Committee (UG/PG); CRC for (UG/PG).She is the project coordinator for UG/PG. Her area of interest are Image processing, Signal processing, VLSI, Mobile Communications, Wireless Communications. She is life member of ISTE, IETE.She is the member of IEEE. She has published no. of papers in national conferences and international conferences.V.Rama obtained her B.Tech in Electronics &Communication Engg., from JNTU, Kakinada, and M.Tech. from NIT, Warangal. Pursuing PhD from NIT, Warangal She is working as Asst Professor in the ECE Dept., at NIT, Warangal. Staff adviser of ECE Dept., Incharge for basic Electronics Lab. She involved inextracurricular activites at institute. She has 12 years of Teaching Experience. She organized no. of UGC workshops in NITW. Her area of research is Bio Medical Signal Processing. Her areas of interest are Image processing, Signal processing, Tele medicine. She is the member of IEEE. She has published no. of papers in national and international conferences.CH.Kranthi Rekha had received her B.E in Electronics and Communication Engineering from Madurai Kamaraj University in 2000 and Completed M.Tech from JNTUH, Hyderabad. Presently she is working as Lecturer in Linton university college, Mantin, Malaysia, She has more than 10 years of teaching experience. She is the Author of two Books (Digital communications and Digital Image processing). Organized student level technical symposium technocraft-’09. Attended 10 workshops/refresher courses/short term courses at various places. As a resource person to talk on Image processing. Member of Project Review Committee (UG/PG); CRC for (UG/PG). Her area of interest are Neural networks, Image processing, Signal processing, VLSI, Communications. She is life member of ISTE, IETE.She has published no. of papers in national conferences and international conferences.B.Sreedevi obtained her AMIE Degree in Electronics & Communication Engg., from Institution of Engineers, Calcutta, and M.Tech. Degree in Digital System Computer Electronics, from JNTUA college of Engg Ananthapur. She is working as Associate Professor in the ECE Dept of Vaagdevi College of Engineering at Warangal. She has 10 years of Teaching Experience. Attended 12 workshops/refresher courses/short term courses at various places. Member of Project Review Committee (UG/PG); CRC for (UG/PG). Her area of interest are Image processing, Signal processing, VLSI, Communications. She is life member of ISTE, IETE. She has published no. of papers in national conferences and international conferences.C.B.RamaRao obtained his B.Tech in Electronics & Communication Engg., from JNTU Kakinada, and M.Tech. from JNTU Kakinada, Ph.D from IIT, kharagpur. He is working as Professor in the ECE Dept., at NIT, Warangal. At present he is the Head of ECE Dept.,. He involved in various activities at institute. He acted as associate dean of academic affairs at NITW. He has 28 years of Teaching Experience. He organized no. of workshops at NITW. His area of research is in advanced digital signal processing. His areas of interest are Bio Medical Signal Processing, Image processing, Signal processing. He is the member of IEEE. He has published no. of papers in national and international conferences.。

FPGA外文资料91

2.

Estimating energy using very detailed post-route power and delay models, we determine the energy savings obtained by our poweraware technology mapping, clustering, placement, and routing algorithms and investigate how the savings behave when the algorithms are applied concurrently. The individual savings of the power-aware technology-mapping, clustering, placement, and routing algorithms were 7.6%, 12.6%, 3.0%, and 2.6% respectively. The majority of the overall savings were achieved during the technology mapping and clustering stages of the poweraware FPGA CAD flow. In addition, the savings were mostly cumulative when the individual power-aware CAD algorithms were applied concurrently with an overall energy reduction of 22.6%.
Permission to make digiel or hard copies of a l l or pan of this work for personal or classroom use is granted without fee provided that copies are not mvdc or dislribulcd for profit or commercial advantage and that copies bcar this notice and the full citation on the firs1 page. To copy otherwise, 10 republish, 10 post on scrvers or lo redistribute to lists, requires prior specific permission andlor a fee. lCCAD’O3. November I 1 -I3, 2003, San Jose, California. U S A . Copyright 2003 ACM I - 5 x 1 13-762-11031001I ... $5.00.

一篇关于FPGA的英文文献及翻译精品文档19页

使用LabVIEW FPGA模块开发可编程自动化控制器Building Programmable Automation Controllers with LabVIEW FPGA OverviewProgrammable Automation Controllers (PACs) are gaining acceptance within the industrial control market as the ideal solution for applications that require highly integrated analog and digital I/O, floating-point processing, and seamless connectivity to multiple processing nodes. National Instruments offers a variety of PAC solutions powered by one common software development environment, NI LabVIEW. With LabVIEW, you can build custom I/O interfaces for industrial applications using add-on software, such as the NI LabVIEW FPGA Module.With the LabVIEW FPGA Module and reconfigurable I/O (RIO) hardware, National Instruments delivers an intuitive, accessible solution for incorporating the flexibility and customizability of FPGA technology into industrial PAC systems. You can define the logic embedded in FPGA chips across the family of RIO hardware targets without knowing low-level hardware description languages (HDLs) or board-level hardware design details, as well as quickly define hardware for ultrahigh-speed control, customized timing and synchronization, low-level signal processing, and custom I/O with analog, digital, and counters within a single device. Youalso can integrate your custom NI RIO hardware with image acquisition and analysis, motion control, and industrial protocols, such as CAN and RS232, to rapidly prototype and implement a complete PAC system.Table of Contents1.Introduction2.NI RIO Hardware for PACs3.Building PACs with LabVIEW and the LabVIEW FPGA Module4.FPGA Development Flowing NI SoftMotion to Create Custom Motion Controllers6.Applications7.ConclusionIntroductionYou can use graphical programming in LabVIEW and the LabVIEW FPGA Module to configure the FPGA (field-programmable gate array) on NI RIO devices. RIO technology, the merging of LabVIEW graphical programming with FPGAs on NI RIO hardware, provides a flexible platform for creating sophisticated measurement and control systems that you could previously create only with custom-designed hardware.An FPGA is a chip that consists of many unconfigured logic gates. Unlike the fixed, vendor-defined functionality of an ASIC(application-specific integrated circuit) chip, you can configure and reconfigure the logic on FPGAs for your specific application. FPGAs are used in applications where either the cost of developing and fabricating an ASIC is prohibitive, or the hardware must be reconfigured after being placed into service. The flexible, software-programmable architecture of FPGAs offer benefits such as high-performance execution of custom algorithms, precise timing and synchronization, rapid decision making, and simultaneous execution of parallel tasks. Today, FPGAs appear in such devices as instruments, consumer electronics, automobiles, aircraft, copy machines, and application-specific computer hardware. While FPGAs are often used in industrial control products, FPGA functionality has not previously been made accessible to industrial control engineers. Defining FPGAs has historically required expertise using HDL programming or complex design tools used more by hardware design engineers than by control engineers.With the LabVIEW FPGA Module and NI RIO hardware, you now can use LabVIEW, a high-level graphical development environment designed specifically for measurement and control applications, to create PACs that have the customization, flexibility, and high-performance of FPGAs. Because the LabVIEW FPGA Module configures custom circuitry in hardware, your system can process and generate synchronized analog and digitalsignals rapidly and deterministically. Figure 1 illustrates many of the NI RIO devices that you can configure using the LabVIEW FPGA Module.Figure 1. LabVIEW FPGA VI Block Diagram and RIO Hardware Platforms NI RIO Hardware for PACsHistorically, programming FPGAs has been limited to engineers who have in-depth knowledge of VHDL or other low-level design tools, which require overcoming a very steep learning curve. With the LabVIEW FPGA Module, NI has opened FPGA technology to a broader set of engineers who can now define FPGA logic using LabVIEW graphical development. Measurement and control engineers can focus primarily on their test and control application, where their expertise lies, rather than the low-level semantics of transferring logic into the cells of the chip. The LabVIEW FPGA Module model works because of the tight integration between the LabVIEW FPGA Module and the commercial off-the-shelf (COTS) hardware architecture of the FPGA and surrounding I/O components.National Instruments PACs provide modular, off-the-shelf platforms for your industrial control applications. With the implementation of RIO technology on PCI, PXI, and Compact Vision System platforms and the introduction of RIO-based CompactRIO, engineers now have the benefits of a COTS platform with the high-performance, flexibility, and customizationbenefits of FPGAs at their disposal to build PACs. National Instruments PCI and PXI R Series plug-in devices provide analog and digital data acquisition and control for high-performance, user-configurable timing and synchronization, as well as onboard decision making on a single device. Using these off-the-shelf devices, you can extend your NI PXI or PCI industrial control system to include high-speed discrete and analog control, custom sensor interfaces, and precise timing and control.NI CompactRIO, a platform centered on RIO technology, provides a small, industrially rugged, modular PAC platform that gives you high-performance I/O and unprecedented flexibility in system timing. You can use NI CompactRIO to build an embedded system for applications such as in-vehicle data acquisition, mobile NVH testing, and embedded machine control systems. The rugged NI CompactRIO system is industrially rated and certified, and it is designed for greater than 50 g of shock at a temperature range of -40 to 70 °C.NI Compact Vision System is a rugged machine vision package that withstands the harsh environments common in robotics, automated test, and industrial inspection systems. NI CVS-145x devices offer unprecedented I/O capabilities and network connectivity for distributed machine vision applications.NI CVS-145x systems use IEEE 1394 (FireWire) technology, compatible with more than 40 cameras with a wide range of functionality,performance, and price. NI CVS-1455 and NI CVS-1456 devices contain configurable FPGAs so you can implement custom counters, timing, or motor control in your machine vision application.Building PACs with LabVIEW and the LabVIEW FPGA ModuleWith LabVIEW and the LabVIEW FPGA Module, you add significant flexibility and customization to your industrial control hardware. Because many PACs are already programmed using LabVIEW, programming FPGAs with LabVIEW is easy because it uses the same LabVIEW development environment. When you target the FPGA on an NI RIO device, LabVIEW displays only the functions that can be implemented in the FPGA, further easing the use of LabVIEW to program FPGAs. The LabVIEW FPGA Module Functions palette includes typical LabVIEW structures and functions, such as While Loops, For Loops, Case Structures, and Sequence Structures as well as a dedicated set of LabVIEW FPGA-specific functions for math, signal generation and analysis, linear and nonlinear control, comparison logic, array and cluster manipulation, occurrences, analog and digital I/O, and timing. You can use a combination of these functions to define logic and embed intelligence onto your NI RIO device.Figure 2 shows an FPGA application that implements a PID control algorithm on the NI RIO hardware and a host application on a Windows machine or an RT target that communicates with the NI RIO hardware. Thisapplication reads from analog input 0 (AI0), performs the PID calculation, and outputs the resulting data on analog output 0 (AO0). While the FPGA clock runs at 40 MHz the loop in this example runs much slower because each component takes longer than one-clock cycle to execute. Analog control loops can run on an FPGA at a rate of about 200 kHz. You can specify the clock rate at compile time. This example shows only one PID loop; however, creating additional functionality on the NI RIO device is merely a matter of adding another While Loop. Unlike traditional PC processors, FPGAs are parallel processors. Adding additional loops to your application does not affect the performance of your PID loop.Figure 2. PID Control Using an Embedded LabVIEW FPGA VI withCorresponding LabVIEW Host VI.FPGA Development FlowAfter you create the LabVIEW FPGA VI, you compile the code to run on the NI RIO hardware. Depending on the complexity of your code and the specifications of your development system, compile time for an FPGA VI can range from minutes to several hours. To maximize development productivity, with the R Series RIO devices you can use a bit-accurate emulation mode so you can verify the logic of your design before initiating the compile process. When you target the FPGA Device Emulator, LabVIEW accesses I/O from the device and executes the VI logic on the Windowsdevelopment computer. In this mode, you can use the same debugging tools available in LabVIEW for Windows, such as execution highlighting, probes, and breakpoints.Once the LabVIEW FPGA code is compiled, you create a LabVIEW host VI to integrate your NI RIO hardware into the rest of your PAC system. Figure 3 illustrates the development process for creating an FPGA application. The host VI uses controls and indicators on the FPGA VI front panel to transfer data between the FPGA on the RIO device and the host processing engine. These front panel objects are represented as data registers within the FPGA. The host computer can be either a PC or PXI controller running Windows or a PC, PXI controller, Compact Vision System, or CompactRIO controller running a real-time operating system (RTOS). In the above example, we exchange the set point, PID gains, loop rate, AI0, and AO0 data with the LabVIEW host VI.Figure 3. LabVIEW FPGA Development FlowThe NI RIO device driver includes a set of functions to develop a communication interface to the FPGA. The first step in building a host VI is to open a reference to the FPGA VI and RIO device. The Open FPGA VI Reference function, as seen in Figure 2, also downloads and runs the compiled FPGA code during execution. After opening the reference, you read and write to the control and indicator registers on the FPGA using theRead/Write Control function. Once you wire the FPGA reference into this function, you can simply select which controls and indicators you want to read and write to. You can enclose the FPGA Read/Write function within a While Loop to continuously read and write to the FPGA. Finally, the last function within the LabVIEW host VI in Figure 2 is the Close FPGA VI Reference function. The Close FPGA VI Reference function stops the FPGA VI and closes the reference to the device. Now you can download other compiled FPGA VIs to the device to change or modify its functionality.The LabVIEW host VI can also be used to perform floating-point calculations, data logging, networking, and any calculations that do not fit within the FPGA fabric. For added determinism and reliability, you can run your host application on an RTOS with the LabVIEW Real-Time Module. LabVIEW Real-Time systems provide deterministic processing engines for functions performed synchronously or asynchronously to the FPGA. For example, floating-point arithmetic, including FFTs, PID calculations, and custom control algorithms, are often performed in the LabVIEW Real-Time environment. Relevant data can be stored on a LabVIEW Real-Time system or transferred to a Windows host computer for off-line analysis, data logging, or user interface displays. The architecture for this configuration is shown in Figure 4. Each NI PAC platform that offers RIO hardware can run LabVIEW Real-Time VIs.Figure 4. Complete PAC Architecture Using LabVIEW FPGA, LabVIEW Real-Timeand Host PCWithin each R Series and CompactRIO device, there is flash memory available to store a compiled LabVIEW FPGA VI and run the application immediately upon power up of the device. In this configuration, as long as the FPGA has power, it runs the FPGA VI, even if the host computer crashes or is powered down. This is ideal for programming safety power down and power up sequences when unexpected events occur.Using NI SoftMotion to Create Custom Motion ControllersThe NI SoftMotion Development Module for LabVIEW provides VIs and functions to help you build custom motion controllers as part of NI PAC hardware platforms that can include NI RIO devices, DAQ devices, and Compact FieldPoint. NI SoftMotion provides all of the functions that typically reside on a motion controller DSP. With it, you can handle path planning, trajectory generation, and position and velocity loop control in the NI LabVIEW environment and then deploy the code on LabVIEW Real-Time or LabVIEW FPGA-based target hardware.NI SoftMotion includes functions for trajectory generator and spline engine and examples with complete source code for supervisory control, position, and velocity control loop using the PID algorithm. Supervisory control and the trajectory generator run on a LabVIEW Real-Time target and run at millisecond loop rates. The spline engine and the control loop can run either on a LabVIEW Real-Time target at millisecond loop rates or on a LabVIEW FPGA target at microsecond loop rates.ApplicationsBecause the LabVIEW FPGA Module can configure low-level hardware design of FPGAs and use the FPGAs within in a modular system, it is ideal for industrial control applications requiring custom hardware. These custom applications can include a custom mix of analog, digital, and counter/timer I/O, analog control up to 125 kHz, digital control up to 20 MHz, and interfacing to custom digital protocols for the following:•Batch control•Discrete control•Motion control•In-vehicle data acquisition•Machine condition monitoring•Rapid control prototyping (RCP)•Industrial control and acquisition•Distributed data acquisition and control•Mobile/portable noise, vibration, and harshness (NVH) analysisConclusionThe LabVIEW FPGA Module brings the flexibility, performance, and customization of FPGAs to PAC platforms. Using NI RIO devices and LabVIEW graphical programming, you can build flexible and custom hardware using the COTS hardware often required in industrial control applications. Because you are using LabVIEW, a programming language already used in many industrial control applications, to define your NI RIO hardware, there is no need to learn VHDL or other low-level hardware design tools to create custom hardware. Using the LabVIEW FPGA Module and NI RIO hardware as part of your NI PAC adds significant flexibility and functionality for applications requiring ultrahigh-speed control, interfaces to custom digital protocols, or a custom I/O mix of analog, digital, and counters.使用LabVIEW FPGA（现场可编程门阵列）模块开发可编程自动化控制器综述工业控制上的应用要求高度集成的模拟和数字输入输出、浮点运算和多重处理节点的无缝连接。

FPGA外文资料125

Testing Conﬁgurable LUT-Based FPGA’s Wei Kang Huang,Fred J.Meyer,Member,IEEE,Xiao-Tao Chen,and Fabrizio Lombardi,Member,IEEEAbstract—We present a new technique for testingﬁeldprogrammable gate arrays(FPGA’s)based on look-up tables(LUT’s).We consider a generalized structure for the basic FPGAlogic element(cell);it includes devices such as LUT’s,sequentialelements(ﬂip-ﬂops),multiplexers and control circuitry.We usea hybrid fault model for these devices.The model is based on aphysical as well as a behavioral characterization.This permitsdetection of all single faults(either stuck-at or functional)andsome multiple faults using repeated FPGA reprogramming.Weshow that different arrangements of disjoint one-dimensional(1-D)cell arrays with cascaded horizontal connections andcommon vertical input lines provide a good logic testing regimen.The testing time is independent of the number of cells in thearray(C-testability).We deﬁne new conditions for C-testabilityof programmable/reconﬁgurable arrays.These conditions donot suffer from limited I/O pins.Cell conﬁguration affects thecontrollability/observability of the iterative array.We applythe approach to various Xilinx FPGA families and compareit to prior work.Index Terms—C-testability,ﬁeld programmable gate array,programmability,reconﬁgurability,testing.I.N OTATION AND D EFINITIONSHorizontal The internal inputs(outputs)of an iterative array.These propagate dependency between the CLB’sin the array.Vertical The external inputs of an iterative array.Thesecan be directly speciﬁed in the test patterns andrequire I/O blocks.Phase Each testing phase is a reprogramming of theFPGA followed by test vector application.Sincereprogramming is slow,the number of phases isa good measure of testing time.Session The application of every CLB test conﬁguration tothose CLB’s that are under test.Multiple sessionsare required if not all CLB’s can be under testsimultaneously.C-testable An FPGA is C-testable with a given testingmethod if the number of programmings isindependent of the circuit size.In particular,foran iterative array,it is independent of the lengthof the array.and segments with programmable devices.Customization is accomplished by conﬁguring the interconnect and the CLB’s by loading them with appropriate data from an external storage device.The FPGA also includes input/output blocks(IOB’s), which provide the interface between the package pins and the internal logic.The numbers of CLB’s and IOB’s vary widely depending on the particular chip and manufacturer[2].FPGA’s are versatile and in widespread use,due to their programmable nature and their ease of reconﬁguration[3].Internal static conﬁguration (memory)elements determine the logic functions and the interconnections.The SRAM in memory-based FPGA’s can be used to conﬁgure functions via look-up tables(LUT’s). Also,they often have a mode where the conﬁguration SRAM is usable as memory.We focus on CLB testing for SRAM-based FPGA’s that implement functions via LUT’s.III.B ACKGROUNDTesting FPGA’s is addressed in the literature such as[4]–[7]. These works and this paper deal with manufacturing test.Other tests in theﬁeld,such as verifying correctly loaded conﬁgu-ration data,are typically handled by architectural features for reprogrammable FPGA’s[2].Reference[4]discusses testing of row-based(segmented channel)FPGA’s.The approach sequentially tests every cell using a modiﬁed scan procedure, providing100%fault coverage of single stuck-at faults.It requires many tests and does not fully exploit the regularity of the FPGA to reduce test time.The methodology in[8]for testing uncommitted segmented channel FPGA’s for single stuck-at faults is based on connecting the cells of each row as a one-dimensional(1-D)unilateral array,such that the FPGA could be tested as a set of disjoint arrays.This yields considerable reduction in both vectors and test circuitry. Simultaneous testing of disjoint arrays helps achieve constant test set size(C-testability),so that test cost will be independent of chip size[9].In[7],the FPGA is conﬁgured to conduct direct output comparisons of pairs of logic cells using full cell control-lability.Test generation and output response comparison are handled internally using some of the logic resources in a built-in self-test(BIST)arrangement.This requires at least one extra “session,”i.e.,a doubling of chip programmings so that the cells previously used for test pattern generation and for output comparisons can become cells under test.Fault simulation established that100%fault coverage can be accomplished for single stuck-at faults.In[10],the logic resources are arranged as an iterative logic array(ILA)[9].This allows better scalability than the previous BIST arrangement[7]; however,it also requires another additional session—i.e.,a tripling of chip programmings.A simple testing arrangement(referred to as“naive”)was mentioned in[5].It connects together all input lines to the CLB’s(cells)under test from the IOB’s,and uses the remaining IOB’s for direct observability of the output lines of each cell under test.Fig.1shows a single programming phase with the three leftmost CLB’s under test.EachCLB Fig.1.One programming phase with the naive testingmethod.Fig.2.Interior of a Xilinx XC5200CLB.in theﬁgure has three inputs and two outputs.Three IOB’s are consumed in order to provide the cells under test with their input vectors.The cells under test have no connections between them.Their output response is directly observed at the IOB’s.In each programming phase,only a few CLB’s can be tested in parallel.This is basically restricted by the number of IOB’s and the number of output lines of each CLB.In Fig.1, only three CLB’s can be under test,because,after three IOB’s are used for CLB inputs,only seven IOB’s remain to observe output response.IV.P ROPOSED F AULT M ODELFig.2shows a portion of a Xilinx XC5200CLB.A full CLB consists of four stacked copies of theﬁgure(with the carry in (CI)and carry out(CO)signals rippled through)—plus a little extra logic.The portion in Fig.2has a single LUT with four inputs,so it has24conﬁguration bits to specify its function. Of the three multiplexers,M1is a conventional multiplexer. M2and M3are programmable multiplexers;each needs only a single conﬁguration bit to specify which input it passes.There is also a Dﬂip-ﬂop.Our generalized internal CLB structure permits these de-vices:LUT’s,programmable(conﬁgurable)multiplexers,con-ventional multiplexers,andﬂip-ﬂops.Some conventional logic usually does not interfere with test generation.We assume the interconnect and the IOB’s have already been tested;the interested reader should refer to[11]and[12] for a detailed treatment.In our proposed testing strategy,wedivide CLB’s into independent sets(linear arrays).For our fault model,within each linear array,we assume at most one faulty CLB;otherwise,fault masking might occur.For the single faulty CLB,the nature of the fault could take any form. For simplicity of illustration,in our investigation,we limited a CLB to a single faulty device.The nature of a device fault varies with the device.We model single device faults both physically(e.g.,stuck-at)and functionally[13].This hybrid fault model is adaptable to emerging FPGA technology and to different products as they become commercially available[2].The fault model was shown suitable to FPGA’s in[5]and conﬁrmed by industrial experiments.In particular,by device as follows.•For a LUT,a fault can occur in any one of the:memory matrix,decoders,and input and output lines.A faulty memory matrix makes some memory cells incapable of storing the correct logic value(the LUT has a single bit output,so this value is either0or1)[2].Any number of memory cells in the LUT could be faulty.If the fault is in the decoder,the erroneous address leads to reading the contents of the wrong cell,i.e.,a one-bit address fault.The third possible LUT fault is on the I/O lines,with respect to which we allow any single stuck-at fault.The one-bit decoder address fault can be collapsed to the stuck-at fault of a LUT input line.So this fault type is detected when the decoders are tested.A stuck-at fault on a LUT output line is covered by the tests for the LUT memory matrix.•For a multiplexer,we use a functional fault model, because the internal logic structure varies from FPGA to FPGA[2].Testing conﬁrms the multiplexer’s ability to select each input.•For the Dﬂip-ﬂops,we use a functional fault model.A fault can cause aﬂip-ﬂop to receive no data,to be unable to be triggered by the correct clock edge,or to be unable to be set or reset.Our testing objectives are as follows:•a100%fault coverage under a single faulty device model with neither delay nor area overhead;•ease of test pattern generation,because patterns are generated for a CLB,not the whole FPGA;•efﬁcient implementation of the testing process as mea-sured by the amount of memory required to store the test instructions(conﬁguration bits and test patterns);•the number of programming phases must be as small as possible,because reprogramming time is much greater than test pattern application time[3].V.T ESTING A CLBWe generate test patterns in two steps according to the CLB partitioning.Consider initially a CLB made of a single LUT.We can test the LUT memory matrix by reading all the memory bits in two phases.The programmed memory matrix contents in the second phase are complements of theﬁrst.The scenario is different for testing stuck-at faults at the LUT inputs.The LUT matrix contents must be arranged such that the boolean difference is one for the input to be tested;multiple patterns are required.Furthermore,each LUT output must be observable at a primary CLB output.So we require a sensitized path from the LUT output to a primary CLB output.By deﬁnition of the combinational partition,this can be achieved by conﬁguring the multiplexers(or other devices) in the partition.We use a functional test for the multiplexers.Since a multiplexer selects from among all inputs,each data input must be active in at least one phase.Further,the functional test consists of applying logic1to the selected input while holding all others at logic0,and a second test pattern with these logic values reversed.The multiplexer output must be observable from at least one primary CLB output.If a multiplexer is not simultaneously controllable/observable,additional phases could be required.We need atleast phases to test a multiplexerwith inputs.A multiplexer can be tested with a LUT(if connected);a possible way to accomplish this consists of choosing a memory matrix for the LUT that satisﬁes the multiplexer(s)testability conditions.In this way,test phases can be overlapped(reduced).A.Testing the Sequential PartitionThe sequential partition includes the Dﬂip-ﬂops as well as multiplexers and control circuits emanating from them or only observable through them.During test generation,we seek to overlap testing of programmable multiplexers with that of ﬂip-ﬂops.In some FPGA’s,ﬂip-ﬂops are more complicated than the D type.In particular,the Xilinx XC4000family[2]has Dﬂip-ﬂops plus added logic that can be programmed to add set/reset capability.The S/R controllers are conﬁgurable to allow a set function,a reset function,or neither.For the XC4000,this requires three separate programming phases;however,testing of the S/R controllers can be overlapped with testing the ﬂip-ﬂops.1)Testing the D Flip-Flops:We functionally test the D ﬂip-ﬂops.We test the input and hold function with the data sequence010(or101)at D.Separate phases are required to test both rising and falling edge trigger mechanisms,if applicable.We can test the set(reset)function by applying the set(reset)signal after aﬂip-ﬂop is in the“0”(“1”)state. The set/reset disable functions must also be tested if present, leading to another phase.To test the clock enable function,we use theﬁve-vector sequence given in Fig.3.Some functional tests can be overlapped to reduce the number of phases.We can possibly also overlap phases with those for multiplexers, depending on the sequential partition’s structure.VI.P ROPOSED T EST S TRATEGYFig.4shows a linear iterative array.There is a cascaded (horizontal)input reﬂecting the dependence of the cells in the iterative array,and testvector,etc.The period of thearray,Fig.3.Testing the enablefunction.Fig.4.Iterative array with period three.number of cells we must traverse in order to repeat the cascaded input.We do not allow all test patterns;the test generation process must ensure the periodicity is satisﬁed as it searches the input space.In Fig.4it must ensurethat—couldall be different,but we will constrain them to be identical—so,if wehaveand(and schedule the corresponding vertical CLBinputs)and we will have every CLB experience testvectorvertical inputs.Then,weneedIOB’s forcontrollability/observabilitywhen•For testing the CLB sequential partition,we program a 1-D (sequential)array and use the pipeline technique of [8].To reduce the number of required IOB’s,as many as possible vertical inputs are made common (i.e.,they will have identical logic values when test vectors are applied).This is beneﬁcial because,in a sequential array,the requirements of controllability and observability are far more stringent than for a combinational array of the same size [8],[9].Let the number of vertical inputs with different logic values in the test process for thepipeline techniquebe.So,IOB’s are used as primary vertical inputs for thevertical inputs of the CLB’s with different logic valuesduring the test processandIOB’s are used as primary common vertical inputs for those vertical inputs that do not need to be distinguished.The total required IOB’s isthenwherewheretestsandhence,the totalFig.5.Block diagram of an XC5200CLB.number of test patterns for the CLB sequential partitionsisthe total number of phasesisandAgain,the number of requiredphases for testing the whole FPGA is the same as that fortesting a single CLB,so The conﬁguredarrays are sequential in12of the required19phases.For thecombinational arrays,the number of vectors required in eachphase is the same as for a single ing the sequentialarrays in[8],we need additional cycles equal to the numberof CLB’s in an iterative array.Therefore,A.Example:The XC5200FPGA FamilyAn XC5200CLB has four independent four-input LUT’s,14multiplexers,and four Dﬂip-ﬂops(Fig.5).This is a stack offour independent logic cells(LC’s).Each LC in Fig.2containsone four-input LUT,three multiplexers,and one Dﬂip-ﬂopfor a total ofﬁve independent inputs().Also,it hasa),while the sequential partition hasﬁveinputspatterns tests LC0/2.At the same time,we notice that wehavetestedtostuck faults.We need two phases,because weneed to program each memory cell for both0and1.There are16memory cells in each LUT and we need toaccess each in both phases for a total of32test patterns;however,six of these have already been performed(twoeach in phases2–4).•Phase7:In this sequential test phase wetesttostuckfaults.•Phase8:In this sequential test phase wetestfollowsthe LUT output,we propagate errors by horizontallyconnectingpatterns to testthe whole FPGA,becausedenotes the numberof test“sessions.”In each session,different CLB’s are underparing the formulaefor isequivalent in naturetoisgood for the Xilinx families studied,especially the XC3000.Table IIIgivesin Table IIIwould be increased if the BIST-ILA conﬁguration of[10]were used.For the array-based method,all CLB’s are undertest simultaneously,so the equivalenttoandof anapproach—i.e.,the maximum number of faults such that testinvalidation(fault masking)cannot occur.Wegiveis under worst case conditions.The upper bound can be achieved if the faulty CLB locationsare favorable.We assume that test invalidation always occursif the CLB locations permitit.TABLE IVCLB A RCHITECTURAL C OMPARISON•Naive Approach.Every CLB is tested independently,hencewas barelyachievable.)•Array-Based Approach.We assume test invalidation oc-curs in a1-D array if there is more than one faultyCLB.Since we conﬁgure the arrays along the rows ofthe FPGA,IX.A RCHITECTURES AND C OMPARATIVE A NALYSISWe compare the Xilinx FPGA’s in the series3000,4000,and5200with respect to their CLB structures and IOBlimitations.We compare each CLB architecture by consider-ing programmability and controllability/observability and therequired programming phases.Further details are in[14].parison of CLB Devices and FeaturesTable IV gives the numbers of Dﬂip-ﬂops,LUT’s,andprogrammable and conventional multiplexers—together withthe test patterns and programming phases needed by the array-based method.It also gives the conﬁguration memory sizedue to theﬂip-ﬂops and programmable multiplexers.Thedifference between the XC3000and XC4000families is notlarge.The XC4000has an extra8-bit LUT connected in serieswith the other two LUT’s.This partially affects controllabilityand observability;however,some of the required tests forthe additional LUT can still be combined with the tests forthe other two LUT’s.We consider the R/S control elementsequivalent to two MUX’s each,so the XC4000CLB has moreprogrammable MUX’s.Also,six MUX’s have four inputs.Since the number of programming phases to test a MUXdepends linearly on the number of MUX input lines[5],theseneed more phases(and test vectors)for the XC4000comparedto the XC3000.Two main XC5200features contribute to its suitability tothe array-based testing approach.First,it can be treated asfour parallel simple logic cells(LC’s)with little hassle.Likethe XC4000LUT’s,internal signals are not independent(dueto MUX”and“”isneutral.So“IOB’s cf.CLB outputs,and better CLB observability compared to other families.For a test method,“。

外文文献及翻译-fpga实现实时适应图像阈值-其他专业

FPGA实现实时适应图像阈值Elham Ashari电气与计算机工程系,滑铁卢大学理查德霍恩西计算机科学和工程系,纽约大学摘要：本文提出了一种基于实时阈值的通用FPGA结构。

硬件架构是基于一种加权聚类算法的架构,这种算法的重点就在于聚类的前景和背景像素的阈值问题。

该方法采用聚类的二值加权神经网络法找到两个像素组的质心。

图像的阈值是两个质心的平均值。

因为对于每个输入的像素，选定的最近的权值是用来更新的，因而推荐一种自适应的阈值技术。

更新是基于输入像素的灰度级和相关权值的差额的，通过学习快慢因素来衡量其速率。

硬件系统是在FPGA平台上实现的，它包含两个功能模块。

第一个模块获得图像框架阈值，另一个模块将阈值应用于图像的框架。

两个模块的并行性和简单的硬件组成部分使其适用于实时应用程序，并且，其性能可与经常用于离线阈值技术相媲美。

通过利用FPGA对无数的例子进行模拟和实验，得到该算法的结果。

这项工作的基本应用是确定激光的质心，但接下来将会讨论它在其他方面的应用。

关键词:实时阈值,自适应阈值,FPGA实现、神经网络1 简介图像二值化是图像处理的一个主要问题。

如果要从一张图像上提取有用的信息，我们需要将它分成不同的部分（例如背景色和前景色）来进行更为详细的分析。

一般来说，前景色的像素的灰度级与背景色的灰度级是不同的。

现在已有一些较好的使图像二值化地算法，就性能而不是就速度而言，这些算法的主要目标在于高效率，然而对于一些应用，尤其对是在那些定制的硬件和实时应用程序来说，速度则是最关键的要求。

可实现的快速而简单的阈值技术在实际成像系统中得到广泛应用。

例如，结合了CMOS图像传感器的片上图像处理技术普遍存在于各种各样的成像系统当中。

在这样一个系统当中，图像的实时处理及其得到的相关信息是至关重要的。

实时阈值技术的应用领域包括机器人、汽车、目标追踪以及激光测距。

在激光测距，即确定目标的运动范围的过程中，所捕获的图像为二值图像。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Building Programmable Automation Controllers with LabVIEWFPGAOverviewProgrammable Automation Controllers (PACs) are gaining acceptance within the industrial control market as the ideal solution for applications that require highly integrated analog and digital I/O, floating-point processing, and seamless connectivity to multiple processing nodes. National Instruments offers a variety of PAC solutions powered by one common software development environment, NI LabVIEW. With LabVIEW, you can build custom I/O interfaces for industrial applications using add-on software, such as the NI LabVIEW FPGA Module.With the LabVIEW FPGA Module and reconfigurable I/O (RIO) hardware, National Instruments delivers an intuitive, accessible solution for incorporating the flexibility and customizability of FPGA technology into industrial PAC systems. You can define the logic embedded in FPGA chips across the family of RIO hardware targets without knowing low-level hardware description languages (HDLs) or board-level hardware design details, as well as quickly define hardware for ultrahigh-speed control, customized timing and synchronization, low-level signal processing, and custom I/O with analog, digital, and counters within a single device. You also can integrate your custom NI RIO hardware with image acquisition and analysis, motion control, and industrial protocols, such as CAN and RS232, to rapidly prototype and implement a complete PAC system.Table of Contents1.Introduction2.NI RIO Hardware for PACs3.Building PACs with LabVIEW and the LabVIEW FPGA Module4.FPGA Development Flowing NI SoftMotion to Create Custom Motion Controllers6.Applications7.ConclusionIntroductionYou can use graphical programming in LabVIEW and the LabVIEW FPGA Module to configure the FPGA (field-programmable gate array) on NI RIO devices. RIO technology, the merging of LabVIEW graphical programming with FPGAs on NI RIO hardware, provides a flexible platform for creating sophisticated measurement and control systems that you could previously create only with custom-designed hardware.An FPGA is a chip that consists of many unconfigured logic gates. Unlike the fixed, vendor-defined functionality of an ASIC (application-specific integrated circuit) chip, you can configure and reconfigure the logic on FPGAs for your specific application. FPGAs are used in applications where either the cost of developing and fabricating an ASIC is prohibitive, or the hardware must be reconfigured after being placed into service. The flexible,software-programmable architecture of FPGAs offer benefits such as high-performance execution of custom algorithms, precise timing and synchronization, rapid decision making, and simultaneous execution of parallel tasks. Today, FPGAs appear in such devices as instruments, consumer electronics, automobiles, aircraft, copy machines, andapplication-specific computer hardware. While FPGAs are often used in industrial control products, FPGA functionality has not previously been made accessible to industrial control engineers. Defining FPGAs has historically required expertise using HDL programming or complex design tools used more by hardware design engineers than by control engineers.With the LabVIEW FPGA Module and NI RIO hardware, you now can use LabVIEW, a high-level graphical development environment designed specifically for measurement and control applications, to create PACs that have the customization, flexibility, andhigh-performance of FPGAs. Because the LabVIEW FPGA Module configures custom circuitry in hardware, your system can process and generate synchronized analog and digital signals rapidly and deterministically. Figure 1 illustrates many of the NI RIO devices that you can configure using the LabVIEW FPGA Module.Figure 1. LabVIEW FPGA VI Block Diagram and RIO Hardware PlatformsNI RIO Hardware for PACsHistorically, programming FPGAs has been limited to engineers who have in-depth knowledge of VHDL or other low-level design tools, which require overcoming a very steep learning curve. With the LabVIEW FPGA Module, NI has opened FPGA technology to a broader set of engineers who can now define FPGA logic using LabVIEW graphical development. Measurement and control engineers can focus primarily on their test and control application, where their expertise lies, rather than the low-level semantics of transferring logic into the cells of the chip. The LabVIEW FPGA Module model works because of the tightintegration between the LabVIEW FPGA Module and the commercial off-the-shelf (COTS) hardware architecture of the FPGA and surrounding I/O components.National Instruments PACs provide modular, off-the-shelf platforms for your industrial control applications. With the implementation of RIO technology on PCI, PXI, and Compact Vision System platforms and the introduction of RIO-based CompactRIO, engineers now have the benefits of a COTS platform with the high-performance, flexibility, and customization benefits of FPGAs at their disposal to build PACs. National Instruments PCI and PXI R Series plug-in devices provide analog and digital data acquisition and control for high-performance, user-configurable timing and synchronization, as well as onboard decision making on a single device. Using these off-the-shelf devices, you can extend your NI PXI or PCI industrial control system to include high-speed discrete and analog control, custom sensor interfaces, and precise timing and control.NI CompactRIO, a platform centered on RIO technology, provides a small, industrially rugged, modular PAC platform that gives you high-performance I/O and unprecedented flexibility in system timing. You can use NI CompactRIO to build an embedded system for applications such as in-vehicle data acquisition, mobile NVH testing, and embedded machine control systems. The rugged NI CompactRIO system is industrially rated and certified, and it is designed for greater than 50 g of shock at a temperature range of -40 to 70 °C.NI Compact Vision System is a rugged machine vision package that withstands the harsh environments common in robotics, automated test, and industrial inspection systems. NICVS-145x devices offer unprecedented I/O capabilities and network connectivity for distributed machine vision applications.NI CVS-145x systems use IEEE 1394 (FireWire) technology, compatible with more than 40 cameras with a wide range of functionality, performance, and price. NI CVS-1455 and NI CVS-1456 devices contain configurable FPGAs so you can implement custom counters, timing, or motor control in your machine vision application.Building PACs with LabVIEW and the LabVIEW FPGA Module With LabVIEW and the LabVIEW FPGA Module, you add significant flexibility and customization to your industrial control hardware. Because many PACs are already programmed using LabVIEW, programming FPGAs with LabVIEW is easy because it uses the same LabVIEW development environment. When you target the FPGA on an NI RIO device, LabVIEW displays only the functions that can be implemented in the FPGA, further easing the use of LabVIEW to program FPGAs. The LabVIEW FPGA Module Functions palette includes typical LabVIEW structures and functions, such as While Loops, For Loops, Case Structures, and Sequence Structures as well as a dedicated set of LabVIEWFPGA-specific functions for math, signal generation and analysis, linear and nonlinear control, comparison logic, array and cluster manipulation, occurrences, analog and digital I/O, and timing. You can use a combination of these functions to define logic and embed intelligence onto your NI RIO device.Figure 2 shows an FPGA application that implements a PID control algorithm on the NI RIO hardware and a host application on a Windows machine or an RT target that communicates with the NI RIO hardware. This application reads from analog input 0 (AI0), performs the PID calculation, and outputs the resulting data on analog output 0 (AO0). While the FPGA clock runs at 40 MHz the loop in this example runs much slower because each component takes longer than one-clock cycle to execute. Analog control loops can run on an FPGA at a rate of about 200 kHz. You can specify the clock rate at compile time. This example shows only one PID loop; however, creating additional functionality on the NI RIO device is merely a matter of adding another While Loop. Unlike traditional PC processors, FPGAs are parallel processors. Adding additional loops to your application does not affect the performance of your PID loop.Figure 2. PID Control Using an Embedded LabVIEW FPGA VI with Corresponding LabVIEW HostVI.FPGA Development FlowAfter you create the LabVIEW FPGA VI, you compile the code to run on the NI RIO hardware. Depending on the complexity of your code and the specifications of your development system, compile time for an FPGA VI can range from minutes to several hours.To maximize development productivity, with the R Series RIO devices you can use abit-accurate emulation mode so you can verify the logic of your design before initiating the compile process. When you target the FPGA Device Emulator, LabVIEW accesses I/O from the device and executes the VI logic on the Windows development computer. In this mode, you can use the same debugging tools available in LabVIEW for Windows, such as execution highlighting, probes, and breakpoints.Once the LabVIEW FPGA code is compiled, you create a LabVIEW host VI to integrate your NI RIO hardware into the rest of your PAC system. Figure 3 illustrates the development process for creating an FPGA application. The host VI uses controls and indicators on the FPGA VI front panel to transfer data between the FPGA on the RIO device and the host processing engine. These front panel objects are represented as data registers within the FPGA. The host computer can be either a PC or PXI controller running Windows or a PC, PXI controller, Compact Vision System, or CompactRIO controller running a real-time operating system (RTOS). In the above example, we exchange the set point, PID gains, loop rate, AI0, and AO0 data with the LabVIEW host VI.Figure 3. LabVIEW FPGA Development FlowThe NI RIO device driver includes a set of functions to develop a communication interface to the FPGA. The first step in building a host VI is to open a reference to the FPGA VI and RIO device. The Open FPGA VI Reference function, as seen in Figure 2, also downloads and runs the compiled FPGA code during execution. After opening the reference, you read and write to the control and indicator registers on the FPGA using the Read/Write Control function. Once you wire the FPGA reference into this function, you can simply select which controls and indicators you want to read and write to. You can enclose the FPGA Read/Write function within a While Loop to continuously read and write to the FPGA. Finally, the last function within the LabVIEW host VI in Figure 2 is the Close FPGA VI Reference function. The Close FPGA VI Reference function stops the FPGA VI and closes the reference to the device. Now you can download other compiled FPGA VIs to the device to change or modify its functionality.The LabVIEW host VI can also be used to perform floating-point calculations, data logging, networking, and any calculations that do not fit within the FPGA fabric. For added determinism and reliability, you can run your host application on an RTOS with the LabVIEW Real-Time Module. LabVIEW Real-Time systems provide deterministicprocessing engines for functions performed synchronously or asynchronously to the FPGA. For example, floating-point arithmetic, including FFTs, PID calculations, and custom control algorithms, are often performed in the LabVIEW Real-Time environment. Relevant data can be stored on a LabVIEW Real-Time system or transferred to a Windows host computer for off-line analysis, data logging, or user interface displays. The architecture for this configuration is shown in Figure 4. Each NI PAC platform that offers RIO hardware can run LabVIEW Real-Time VIs.Figure 4. Complete PAC Architecture Using LabVIEW FPGA, LabVIEW Real-Time and Host PC Within each R Series and CompactRIO device, there is flash memory available to store a compiled LabVIEW FPGA VI and run the application immediately upon power up of the device. In this configuration, as long as the FPGA has power, it runs the FPGA VI, even if the host computer crashes or is powered down. This is ideal for programming safety power down and power up sequences when unexpected events occur.Using NI SoftMotion to Create Custom Motion ControllersThe NI SoftMotion Development Module for LabVIEW provides VIs and functions to help you build custom motion controllers as part of NI PAC hardware platforms that can include NI RIO devices, DAQ devices, and Compact FieldPoint. NI SoftMotion provides all of the functions that typically reside on a motion controller DSP. With it, you can handle path planning, trajectory generation, and position and velocity loop control in the NI LabVIEW environment and then deploy the code on LabVIEW Real-Time or LabVIEW FPGA-based target hardware.NI SoftMotion includes functions for trajectory generator and spline engine and examples with complete source code for supervisory control, position, and velocity control loop using the PID algorithm. Supervisory control and the trajectory generator run on a LabVIEW Real-Time target and run at millisecond loop rates. The spline engine and the control loop can run either on a LabVIEW Real-Time target at millisecond loop rates or on a LabVIEW FPGA target at microsecond loop rates.ApplicationsBecause the LabVIEW FPGA Module can configure low-level hardware design of FPGAs and use the FPGAs within in a modular system, it is ideal for industrial controlapplications requiring custom hardware. These custom applications can include a custom mix of analog, digital, and counter/timer I/O, analog control up to 125 kHz, digital control up to 20 MHz, and interfacing to custom digital protocols for the following:•Batch control•Discrete control•Motion control•In-vehicle data acquisition•Machine condition monitoring•Rapid control prototyping (RCP)•Industrial control and acquisition•Distributed data acquisition and control•Mobile/portable noise, vibration, and harshness (NVH) analysis ConclusionThe LabVIEW FPGA Module brings the flexibility, performance, and customization of FPGAs to PAC platforms. Using NI RIO devices and LabVIEW graphical programming, you can build flexible and custom hardware using the COTS hardware often required in industrial control applications. Because you are using LabVIEW, a programming language already used in many industrial control applications, to define your NI RIO hardware, there is no need to learn VHDL or other low-level hardware design tools to create custom hardware. Using the LabVIEW FPGA Module and NI RIO hardware as part of your NI PAC adds significant flexibility and functionality for applications requiring ultrahigh-speed control, interfaces to custom digital protocols, or a custom I/O mix of analog, digital, and counters.使用LabVIEW FPGA（现场可编程门阵列）模块开发可编程自动化控制器综述工业控制上的应用要求高度集成的模拟和数字输入输出、浮点运算和多重处理节点的无缝连接。