A Simulator for SMT Architectures Evaluating Instruction Cache Topologies

合集下载

Java Card Development Kit Simulator Release Notes说

Java Card Development Kit Simulator Release Notes说

Java Card Development KitSimulator Release NotesVersion 3.1.0u5F12192-06March 2021Table of Contents•Introduction•What's New•System Requirements•Installation•Known Issues•Documentation•Product InformationIntroductionJava Card technology enables secure elements, such as smart cards and othertamper-resistant security chips to host applications called applets, which employ Javatechnology.Java Card technology offers a secure and interoperable execution platform that canstore and update multiple applications on a single resource constrained device,while retaining the highest certification levels and compatibility with standards. JavaCard developers can build, test, and deploy applications and services rapidly andsecurely. This accelerated process reduces development costs, increases productdifferentiation, and enhances value to the customers.The Java Card Development Kit is a suite of tools for designing implementations ofJava Card technology and developing applets based on the Java Card Specifications.It is available as two independent downloads:•The Java Card Development Kit Tools are used to convert and verify Java Cardapplications.•The Java Card Development Kit Simulator offers a testing and debuggingreference for Java Card applications. It includes a Java Card simulationenvironment and Eclipse plug-in.Together, these two downloads provide a complete, stand-alone developmentenvironment in which applications written for the Java Card platform can be developedand tested.These release notes describe the Java Card Development Kit Simulator version3.1.0u5, which is based on version 3.1 of the Java Card Platform Specifications. What's NewThe following are the new features, changes, and bug fixes in the Java Card Development Kit Simulator, version 3.1.0u5:•Fixes to setKey() and getKey() methods of HMACKey and GenericSecretKey as per the clarifications provided in the specification.The following are the new features, changes, and bug fixes in the Java Card Development Kit Simulator, version 3.1.0u4:•Added the usage of the target platform when the CAP file verification is done in Eclipse plugin projects•Added the usage of the target platform when CAP file verification is done for building the samples using the Ant tool•Fixes in HMAC implementation and handling of APDU case 4 in the APDU toolThe following are the new features, changes, and bug fixes in the Java Card Development Kit Simulator, version 3.1.0u3:•Fixed an installer issue when loading multiple applet and library CAP files with static resources•Fixed a minor issue in the ByteBuffer.slice methodThe following are new features and changes in the Java Card Development Kit Simulator, version 3.1.0u1:•The Java Card plugin for Eclipse now supports all features of the latest Java Card Development Kit Tools and usability improvements for their integration. New features include:–Support for extended CAP files–Support for static resources in CAP files–Support the target parameter of the converter for the following versions of the Java Card Platform Specifications: 3.0.4, 3.0.5, and 3.1.0–New and improved management of CAP file configurations and the build process–Persistence of CAP file configurations in files that can be directly used by the converter tool in the command line–Support for Java Card Development Kit Tools path configuration as an independent setting. This ensures that the plug-in can always be configured to access the latest tools bundle.–Support for Java Card framework API debugging (source bundles only)The following are new features and changes in the Java Card Development Kit Simulator, version 3.1:•The Java Card Development Kit Simulator now supports version 3.1 of the Java Card Platform specifications. New features include:–Improved API Extensibility Using Virtual Method Mapping Table–Support for Array Views–Support for large CAP file that may comprise multiple packages–Support for static resources in CAP files–Certificate API–Monotonic Counter API–System Time API–Extended I/O FrameworkFor a complete list of supported features, see the Java Card Development Kit User Guide, Version 3.1.•Java Card Development Kit Tools and Java Card Development Kit Simulator are now two independent bundles, which can be downloaded separately, enablingeasier update of the Development Kit Tools.•New samples are available to demonstrate the use of:–Array Views–Certificate HandlingThe Java Card Development Kit Simulator, version 3.1 also supports new cryptographic algorithms. For a list of the supported cryptographic algorithms, see Supported Cryptography Classes section in the Java Card Development Kit User Guide, Version 3.1.System RequirementsThis product is targeted for use on a PC running the Microsoft Windows 7, Windows 8, and Windows10 operating system.The following software must be installed for the Java Card Development Kit Simulator to work:•Java Development Kit (JDK): This release has been verified and tested with Oracle JDK 11 (64 bit version) and OpenJDK 11 (64 bit version). Download the JDK software from:/technetwork/java/javase/downloadsInstall it according to the instructions on the website.•Eclipse IDE: Eclipse IDE is optional and is required only for using Eclipse plug-in.Download the Windows Eclipse IDE (Eclipse Neon, Oxygen, or Photon) from the following URL, and install it according to instructions on the website:https:///InstallationThe Java Card Specifications, Development Kit Simulator, and Development Kit Tools must be downloaded and installed individually.•See the Downloading the Specification Documents topic of the Java Card Platform Specification Release Notes, Version 3.1 for more details on how to download the Java Card Specification bundle.•See the Installation topic of the Java Card Development Kit User Guide for more details on how to install the Java Card Development Kit Simulator and Java Card Development Kit Tools.•The Java Card Development Kit Simulator installer is provided on the Oracle Technology Network download website (https:///technetwork/ java/embedded/javacard/overview/index.html). Install the Development Kit Simulator by downloading and running the Java Card Development Kit Simulator .msi installer. Contents of the Development Kit SimulatorThis release of the Java Card Development Kit Simulator contains Java Card simulation environment and Eclipse plug-in.The following table describes the files and directories that are installed in the root installation directory (JC_HOME_SIMULATOR).Directory/File Descriptionbin Contains all shell scripts or batch files forrunning the tools (such as the apdutool,capdump, converter, and so on), and thecref (Java Card Platform Simulator) binaryexecutable.docs Contains subdirectories each withcompilations of the Javadoc tool files for theAPDU I/O API, the Java Card 3.1 API, and theJava Card Client RMI API.eclipse-plugin The repository for the Java Card plugin forEclipse.legal Contains license files.lib Contains all Java programming language JARfiles required for running tools by using theshell scripts or batch files, provided in the bindirectory.samples Contains sample applets and applications. Known IssuesAfter installing the Java Card Eclipse plugin in Eclipse, when you start the Eclipse for the first time, the Java Card Run/Debug configurations in the Run/Debug Settingsof the project properties might not appear. To fix this issue, you must reopen the Run/ Debug Settings window.DocumentationThe Java Card Documentation web site provides online product documentation for the Java Card Platform.Document DescriptionJava Card Development Kit User Guide This document describes how to use the JavaCard Development Kit to develop applicationsfor Java Card Platform. It is available in HTMLand PDF formats.Java Card Platform Specifications.The following specification documents areavailable for the Java Card Platform:•Java Card Platform Runtime EnvironmentSpecification, Classic Edition, Version 3.1(PDF format)•Java Card Platform Virtual MachineSpecification, Classic Edition, Version 3.1(PDF format)•Java Card Platform ApplicationProgramming Interface, Classic Edition,Version 3.1 (HTML format)•Java Card Platform Specification ReleaseNotes, Version 3.1 (HTML and PDFformats)Product InformationThe Java Card Technology website provides useful information about the Java Card product.Visit the Java Card Technology website to access the most up-to-date information on the following:•Product news and reviews•Release notes and product documentation Documentation AccessibilityFor information about Oracle's commitment to accessibility, visit theOracle Accessibility Program website at /pls/topic/lookup?ctx=acc&id=docacc.Access to Oracle SupportOracle customers that have purchased support have access to electronic support through My Oracle Support. For information, visit /pls/topic/ lookup?ctx=acc&id=info or visit /pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.Java Card Development Kit Simulator Release Notes, Version 3.1.0u5F12192-06Copyright © 1998, 2021, Oracle and/or its affiliates. All rights reserved.Simulator release notes for Java Card Development Kit, Version 3.1.0u5.This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract. The terms governing the U.S. Government’s use of Oracle cloud services are defined by the applicable contract for such services. No other rights are granted to the U.S. Government.This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc, and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use ofthird-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.。

罗克伯特(Rockwell) FactoryTalk Batch 批处理控制系统说明书

罗克伯特(Rockwell) FactoryTalk Batch 批处理控制系统说明书

FactoryTalk® BatchEffi cient, Consistent and Predictable Batch Control for More Comprehensive OperationsOverviewFactoryTalk Batch allows you to apply one control and information system across your process to improve capacity and product quality, save energy and raw materials, and reduce process variations and human intervention. It enables you to develop modern batch control strategies by supporting fl exible production capabilities and standardized company procedures while accelerating product and process development.The net result – your equipment is better utilized, product quality is improved, the need for visibility and access to actionable data is better met, and your costs are reduced.With FactoryTalk Batch, you can:• Create and manage recipes and execute them automatically• Reduce the hours that are needed for validating and commissioning • Confi gure physical and procedural models• Collect electronic batch data to generate detailed reports for compliance or process improvement • Simulate your entire batch processFactoryTalk Batch Provides:A System that is ScalableFactoryTalk Batch combines industry standards with proven technology that provides the fl exibility needed for enterprise-wide architectures to single-unit applications.Intuitive Interfaces for Operations Modern interfaces and workfl ows allow operators to more easily navigate the system, while the option for mobile accessibility enables access to critical information from anywhere in the plant. Faster and More Reliable Control FactoryTalk Batch promotes responsive interactions between server and controller-based batch architectures. This allows for reliable and responsive step changes closer to the process to ensure tight control parameters can be achieved.Secured InformationFlexible yet secured access to the system allows for adoption of a batch system in applications that require manual additions, material trackingand version control of recipes.Problem Solved Historically, plants haveused manual SOPs, custom programming, or expensive, specialized control systems for batch applications. Other control applications, such as material handling, continuous processes, palletizing and utilities, were implemented with diff erent control systems and strategies.To meet the need for higher level information, interfaces were maintained between these disparate systems. While this enabled data from across the enterprise to be compiled, it required timely and costly integration into the overallIT structure.Meeting Diverse Design,Operations and Production NeedsFactoryTalk Batch supports the diverse design and production requirements needed by system integrators, skid vendors and end users and provides the technology that puts them in control.Using ANSI/ISA-88-based functionality, recipes and processes can be developed independently of process equipment. You can easily change recipe parameters, add new batches, or defi ne which equipment your batches use, without requiring engineering or automation system changes or revalidation.FactoryTalk Batch is a complete batch automation environment giving companies the ability to meetthe broadest suite of batch applications. It supports: • Integration with smart instruments and devices• Equipment allocation and arbitration to moreeff ectively manage batch assignments• Production history• Material tracking• Reporting The Flexibility and Functionality You Needto Manage and Increase ProductionConfi gure the Physical ModelIn FactoryTalk Batch, the physical model is confi gured through the Batch Equipment Editor in a logical progression, starting with the area and building down to the equipment module level. Using the graphical interface in the Batch Equipment Editor, you use templates to create and maintain information about your process equipment. Once you’ve defi ned the physical model, information in that physical model is available to all other FactoryTalk Batch components.The Equipment Editor allows you to easily:• Scale a single recipe or procedure to adjust an overall batch size• Choose the best equipment for particular batch requirements• Easily manage batches across multiple pieces of equipment or multiple lines even when many piecesof equipment are involved• Dynamically allocate and reassign equipment and processes to maintain eff ective control of productand better manage your resources• Use integrated scheduling and supervisory functionsto better utilize production equipmentFactoryTalk Batch can proactively assign equipmentto maximize asset utilization and increase production. The system also allows you to use batch capabilities within the controller, or to integrate them with the plant-wide batch management system, depending on the complexityof equipment and material.Defi ne Recipes and Procedural ModelFactoryTalk Batch procedures are added in the Batch Recipe Editor, which is used to help you build recipes that defi ne the sequences of equipment actions ina batch process. The Batch Recipe Editor provides a simple way to confi gure, organize, and store recipe information. As in the Batch Equipment Editor, recipes are built hierarchically and consist of procedures,unit procedures, operations, and phases. Recipes also include descriptive information, formula information, equipment requirements, and the procedures used to make the batch. Additionally, you can add comments into the recipe structure that can be viewed both in design and runtime.Validating and CommissioningFactoryTalk Batch provides security features designed with input from a cross segment of industry users.The security levels can be customized to meet the most demanding requirements, such as Good Manufacturing Practice (GMP) regulations.It includes confi gurable electronic signature templates that represent a signature and its associated data, such as signoff level, comments, security requirements, and date and time stamps. Up to three signatures can be required for verifi cation of runtime batch events. All signatures are stored in the event journal and are non-editable, fully supporting 21 CFR Part 11 compliance. Material Tracking and TracingFactoryTalk Batch provides real-time material management and traceability in batch execution systems, improving corporate inventory solutions and allowing more eff ective management of raw inventory.• Complements ERP-level resource management by collecting the detailed material and equipment tracking information needed for optimizing your supply chain• Tracks the use of materials, vessels, containers and permanent/transient storage• Supports recipe execution by determining which equipment must be used to meet a request Leverage Additional Features to MeetYour Specifi c Batch NeedsMaterial ManagerOptimized Production FlowsMaterial Manager provides just-in-time, plant-level material management and tracking that integrateswith company-wide inventory management applications. It complements your ERP systems by collecting the detailed tracking information needed for optimizing your supply chain and e-business fulfi llment. eProcedure®Process Management for Manual OperationsThe eProcedure software automates manual procedures using an interactive, web-based interface to sequence and document your manufacturing operations. eProcedure provides the consistencyof automated controls in manual operations.Use eProcedure to guide operators through manual activity or add links to production documents such as operating procedures, material safety data sheets, and equipment maintenance manuals. Enforce signoff s when required to meet 21CFR part 11 compliance. SequenceManager™Controller or Server Level Batch Sequencing• Allows machine builders to develop and deliverfully tested skids that end users can integrate into their batch process, with minimal validation and commissioning eff ort• FactoryTalk Batch enables batch sequencing tooccur at the controller level closer to the equipment providing faster transitions for time-sensitive procedures. End-users will discover new opportunities to adopt batching methodology for equipmentthat previously would not tolerate lags from server-initiated step-changes or network latency• Minimizes the rework that is required when manufacturers with small, controller-based batchsystems expand to larger, server-based batch systemsPublication PROCES-PP017A-EN-P – February 2017 Copyright © 2017 Rockwell Automation, Inc. All Rights Reserved. Printed in USA.Allen-Bradley, eProcedure, FactoryTalk, LISTEN. THINK. SOLVE., Rockwell Software and SequenceManager are trademarks of Rockwell Automation, Inc. Trademarks not belonging to Rockwell Automation are property of their respective companies.Batch ReportingPre-confi gured, open-source, web-based batch reports can help solve your most common needs includingelectronic batch records (EBR), track and trace genealogy and production exceptions.Additional custom reports can be quickly confi gured from the following included reports:• Batch Reports: Batch Listing, Batch Summary, Batch Detail• Material Reports: Material Usage, Forward Tracking, Backward Tracing• Analysis Reports: Batch Execution, Duration Comparison, Batch Exceptions SimulationFactoryTalk Batch provides a powerful tool that allows recipes to be tested against plant confi gurations without running them in the plant. It can also be confi gured to match your specifi c project and/or process-connected device.The simulator is fully confi gurable and includes phase modifi cation and changes to phase states during runtime. The simulator also supports cutover of one phase at a time, speeding the debugging process as startup approaches.Mobility and Enhanced HMI Features• Graphics are capable of adapting to any mobile device, such as iPhone and Android• Can be leveraged on existing FactoryTalk Batch workstations as the HMI visualization interface to the batch system• Helps create intuitive workfl ows, reduces procedural steps and increases collaboration• Access real-time information, interact with processesand secure approvals from anywhere in a plantA Valued PartnerAcross industries and processes, companies can leverage our global experience and the resources within ourPartnerNetwork to tailor solutions and services capabilities to meet their needs. Rockwell Automation understands that a profi table, safe and sustainable operation is a primary goal for companies. We off er industry and technology-specifi c expertise to address unique production challenges. Our commitment is to help reduce project risk and provide solutions specifi c to the companies we support – executed globally and supported locally.For more information about our modern batch solutions, please visit: /go/process。

Trace32 simulator For arm

Trace32  simulator For arm

Simulator for ARM and XSCALETRACE32 Online HelpTRACE32 DirectoryTRACE32 IndexTRACE32 Documents ......................................................................................................................!ICD In-Circuit Debugger ................................................................................................................!Software Simulator Target Guides ...........................................................................................!Simulator for ARM and XSCALE (1)Quick Start of the Simulator (5)Operation (7)Peripheral Simulation (7)Troubleshooting (8)FAQ8 Memory Classes9 Virtual Terminal9 Semihosting10 Coprocessors11ARM Specific SYStem Commands (12)SYStem.Mode Establish the communication with the simulator12 SYStem.CPU Select the used CPU13 SYStem.CpuAccess Run-time memory access (intrusive)13 SYStem.MemAccess Run-time memory access14 SYStem.Option Alignment Enable alignment exceptions15 SYStem.Option BigEndian Define byte order (endianess)15 SYStem.Option DisMode Define disassembler mode16 SYStem.Option MMU Debugging of multi-spaced applications16 SYStem.RESetOut CPU reset command for ARM simulator16Support (18)Available Tools18 ARM718 ARM925 ARM1028 ARM1128 Compilers28 Realtime Operation System30 Debuggers30Products (32)Product Information32 Order Information32Simulator for ARM and XSCALEVersion June, 22 2005For general informations about the In-Circuit Debugger refer to the ‘Software Simulator User’s Guide’. All general commands are described in ‘IDE Reference Guide’ and ’General Reference Guide’.Quick Start of the SimulatorStarting up the simulator is done as follows:1.Select the device prompt for the ICD Debugger and reset the system.The device prompt B:: is normally already selected in the command line. If this is not the case enter B:: to set the correct device prompt. The RESet command is only necessary if you do not start directly after booting the TRACE32 development tool.2. Specify the CPU specific settings. The default values of all other option are set in such a way that it should be possible to work without modification. Please consider that this is probably not the best configuration for your target.3. Enter debug mode.This command resets the CPU and enters debug mode. After this command is executed it is possible to access memory and registers.4. Load the program.The format of the Data.LOAD command depends on the file format generated by the compiler. Refer to Supported Compilers to find the command, that is necessary for your compiler.A detailed description of the Data.LOAD command and all available options is given in the reference guide.5. Startup ExampleA typical start sequence is shown below. This sequence can be written to an ASCII file (script file)and executed with the command DO <filename>. b::RESetSYStem.CPU <cputype >SYStem.UpData.LOAD.format <filename >load program and symbolsb::Select the ICD device prompt WinClearClear all windows SYStem.CPU <cpuname>Select CPU typeSYStem.Up Reset the target and enter debug mode Data.LOAD.format <filename>Load the applicationRegister.Set pc main Set the PC to function mainData.List *)Open source code windowRegister *)Open register windowVariable.Local *)Open window with local variables *) These commands open windows on the screen. The window position can be specified with the WINPOS command.OperationFor more information see ‘Software Simulator User’s Guide’Peripheral SimulationFor more information see ‘API for Software Simulators’TroubleshootingFAQ•No information available•Memory ClassesThe following ARM specific memory classes are available.Memory Class DescriptionP Program MemoryD Data MemorySP Supervisor Program Memory (privileged access)UP User Program Memory (non-privileged access)SR Supervisor ARM Memory (privileged access)ST Supervisor Thumb Memory (privileged access)UR User ARM Memory (non-privileged access)UT User Thumb Memory (non-privileged access)U User Memory (non-privileged access)S Supervisor Memory (privileged access)R ARM MemoryT Thumb MemoryICE ICE Breaker Register (debug register; ARM7, ARM9)C14Coprocessor 14 Register (debug register; ARM10, ARM11)C15Coprocessor 15 Register (if implemented)ETM Embedded T race Macrocell Registers (if implemented)VM Virtual Memory (memory on the debug system)USR Access to Special Memory via User Defined Access RoutinesE Run-time memory access(see SYStem.CpuAccess and SYStem.MemAccess)T o access a memory class write the class in front of the address.Example: Data.dump ICE:0--3Normally there is no need to use the following memory classes: P, D, SP, UP, SR, ST, UR, UT, U, S, R, or T.The memory class is set automatically depending on the setting of SYStem.Option DisMode.Virtual TerminalThe command TERM opens a terminal window which allows to communicate with the ARM core over the ICEbreaker Debug Communications Channel (DCC). All data received from the comms channel aredisplayed and all data inputs to this window are sent to the comms channel. Communication occurs byte wide or up to four bytes per transfer. The four bytes ASCII mode (DCC4A) does not allow to transfer the byte00. Each non-zero byte of the 32bit word is a character in this mode. The four byte binary mode (DCC4B)can be used to transfer non-ascii 32bit data (e.g. to or from a file). The three bytes mode (DCC3) allows binary transfers of up to 3 bytes per DCC transfer. The upper byte defines how many bytes are transfered (0=one byte, 1= two bytes, 2=three bytes). This is the preferred mode of operation, as it combines arbitrary length messages with high bandwidth. The TERM.METHOD command selects which mode is used (DCC, DCC3, DCC4A or DCC4B).The communication mechanism is described e.g. in the ARM7TDMI data sheet in chapter 9.11. Only three move to/from coprocessor 14 instructions are necessary.The TRACE32 demo/arm/etc/terminal directory contains the file TERM.CMM which demonstrates how the communication works.SemihostingThe command TERM.GATE opens a terminal window which allows to support ARM compatiblesemihosting. The communication can either be done by stopping the target at the SWI or by using the DCC interface channel - which provides non-stop operation of the target.The SWI emulation mode requires to stop the target at the SWI exception vector. On ARM7 this can be done only with an onchip or software breakpoint at location 8. On other ARM cores it can be done by enabling the ICEbreaker breakpoint at the SWI vector (TrOnchip.Set SWI ON). The terminal must be set to the ARMSWI method (TERM.METHOD ARMSWI). The handling of the SWI is only active when the TERM.GATE window is existing. An example can be found in demo/etc/arm/semihost/swisoft.cmm.The DCC communication mode requires an target agent for the SWI. The communication is done in the DCC3 method of the TERM command. An example and the source of the SWI agent can be found in demo/ etc/arm/semihost/swidcc.cmm.CoprocessorsIt is not possible to access coprocessors which are not included in an ARM macrocell from debug mode.This means all coprocessors which are added to ARM cores by customers can not be accessed from debug mode.The following coprocessors can be accessed if available in the processor:Coprocessor 14. Please refer to the chapter Virtual Terminal and to your ARM documentation for details.Coprocessor 15, which allows the control of basic CPU functions. This coprocessor can be accessed with the access class C15. For the detailed definition of the CP15 registers please refer to the ARM data sheet.The CP15 registers can also be controlled in the PER window.The TRACE32 address is composed of the CRn, CRm, op1, op2 fields of the corresponding coprocessor register command<MCR|MRC> p15, <op1>, Rd, CRn, CRm, <op2>BIT0-3:CRn, BIT4-7:CRm, BIT8-10:<op2>, BIT12-14:<op1>is the corresponding TRACE32 address (one nipple for each field)ARM Specific SYStem CommandsSYStem.Mode Establish the communication with the simulator Format:SYStem.Mode <mode><mode>:DownNoDebugGoUpDefault: Down. Selects the target operating mode.Down The CPU is in reset. Debug mode is not active. Default state and state after fatalerrors.NoDebug The CPU is running. Debug mode is not active. Debug port is tristate. In thismode the target should behave as if the debugger is not connected.Go The CPU is running. Debug mode is active. After this command the CPU can bestopped with the break command or if any break condition occurs.Up The CPU is not in reset but halted. Debug mode is active. In this mode the CPUcan be started and stopped. This is the most typical way to activate debugging.If the mode “Go” is selected, this mode will be entered, but the control button in the SYStem window jumps to the mode “UP”.SYStem.CPU Select the used CPU Format:SYStem.CPU <cpu><cpu>:ARM7TDMI | ARM740TD | ... (JTAG Debugger ARM7)ARM9TDMI | ARM920T | ARM940T | ... (JTAG Debugger ARM9)ARM1020E | ARM1022E | ARM1026EJ | ... (JTAG Debugger ARM10)ARM1136J | ARM1136JF | ... (JTAG Debugger ARM11)JANUS2 (JTAG Debugger Janus)Selects the processor type. If your ASIC is not listed, select the type of the integrated ARM core. SYStem.CpuAccess Run-time memory access (intrusive) Format:SYStem.CpuAccess Enable |DeniedDefault: Denied. For the ARM7 and the ARM9 onchip breakpoints can always be set while programexecution is running.SYStem.MemAccess Run-time memory access Format:SYStem.MemAccess <mode><mode>:CPUDeniedDefault: Denied.If SYStem.MemAccess is not Denied, it is possible to to read from memory, to write to memory and to set software breakpoints while the CPU is executing the program. This requires one of the following monitors.CPU A run-time memory access is made without CPU intervention whilethe program is running. This is only possible on the instruction setsimulator.Denied No memory access is possible while the CPU is executing theprogram.If specific windows, that display memory or variables should be updated while the program is running select the memory class E: or the format option %E.Data.dump E:0x100Var.View %E firstSYStem.Option Alignment Enable alignment exceptions Format:SYStem.Option AlignmentCauses the processor to go into a DAbort exeception for any unaligned access. Otherwise the data will be handled according to the ARM core specification.SYStem.Option BigEndian Define byte order (endianess) Format:SYStem.Option BigEndian [ON | OFF]Default: OFF. This option selects the byte ordering mechanism. For correct operation the following three settings must correspond:•this option•the compiler setting (-li or -bi compiler option)SYStem.Option DisMode Define disassembler mode Format:SYStem.Option DisMode <option><option>:AUTOACCESSARMTHUMBThis command specifies the selected disassembler. Default: AUTO.AUTO The information provided by the compiler output file is used for thedisassembler selection. If no information is available it has the samebehavior as the option ACCESS.ACCESS The selected disassembler depends on the T bit in the CPSR or onthe selected access class. (e.g. Data.List SR:0 for ARM mode orData.List ST:0 for THUMB mode).ARM Only the ARM disassembler is used (highest priority).THUMB Only the THUMB disassembler is used (highest priority). SYStem.Option MMU Debugging of multi-spaced applications Format:SYStem.Option MMU [ON | OFF]Default: OFF. Debugging of multi-spaced applications. Extends the address scheme of the debugger to include memory spaces (16:32 address format). The option is not required when the MMU is doing a static address translation.SYStem.RESetOut CPU reset command for ARM simulator Format:SYStem.RESetOutSpecial reset command for ARM simulator.SupportAvailable ToolsARM7C P UI C EF I R EI C D D E B U GI C D M O N I T O R I C D T R A C EP O W E R I N T E G R A T O RI N S T R U C T I O N S I M U L A T O R AD20MSP430YES YES YES AD6522YES YES YES AD6526YES YES YES AD6528YES YES YES AD6529YES YES YES AD6532YES YES YES ADUC7020YES YES YES ADUC7021YES YES YES ADUC7022YES YES YES ADUC7024YES YES YES ADUC7025YES YES YES ADUC7026YES YES YES ADUC7027YES YES YES ARM710TYES YES YES YES ARM710T -AMBA YES YES YES YES YES ARM720TYES YES YES YES ARM720T -AMBA YES YES YES YES YES ARM740TYES YES YES YES ARM740T -AMBA YES YES YES YES YES ARM7TDMIYES YES YES YES YES YES ARM7TDMI-AMBA YES YESYES YES YES YES ARM7TDMI-S YES YES YESYES AT75C220YES YES YES AT75C310YES YES YES AT75C320YES YES YES AT76C501YES YES YES AT76C502YES YES YES AT76C502A YES YES YES AT76C503YES YES YES AT76C503A YES YES YES AT76C510YESYESYESAT76C551YES YES YES AT76C901YES YES YES AT78C1501YES YES YES AT91F40416YES YES YES AT91F40816YES YES YES AT91FR40162YES YES YES AT91FR4042YES YES YES AT91FR4081YES YES YES AT91M40100YES YES YES AT91M40400YES YES YES AT91M40403YES YES YES AT91M40800YES YES YES AT91M40807YES YES YES AT91M42800A YES YES YES AT91M43300YES YES YES AT91M55800A YES YES YES AT91M63200YES YES YES AT91R40008YES YES YES AT91R40807YES YES YES AT91SC321RC YES YES YES BC6911YES YES YES BERYLLIUM YES YES YES BU7611AKU YES YES YES CBC32XXA YES YESYES CDC3207G YES YES YES YES CDC3272G YES YES YES YES CDC32XXG YES YES YES CDL-82YES YES YES CDMAX YES YES YES CEA32XXA YES YES YES CL-PS7110YES YES YES CL-PS7111YES YES YES CL-PS7500FE YES YES YES CL-SH8665YES YES YES CL-SH8668YES YES YES CLARITY YES YES YES CS22210YES YES YES CS22220YES YES YES CS22230YES YES YES CS22250YES YES YES CS22270YES YES YES C I C F I I C D I C M I C T R P I N I N S ICS89712YES YES YES CSM5000YES YES YES CSM5200YES YES YES CX81210YES YES YES CX81400YES YES YES D5205YES YES YES D5313YES YES YES D5314YES YES YES EP7209YES YES YES EP7211YES YES YES EP7212YES YES YES EP7309YES YES YES EP7311YES YES YES EP7312YES YES YES EP7339YES YES YES EP7407YES YESYES GEMINIYES YES YES YES GMS30C7201YES YES YES GP4020YES YES YES HELIUM 100YES YES YES HELIUM 200YES YES YES HELIUM 210YES YES YES HMS30C7202YES YES YES HMS31C2816YES YES YES HMS39C70512YES YES YES HMS39C7092YES YES YES IXP220YES YES YES IXP225YES YES YES KS17C40025YES YES YES KS17F80013YES YES YES KS32C61100YES YES YES KS32P6632YES YES YES L64324YES YES YES L7200YES YES YES L7205YES YES YES L7210YES YES YES LH75400YES YES YES LH75401YES YES YES LH75410YES YES YES LH75411YES YES YES LH77790YES YES YES C I C F I I C D I C M I C T R P I N I N S ILH79520YES YES YES LITHIUMYES YES YES LOGIC CBP3.0YES YES YES LOGIC CBP4.0YES YES YES LOGIC L64324YES YES YES LPC2104YES YES YES YES LPC2105YES YES YES YES LPC2106YES YES YES YES LPC2109YES YES YES YES LPC2112YES YES YES YES LPC2114YES YES YES YES LPC2119YES YES YES YES LPC2124YES YES YES YES LPC2129YES YES YES YES LPC2131YES YES YES YES LPC2132YES YES YES YES LPC2138YES YES YES YES LPC2194YES YES YES YES LPC2210YES YES YES YES LPC2212YES YES YES YES LPC2214YES YES YES YES LPC2290YES YES YES YES LPC2292YES YES YES YES LPC2294YES YES YES YES M4641YES YES YES MAC7101YES YES YES YES MAC7111YES YES YES YES MAC7116YES YES YES MAC7121YES YES YES YES MAC7131YES YES YES YES MAC7141YES YES YESYES MKY -82A YES YES YES MKY -85YES YES YES ML670100YES YES YES ML671000YES YES YES ML674000YES YES YES ML674001YES YES YES ML674080YES YES YES ML675001YES YES YES ML675200YES YES YES ML675300YES YESYES C I C F I I C D I C M I C T R P I N I N S IML67Q2300YES YES YES ML67Q2301YES YES YES ML67Q4002YES YES YES ML67Q4003YES YES YES ML67Q4100YES YES YES ML67Q5002YES YES YES ML67Q5003YES YES YES ML67Q5200YES YES YES ML67Q5300YES YES YES ML70511LA YES YES YES ML7051LA YES YES YES MN1A7T0200YES YES YES MODEM YES YES YES MSM3000YES YES YES MSM3100YES YES YES MSM3300YES YES YES MSM5000YES YES YES MSM5100YES YES YES MSM5105YES YES YES MSM5200YES YES YES MSM5500YES YES YES MSM6000YES YES YES MSM6050YES YES YES MSM6200YES YES YES MSM6600YES YES YES MSP1000YES YES YES MT1020A YES YES YES MT92101YES YES YES MTC-20276YES YES YES MTC-20277YES YES YES MTC-30585YES YES YES MTK-20141YES YES YES MTK-20280YES YES YES MTK-20285YES YES YES NET+15YES YES YES NET+20YES YES YES NET+40YES YES YES NET+50YES YES YES NITROGEN YES YES YES NS7520YES YES YES OMAP710YES YES YES C I C F I I C D I C M I C T R P I N I N S IOMAP730YES YES YES OMAP732YES YES YES PBM 990 90YES YES YES PCC-ISES YES YES YES PCD80703YES YES YES YES PCD80705YES YES YES YES PCD80708YES YES YES YES PCD80715YES YES YES YES PCD80716YES YES YES YES PCD80718YES YES YES YES PCD80720YES YES YES YES PCD80721YES YES YES YES PCD80725YES YES YES YES PCD80727YES YES YES YES PCD80728YES YES YESYES PCF26002YES YES YES PCF26003YES YES YES PCF87750YES YES YES PCI2010YES YES YES PCI3610YES YES YES PCI3620YES YES YES PCI3700YES YES YES PCI3800YES YES YES PCI5110YES YES YES PCI9501YES YES YES PH21101YES YES YES PMB7754YES YES YES PS7500FE YES YES YES PUC3030A YES YES YES PUC303XA YES YES YES S3C3400A YES YES YES S3C3400X YES YES YES S3C3410X YES YES YES S3C44A0A YES YES YES S3C44B0X YES YES YES S3C4510B YES YES YES S3C4520A YES YES YES S3C4530A YES YES YES S3C4610D YES YES YES S3C4620D YES YES YES S3C4640XYES YESYES C I C F I I C D I C M I C T R P I N I N S IS3C4650D YES YES YES S3C46C0YES YES YES S3C46M0X YES YES YES S3C4909A YES YES YES S3C49F9X YES YES YES S3F441FX YES YES YES S3F460H YES YES YES S5N8946YES YES YES S5N8947YES YESYES SE470R1VB8AD YES YES YES YES SIRFSTARII YES YES YES SJA2020YES YES SOCLITE+YES YES YES ST30F7XXA YES YES YES YES ST30F7XXC YES YES YES YES ST30F7XXZ YES YES YESYES STR710YES YES YES STR711YES YES YES STR712YES YES YES STR720YES YES YESYES STW2400YES YES YES TA7S05YES YES YES TA7S12YES YES YES TA7S20YES YES YES TA7S32YES YES YES TMS320VC5470YES YES YESYES TMS320VC5471YES YES YES TMS470R1VC336A YES YES YES TMS470R1VC338YES YES YES TMS470R1VC346A YES YES YES TMS470R1VC348YES YES YES TMS470R1VC688YES YES YES TMS470R1VF288YES YES YES TMS470R1VF336YES YES YES TMS470R1VF336A YES YES YES TMS470R1VF338YES YES YES TMS470R1VF346A YES YES YES TMS470R1VF348YES YES YES TMS470R1VF356A YES YES YES TMS470R1VF37A YES YES YES TMS470R1VF448YESYESYES C I C F I I C D I C M I C T R P I N I N S IARM9TMS470R1VF45A YES YES YES TMS470R1VF45AA YES YES YES TMS470R1VF45B YES YES YES TMS470R1VF45BA YES YES YES TMS470R1VF478YES YES YES TMS470R1VF48B YES YES YES TMS470R1VF48C YES YES YES TMS470R1VF55B YES YES YES TMS470R1VF55BA YES YES YES TMS470R1VF67A YES YES YES TMS470R1VF688YES YES YES TMS470R1VF689YES YES YES TMS470R1VF76B YES YES YES TMS470R1VF7AC YES YES YES UPD65977YES YES YES UPLAT CORE YES YES YES VCS94250YES YES YES VMS747YES YES YES VWS22100YES YES YES VWS22110YES YES YES VWS23112YES YES YES VWS23201YES YES YES VWS23202YES YES YES VWS26001YES YES YESC P UI C EF I R EI C D D E B U GI C D M O N I T O RI C D T R A C EP O W E R I N T E G R A T O RI N S T R U C T I O N S I M U L A T O R 88E6208YES YES YES 88E6218YES YES YES AAEC-2000YES YESYES ARM7EJ-S YES YES YES YES ARM910T YES YES YES YES ARM920TYES YES YESYES C I C F I I C D I C M I C T R P I N I N S IARM922T YES YES YES YES ARM926EJ-S YES YES YES YES ARM940T YES YES YES YES ARM946E-S YES YES YES YES ARM966E-S YES YES YES YES ARM968E-S YES YES YES YES ARM9E-S YES YES YES YES ARM9TDMI YES YES YES YES AT91RM9200YES YES YESYES CN9414YES YES YES CX22490YES YES YES CX22491YES YES YES CX22492YES YES YES CX22496YES YES YES CX82100YES YES YES DIGICOLOR-OA980YES YES YES DRAGONBALL MX1YES YES YESYES EP9301YES YES YES EP9312YES YES YES EPXA1YES YES YES YES EPXA10YES YES YES YES EPXA4YES YES YESYES FA526YES YES YES HELIUM 500YES YES YES INFOSTREAM YES YES YES LH7A400YES YES YES LH7A404YES YES YES LH7A405YES YES YES LPC3000YES YES YES YES MC9328MX1YES YES YESYES MC9328MX21YES YES YES ML67Q2003YES YES YES MSM6100 3G YES YES YES YES MSM6250YES YES YES YES MSM6300YES YES YES YES MSM6500YES YES YES YES MSM7XXX YES YES YES NEXPERIA YES YES YES YES NOMADIC YES YES YESYES NS9360YES YES YES NS9750YES YESYES C I C F I I C D I C M I C T R P I N I N S IARM10NS9775YES YES YES OMAP1510YES YES YES YES OMAP1610YES YES YES YES OMAP1611YES YES YES YES OMAP1612YES YES YES YES OMAP1710YES YES YES YES OMAP310YES YES YES YES OMAP5910YES YES YES YES OMAP5912YES YES YESYES OMAP710YES YES YES OMAP730YES YES YES OMAP732YES YESYES PMB8870 S-GOLD YES YES YES PMB8875 S-GOLDL YES YES YES PMB8876 S-GOLD2YES YES YES S3C2400X YES YES YES S3C2410YES YES YES S3C2410X YES YES YES S3C2440YES YES YES S3C2500A YES YES YES S3C2510YES YES YES S3C2800X YES YES YES SCORPIOYES YES YES T6TC1XB-0001YES YES YES T8300YES YES YES T8302YESYESYESC P UI C EF I R EI C D D E B U GI C D M O N I T O RI C D T R A C EP O W E R I N T E G R A T O RI N S T R U C T I O N S I M U L A T O R ARM1020E YES YES YES YES ARM1022E YES YES YES YES ARM1026EJ-S YES YES YESYES C I C F I I C D I C M I C T R P I N I N S IARM11CompilersC P UI C EF I R EI C D D E B U GI C D M O N I T O R I C DT R A C EP O W E R I N T E G R A T O RI N S T R U C T I O N S I M U L A T O R ARM1136J-S YES YES YES YES ARM1136JF-S YES YES YES YES MSM7XXX YES YES YES OMAP2410YES YES YES YES OMAP2420YESYESYESYES Language CompilerCompanyOptionCommentC ARMCC ARM AIFC ARMCC ARM ELF/DWARF C GCCARM FSF COFF/STABS C GCCARMFSFELF/DWARF2C GREENHILLS C Greenhills ELF/DWARF2C ICCARM IARELF/DWARF2C ICCV7-ARM Imagecraft ELF/DWARF ARM7C HIGH-C MetawareELF/DWARF C TI-C Texas Instruments COFF C GNU-CWind River Systems COFFC++ARM SDT 2.50ARM ELF/DWARF2C++GCCARM FSF COFF/STABS C++GNUFSF EXE/STABS C++GCCARMFSFELF/DWARF2C++GREENHILLS C++Greenhills ELF/DWARF2C++HIGH-C++Metaware ELF/DWARF C++MSVCMicrosoftEXE/CV5WindowsCERealtime Operation SystemDebuggersNameCompanyComment OSEK -via ORTI ProOSEK 3Softvia ORTI AMXKADAK Products ChorusOS Sun Microsystems micro & basic ECOS eCosCentric Limited 1.3.1 and 2.0embOS Segger Linux -Kernel Version 2.4 and 2.6LinuxMontaVistaVersion 3.0 and 3.1Nucleus PLUS Accelerated Tech.OSE Basic Enea OSE Systems (OSARM)OSE Epsilon Enea OSE Systems (OSARM), 3.x Precise/MQX Precise Software 2.40 and 2.50pSOS+Integrated Systems 2.1 to 2.5, 3.0QNXQNX Software Systems 6.0 to 6.3RealTime Craft GSI tecsi(XECARM)RTXC 3.2Quadros Systems Inc.RTXC Quadros Quadros Systems Inc.Sciopta ScioptaSMXMicro Digital Symbian OS Symbian6.x,7.0s,8.0a+b, 8.1a+b,9.1ThreadX Express Logic uC/OS-II Micrium Inc. 2.0 to 2.7uCLinux freewareKernel Version 2.4VxWorks Wind River Systems 5.x, with TRACE32 BSP Windows CEMicrosoft4.0 to 4.2CPU DebuggerCompany Host ALL EASYCASE BKR GmbHWindows ALL X-TOOLS / X32blue river software Windows ALL RHAPSODY IN MICROC I-Logix Windows ALL A TTOL TOOLS MicroMax Windows ALL VISUAL BASIC INTERFACE MicrosoftWindows ALL CODEWRIGHT Premia Corporation Windows ALLDA-CRistanCASEWindows31 ProductsProduct InformationOrder InformationOrderNo CodeText LA-8809SIM-ARM Instruction Set Simulator for ARM Instruction Set Simulatorfor ARM, XSCALE, Janus2Order No.Code Text LA-8809SIM-ARM Instruction Set Simulator for ARM。

英语作文-集成电路设计行业的智能芯片与系统解决方案

英语作文-集成电路设计行业的智能芯片与系统解决方案

英语作文-集成电路设计行业的智能芯片与系统解决方案The design and development of intelligent chips and system solutions in the integrated circuit design industry have revolutionized the way we interact with technology. These advancements have not only enhanced the performance and efficiency of electronic devices but have also opened up new possibilities for innovation in various fields.One of the key aspects of intelligent chip design is the integration of artificial intelligence (AI) algorithms. By incorporating AI into the chip architecture, designers are able to create systems that can learn and adapt to different situations, making them more efficient and versatile. This has led to the development of smart devices that can recognize speech, images, and patterns, enabling them to provide personalized experiences for users.Moreover, intelligent chips have also played a crucial role in the development of autonomous systems. By combining sensors, processors, and communication modules, designers have been able to create self-driving cars, drones, and robots that can navigate and interact with their environment without human intervention. These advancements have not only improved efficiency and safety but have also opened up new opportunities for automation in various industries.In addition to AI integration, intelligent chip design also focuses on energy efficiency and miniaturization. By optimizing the power consumption of chips and reducing their size, designers are able to create devices that are not only more environmentally friendly but also more portable and convenient for users. This has led to the development of wearable devices, smart home appliances, and IoT devices that can seamlessly integrate into our daily lives.Furthermore, intelligent chip design has also enabled the development of advanced security features. By incorporating encryption, authentication, and secure bootmechanisms into the chip architecture, designers are able to create systems that can protect sensitive data and prevent unauthorized access. This has become increasingly important in today's interconnected world, where cyber threats are becoming more sophisticated and prevalent.Overall, the integration of intelligent chips and system solutions in the integrated circuit design industry has transformed the way we interact with technology. From AI-powered devices to autonomous systems and energy-efficient gadgets, these advancements have not only improved the performance and efficiency of electronic devices but have also opened up new possibilities for innovation in various fields. As technology continues to evolve, intelligent chip design will play a crucial role in shaping the future of electronics and revolutionizing the way we live and work.。

集成电路工艺模拟软件SSUPREM4的校验

集成电路工艺模拟软件SSUPREM4的校验

集成电路工艺模拟软件SSU PREM4的校验Ξ朱兆 阮 刚 庞海舟 冒慧敏 (复旦大学电子工程系,上海200433) (上海先进半导体制造有限公司,上海200233) 【提要】 本文对知名的集成电路工艺模拟软件SSU PREM4进行了较仔细的校验,用SSU PREM4模拟了氧化、扩散工艺,并同实验值进行了比较,模拟值和实验值的偏差在10%以内,与集成电路器件模拟软件S2PISCES联用校验了SSU PREM4的全工序模拟结果,校验结果有较大参考价值.关键词:集成电路,工艺模拟,软件校验Calibration of IC Proce ss Simulator SSUPREM4Zhu Zhaomin,Ruan G ang(Dept.of Elect ronic Engi neeri ng,Fudan U niv.,S hanghai200433)Pang Haizhou,Mao Huimin(A S M C,S hanghai200233)Abstract: Careful calibration of IC process simulator SSU PREM4is made in thispaper.We simulate oxidation and dif2 fusion by SSU PREM4.The simulated values are com pared with the measured values,and the relative deviation between them is within10%.We also calibrate the simulated results of full process by SSU PREM4together with device simulator S2 PISCES.The calibrated results have some values for references.K ey words: Integrated circuit,Process simulation,S oftware calibration一、引 言 A THENA是美国SILVACO公司推出的集成电路工艺模拟的商用软件包.A THENA包括SSU PREM4、EL ITE、OP2 TOL ITH和FLASH四个软件.SSU PREM4是Stanford大学开发的二维工艺模拟软件SU PREM4的商用改进版,它是A2 THENA中最重要的组成部分.SSU PREM4是世界上公认的最先进的集成电路工艺模拟软件之一.本文通过对SSU PREM4的校验,来估算它的实际模拟精度,校验结果对有关软件的开发者、使用者及有兴趣的科技人员有较大的参考价值.二、工艺模拟软件SSUPREM4的校验 SSU PREM4的模拟运算在SUN2UL TRA1型号的工作站上进行,校验用实验样品和测试数据由上海先进半导体制造公司(ASMC)提供,氧化层厚度是用椭偏仪测得,椭偏仪型号为Nanometrix公司的M5000型,测量范围是100A~10000A,测试精度为10A;方块电阻是用四探针法测得,所用仪器型号为美国Prometrix公司生产的VP-10型,测量范围是5Ω/ ~5MΩ/ ,测试精度为011%;结深用扩展电阻法测得,所用仪器型号为SSM公司生产的130型,测试精度为50A.在工艺模拟校验中,我们主要校验集成电路芯片制造中最重要和最普遍使用的两种工艺;氧化工艺和扩散工艺.11氧化工艺的校验(1)三种氧化模型的比较SSU PREM4在模拟氧化时可选用三种数值氧化模型,这三种模型都基于Deal和Grove的线性2抛物线理论[1].它们是垂直模型、压缩模型、粘滞性模型[1~4].在垂直模型中,氧化层只是严格在垂直方向上生长,它不能模拟鸟嘴现象等二维氧化效应,压缩模型能模拟二维氧化效应,粘滞性模型还能计算应力.我们用三种模型模拟了表2的样品号为OXHCL1的工艺,这种工艺无鸟嘴现象,模拟结果见表1.表1 三种氧化模型的比较模型名称模拟所需时间氧化层厚度(埃)模拟值测量值相对偏差垂直模型9分46秒6231662616-0149%压缩模型10分11秒6231662616-0149%粘滞性模型12分22秒62817626160134% 由上表我们可以看出,在模拟该工艺时,三种模型的模拟结果差别并不大,而模型越复杂,模拟时间就越长.在模拟无鸟嘴现象的氧化时,用垂直模型和压缩模型模拟的结果完全相同,这是我们所预期的:考虑了应力效应的粘滞性模型的模拟,其结果比其它二种模型的模拟值约大1%,这说明应力效 第8期1999年8月电 子 学 报ACTA EL ECTRONICA SINICAVol.27 No.8Aug. 1999Ξ1998年2月收到,1998年6月修改定稿应有着增强氧化的作用,但对该样品的修正值很小,一般可忽略.(2)干氧氧化工艺的校验计算干氧氧化SSU PREM4所用的模型公式为dX 0/dt =B/(A +2X 0)其中,A=2D eff (1K+1h)B =2D effC 3N 1(1)X 0是氧化层厚度,D eff 是有效氧化系数,K 是享利常数,h 是输运系数,C 3是在二氧化硅中氧的平衡浓度,N 1是氧化层内单位体积的氧分子数.校验结果见图1,模拟值和测量值的偏差小于413%.图1 不同温度时干氧氧化的氧化层厚度的模拟值和测量值比较.所用衬底为N 型,〈111〉晶向,衬底浓度为1×1015cm -3,氧流量为8升/分,氧化时间为60分钟(3)含HC1氧化工艺的校验计算含HC1氧化SSU PREM4所用的模型公式同(1),A 中的参数或用干氧或用湿氧,而B 可用B/A =L 0L P L HC1L baf 来表示,其中L 0是本征线性氧化速率,有速率的量纲.L P 、L HC1、L baf 分别是同压强、HC1浓度、掺杂有关的系数,是无量纲的量.校验结果见表2,所有样品的衬底浓度为1×1015cm -3.表2 含HCl 氧化工艺的校验结果样品号氧化条件氧化层厚度(埃)模拟值测量值相对偏差OXCHL1干氧,氧流量为14升/分,温度1000℃,时间40分,HCl 流量0142sccm ,N 型衬底,〈111〉6231662616-0149%OXCHL2湿氧,温度1000℃,时间26分,HCl 流量0112sccm ,N 型衬底,〈100〉211510196510716%OXCHL3湿氧,温度1000℃,时间163分,HCl 流量0124sccm ,P 型衬底,〈100〉744710706510514%从上述各种氧化条件下的氧化厚度的模拟值和测量值的比较可以看出,其相对偏差的绝对值最大为716%,最小为0134%,10个实例的平均偏差的绝对值为3189%.21扩散工艺的校验(1)SSUPRE M4中所使用的扩散模型在SSU PREM4软件里,模拟杂质扩散总共有三种点缺陷模型,即FERMI 模型、TWO.DIM 模型和FULL.CPL 模型.FERMI 模型把点缺陷密度近似为仅是费米能级的函数,TWO.DIM 模型把点缺陷扩散方程定为具有时间依赖性的瞬态方程.FULL.CPL 模型考虑了点缺陷和杂质之间的全耦合.在SSU PREM4软件中模拟扩散时,缺省模型是FEIMI 模型,适用于低掺杂、没有或几乎没有氧化的条件.TWO.DIM 模型适用于氧化增强扩散.在高浓度扩散时,与杂质耦合的点缺陷的流量可以同未与杂质耦合的点缺陷的流量相比拟,此时用FULL.CPL 模型模拟出的结果和FERMI 模型有很大不同.这里的高浓度一般是指大于1020cm -3的掺杂浓度.FULL.CPL 模型还考虑了其它模型未考虑到的掺杂剂间的互作用效应.因此模型在模拟基于存在高点缺陷浓度情况下的扩散有较高的精确性[5~7].我们分别用这三种模型模拟了表5中样品号为DP1的磷扩散工艺,比较结果见表3和图2.表3 三种扩散模型的比较模型名称模拟时间方块电阻(Ω/ )模拟值测量值相对偏差FERMI 模型8分28秒1061411715-914%TWO 1DIM 模型70分49秒1051511715-10%FULL 1CPL 模型170分1051211715-10%图2 使用不同模型情况下模拟出来的磷的浓度分布比较图 由表3和图2我们可以看出,在掺杂浓度比较低(<1×1017/cm 3)的情况下,用三种扩散模型模拟出的浓度分布区别不大,但它们的运算时间却相差很大,所以在模拟低浓度扩散时,为了节省模拟时间,可采用FERMI 模型.(2)硼扩散的校验计算硼扩散SSU PREM4所用的模型公式为5C T 5t = [D V C A C V C 3V log (C A C V p C 3V n i )+D I C A C I C 3I log (C A C I pC 3I n i)]D V =(1-F I )(D x +D+pn i)D I =F I (D x+D 3p n i)F I =D I D I +D V(2)其中C T 是化学浓度,C A 是激活浓度,D V 、D I 分别是空位和填隙子的本征扩散系数,C I 和C V 分别是填隙子和空位的浓度,C 3I 和C 3V 分别是填隙子和空位的平衡浓度,F I 是相对的填隙子因子,D X 和D +分别是中性和带一个正电荷的空位和填隙子的扩散系数,p 是空穴浓度,n i 是本征电子浓度.我们对样品进行三道工艺,先在硅片上氧化,再硼离子注72第 8 期朱兆 :集成电路工艺模拟软件SSU PREM4的校验入,然后再进行氧化推进扩散.所选用的衬底类型为N型,晶向为〈111〉,在模拟硼扩散时,我们用TWO.DIM模型进行模拟,模拟结果见表4.表4 硼扩散工艺的校验结果样品号工艺条件方块电阻(Ω/□)模拟值测量值相对偏差DB1前氧化温度1100℃,时间77分,干氧,氧流量8升/分,注入剂量1×1014/cm2,能量65keV,后推进温度1200℃,时间616分141213410611%DB2前氧化温度900℃,时间95分,干氧,氧流量8升/分,注入剂量3185×1014/cm2,能量40keV,后氧化温度1135℃,时间40分,干氧,氧流量8升/分1741616514516%DB3前氧化温度1100℃,时间77分,干氧,氧流量8升/分,注入剂量312×1014/cm2,能量65keV,后氧化温度1135℃,时间23分,湿氧1911520416-614% (3)磷扩散的校验计算磷扩散SSU PREM4所用的模型公式为5C T5t= [D V C A C VC3Vlog(C A C V nC3V n i)+D I C AC IC3Ilog(C A C I nC3I n i)]D I=D x I+D-I nn i+D=In2n2iD V=D x v+nn iD-v+n2n2iD=v(3)其中,C T是化学浓度,D V是空位的本征扩散系数,DI是填隙子的本征扩散系数,C A是激活浓度,C I和C V分别是填隙子和空位的浓度,C3I和C3V分别对应填隙子和空位的平衡浓度,D X、D-和D=分别是中性、带一负电荷和带两个负电荷的填隙子或空位的扩散系数,n是电子浓度,n i是本征电子浓度.我们对样品进行两道工艺,先在硅片上磷离子注入,再进行氧化推进扩散,衬底为P型,晶向为〈111〉,模拟结果见表5.表5 磷扩散工艺的校验结果样品号工艺条件方块电阻(Ω/ )模拟值测量值相对偏差DP1注入剂量7×1013/cm2,能量80keV,氧化温度1250℃,时间140分,干氧,氧流量8升/分1061411715-914%DP2注入剂量215×1013/cm2,能量80keV,氧化温度1150℃,时间53分,湿氧4571244513217%DP3注入剂量2×1013/cm2,能量80keV,氧化温度1150℃,时间53分,湿氧5231352610-0152% 从以上扩散工艺的校验中可以看出,不管是硼、磷的哪一种杂质,不管是二步或三步工艺,模拟值和测量值的相对偏差的绝对值在10%以内,其中最大值为914%,最小值为0152%,8个实例的平均偏差的绝对值为6134%.31与器件模拟软件S2PISCES联用校验SSUPRE M4的全工序模拟S2PISCES是一套可用来进行一维、二维和三维模拟的器件模拟软件,它和SSU PREM4的联用可用来较验SSU P2 REM4全工序模拟值的精度.首先,我们用SSU PREM4模拟了一个N沟MOSFET二维结构,做栅氧化层使用的工艺条件为氧化温度1000℃,时间40分,干氧,氧流量为14升/分,HCl流量为0142sccm;做P阱离子注入的能量为150K ev,剂量为413×1012cm-2.测量出的栅氧化层厚度为500埃,模拟值为49012埃,相对偏差-210%.图3给出了nMOSFET在V DS=011V时的转移特性曲线的测量值和模拟值的比较.图3 N沟MOSFET的转移特性的测量曲线和模拟曲线比较从上面我们可以看出,模拟曲线和测量曲线符合得很好.我们采用阈值电压的一种定义方式:漏极电压上加011V的电压,当漏极电流为1μA时,此时的栅电压就设为阈值电压.我们用此定义从以上曲线中求出测量的阈值电压为0159V,而模拟的阈值电压为016V,相对偏差为117%.三、结 论 我们所用的样品的工艺条件和测量数据都取自有较大规模生产的实际工艺生产线上给出的典型值,而且用于测量氧化层厚度、方块电阻和结深的测试仪器都具有较大的精度,这些都保证了测量值的可靠性和精确性.从上列氧化工艺、扩散工艺以及对N沟MOSFET全工序的模拟值和测量值的比较,其相对偏差在±10%范围内,全工序模拟的相对偏差虽然包括了器件模拟软件S2PISCES的模拟值和测量值的相对偏差,但其最终相对偏差的绝对值也在10%以内,这也表明虽然我们没有对离子注入、刻蚀等工艺单独进行校验,但可以预期它们的模拟值和实测值的相对偏差也不会超越±10%的范围.从上列校验结果来看,在容许模拟值和测量值有10%的偏差的范围内,SSU PREM4是一种可以用作集成电路工艺计算机辅助设计和模拟的有力的工具,和器件模拟软件S2 PISCES的联用可用于MOS场效应管特性的快速设计和分析.在模拟的过程中我们也发现,SSU PREM4中所用氧化和扩散等主要工艺的模型有多种可供选择,这样在模拟过程中82 电 子 学 报1999年就可以灵活应用.比如在模拟简单的扩散工艺时,可以用简单的FERMI模型,这样可以大大缩短模拟时间,而模拟高浓度扩散时,就要用较复杂的FULL.CPL模型,这样模拟得出的结果才会更精确,不过运算时间将会大大地增加.参 考 文 献1 B.E.Deal and A.S.Grove.G eneral relationship for the thermal oxi2 dation of silicon.J.Appl.Phys.,1965,36:37702 C.P.Ho and J.D.Plummer.J.Electrochem.Soc.,1979,126:1576 3 N.Gullemot.A new analytical model of the bird’s beak.IEEE Trans.on ED.,1987,34:1033~10384 D.Chin.Two Dimensional Oxidation.Modeling and Applications.Ph.D Thesis,Department of Electrical Engineering,Stanford Uni2 versity,19835 H.Eyring.Viscocity,Plasticity,and Diffusion as Examples of Abso2 lute Reaction Rate.J.Chem.Phys.,1936,4:2836 S.P.Murarka.Silicides for VLSI Applications.Academic Press,Or2 lando,Folorida32887,1983:607 Conor S.Rafferty.Stess Effects in Silicon Oxidation2Simulation and Experiments.Integrated Circuits Laboratory,Department of Electri2 calEngineering,Stanford University,Stanford,CA94305,1989:123~125朱兆 1973年生,1995年毕业于南开大学电子科学系,获理学学士学位,同年转入复旦大学微电子所攻读硕士学位.现在从事工艺、器件模拟软件及其埋沟MOS器件等方面的研究.阮 刚 教授,博士生导师,上海电子学会副理事长.1960年研制成功我国第一批锗固体电路.1983年10月至1985年1月为美国伊里诺大学访问学者,1992年8月至1993年8月为新加坡南洋理工大学访问教授,1996年7月起为德国开姆尼茨技术大学访问教授.已在国内外发表学术论文180余篇,译著3本.目前从事VL2SI/ULSI工艺、器件及电路的模型和模拟研究、半导体新器件的物理研究.92第 8 期朱兆 :集成电路工艺模拟软件SSU PREM4的校验。

modelsim教程

modelsim教程

Secondary
– Units in the same library may use a common name – VHDL • Architectures • Package bodies – No Verilog secondary units

VHDL Predefined Libraries
Where
– – – – _primary.dat - encoded form of Verilog module or VHDL entity _primary.vhd - VHDL entity representation of Verilog ports <arch_name>.dat - encoded form of VHDL architecture verilog.asm and <arch_name>.asm - executable code files

ModelSim Design Units
Primary
– Must have a unique name in a given library – VHDL • Entities • Package Declarations • Configurations – Verilog • Modules • User Defined Primitives
Model Technology’s ModelSim
Main Window: Source Window:
Structure Window Wave & List Windows:
Process Window:
Signals & Variables Windows: Dataflow Window:

纹理物体缺陷的视觉检测算法研究--优秀毕业论文

纹理物体缺陷的视觉检测算法研究--优秀毕业论文

摘 要
在竞争激烈的工业自动化生产过程中,机器视觉对产品质量的把关起着举足 轻重的作用,机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检 测技术相比,自动化的视觉检测系统更加经济、快捷、高效与 安全。纹理物体在 工业生产中广泛存在,像用于半导体装配和封装底板和发光二极管,现代 化电子 系统中的印制电路板,以及纺织行业中的布匹和织物等都可认为是含有纹理特征 的物体。本论文主要致力于纹理物体的缺陷检测技术研究,为纹理物体的自动化 检测提供高效而可靠的检测算法。 纹理是描述图像内容的重要特征,纹理分析也已经被成功的应用与纹理分割 和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检 测算法。这种算法能容忍物体变形引起的图像配准误差,对纹理的影响也具有鲁 棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义,如缺陷区域 的大小、形状、亮度对比度及空间分布等。同时,在参考图像可行的情况下,本 算法可用于同质纹理物体和非同质纹理物体的检测,对非纹理物体 的检测也可取 得不错的效果。 在整个检测过程中,我们采用了可调控金字塔的纹理分析和重构技术。与传 统的小波纹理分析技术不同,我们在小波域中加入处理物体变形和纹理影响的容 忍度控制算法,来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字 塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段,我们检测了一系列 具有实际应用价值的图像。实验结果表明 本文提出的纹理物体缺陷检测算法具有 高效性和易于实现性。 关键字: 缺陷检测;纹理;物体变形;可调控金字塔;重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction
II

英语作文-探索集成电路设计中的新技术与应用前景

英语作文-探索集成电路设计中的新技术与应用前景

英语作文-探索集成电路设计中的新技术与应用前景As integrated circuit (IC) design continues to evolve, new technologies are constantly emerging, offering exciting possibilities for innovation and advancement. In this essay, we will explore some of the latest trends and applications in IC design, highlighting their potential impact on various industries and the future landscape of technology.One of the most significant advancements in IC design is the development of 3D integration technology. Unlike traditional 2D designs, which place all components on a single plane, 3D integration allows for stacking multiple layers of integrated circuits, thereby increasing functionality and performance while reducing footprint. This technology enables the creation of smaller, more power-efficient devices, making it ideal for applications in mobile devices, wearables, and IoT devices.Another area of innovation in IC design is the use of advanced materials such as graphene and carbon nanotubes. These materials offer unique electrical and mechanical properties that can greatly enhance the performance of integrated circuits. For example, graphene-based transistors have demonstrated higher electron mobility and faster switching speeds compared to traditional silicon transistors, paving the way for next-generation computing devices with unprecedented speed and efficiency.In addition to new materials, machine learning and artificial intelligence (AI) are playing an increasingly important role in IC design. By leveraging AI algorithms, designers can automate the process of optimizing chip architectures, reducing time-to-market and improving overall performance. AI-driven design tools can analyze vast amounts of data to identify the most efficient circuit layouts and power management strategies, leading to more reliable and cost-effective ICs.Moreover, the integration of photonics into IC design is opening up new possibilities for high-speed data communication and processing. Photonic integrated circuits (PICs)use light instead of electricity to transmit and manipulate data, offering significant advantages in terms of bandwidth and latency. PICs are already being used in data centers and telecommunications networks to improve the performance and scalability of optical communication systems.Furthermore, the emergence of quantum computing represents a paradigm shift in IC design, with the potential to solve complex problems that are currently intractable for classical computers. Quantum ICs, which exploit the principles of quantum mechanics to perform calculations, have the potential to revolutionize fields such as cryptography, materials science, and drug discovery. While quantum computing is still in its infancy, ongoing research and development efforts are rapidly advancing the state-of-the-art, bringing us closer to realizing the full potential of this transformative technology.In conclusion, the field of IC design is experiencing rapid innovation driven by advancements in materials science, machine learning, photonics, and quantum computing. These technologies hold the promise of delivering faster, more efficient, and more powerful integrated circuits, with profound implications for a wide range of industries and applications. As we continue to push the boundaries of what is possible, the future of IC design looks brighter than ever before.。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

1A Simulator for SMT Architectures: Evaluating Instruction Cache TopologiesRonaldo Gonçalves 1*, Eduard Ayguadé2, Mateo Valero 2, Philippe Navaux 31Departamento de Informática, Universidade Estadual de MaringáAvenida Colombo 5790, Maringá, Brazil{ronaldo@din.uem.br}2Departament d’Arquitectura de Computadors †, Universitat Politècnica de CatalunyaJordi Girona 1-3, Barcelona, Spain {eduard, mateo@ac.upc.es}3Instituto de Informática, Universidade Federal do Rio Grande do SulAvenida Bento Gonçalves 9500, Porto Alegre, Brazil{navaux@inf.ufrgs.br}*PhD student at the II/UFRGS, supported by CAPES.†Supported by the Spanish Ministry of Education (TIC98-511).AbstractSMT (Simultaneous MultiThreaded) is becoming one of the major trends in the design of future generations of microarchitectures. Its ability to exploit both intra- and inter-thread parallelism makes it possible to exploit the potential ILP (Instruction-level parallelism) that will be offered by future processor designs. SMT architectures can hide high latencies of instructions taking better advantage of the hardware resources through the simultaneous execution of a lot of diversified instructions from different threads. In order to provide detailed and accurate information about the performance of this approach, a SMT simulator has been developed on top of the SimpleScalar Tool Set.The SMT simulator allows the configuration of a large set of architectural parameters (cache and reservation station topologies, number of slots and branch prediction accuracy) in addition to the parameters originally inherited from the basic simulator (size of the cache memories, tables and queues, instruction scheduling policy and pipeline width). The SMT simulator has been exhaustively tested with workloads composed of some SPEC95 benchmarks and under different instruction cache topologies. The simulator has proved to be an efficient tool for the performance evaluation of these kind of architectures. The paper describes the main features of this simulator and analyses the simulation results.Keywords superscalar, SMT, performance evaluationI. I NTRODUCTION .The technological advance in microprocessor design in the recent years has followed two main trends. The first one tries to increase the microprocessor clock frequency using new digital components and modern VLSI solutions. The second one tries to exploit parallelism at the level of instructions by applying more aggressive techniques to exploit the instruction level parallelism (ILP): pipelining, superscalar out-of-order execution and simultaneous multithreading.Pipelining consists in dividing the execution of an instruction in a set of ordered and synchronized stages, which can be operated in parallel. Using this approach, the instructions are executed in subsequent steps, allowing their partial overlapping. The ideal performance for this technique is the retirement of one instruction per cycle. Although it could be interesting to have as much stages as possible (with the subsequent reduction in the cycle time), the penalties incurred by pipeline hazards [HEN 94] forced current designs to have a moderate number of stages.To boost the performance of the pipelining technique, superscalar architectures [JOH 91, SMI 95] replicate some hardware resources (such as registers and functional units). Superscalar execution allows the initiation of more that one instruction per cycle. Although this approach is used in current commercial microprocessors, such as Pentium [SAI 93, AND 95], Power/PowerPC [CHA 94, DIE 95, YOU 96], MIPS R10000 [MIP 95] and UltraSparc [ULT 96], its performance is limited by instruction dependencies [JOU 89, BUT 91, TRA 92, WAL 93].Since the dataflow dependences limits the maximum parallelism that can be reached by single-thread applications [LIP 96], some researchers have investigated the simultaneous execution of several instruction flows [HIR 92, YAM 94, TUL 95, GON 98]. This new approach is called SMT (Simultaneous MultiThreaded) and its advantage is based on two main reasons: First, the ILP fromthe different threads can be combined if there is no communication among them (for instance, threads come from independent applications). Second, the availability of threads can hide the execution of high-latency instructions in other threads (like for instance memory misses).The SMT is a modern concept and just a few of performance evaluation studies have been conducted until now. In 1996, Tullsen [TUL 96] analyzed different instruction fetch policies and concluded that the fetch unit must favor those threads with less instructions in the pipeline. In 1998, Hily [HIL 98] analyzed the contention on the secondary cache and concluded that it can not be ignored in order to obtain accurate results from the simulations. Still in 1998, Lo [LO98] concluded that many cache conflicts could be eliminated by software using a suitable policy of virtual-physical address mapping for memory pages and by using per-thread memory address offset. In 1999, Gonçalves [GON 99] showed analytically that cache miss rate, caused by interference among threads, could be reduced by prefetching instruction, before that the thread is scheduled to context switching. Also in 1999, Sigmund [SIG 99] concluded that the choice of the cache replacement policy is fundamental when there are restrictions in the memory bandwidth.All previous studies have shown the importance of the SMT architectures, analyzing and evaluating some aspects. However, much more must be done. The existence of an efficient SMT simulator that models the whole system (including memory hierarchy) is totally necessary. This paper presents a SMT simulator that has been developed on top of the SimpleScalar Tool Set [BUR 97]. Our simulator is portable and provides many facilities to obtain detailed statistical information about the performance of this new architectural approach, under different configurations. This paper describes the main features of the simulator and analyses the simulation results.The paper is organized as follows. Section II describes the basic simulator. Section III describes the SMT simulator. Section IV presents and analyses the results of the cache simulations. The conclusions are presented in the section V and the references can be found in the last section.II. B ASE S IMULATOR.The SimpleScalar Tool Set has been developed at the University of Wisconsin-Madison as part of the MultiScalar Project. SimpleScalar has gained popularity and it is used as the base in the development of current execution-driven architecture simulators. It contains a lot of C functions that can be used to decode SS binaries (a variation of MIPS instruction set), simulate caches and branch predictors, besides other I/O and resource management functions. In addition, there are tools for the generation of SS binaries from C source programs. The package contains pre-compiled SS benchmarks, allowing fast testing of simulators in progress.The SimpleScalar Tool Set also includes some basic simulators. One of them, called Sim-outorder, simulates a superscalar architecture with branch prediction, register renaming and out-of-order execution. This simulator uses the RUU (Register Update Unit [SOH 90]) to store instructions and to control both renaming and dependencies. The RUU keeps instructions until they can be committed. This architecture has a pipeline with 6 stages: Fetch, Decode, Issue, Execution, Write-back and Commit, as shown in Figure 1. The Fetch stage fetches instructions from instruction cache (i-cache), stores them in a buffer (i-queue) and predicts branches.Fig.1General view of the architecture simulate by Sim-outorderThe virtual addresses of the instructions and data are mapped to real addresses through the instruction TLB (i-tlb) and data TLB (d-tlb), respectively. Misses in the i-cache or the i-tlb block the Fetch stage for a specific number of cycles. The number of useful fetched instructions per cycle depends on the fetch width, i-queue availability and branch prediction accuracy. The instructions available in the i-queue are decoded and renamed in the Decode stage, in order, and stored in the RUU queue (ruu-q). Load/Store instructions are split in two parts: an Add instruction, which computes the effective memory address and is stored in the ruu-q; and the Load/Store instruction itself which is stored in a load/store queue (ls-q).Both the ruu-q and the ls-q are pools of reservation stations [TOM 67], ordered such as a reorder buffer [SMI 95]. They contain decode information, operands, busy bits and tags for the dependence controlling. The number of decoded instructions per cycle, depends on both the decode width and the availability of instructions (in the i-queue)2and reservation stations (in the RUU). The Issue stage verifies which instructions from both the ruu-q and the ls-q are ready to execute (i.e. they have all the operands available and satisfied memory dependences), and issues them to appropriate functional unit. The number of issued instructions depends on the number of ready instructions, the issue width and the availability of functional units and memory ports.The Execution stage executes the instructions and keeps each functional unit busy during the operation latency. Among the memory instructions, only Load instructions are executed in this stage. The Store instructions are executed in the Commit stage, when the computation is tagged as non-speculative. Both memory and branch instructions are executed with highest priority. The result of executed instructions are sent back to RUU in order to free the execution of other instructions which are waiting for them. The Commit stage verifies the ruu-q and retires, in order, the concluded instructions. When an Add instruction for a memory address is retired from the end of rru-q, the last entry from ls-q, which must be the other component of the memory instruction, is retired too. When a branch instruction is found, the prediction validation is done. If the prediction was wrong, all other entries of rru-q and ls-q are eliminated and the instruction fetch is redirected to the correct target. The non-speculative results are stored in registers and memory, definitively.During each simulated cycle, all pipeline stages are executed and execution statistics are collected. The Sim-outorder simulator is totally configurable and allows the definition of L1 and L2 caches, TLB, branch predictor, as well as all the other internal parameters of the architecture mentioned in the previous paragraphs. This simulator has been used to develop the SMT simulator presented in the next section.III. T HE SMT S IMULATOR.A simulator for SMT architectures was developed1, as part of SEMPRE2 Project [GON 98], using the Sim-outorder simulator. The first step in the implementation consisted in developing a multiprocessor version of Sim-outorder, through the replication of all the structures (slots). This mainly implied the expansion of scalar variables to vectors and n-dimensional data structures to (n+1)-dimensional ones. All functions were adapted to accept a new argument that corresponds to the processor identifier (slot’s index). Each slot executes is devoted to execute a single application. At each cycle, all processors execute instructions from their applications in parallel. The individual results from each application are the ones1 Cooperative work between II/UFRGS and DAC/UPC2 SMT architecture that executes processes simultaneously, providing facilities to help the operating system. reported by the original Sim-outorder simulator. The basiccode of the multiprocessor is shown in the Figure 2.Fig.2Simplified code for the multiprocessor simulator After the implementation of the multiprocessor simulator, all pipeline stages were unified and many resources were shared in order to make the SMT simulator, as shown in Figure 3. Note that in this work, each thread corresponds to an independent application. The resource set containing register file, tables and queues, used to keep the context of one thread is still called slot. The new Fetch stage fetches one instruction block per cycle, which is composed of instructions from just one thread (scheduled each time in a round-robin fashion). The other stages schedule just one block of mixed instructions per cycle, which is composed of instructions from different slots in a round-robin fashion, until the corresponding bus width is filled.There is an i-queue for each slot to ensure that each thread have its instructions fetched and also to ease the mixing of instructions inside the pipeline. The Fetch stage fetches instructions from the il1-cache, giving priority to the thread that has few instructions in the pipeline. This technique, called ICOUNT in [TUL 96], achieved better performance. From these fetch buffers, a lot of instructions are decoded and dispatched, in order, to reservation stations (ruu-q and ls-q). From the reservation stations, the instructions are always issued to a shared pool of functional units. Regarding registers, each slot has an individual frame to store a different context.Many features were inherited from the original Sim-outorder simulator, such as out-of-order and speculative execution, branch prediction and register renaming. However, new features have been developed. The first one is to control the branch prediction accuracy.In some cases, a SMT architecture with small hardware budget usually requires a good branch predictor to3efficiently exploit the potential ILP. On the other hand, having more resources relieves the system from accurate predictors. With the control of this feature, it is possible to force the accuracy rate in order to evaluate these questions. The simulator also allows the evaluation of differentorganizations for the reservation stations. Two different topologies have been considered (as shown in Figure 4): (a) per-thread distributed and (b) shared topologies. In that figure, both topologies receive a mix of instructions from decode stage in a SMT-4 architecture.Fig.3General view of the SMT architectureIn 1995, Jourdan [JOU 95] evaluated the performance of different issue topologies on superscalar processors, showing that the per-functional distributed topology could take better advantage of the reservation stations. In 1997, Palacharla [PAL 97] showed that the issue logic on superscalar processors could become a big bottleneck in the future. However, because of the dynamical nature of the SMT architectures, a new study of the issue topology is necessary. Depending on both the branch prediction accuracy and the cache miss rate, besides others parameters such as cache topologies and types of functional units, this feature can help deciding which is the best situation to improve the performance. [GON 00] concludes that the per-thread distributed topology can be more appropriated. In the present work we are using this topology in order to achieve maximum performance.Another very important feature of this simulator is the ability to configure the decode depth, which defines the number of instructions from each fetch buffer that can be inspected simultaneously per cycle. If each fetch buffer has n entries, n is the maximum decode depth. Note that inspected instructions are not necessarily dispatched.RUU/LSQbuffers(a) (b)Fig.4Issue buffer topologiesPrevious studies have considered that any entry of any fetch buffer can be inspected and, if possible, decoded to fill the dispatch width. So, the decoder can make the choice of the better schedule to dispatch. However, it is required an expensive decode logic in order to implement it, and the cycle time can be increased. The availability of multiple threads makes possible to reduce the number of instructions inspected from each slot, with minimal waste of performance. As a consequence, the decoder can be simplified and the cycle time reduced. Figure 5 shows two examples of SMT-4 pipelines with decode depth of 1 and 2, respectively. In that figure, the first SMT-4 (a) can decode just one instruction from each instruction buffer, and the second SMT-4 (b) can decode up to two instructions. So, the first SMT-4 can inspect up to 4 instructions in order to dispatch up to 4 instructions too. However, the second SMT-4 can inspect up to 8 instructions in order to dispatchup to 4 instructions.1 depth 2(a) (b)Fig.5Examples of decode depthIn addition, if the decode depth is small, the dispatch width could not be filled completely due to insufficient amount of inspected instruction per cycle. This situation can happen even if there are other instructions in the fetch buffers. [GON 00] concluded that it is not necessary to decode more than 2 instructions per thread and cycle to achieve more than 96% of the best performance. In this45paper we are using maximum decode depth in order to achieve to maximum performance.This simulator also allows the definition of several memory hierarchies, which can use multiplexed banks on the same bus and modules on the independent buses, as exemplified in Figure 6 for SMT-4. There are 2 modules of instruction cache, which can be accessed in parallel. Each one serves 2 fetch buffers. Inside each module, there are 2 banks, which can be accessed exclusively. Each bank can be used by more than one thread, nevertheless, a thread must be located on just one bank. The sharing of the same cache bank is allowed because a field that identifies the slot owner (thread) was included in the cache block. In the present work we have evaluated these questions in order to certify the efficiency of this simulator.module 0 module 1Fig.6 Example of instruction cache topologyIV. E VALUATING I NSTRUCTION C ACHE T OPOLOGIES . One of the main problems related to SMT architectures is the low performance of the instruction cache [GON 99] due to memory addressing conflicts [LO 98] from different threads. In this work we have simulated different topologies for the instruction cache that try to overcome this problem.TABLE 1Hardware LatenciesTypes of latencies Number of cycles l1 hit ; l2 hit ; tlb miss 1 ; 6 ; 30 l2 miss (for n+1 chunks) (18 + n*2)int-alu functional unit 1 fp-alu functional unit 2 ld/st functional unit1Div oper 20 int-mul functional unit Mult oper 3 Sqrt oper 24 Div oper 12 fp-mult functional unitMult oper 4The experimental evaluation in this section is carried out for two processor configurations, named SMT-4 and SMT-8, with 4 and 8 slots, respectively. Table 1 shows the latencies considered for the functional units and Table 2 the total hardware amount used in the two configurations.TABLE 2Total Hardware AmountSMT-4SMT-8Pipeline width = 8, Unif-l2-cache = 128k, Instr-l1-cache = 16k, Data-l1-cache = 16k, 7 Funct-units = (2 int-alu, 2 fp-alu, 1 int-mult, 1 fp-mult, 1 ld/st), (ruu, lsq) sizes = (16, 8) entries.Pipeline width = 16, Unif-l2-cache = 256k, Instr-l1-cache = 32k, Data-l1-cache = 32k, 14 Funct-units = (4 int-alu, 4 fp-alu, 2 int-mult, 2 fp-mult, 2 ld/st), (ruu, lsq) sizes = (32, 16) entries.Different memory topologies are considered, called CacheXYZ . X is the Cache Modularity that defines the number of modules connected on different buses, which can be accessed in parallel. Y is the Cache Separativity that defines the total number of multiplexed banks, distributed among the modules. Z is the Cache Associativity that defines the number of entries of each cache bank related to the same memory address. The product Y Z (separativity times associativity) is called ST (space of threads ). In order to provide sufficient space for the co-existence of many threads in the cache, the ST must be greater than the total number of threads sharing it. Also, the maximum number of threads located in the same module must be ST/X. Inside the module, the multiplexing of the banks simplifies the external hardware complexity, making possible the use of just one fetch bus. However, the multiplexing forces that just one bank can be accessed per cycle. Figures 7 and 8 show the cache topologies, which have been simulated on the SMT-4 and SMT-8 architectures, respectively. For the two configurations, all the proposed topologies use the same hardware amount, as shown in the Table 2. Also, when the cache is distributed into two modules, the total bus width is split into two parts.In our analysis we have measured the performance in terms of ipc (instructions per cycle ) and the ratio between two ipc values (Speed-up ). We have used eight programs from the SPEC95 suite: 4 integer benchmarks (perl , ijpeg , gcc and li ) and 4 floating-point benchmarks (swim , mgrid , wave5 and fpppp ). For the simulation of the SMT-4 configuration, 4 different workloads composed of 4 benchmarks each (2 integer and 2 floating-point) are used. For the simulation of the SMT-8 configuration, a single workload composed of the eight programs is used. Table 3 summarizes the composition of the workloads. All6simulations are executed until one of the benchmark in the workload completes 250 millions of instructions, from which 50 millions of initial instructions are skipped to reduce the warm-up stage.TABLE 3Benchmarks Arrangements1 swim, perl, mgrid, ijpeg2 wave5, gcc, fpppp, li3 li, fpppp, ijpeg, mgridSMT-4 4gcc, wave5, perl, swimSMT-8swim, perl, mgrid, ijpeg, wave5, gcc, fppp, liCache141Cache122 Cache114Cache241 Cache222Fig.7 Instruction cache topologies for SMT-4Cache181 Cache142Cache124 Cache118Cache281Cache242Cache224Fig.8 nstruction cache topologies for SMT-8Figure 9 shows the performance achieved by the SMT-4 architecture for the cache topologies previously defined. In that figure it is possible to see that ijpeg achieves the best individual performance (over 1 ipc) and perl achieves the worst individual performance (close to 0.5 ipc). Notice that the overall performance reaches more than 3 ipc for all topologies. Also, we can note that changing the topology do not cause a significant difference in the individual performance of each benchmark. Regarding the overall performance, this difference is higher, as shown in the Table 4. This table shows that the speed-up of the Cache114 topology (best overall performance) over the Cache241 topology (worst overall performance) reaches 9.48%. Moreover, in that figure, we can make two considerations about the vectorial space of threads. First, the associativity is more important than the separativity to improve the performance. Second, the modularity of 2 doesnot improve the performance.Fig.9 SMT-4 performanceFigure 10 shows the results obtained for the SMT-8 architecture. The overall performance is much higher than that obtained with the SMT-4 architecture, reaching more than 7 ipc for both Cache242 and Cache224 topologies. The best speed-up is achieved by the Cache224 topology over the Cache181 reaching 34.74%, as shown in the Table 4. However, the average performance of each benchmark is similar on both SMT-4 and SMT-8 architectures, as shown in Figure 11. This situation has occurred because the pipeline stages of these architectures are based on a round-robin algorithm, which gives priority for the inter-thread parallelism instead of the intra-thread parallelism. The differences in the average performance between SMT-4 and SMT-8 are 1.62%, 1.31% and 0.34%, respectively, for perl , mgrid and ijpeg , as shown in the Figure 11. The maximum speed-up reaches about 10.70% in swim on SMT-4. The best speed-ups between SMT-4and SMT-8 are obtained with the floating-point benchmarks wave, fppp and swim, reaching from 8% to 11%.Fig.10SMT-8 performanceFig.11Average performanceThese simulations allow us to verify that the associativity is more important than the separativity to improve the performance on SMT-8, just like the SMT-4. This fact has happened because the separativity does not allow the use of the banks by any thread, in opposite to the associativity. However, in opposite to the SMT-4, the utilization of modularity 2 on the SMT-8 provides better performance. This situation has happened due to fetch width. In the SMT-4 architecture, a fetch width of 4 instructions is not sufficient to explore the ILP available inside each thread, due to breaking of the basic blocks. This has not happened on SMT-8 because the use of a fetch width of 8 instructions ensures the fetching of a greater number of complete basic blocks.TABLE 4Maximum Performance Speed-upSMT-4 SMT-8Cache114/Cache241 Cache224/Cache1819,48% 34,74% We can not ignore the tradeoff between performance and complexity when evaluating associativity, separativity and modularity. High associativity is very expensive to implement in terms of cycle time, because it is necessary to verify a large number of cache positions in order to find the target address. However, this technique can reduce conflicts and make better use of the cache. On the other hand, the separativity provides fast access to the target bank through the multiplexing and reduces conflicts too. However, the under-utilized banks can not be used for other threads.Regarding the modularity, the utilization of more than 1 module requires the duplication of the fetch stage. Moreover, the under-utilized fetch buffers by a module can not be used by another module. However, due to simplicity of each fetch sub-stage and the absence of conflicts among modules, this approach could be used to improve performance in some cases. We believe that combined solution of these concepts could be better.A final consideration about our simulations it that our threads come from independent applications. Consequently, there is no communication and synchronization among them. This topic is very important but out-of-scope of this paper, in which we have been interested in the use of multithreading to achieve better workload performance.V. C ONCLUSIONSResearch activities around SMT architectures are expanding widely due to potential benefits that can be derived from multithreaded applications. From our point of view, it is important to analyze and evaluate all pipeline components in order to design efficient SMT architectures. The SMT simulator described in this paper is the key tool that allows the analysis and performance evaluation of different configurations for these SMT architectures, easing the design phase of aggressive microprocessors. The functionality and capabilities of the simulator have been tested through the evaluation of several instruction cache topologies.Several conclusions have been drawn from our study. First, as expected, an SMT-8 architecture can provide better overall performance than SMT-4 architecture. However, the individual performance of each benchmark is kept equivalent on both architectures.Second, the utilization of a space of threads smaller than the number of active threads on SMT architecture can significantly decrease performance or make impossible the execution of applications. This situation happens due to memory addressing conflicts, which enlarge the amount of7cache misses. Sometimes, the processor is not able to execute useful instructions, making just cache replacement.Third, despite of hardware complexity, cache associativity can contribute to better performance more than the cache separativity. In addition, the modularity can be good to improve the performance, depending on the fetch width.A final remark is that when the cache topologies are organized appropriately, they can provide a reasonable speed-up, reaching more than 9% on SMT-4 and more than 30% on SMT-8, using SPEC95 benchmarks.R EFERENCES[AND 95] Anderson, D. & Shanley, T., Pentium Processor System Architecture, Second Edition, MindShare,Inc., Addison-Wesley, Massachusetts, 433p.,February, 1995.[BUR 97] Burger, D., Austin, T. M., The SimpleScalar Tool Set, Version 2.0, Technical Report #1342, University ofWisconsin – Madison, June, 1997.[BUT 91] Butler, M., et all, Single Instruction Stream Parallelism Is Greater Than Two, Proceedings of the18th Annual International Symposium on ComputerArchitecture, Toronto, Canada, May, 1991.[CHA 94] Chakravarty, D. & Cannon, C., PowerPC: Concepts, Architecture, and Design, J. Ranade WorkstationsSeries, McGraw-Hill, USA, Inc.,p.363, 1994.[DIE 95] Diep, T.A, Nelson, C., Shen, J.P., Performance Evaluation of the PowerPC 620 Microarchitecture,Proceedings of the 22nd Annual InternationalSymposium on Computer Architecture, SantaMargherita Ligure, Italy, June, 1995.[GON 98] Gonçalves, R. A. L., Navaux, P. O. A., SEMPRE: Superscalar Architecture with Multiple Processes inExecution (in portuguese), X SBAC-PAD, Búzios,Brazil, September, 1998.[GON 99] Gonçalves, R. A. L., Sagula, R. L., Divério, T. A., Navaux, P. O. A., Process Prefetching for aSimultaneous Multithredead Architecture, SBAC-PAD’99 (11st Symposium on Computer Architectureand High Performance Computing), Natal, Brasil,Sept/October, 1999.[GON 00] Gonçalves, R. A. L., Ayguadé, E., Valero, M., Navaux, P. O. A., Performance Evaluation of IssueTopology and Decode Depth on SimultaneousMultithreaded Architectures, Technical Report,UFRGS, Brazil, April, 2000.[HEN 94] Hennessy, J., Patterson, D. A., Computer Architecture:A Quantitative Approach., San Mateo, CA: MorganKaufmann, 1994.[HIL98]Hily, S., Seznec, A., Standart Memory Hierarchy Does Not Fit Simultaneous Multithreading, Proceedings ofthe Multithreaded Execution, Architecture andCompilation - MTEAC, 1998.[HIR 92] Hirata, H. Et al, An Elementary Processor Architecture with Simultaneous Instruction Issuing from MultipleThreads, Proceedings of the 19th Annual InternationalSymposium on Computer Architecture, ACM &IEEE-CS, pp.136-145, May, 1992. [JOH 91] Johnson, M., Superscalar Microprocessor Design, Prentice Hall Series in Innovative Technology, PTRPrentice Hall, Englewood Cliffs, New Jersey, 288p.,1991.[JOU 89] Jouppi, N. P. & Wall, D. W., Available Instruction-Level Parallelism for Superscalar and SuperpipelinedMachines, Research Report, Digital, Western ResearchLaboratory, Palo Alto, California, July, 1989.[JOU 95] Jourdan, S., Sainrat, P., Litaize, D., An Investigation of the Performance of Various Instruction-Issue BufferTopologies, Proceedings of the 28th InternationalSymposium on Microarchitecture - MICRO-28, AnnArbor, Michigan, December, 1995.[LIP 96] Lipasti, M.H. & Shen, J.P., Exceeding the Dataflow Limit via Value Prediction, 29th Micro, Paris, France,December, 1996.[LO 98] Lo, J. et al., An Analysis of Database Workload Performance on Simultaneous MultithreadedProcessors, Proceedings of the 25th AnnualInternational Symposium on Computer Architecture(ISCA'98), June 29-July 1, 1998.[MIP 95] MIPS R10000 Microprocessor User’s Manual, Version 1.0, MIPS Technologies, Inc. NorthShoreline, Mountain View, California, June, 1995. [PAL 97]Palacharla, S., Jouppi, N. P., Smith, J. E., Complexity-Effective Superscalar Processors, Proceedings ofISCA´97, Denver, USA, 1997.[SIG99]Sigmund, U., Ungerer, T., Memory Hierarchy Studies of Multimedia-enhanced Simultaneous MultithreadedProcessors for MPEC-2 Video Decompression.Workshop on Multi-Threaded Execution, Architectureand Compilation (MTEAC 00), Toulouse, 8.1.2000 [SMI 95] Smith, J.E, Sohi, G.S., The Microarchitecture of SuperScalar Processors, Proceedings of the IEEE,83(12), pp.1609-1624, December, 1995.[SOH 90] Sohi, G. S., Instruction Issue Logic for High Performance, Interruptible, Multiple Functional Unit,Pipelined Computers, IEEE Transactions onComputers, 39(3):349-369, March, 1990.[TOM 67] Tomasulo, R.M., An Efficient Algorithm for Exploiting Multiple Arithmetic Units, IBM Journal,pp.25-33, January, 1967.[TUL 95] Tullsen, D. M., et all, Simultaneous Multithreading: Maximizing On-Chip Parallelism, Proceedings of theISCA’95, Santa Margherita Ligure, Italy, ComputerArchitetcure News, n.2, v.23, 1995.[ULT 96] UltraSPARC User’s Manual, UltraSPARC-I/UltraSPARC-II, Revision 2.0, Sun Microsystems,Mountain View, CA, USA, May, 1996.[WAL 93] Wall, D. W., Limits of Instruction-Level Parallelism, Research Report, Digital, Western ResearchLaboratory, Palo Alto, California, June, 1993. [YAM 94] Yamamoto, W. et all, Performance Estimation of Multistreamed, Superscalar Processors, Proceedings ofthe Hawaii International Conference on SystemsSciences, January, 1994.[YOU 96] Young, J. L., Por Dentro do Power PC, Editora Berkeley Brasil, 313p., São Paulo, 1996.8。

相关文档
最新文档