AN EXTENDED C LANGUAGE AND A SIMD COMPILER FOR EFFICIENT IMPLEMENTATION OF IMAGE FILTERS ON

合集下载

计算机组成与结构第四版(王爱英著)清华大学出版社课后答案(全)

计算机的算术运算
单精度浮点数
1000 1111 1110 1111 1100 0000 0000 0000 S E F S =(-1)1=-1 E =00011111=3110 F’ =110+(110 1111 1100 0000 0000 0000)2
单精度浮点数=S×F’×2E
WANG Wei, Computer Organization and Architecture, Copyright 2004 TJU
若它分别表示如下所示的三种数，那么他们的含义各是什么？
2的补码表示的整数无符号整数单精度浮点数
WANG Wei, Computer Organization and Architecture, Copyright 2004 TJU
计算机的算术运算
分析与解答：
2的补码表示的整数
(1000 1111 1110 1111 1100 0000 0000 0000)补 =(1111 0000 0001 0000 0100 0000 0000 0000)原 =-(111 0000 0001 0000 0100 0000 0000 0000) =-188011315210
WANG Wei, Computer Organization and Architecture, Copyright 2004 TJU
计算机的逻辑部件
+6
WANG Wei, Computer Organization and Architecture, Copyright 2004 TJU
计算机的逻辑部件
0100 0001 0010 1000 0000 0000 0000 0000
WANG Wei, Computer Organization and Architecture, Copyright 2004 TJU

大学计算机组织与结构习题

前二章作业1.计算机的四个基本功能（Functions）是什么？2.在计算机的top-level structure view中，四个structural components 是什么？3.谁提出了store-program concept ?你能用汉语简单地描述这个存储程序的概念吗？4.CPU的英文全称是什么？汉语意义是什么？5.ALU的英文全称是什么？汉语意义是什么？6.V on Neumann 的IAS机的五大部件都是什么？7.在第一章中我们认识到的四个结构性部件（第2题）与V on Neumann的IAS机（第6题）中部件有本质差别吗？8.Fundamental Computer Elements 有哪几个？它们与计算机的四个基本功能的关系是什么？9.Moore’s Law在中文翻译为什么？它描述了什么事物的一般规律？10.本书的次标题和第二章第二节标题均为“Designing forPerformance”,Performance 主要指什么？Performance Balance的（balance）平衡要平衡什么？11.本书作者将他要研究的范围局限在“desktop, workstation , server“中，它们的中文名称是什么？各自的工作范围是什么？Chapter 3Homework1.PC means _________.A. personal computerB. programming controllerC. program counterD. portable computer2. PC holds _______________ .A. address of next instructionB. next instructionC. address of operandD. operand3. At the end of fetch cycle, MAR holds _____.A. address of instructionB. instructionC. address of operandD. operand4. Interrupt process steps are __________.A. suspending , resuming , branching & processingB. branching , suspending , processing & resumingC. suspending , branching , processing & resumingD. processing , branching , resuming & suspending5. A unsigned binary number is n bits, so it is can represent a value in the range between _________ .A. 0 to n-1B. 1 to nC. 0 to 2n-1D. 1 to 2n6.The length of the address code is 32 bits, so addressing range (or the range of address) is ________________.A. 4GB.from –2G to 2GC.4G-1D. from 1 to 4G7.There are three kinds of BUSes. Which is not belong to them?A. address busB. system busC. data busD. control busQuestions1.Translate the following terms (Note: function)PC, MAR, MBR, IR, AC, bus, system bus, data bus , address bus , control bus , handler*, opcode, Bus arbitrate* , multiplexed bus* , interrupt, ISR, Instruction cycle , fetch cycle , execute cycle(带“*”为选做题)2.Page90 problems3.1What general categories of functions are specified by computer instruction?3. Describe simply the operations of PC and IR in an instruction cycle.4.Suppose the length of word is n-bit, describe simply operand(操作数) format and instruction format.5. Describe simply the procedure of the interruption6. Describe simply the types and functions of the BUS.一：选择题1.The computer memory system refers to _________A.RAMB.ROMC.Main memoryD.Register , main memory, cache, external memory2.If the word of memory is 16 bits, which the following answer is right ?A.The address width is 16 bitsB.The address width is related with 16 bitsC.The address width is not related with 16 bitsD.The address width is not less than 16 bits3.The characteristics of internal memory compared to external memoryA.Big capacity, high speed, low costB.Big capacity, low speed, high costC.small capacity, high speed, high costD.small capacity, high speed, low cost4．On address mapping of cache, any block of main memory can be mapped toany line of cache, it is ___________ .A) Associative Mapping B) Direct MappingC) Set Associative Mapping D) Random Mapping5. Cache’s write-through polity means write operation to main memory _______.A)as well as to cacheB)only when the cache is replacedC)when the difference between cache and main memory is foundD)only when direct mapping is used6.Cache’s write-back polity means write operation to main memory ______________.a)as well as to cacheb)only when the relative cache is replacedc)when the difference between cache and main memory is foundd)only when using direct mapping7. On address mapping of cache, the data in any block of main memory can be mapped to fixed line of cache, it is _________________.associative mapping B) direct mappingC)set associative mapping D) random mapping8.On address mapping of cache, the data in any block of main memory can be mapped to fixed set any line(way) of cache, it is _________________.associative mapping B) direct mappingD)set associative mapping D) random mapping二：计算题(from page 126)Problem 4.1 , Problem 4.3 , Problem 4.4 , Problem 4.5 , , Problem 4.7, Problem 4.10第五章作业1.which type of memory is volatile?A.ROMB. E2PROMC. RAMD. flash memory2.which type of memory has 6-transistor structure?A. DRAMB. SRAMC. ROMD. EPROMing hamming code, its purpose is of one-bit error.A. detecting and correctingB. detectingC. correctingD. none of all4.Flash memory is .A. read-only memoryB. read-mostly memoryC. read-write memoryD. volatile5.Which answer about internal memory is not true?A. RAM can be accessed at any time, but data would be lost when power down..B. When accessing RAM, access time is non-relation with storage location.C. In internal memory, data can’t be modified.D. Each addressable location has a unique address.Page161 Problems: 5.4 5.5 5.6 5.7 5.8第六章作业一、选择题1. RAID levels_________make use of an independent access technique.A. 2B. 3C. 4D. all2. In RAID 4, to calculate the new parity, involves _________reads.A. oneB. twoC. threeD.four3. During a read/write operation, the head is ___________A. movingB. stationaryC. rotatingD. above all4. On a movable head system, the time it takes to position the head at the track is know as______.A. seek timeB. rotational delayC. access timeD. transfer time5. RAID makes use of stored______information that enable the recovery of data lost due to a disk failure.A. parityB. user dataC. OSD. anyone6. Recording and retrieval via _________called a headA. conductive coilB. aluminiumC. glassD. Magnetic field7.In Winchester disk track format, _________is a unique identifier or address used to locate a particular sector.A. SYNCHB. GapC. ID fieldD. Data field8. Data are transferred to and from the disk in ________.A. trackB. sectorC. gapD. cylinder9. In _________, each logical strip is mapped to two separate physical disk.A. RAID 1B. RAID 2C. RAID 3D. RAID 410. With _________, the bits of an error correcting code are stored in the corresponding bit position on multiple parity disk.A. RAID 1B. RAID 2C. RAID 3D. RAID 411. The write-once read-many CD, known as ________.A. CD-ROMB. CD-RC. CD-R/WD. DVD二、How are data written onto a magnetic disk?三、In the context of RAID, what is the distinction between parallel access and independent access?Homework in Chapter 71.“When the CPU issues a command to the I/O module, it must wait until the I/Ooperation is complete”. It is programmed I/O , the word “wait”means ___________________.a. the CPU stops and does nothingb. the CPU does something elsec. the CPU periodically reads & checks the status of I/O moduled. the CPU wait the Interrupt Request Signal2.See Figure 7.7. To save (PSW & PC) and remainder onto stack, why theoperations of restore them is reversed? Because the operations of stack are ________________.a. first in first outb. randomc. last in first outd. sequenceding stack to save PC and remainder, the reason is ____________________ .a.some information needed for resuming the current program at the point ofinterruptb.when interrupt occurs, the instruction is not executed over, so the instructionat the point of interrupt must be executed once againc.the stack must get some information for LIFOd.the start address of ISR must transfer by stack4.The signals of interrupt request and acknowledgement exchange between CPUand requesting I/O module. The reason of CPU’s acknowledgement is ________________a.to let the I/O module remove request signalb. to let CPU get the vectorfrom data busc.both a & bd. other aims5.In DMA , the DMA module takes over the operations of data transferring fromCPU, it means _________________________a.the DMA module can fetch and execute instructions like CPU doesb.the DMA module can control the bus to transfer data to or from memoryusing stealing cycle techniquec.the DMA module and CPU work together(co-operate) to transfer data into orfrom memoryd.when DMA module get ready, it issues interrupt request signal to CPU forgetting interrupt service6.Transfer data with I/O modules, 3 types of techniques can be used. Which one isnot belong them?a. Interrupt-driven I/Ob. programmed I/Oc. direct I/O accessd. DMA7.Think 2 types of different data transferring, to input a word from keyboard and tooutput a data block of some sectors to harddisk. The best choice is to use ___________.a. interrupt-driven I/O and DMAb. DMA and programmed I/O C. both interrupt-driven I/Os d. both DMAsparing with interrupt-driven I/O, DMA further raises the usage rate of CPUoperations, because __________a. it isn’t necessary for CPU to save & restore sceneb. it isn’t necessary for CPU to intervene the dada transferc. it isn’t necessary for CPU to read & check status repeatedlyd. both a and b9.Simply script the all actions when using Interrupt-driven I/O technique totransferring data with I/O module.(please insert the “vector “at step3 & step5)10.See Figure 7.7 & 7.8. Redraw figure 7.8, and mark the sequence numberaccording to Figure 7.7, to indicate the sequence of the information flowing.11. According to DMA technique, write all information of CPU sending to DMA module, and write at which time the DMA module issues interrupt request signal to CPU and why the INTR is issued ?.Chapter9 homework1.Suppose bit long of two’s complement is 5 bits, which arithmetic operation brings OVERFLOW?A. 5+8B. (-8)+(-8)C. 4-(-12)D.15-72.Overflow occurs sometime in ______arithmetic operation.A. addB. subtractC. add and subtractD. multiply3. In twos complement, two positive integers are added, when does overflow occurs?A. There is a carryB. Sign bit is 1C. There is a carry, and sign bit is 0D. Can’t determine4. An 8-bit twos complement 1001 0011 is changed to a 16-bit that equal to____.A.1000 0000 1001 0011B. 0000 0000 1001 0011C.1111 1111 1001 0011D.1111 1111 0110 1101 115. An 8-bit twos complement 0001 0011 is changed to a 16-bit that equal to____.A. 1000 0000 1001 0011B. 0000 0000 0001 0011C. 1111 1111 0001 0011D. 1111 1111 1110 11016.Booth’s algorithm is used for Twos complement ______.A. additionB. subtractionC. multiplicationD. division7. In floating-point arithmetic, addition can divide to 4 steps: ______.A. load first operand, add second operand, check overflow and store resultB. compare exponent, shift significand, add significands and normalizeC. fetch instruction, indirectly address operand, execute instruction and interruptD. process scheduling states: create, get ready, is running and is blocked8. In floating-point arithmetic, multiplication can divide to 4 steps: ______.A. load first operand, add second operand, check overflow and store resultB. fetch instruction, indirectly address operand, execute instruction and interruptC. process scheduling states: create, get ready, is running and is blockedD. check for zero, add exponents, multiply significands, normalize, and round.9.The main functions of ALU are?A. LogicB. ArithmeticC. Logic and arithmeticD. Only addition10. Which is true?A. Subtraction can not be finished by adder and complement circuits in ALUB. Carry and overflow are not sameC.In twos complement, the negation of an integer can be formed with thefollowing rules: bitwise not (excluding the sign bit), and add 1.D. In twos complement, addition is normal binary addition, but monitor sign bit foroverflowPage326：9.4, 9.5 and 9.7(其中9.4选作)To prove: in twos complement, sign-extension rule (converting between different bit length) and negation rule ( (-X)补= X补+ 1).Chapter 10 and Chapter 111: In instruction, the number of addresses is 0, the operand(s)’address is implied, which is(are) in_______.A. accumulatorB. program counterC. top of stackD. any register2: Which the following addressing mode can achieve the target of branch in program?A.Direct addressing modeB.Register addressing modeC.Base-register addressing modeD.Relative addressing mode （有问题）3: In index-register addressing mode , the address of operand is equal toA.The content of base-register plus displacementB.The content of index-register plus displacementC.The content of program counter plus displacementD.The content of AC plus displacement4: The address of operand is in the instruction, it is_________ ?A.Direct addressing modeB.Register indirect addressing modeC.Stack addressing modeD.Displacement addressing mode5: Which the following is not the area that the source and result operands can be stored in ?A.Main or virtual memoryB.CPU registerC.I/O deviceD.Instruction6: Compared with indirect addressing mode , the advantage of register indirect addressing mode isrge address spaceB.Multiple memory referenceC.Limit address spaceD.Less memory access7:With base-register ADDRESSING , the ______________ register can be used.A. BASEB. INDEXC. PCD. ANY8:The disadvantage of INDIRECT ADDRESSING is ____________.A. large addressing rangeB. no memory accessC. more memory accessD. large value range9:Which is not an advantage with REGISTER INDIRECT?A. just one times of operand’s accessB. large memory spaceC. large value rangeD. no memory reference10:The REGISTER ADDRESSING is very fast, but it has _________________.A. very less value rangeB. very less address spaceC. more memory accessD. very complex address’ calculating11:The disadvantage of IMMEDIATE ADDRESSING is ___________.A. limited address rangeB. more memory accessC. limit value rangeD. less memory access12:In instruction, the number of addresses is 2, one address does double duty both _______________.A. a result and the address of next instructionB.an operand and a resultC.an operand and the address of next instructionD.two closed operands13.In instruction, the number of addresses is 3, which are _______________.A. two operands and one resultB. two operands and an address of next instructionC. one operand, one result and an address of next instructionD. two operands and an address of next instruction14.The address is known as a type of data, because it is represented by __________.A. a number of floating pointB. a signed integerC. an unsigned integerD. a number of hexadecimal15.Which is not a feature of Pentium .A. complex and flex addressingB. abundant instruction setC. simple format and fixed instruction lengthD. strong support to high language16. Which is not a feature of Power PC .A. less and simple addressing modeB. basic and simple instruction setC. variable instruction length and complex formatD. strong support to high languageChapter 12 and Chapter 181. After the information flow of fetch subcycle, the content of MBR is_____________.A.oprandB.address of instructionC. instructionD. address of operand2. After the information flow of instruction subcycle, the content of MBR is_____________.A.oprandB.address of instructionC. instructionD. address of operand3. The worse factor that limits the performance of instruction pipeline is _________________.A.conditional branch delaying the operation of target addressB. the stage number of pipeline c an’t exceed 6C. two’s complement arithmetic too complexD. general purpose registers too few4.The most factor to affect instruction pipeline effectiveness is __________.A. The number of stagesB. the number of instructionC. the conditional branch instructionD. the number of pipelines5. RISC rejects ______.A. few, simple addressing modesB. a limited and simple instruction setC. few, simple instruction formatsD. a few number of general purpose registers6. RISC rejects ______.A.a large number of general-purpose registersB. indirect addressingC. a single instruction sizeD. a small number of addressing mode7. Which is NOT a characteristic of RISC processor.A. a highly optimized pipeline.B. Register to registeroperationsC. a large number of general-purpose registersD. a complexed instructionformat8.Control unit use some input signals to produce control signals that open the gatesof information paths and let the micro-operations implement. Which is NOT the input signals of control unit/A.clock and flagsB.instruction registerC.interrupt request signalD.memory read or write9.Control unit use some output signals to cause some operations. Which is notincluded in the output signals?A.signals that cause data movementB.signals that a ctivate specific functions(e.g. add/sub/…)C.flagsD.read or write or acknowledgement10. Symmetric Multi-Processor (SMP) system is tightly coupled by _______.A. high-speed data-link and distributed memoryB. shared RAIDs and high-speed data-linkC. distributed caches and shared memoryD. interconnect network and distributed memory11. The SMP means __________.A.Sharing Memory ProcessesB.Split Memory to PartsD.Stack and Memory Pointer D.Symmetric Multi-Processo r12.The “MESI” means states of ____________ .A.Modified, Exclusive, Stored and InclusiveB.Modified, Expected, Shared and InterruptedC.Modified, Exclusive, Shared and InvalidD.Moved, Exchanged, Shared and Invalid13.The protocol “MESI” is also called __________.A. write back policyB. write-update protocolC. write-invalidate protocolD. write through policyChapter 121.Which register is user –visible but is not directly operated in 8086 ?A. DSB. SPC. IPD. BP2.The indirect sub-cycle is occurred _____________ ?A. before fetch sub-cycleB. after execute sub-cycleC. after interrupt sub-cycleD. after fetch sub-cycle and before execute sub-cycle3.Within indirect sub-cycle , the thing the CPU must do is ______________?A. fetch operand or store resultB. fetch operand’s address from memoryC. fetch next instruction from memoryD. nothing4.In general, which register is used for relative addressing ---- the content inthis register plus the A supplied by instruction to make a target address in branch or loop instructions.A. SPB. IRC. BRD. PC5.The Memory Address Register connects to ____________ BUS .A. systemB. addressC. dataD. control6.The Memory Buffer Register links to ________ BUS.A. systemB. addressC. dataD. control7.After Indirect cycle , there is a ______________ cycle .A. FetchB. IndirectC. ExecuteD. Interrupt8.The Interrupt cycle is __________ ______ Execute cycle .A. always afterB. never afterC. sometime afterD. maybe before9.The correct cycle sequence is _________________ .A. Fetch , Indirect , Execute and InterruptB. Fetch , Execute , Indirect and InterruptC. Fetch , Indirect , Interrupt and ExecuteD. Indirect , Fetch , Execute and Interrup10.The aim of the indirect cycle is to get __________________.A. an operandB. an instructionC. an address of an instructionD. an address of an operand11.Which is not in the ALU ?A. shifterB. adderC. complementerD. accumulator12.The registers in the CPU is divided _____registers and ________registers .A. general purpose , user-visibleB. user-visible , control and statusC. data , addressD. general purpose , control and status13.The Base register is a(n) __________ register in 8086.A. general purposeB. dataC. addressD. control14.The Instruction Pointer is a(n) __________ register in 8086.A. general purposeB. dataC. addressD. control15.The Index register is a(n) __________ register in 8086.A. general purposeB. dataC. addressD. control16.The Stack Pointer is a(n) __________ register in 8086.A. general purposeB. dataC. addressD. control17.The Accumulator is a(n) __________ register in 8086.A. general purposeB. dataC. addressD. control18.The Programming Status W ord is a(n) __________ register .A. general purposeB. dataC. addressD. controlShow all the micro-operations and control signals for the following instruction:1. ADD AX, X; —The contents of AC adds the contents of location X, result is stored to AC.2. MOV AX, [X];—Operand pointed by the content of location X is moved to AX, that means ((X))->AX—[ ] means indirect addressing.3. ADD AX, [BX];—Operand pointed by the content of Register BX is added to AX, that means (AX)+((BX))->AX—[ ] means register indirect addressing.4. JZ NEXT1; —If (ZF)=0,then jump to (PC)+ NEXT1.5. CALL X; —Call x function, save return address on the top of stack.6. RETURN; —From top of stack return to PC.。

深入理解计算机系统(第二版) 家庭作业答案

int int_shifts_are_arithmetic(){int x = -1;return (x>>1) == -1;}2.63对于sra，主要的工作是将xrsl的第w-k-1位扩展到前面的高位。

这个可以利用取反加1来实现，不过这里的加1是加1<<(w-k-1)。

如果x的第w-k-1位为0，取反加1后，前面位全为0，如果为1，取反加1后就全是1。

最后再使用相应的掩码得到结果。

对于srl，注意工作就是将前面的高位清0，即xsra & (1<<(w-k) - 1)。

额外注意k==0时，不能使用1<<(w-k)，于是改用2<<(w-k-1)。

int sra(int x, int k){int xsrl = (unsigned) x >> k;int w = sizeof(int) << 3;unsigned z = 1 << (w-k-1);unsigned mask = z - 1;unsigned right = mask & xsrl;unsigned left = ~mask & (~(z&xsrl) + z);return left | right;}int srl(unsigned x, int k){int xsra = (int) x >> k;int w = sizeof(int)*8;unsigned z = 2 << (w-k-1);return (z - 1) & xsra;}INT_MIN);}2.74对于有符号整数相减，溢出的规则可以总结为：t = a-b;如果a, b 同号，则肯定不会溢出。

如果a>=0 && b<0，则只有当t<=0时才算溢出。

如果a<0 && b>=0，则只有当t>=0时才算溢出。

NVIDIA显卡架构简介

An Introduction to Modern GPU ArchitectureAshu RegeDirector of Developer TechnologyAgenda•Evolution of GPUs•Computing Revolution•Stream Processing•Architecture details of modern GPUsEvolution of GPUs(1995-1999)•1995 –NV1•1997 –Riva 128 (NV3), DX3•1998 –Riva TNT (NV4), DX5•32 bit color, 24 bit Z, 8 bit stencil •Dual texture, bilinear filtering•2 pixels per clock (ppc)•1999 –Riva TNT2 (NV5), DX6•Faster TNT•128b memory interface•32 MB memory•The chip that would not die☺Virtua Fighter (SEGA Corporation)NV150K triangles/sec 1M pixel ops/sec 1M transistors16-bit color Nearest filtering1995(Fixed Function)•GeForce 256 (NV10)•DirectX 7.0•Hardware T&L •Cubemaps•DOT3 –bump mapping •Register combiners•2x Anisotropic filtering •Trilinear filtering•DXT texture compression • 4 ppc•Term “GPU”introducedDeus Ex(Eidos/Ion Storm)NV1015M triangles/sec 480M pixel ops/sec 23M transistors32-bit color Trilinear filtering1999NV10 –Register CombinersInput RGB, AlphaRegisters Input Alpha, BlueRegistersInputMappingsInputMappingsABCDA op1BC op2DAB op3CDRGB FunctionABCDABCDAB op4CDAlphaFunctionRGBScale/BiasAlphaScale/BiasNext Combiner’sRGB RegistersNext Combiner’sAlpha RegistersRGB Portion Alpha Portion(Shader Model 1.0)•GeForce 3 (NV20)•NV2A –Xbox GPU •DirectX 8.0•Vertex and Pixel Shaders•3D Textures •Hardware Shadow Maps •8x Anisotropic filtering •Multisample AA (MSAA)• 4 ppcRagnarok Online (Atari/Gravity)NV20100M triangles/sec 1G pixel ops/sec 57M transistors Vertex/Pixel shadersMSAA2001(Shader Model 2.0)•GeForce FX Series (NV3x)•DirectX 9.0•Floating Point and “Long”Vertex and Pixel Shaders•Shader Model 2.0•256 vertex ops•32 tex+ 64 arith pixel ops •Shader Model 2.0a•256 vertex ops•Up to 512 ops •Shading Languages •HLSL, Cg, GLSLDawn Demo(NVIDIA)NV30200M triangles/sec 2G pixel ops/sec 125M transistors Shader Model 2.0a2003(Shader Model 3.0)•GeForce 6 Series (NV4x)•DirectX 9.0c•Shader Model 3.0•Dynamic Flow Control inVertex and Pixel Shaders1•Branching, Looping, Predication, …•Vertex Texture Fetch•High Dynamic Range (HDR)•64 bit render target•FP16x4 Texture Filtering and Blending 1Some flow control first introduced in SM2.0aFar Cry HDR(Ubisoft/Crytek)NV40600M triangles/sec 12.8G pixel ops/sec 220M transistors Shader Model 3.0 Rotated Grid MSAA 16x Aniso, SLI2004Far Cry –No HDR/HDR ComparisonEvolution of GPUs (Shader Model 4.0)• GeForce 8 Series (G8x) • DirectX 10.0• • • • Shader Model 4.0 Geometry Shaders No “caps bits” Unified ShadersCrysis(EA/Crytek)• New Driver Model in Vista • CUDA based GPU computing • GPUs become true computing processors measured in GFLOPSG80 Unified Shader Cores w/ Stream Processors 681M transistorsShader Model 4.0 8x MSAA, CSAA2006Crysis. Images courtesy of Crytek.As Of Today…• • • • GeForce GTX 280 (GT200) DX10 1.4 billion transistors 576 mm2 in 65nm CMOS• 240 stream processors • 933 GFLOPS peak • 1.3GHz processor clock • 1GB DRAM • 512 pin DRAM interface • 142 GB/s peakStunning Graphics RealismLush, Rich WorldsCrysis © 2006 Crytek / Electronic ArtsHellgate: London © 2005-2006 Flagship Studios, Inc. Licensed by NAMCO BANDAI Games America, Inc.Incredible Physics EffectsCore of the Definitive Gaming PlatformWhat Is Behind This Computing Revolution?• Unified Scalar Shader Architecture• Highly Data Parallel Stream Processing • Next, let’s try to understand what these terms mean…Unified Scalar Shader ArchitectureGraphics Pipelines For Last 20 YearsProcessor per functionVertex Triangle Pixel ROP MemoryT&L evolved to vertex shadingTriangle, point, line – setupFlat shading, texturing, eventually pixel shading Blending, Z-buffering, antialiasingWider and faster over the yearsShaders in Direct3D• DirectX 9: Vertex Shader, Pixel Shader • DirectX 10: Vertex Shader, Geometry Shader, Pixel Shader • DirectX 11: Vertex Shader, Hull Shader, Domain Shader, Geometry Shader, Pixel Shader, Compute Shader • Observation: All of these shaders require the same basic functionality: Texturing (or Data Loads) and Math Ops.Unified PipelineGeometry(new in DX10)Physics VertexFutureTexture + Floating Point ProcessorROP MemoryPixelCompute(CUDA, DX11 Compute, OpenCL)Why Unify?Vertex ShaderPixel ShaderIdle hardwareVertex ShaderIdle hardwareUnbalanced and inefficient utilization in nonunified architectureHeavy Geometry Workload Perf = 4Pixel Shader Heavy Pixel Workload Perf = 8Why Unify?Unified ShaderVertex WorkloadPixelOptimal utilization In unified architectureUnified ShaderPixel WorkloadVertexHeavy Geometry Workload Perf = 11Heavy Pixel Workload Perf = 11Why Scalar Instruction Shader (1)• Vector ALU – efficiency varies • • 4 MAD r2.xyzw, r0.xyzw, r1.xyzw – 100% utilization • • 3 DP3 r2.w, r0.xyz, r1.xyz – 75% • • 2 MUL r2.xy, r0.xy, r1.xy – 50% • • 1 ADD r2.w, r0.x, r1.x – 25%Why Scalar Instruction Shader (2)• Vector ALU with co-issue – better but not perfect • DP3 r2.x, r0.xyz, r1.xyz } 100% • 4 ADD r2.w, r0.w, r1.w • • 3 DP3 r2.w, r0.xyz, r1.xyz • Cannot co-issue • 1 ADD r2.w, r0.w, r2.w • Vector/VLIW architecture – More compiler work required • G8x, GT200: scalar – always 100% efficient, simple to compile • Up to 2x effective throughput advantage relative to vectorComplex Shader Performance on Scalar Arch.Procedural Perlin Noise FireProcedural Fire5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 7900GTX 8800GTXConclusion• Build a unified architecture with scalar cores where all shader operations are done on the same processorsStream ProcessingThe Supercomputing Revolution (1)The Supercomputing Revolution (2)What Accounts For This Difference?• Need to understand how CPUs and GPUs differ• Latency Intolerance versus Latency Tolerance • Task Parallelism versus Data Parallelism • Multi-threaded Cores versus SIMT (Single Instruction Multiple Thread) Cores • 10s of Threads versus 10,000s of ThreadsLatency and Throughput• “Latency is a time delay between the moment something is initiated, and the moment one of its effects begins or becomes detectable”• For example, the time delay between a request for texture reading and texture data returns• Throughput is the amount of work done in a given amount of time• For example, how many triangles processed per second• CPUs are low latency low throughput processors • GPUs are high latency high throughput processors•GPUs are designed for tasks that can tolerate latency•Example: Graphics in a game (simplified scenario):•To be efficient, GPUs must have high throughput , i.e. processing millions of pixels in a single frame CPUGenerateFrame 0Generate Frame 1Generate Frame 2GPU Idle RenderFrame 0Render Frame 1Latency between frame generation and rendering (order of milliseconds)•CPUs are designed to minimize latency•Example: Mouse or keyboard input•Caches are needed to minimize latency•CPUs are designed to maximize running operations out of cache •Instruction pre-fetch•Out-of-order execution, flow control• CPUs need a large cache, GPUs do not•GPUs can dedicate more of the transistor area to computation horsepowerCPU versus GPU Transistor Allocation•GPUs can have more ALUs for the same sized chip and therefore run many more threads of computation•Modern GPUs run 10,000s of threads concurrentlyDRAM Cache ALU Control ALUALUALUDRAM CPU GPUManaging Threads On A GPU•How do we:•Avoid synchronization issues between so many threads?•Dispatch, schedule, cache, and context switch 10,000s of threads?•Program 10,000s of threads?•Design GPUs to run specific types of threads:•Independent of each other –no synchronization issues•SIMD (Single Instruction Multiple Data) threads –minimize thread management •Reduce hardware overhead for scheduling, caching etc.•Program blocks of threads (e.g. one pixel shader per draw call, or group of pixels)•Any problems which can be solved with this type of computation?Data Parallel Problems•Plenty of problems fall into this category (luckily ☺)•Graphics, image & video processing, physics, scientific computing, …•This type of parallelism is called data parallelism•And GPUs are the perfect solution for them!•In fact the more the data, the more efficient GPUs become at these algorithms •Bonus: You can relatively easily add more processing cores to a GPU andincrease the throughputParallelism in CPUs v. GPUs•CPUs use task parallelism•Multiple tasks map to multiplethreads•Tasks run different instructions•10s of relatively heavyweight threadsrun on 10s of cores•Each thread managed and scheduledexplicitly•Each thread has to be individuallyprogrammed •GPUs use data parallelism•SIMD model (Single InstructionMultiple Data)•Same instruction on different data•10,000s of lightweight threads on 100sof cores•Threads are managed and scheduledby hardware•Programming done for batches ofthreads (e.g. one pixel shader pergroup of pixels, or draw call)Stream Processing•What we just described:•Given a (typically large) set of data (“stream”)•Run the same series of operations (“kernel”or“shader”) on all of the data (SIMD)•GPUs use various optimizations to improve throughput:•Some on-chip memory and local caches to reduce bandwidth to external memory •Batch groups of threads to minimize incoherent memory access•Bad access patterns will lead to higher latency and/or thread stalls.•Eliminate unnecessary operations by exiting or killing threads•Example: Z-Culling and Early-Z to kill pixels which will not be displayedTo Summarize•GPUs use stream processing to achieve high throughput •GPUs designed to solve problems that tolerate high latencies•High latency tolerance Lower cache requirements•Less transistor area for cache More area for computing units•More computing units 10,000s of SIMD threads and high throughput•GPUs win ☺•Additionally:•Threads managed by hardware You are not required to write code for each thread and manage them yourself•Easier to increase parallelism by adding more processors•So, fundamental unit of a modern GPU is a stream processor…G80 and GT200 Streaming ProcessorArchitectureBuilding a Programmable GPU•The future of high throughput computing is programmable stream processing•So build the architecture around the unified scalar stream processing cores•GeForce 8800 GTX (G80) was the first GPU architecture built with this new paradigmG80 Replaces The Pipeline ModelHost Input Assembler Setup / Rstr / ZCull Geom Thread Issue Pixel Thread Issue128 Unified Streaming ProcessorsSP SP SP SPVtx Thread IssueSPSPSPSPSPSPSPSPSPSPSPSPTFTFTFTFTFTFTFTFL1L1L1L1L1L1L1L1L2 FB FBL2 FBL2 FBL2 FBL2 FBL2Thread ProcessorGT200 Adds More Processing PowerHost CPU System MemoryHost Interface Input Assemble Vertex Work Distribution Geometry Work Distribution Viewport / Clip / Setup / Raster / ZCull Pixel Work Distribution Compute Work DistributionGPUInterconnection Network ROP L2 ROP L2 ROP L2 ROP L2 ROP L2 ROP L2 ROP L2 ROP L2DRAMDRAMDRAMDRAMDRAMDRAMDRAMDRAM8800GTX (high-end G80)16 Stream Multiprocessors• Each one contains 8 unified streaming processors – 128 in totalGTX280 (high-end GT200)24 Stream Multiprocessors• Each one contains 8 unified streaming processors – 240 in totalInside a Stream Multiprocessor (SM)• Scalar register-based ISA • Multithreaded Instruction Unit• Up to 1024 concurrent threads • Hardware thread scheduling • In-order issueTPC I-Cache MT Issue C-CacheSP SP SP SP SP SP SP SPSFU SFU• 8 SP: Thread Processors• IEEE 754 32-bit floating point • 32-bit and 64-bit integer • 16K 32-bit registers• 2 SFU: Special Function Units• sin, cos, log, exp• Double Precision Unit• IEEE 754 64-bit floating point • Fused multiply-add DPShared Memory• 16KB Shared MemoryMultiprocessor Programming Model• Workloads are partitioned into blocks of threads among multiprocessors• a block runs to completion • a block doesn’t run until resources are available• Allocation of hardware resources• shared memory is partitioned among blocks • registers are partitioned among threads• Hardware thread scheduling• any thread not waiting for something can run • context switching is free – every cycleMemory Hierarchy of G80 and GT200• SM can directly access device memory (video memory)• Not cached • Read & write • GT200: 140 GB/s peak• SM can access device memory via texture unit• Cached • Read-only, for textures and constants • GT200: 48 GTexels/s peak• On-chip shared memory shared among threads in an SM• important for communication amongst threads • provides low-latency temporary storage • G80 & GT200: 16KB per SMPerformance Per Millimeter• For GPU, performance == throughput• Cache are limited in the memory hierarchy• Strategy: hide latency with computation, not cache• Heavy multithreading • Switch to another group of threads when the current group is waiting for memory access• Implication: need large number of threads to hide latency• Occupancy: typically 128 threads/SM minimum • Maximum 1024 threads/SM on GT200 (total 1024 * 24 = 24,576 threads)• Strategy: Single Instruction Multiple Thread (SIMT)SIMT Thread Execution• Group 32 threads (vertices, pixels or primitives) into warps• Threads in warp execute same instruction at a time • Shared instruction fetch/dispatch • Hardware automatically handles divergence (branches)TPC I-Cache MT Issue C-CacheSP SP SP SP SP SP SP SPSFU SFU• Warps are the primitive unit of scheduling• Pick 1 of 24 warps for each instruction slot• SIMT execution is an implementation choice• Shared control logic leaves more space for ALUs • Largely invisible to programmerDPShared MemoryShader Branching Performance• G8x/G9x/GT200 branch efficiency is 32 threads (1 warp) • If threads diverge, both sides of branch will execute on all 32 • More efficient compared to architecture with branch efficiency of 48 threadsG80 – 32 pixel coherence 48 pixel coherence 16 14 number of coherent 4x4 tiles 12 10 8 6 4 2 0% 20% 40% 60% 80% 100% 120% PS Branching EfficiencyConclusion:G80 and GT200 Streaming Processor Architecture• Execute in blocks can maximally exploits data parallelism• Minimize incoherent memory access • Adding more ALU yields better performance• Performs data processing in SIMT fashion• Group 32 threads into warps • Threads in warp execute same instruction at a time• Thread scheduling is automatically handled by hardware• Context switching is free (every cycle) • Transparent scalability. Easy for programming• Memory latency is covered by large number of in-flight threads• Cache is mainly used for read-only memory access (texture, constants).。

胡文美教授cuda中文讲座_lecture1

© David Kirk/NVIDIA and Wen-mei W. Hwu Taiwan, June 30-July 2, 2008
Historic GPGPU Constraints
• Dealing with graphics API
– Working with the corner cases of the graphics API
• The same computation executed on many data elements in parallel – low control flow overhead with high SP floating point arithmetic intensity • Many calculations per memory access • Currently also need high floating point to integer ratio
© David Kirk/NVIDIA and Wen-mei W. Hwu Taiwan, June 30-July 2, 2008
CUDA - No more shader functions.
• CUDA integrated CPU+GPU application C program
– Serial or modestly parallel C code executes on CPU – Highly parallel SPMD kernel C code executes on GPU
Input Registers
per thread per Shader per Context
Fragment Program
• Addressing modes

Stm32常见英文缩写（Stm32iscommonlyabbreviatedinEnglish）

Stm32常见英文缩写（Stm32 is commonly abbreviated in English）Embedded common English abbreviations and English vocabulary (in the collection)Author: the author of the English abbreviation and English vocabulary in EnglishThis article is a supplement to the original, reproduced please indicate the sourceThis article is linked to:/s/blog_574d08530100hzo2 (the disabled wing angel summed up well, thanks for his sharing)English abbreviationsARM: Advanced RISC MachineAAPCS: ARM Architecture Process call standardThe ARM architecture process calls standardRISC: Reduced Instruction Set Computer to simplify Instruction Set ComputerRTOS: Real Time Operating SystemDMA: Direct Memory Access Memory is accessed directly EXTI: External Interrupts External InterruptsFSMC: Flexible static memory controller variable static storage controllerFPB: flash patch and breakpoint flash conversion and power failure unitsHSE: Hign speed externalHSI: High speed internalThe LSE: Low Speed externalLSI: Low Speed InternalLSU: load store unit access unitPFU: prefetch unit prefetch unitISR: Interrupt Service Routines Interrupt Service RoutinesNMI: Nonmaskable Interrupt cannot be blockedNVIC: Nested Vectored Interrupt Controller Nested vector Interrupt ControllerMPU: Memory Protection Unit Memory Protection UnitMIPS: million instructions per second of the million instructions per secondRCC: Reset and clock control Reset and clock controlRTC: real-time ClockIWDG: independent watchdog independent watchdogWWDG: Window watchdog Window watchdogTIM: timerGAL: generic array logic common array logicPAL: programmable logic programmable array logicASIC: Application Specific Integrated Circuit special Integrated CircuitFPGA: field-gate Array Field Programmable Gate ArrayCPLD: Complex programmed Logic Device Programmable Logic DeviceportAFIO: alternate function IO multiplexing IO portGPIO: general purpose input/output generic IO portIOP (a-g) : IO port a-io port G (for example: IOPA: IO port A)CAN: Controller area network Controller LANFLITF: The Flash memory interface Flash memory interfaceI2C: inter-integrated circuit microintegrated circuitIIS: integrate interface of sound with audio interfaceJTAG: joint test action groupSPI: Serial Peripheral Interface SDIO: SD I/OUART: Universal Synchr. / Asynch. Receiver Transmitter Universal asynchronous Receiver/TransmitterUSB: Universal Serial Bus Universal Serial BusRegister correlationCPSP: Current Program Status Register Current Program Status RegisterSPSP: saved program status register program status backup registerCSR: clock control/status register clock controls the status registerLR: link register link registerSP: stack pointer pointerMSP: main stack pointer to main stack pointerPSP: process stack pointer process stack pointerPC: program counter program counterdebugICE: in circuit emulator online simulationICE Breaker embedded online simulation unitDBG: debug debugIDE: integrated development environment integration development environmentDWT: data watchpoint and trace data observation and tracking unitITM: instrumentation trace macrocell measurement tracking unitETM: embedded trace macrocell embedded tracking macro unitTPIU: trace port interface unit tracking port interface unitTAP: the test access port test access portDAP: debug access prot debug access portTP: trace port trace portDP: debug port debug portSwj-dp: serial wire JTAG debug port serial - JTAG debugging interfaceSw-dp: serial wire debug portSerial debugging interfaceJtag-dp: JTAG debug portJTAG debugging interfaceThe system classIRQ: interrupt request interrupt requestFIQ: fast interrupt request quickly interrupts the requestSW: software software: software interrupt soft interruptRO: read only read only (part)RW: read write (part)ZI: zero initial zero initialization (part)BSS: Block Started by Symbol Block (uninitialized data segment)The busBus Matrix Bus MatrixBus Splitter Bus divisionAHB - AP: advanced High - preformance bus-access port APB: advanced peripheral busAPB1: low speed APBAPB2: high speed APBPPB: Private Peripheral BusmiscellaneousALU: the Logical Unit Arithmetic logic UnitCLZ: count leading zero leading zero counting (instruction)SIMD: single instruction stream multiple data stream single instruction stream, multi-data streamVFP: vector floating point vector floating point operation Words/phrasesBig EndianLarge - end storage modeLittle Endian small end storage modeContext switch task switching (context switching) (switching between CPU register content)Task switch task switchLiteral pool data buffer poolWords/wordsArbitration by the arbitrationThe access to visitAssembler assemblerDisassembly disassemblyBinutils connectorBit - banding segment (technology)Bit-band alias segment aliasBit - band region regionBanked groupingThe buffer cache /Ceramic pottery and porcelainTake the fetch refers toDecode decodingThe execute performHarvard Harvard (architecture)Handler handlerHeap heapStack stackLatency time delayLoad (LDR) load (memory content loaded to register Rn) Store (STR) storage (register Rn content stored in memory) Loader LoaderOptimization optimizationProcess process/processThread the threadPrescaler predividerPrefetch prefetch/prefetch meansPerform performThe pre - emption preemptionTail - chaining tail chainNewest - arriving lateResonator cavityInstructions relatedInstructions instructionsThe pseudo - instruction directiveDirective pseudo operationComments commentsFA full ascending ascending (mode)EA empty ascending stack increment (method) FD full desending is regressiveED empty desending stack decrement (method)translation1. The number of wait states for a read operation programmed on - the - flyDynamic Settings (programmed on - the - fly) reads the number of wait statesRefer to the articleReference: 1. BSS /view/453125.htm? Fr = ala0_1BSS is the uninitialized data segment produced by the Unix linker. The other sections are the "text" segments containing the program code and the "data" section containing the initialized data. Variables in the BSS section have no value but name and size. This name was later used by many files, including PE. The "start block" refers to where the compiler handles uninitialized data. The BSS section does not include any data, but simply maintains the start and end addresses so that the memory area can be effectively zero at run time. The BSS section does not exist in the application's binary image file.In the sections of memory management architecture (such as Intel's 80 x86 systems) BSS, (Block Started by Symbol segment), often referred to as used to store global uninitialized variables in the program of a memory area, generally during initialization BSS section will be reset. The BSS section is a static memory allocation, which is where the program zeroes at the start.For example, after the completion of programs such as C language, the global variables that have been initialized are saved in the.data segment. The uninitialized global variables are saved in the.bss section.Both the text and data sections are in the executable (in the embedded system typically solidified in the mirror file), which is loaded from the executable file; The BSS section is not in the executable file and is initialized by the system.2. ISR reference: /view/32247? fromTaglist3. The DMA reference: /view/32471.htm? Fr = ala0_1In the case of DMA transfers, the DMA controller is directly in charge of the bus, so there is a bus control transfer problem. Before DMA transfers, the CPU takes control of the bus to the DMA controller, and after the DMA transfer, the DMA controller should immediately return the bus control back to the CPU.A full DMA transfer process must go through the following four steps.1. The DMA requests the CPU to initialize the DMA controller and issue the operation command to the I/O interface, and the I/O interface requests the DMA request.2. The DMA response DMA controller discriminated the optimal level and shielding for DMA requests, and the bus request wasmade to the bus decision logic. The bus control can be released when the CPU executes the current bus cycle. At this point, the bus decision logic output bus should answer, indicating that the DMA has responded, and the I/O interface is notified by the DMA controller to start the DMA transfer.3. After the DMA controller receives the bus control, the CPU immediately hangs up or executes only the internal operation, and the DMA controller outputs the read and write command, and directly controls the DMA transfer with the I/O interface.4. The DMA controller will release the bus control and issue an end signal to the I/O interface after completion of the specified batch data transfer. When the I/O interface received signal after the end of the stop work I/O devices on one hand, on the other hand to the CPU interrupt request, make the CPU never step in the state of the liberation, and perform a check of the DMA transfer operation is code. Finally, the original program is carried out with the result and state of this operation.DMA transfer way, therefore, no CPU control transmission directly, nor as reserved interrupt handling the scene and the scene of the recovery process, through the hardware for the RAM and I/O devices open a direct send data access, has greatly improve the CPU's efficiency.。

DSP原理及应用_试题 (1)

（）
7. DSP 对程序存储空间、数据存储空间和 I/O 空间的选择分别是由三根片选线 PS、DS、IS
独立选择的。
（）
9. DSP 的流水线冲突可以通过改变编程方法或者添加 nop 语句来消除。
（）
10. TMS320C54X 系列 DSP 的汇编语言中分支转移指令执行需要 4 个机器周期。（）
属于硬件可编程器件，用硬件实现数据处理。
（）
2. 在 C54x 系列 DSP 中断向量表中，每个中断向量的入口地址间隔 4 个字。
（）
4. 在 C54x 系列 DSP 中断向量表的最大长度只能 128 字。
（）
5. DSP 对程序存储空间、数据存储空间和 I/O 空间的选择分别是由三根片选线 PS、DS、IS
4．答：程序
5．C54x 系列 DSP 处理器中，实现时钟频率倍频或分频的部件是_____________。
5．答：锁相环 PLL
6．TMS320C54x 系列 DSP 处理器上电复位后，程序从指定存储地址________单元开始工作。
6．答：FF80h
7．TMS320C54x 系列 DSP 处理器有_____个通用 I/O 引脚，分别是_________。
）
8. 在 DSP 的编程中可以将程序中不同的段放置在不同的存储器中。
（）
10. TMS320C54X 系列 DSP 的汇编指令 WRITA 可以寻址 1M 字的程序空间。（）
4．DSP 处理器 TMS320VC5402 内部没有专用的除法指令。
（）
5．定点 DSP 处理器 TMS320VC5402 可以计算浮点小数运算
7．答：2 个，BIO 和 XF
8．DSP 处理器按数据格式分为两类，分别是_______ __；_____ ___。

311143040_系统级编程(B闭)

1 / 9 311143040系统级编程（B闭） 2015-2016-1 Problem 1（40 Points) : Multiple choice questions 1． Compared to a sequence of machine code instructions, a fragment of C code A. describes the actions of the computer, not just of the CPU B. is the native way to program most computers C. does not engage any transistors during its execution D. may describe the same algorithm

2． In c, using default floating point settings, what happens when a floating-point computation results in an overflow? A. an exception is raised unless disabled by calling _controlfp(). B. an erroneous value is computed and execution continues. C. a special value "infinity" is computed, testable with _finite(). D. program execution is halted.

3． Programs compiled for an intel pentium processor do not execute properly on a sparc processor from sun microsystems because A. the operation codes understood by the two processors are different B. the memory of a sparc cpu is numbered from top to bottom C. copyrights regarding code cannot be violated D. the assembly mnemonics for the same "opcode" are different in the two processors

计算机专业英语

1.1-1.101. The cache memories is designed for ( )a. The processor can read data from the register file too much faster than from memory.b. It is easier and cheaper to make main memory run faster than to make processors run faster.c. Faster devices are more expensive to build than their slower counterparts.d. Larger storage devices are slower than smaller storage devices.2.The cache memories ()a. serve as temporary staging areas for information that the processor is likely to need in the near future.b. have large memory capacity than internal storagec. exchange speed is much slower than internal storage.d. have little impact on the performance of CPU.3. Which answer is wrong about the memory hierarchy ()a. from the top of the hierarchy to the bottom, the devices become slower, larger.b. from the top of the hierarchy to the bottom, the devices cost less per byte.c. the register file occupies the bottom level in the hierarchy, is known as level 0 or L0.d. it is a storage at one level serves as a cache for storage at the previous higher level.4. Regularly, which is wrong among these choices()a. caches L2 and L3 are caches for L1 and L2b. L3 cache is a cache for the main memoryc. the main memory is a cache for the diskd. local disk serves as a cache for data stored on the disks of other systems5. Which hardware technology is wrong with these caches()a. The L1 and L2 caches are implemented with SRAMb. The L2 and L3 caches are implemented with SRAMc. The L2 and L3 caches are implemented with DRAMd. The L3 and main memory caches are implemented with DRAM6. The preprocessor (cpp)modiﬁes the original C program according to directivesthat begin with the_____character. ( )A.” *”B.”#”C.”(”D.”;”7________ translates hello.s into machine-language instructions, packages themin a form known as a relocatable object program, and stores the result in the object ﬁle hello.o. ( )A. The assemblerB. The preprocessorC. The compilerD.The linker8. On a Unix system, the translati on from source ﬁle to object ﬁle is performed by a _______driver (C)A. assemblerB. preprocessorC. compilerD. linker9. A computer system is a collection of ________components that work together to run computer programs. (C)A. hardwareB. softwareC. hardware and softwareD. hardware or software10. In order to run hello.c on the system, the individual C statements must be translated by other programs into a sequence of _________instructions. ( )A .high-level machine-language B. low-level machine-language C.C language D.JA V A language11What is concurrency?A.it’s a system with multiple, simultaneous activities.B.the use of concurrency to make a system run faster.C.is an abstraction that provides each process that it has exclusive use of themain memory.D.is the operating system’s abstraction for a running program.12What is instruction-level parallelism?A.modern processors can execute multiple instructions at one time.B.the use of concurrency to make a system run faster.C.allows a single instruction to cause multiple operations to be parallelism.D.is the operating system’s abstraction for a running program.13 What difference between Multiple-Data(SIMD)Parallelism and Single-Instruction?A.SIMD allows a single instruction to cause multiple operations to be parallelism.B.the use of concurrency to make a system run faster.C.is an abstraction that provides each process that it has exclusive use of themain memory.D.is the operating system’s abstraction for a running program.14What is the parts of virtual machine?A.operating system,I/O devices,main memory,B.processor,I/O devices,operating systemC.main memory,operating systemD.above all15What is an abstraction of running program?A.operating systemB.I/O devicesC.main memoryD.processes16 we can think of the ( ) as a layer of software interposed between the application program and the hardware.A processorB operating systemC main memoryD I/O devices17 ( )can not be modeled as a file.A displaysB registersC disksD keyboards18 When the operating system decides to transfer control from the current process to some new process,it performs a( )A context switchingB processesC threadD main memory19Application programs are not allowed to read or write the contents of this area or to directly call functions defined in the( )A stackB shared librariesC heapD kernel virtual memory20 ( )is not included in the CPU chip.A bus interfaceB ALUC Expansion slotsD Register file21.Why programmers need to understand how compilation thesystems work?①optimazing program performance②Understanding link-time errows③Avoding security holes④Better programmingA.①③④B.①②③C.②③④D.①②④22.What is the hardware organization of a typical system?①buses ②disk ③I/O devices ④mouse ⑤processor ⑥main memoryA.①③④⑤B.②③⑤⑥C.①③⑤⑥D.②④⑤⑥23.What might CPU carry out at the request of an instruction①I/O read ②I/O write ③jump④Load ⑤updata ⑥storeA.①②③③④B.②③④⑤⑥C.④⑥⑤①②③D.③④②①⑤⑥24.Inter Pentium systems have a word size of _____bytes,Sun SPARCShave a word size of_____bytes.A.4 8B.1 4C.4 1D.8 425.what is the basic style of allocator？A.explicit allocator、video allocatorB.video allocator、implicit allocatorC.implicit allocator、signal allocatorD. explicit allocator、implicit allocator10.1-10.101. Why bother learning about Unix I/O ()a. help you understand other systems conceptsb. Sometimes you have no choice but to use Unix I/Oc. Unix I/O is usefuld. the higher-level I/O functions work not well2. Unix I/O enables all input and output to be performed in a uniform and consistent way except ( )a. opening filesb. getting the current file positionc. reading and writing filesd. closing files3. Which flag is not right ()a. O_RDONLY:Reading onlyb. O_WRONLY:Writing onlyc. O_RDWR:Reading and writingd. O_WRRD:Writing and reading4. Which flag is right()a. O_CREAT:If the file exist, then create a truncated version of itb. O_TRUNC:If the file already exists, then truncate itc. O_APEND:After each write operation, set the file position to the end of the file.d. S_IXOTH:No one can execute this file5. Short counts occur for a number of reasons besides: ()a. Encountering EOF on readsb. Reading text lines from a terminalc. Reading and writing network socketsd. write to disk files6._______provifes convenient,robust,and efficient I/O in applications such as network programs that are subject to short counts.( )A.IP packageB.UDP packageC. IP package and UDP packageD. R IO package7.______ntransfer data directly between memory and a file, with no application-level buffering.( )A. Unbuffered input functionsB. Buffered input functionsC. Unbuffered input and output functionsD. Buffered input and output functions8.The rio_readinitb function is called______descriptor. ( )A.once per openB. twice per openC. third per openD. fourth per open9.Applications can transfer data directly between memory and a file by calling the ______.( )A rio_readn function B.rio_writen function C. neither of themD. both of them10.The Rio functions are inspired by the______.( )A readline function B.readn function C.writen functions D. all above.11 What is I/O redirection?A.consists of bits that describe the type of the mapped object.B.allocator maintains an area of a process’s virtual memory known as the heapC.is the operating system’s abstraction for a running programD.allow users to associate standrad input and output with disk files.12 “unix> ls > foo.txt”;what is meaning?A.cause the shell to load and execute the ls program, with standard output redirected to disk file foo.txtB.cause the shell to load the ls program, with standard output redirected to disk file foo.txtC.cause the shell to load and execute the ls program, without standard output redirected to disk file foo.txtD.cause the shell to load and execute the ls program, with redirected to disk file foo.txt13 What function is Unix I/O library provides?A.Fflush()B.lseek()C.stat()D.open()14 How open files in Unix?e fopen() functione fopens() functione fopen_s() functione fopens_s() function15 How read strins in files in Unix?e fgets() functione fget() functione fread() functione freads() function16 The st_size member cotains( )in bytes.A permissionB sizeC typeD blocks17 The kernel represents open files using three related date structure except for( )A descriptor tableB file tableC v-node tableD data table18 The kernel will not delete the file table entry until its reference count is( )A oneB twoC threeD zero19 Which one is right( )A each process has its own distinct descriptor tableB each process has one distinct descriptor tableC all processes have their own open fileD all processes have their own v-node tables20 Parent and child must both( )their descriptors before the kernel will delete the corresponding file table entry.A openB closeC either A or BD neither21.The sbrk function grows or shrinks the heap by adding incr to the kernel’s brk pointer. If successful, it returns the old value of brk, otherwise it returns_____.A.0B.1C.-1D.-222. If an allocator completes 500 allocate requests and 500 freerequests in1 second, how much operations does it throughput per second.?A.1000B.2000C.1500D.50023.what is the two approach to maintain the list?A. last-in first-out order、address ordeB. first-in first-out order、address orderC.first-in first-out order、last-in first-out orderD. time order、address order.24. AMark&Sweep garbage collector consists of a______,and a_______.A. mark phaseB. sweep phaseC.delete phaseD.read phase25. An application frees a previously allocated block by callingthe_______.A. mm_free functionB. *mm_malloc funtionC. extend_heap funtionD.mm_init funtion。

微软暑假实习生笔试题

2012 Microsoft Intern Hiring Written Test1. Suppose that a Selection Sort of 80 items has completed 32 iterations of the main loop. How many items are now guaranteed to be in their final spot (never to be moved again)?(A) 16 (B) 31 (C) 32 (D) 39 (E) 402. Which Synchronization mechanism(s) is/are used to avoid race conditions among processes/threads in operating systems?(A) Mutex (B) Mailbox (C) Semaphore (D) Local procedure call3. There is a sequence of n numbers 1, 2, 3,.., n and a stack which can keep m numbers at most. Push the n numbers into the stack following the sequence and pop out randomly. Suppose n is 2 and m is 3, the output sequence may be 1, 2 or 2, 1, so we get 2 different sequences. Suppose n is 7 and m is 5, please choose the output sequences of the stack:(A) 1, 2, 3, 4, 5, 6, 7(B) 7, 6, 5, 4, 3, 2, 1(C) 5, 6, 4, 3, 7, 2, 1(D) 1, 7, 6, 5, 4, 3, 2(E) 3, 2, 1, 7, 5, 6, 44. What is the result of binary number 01011001 after multiplying by 0111001 and adding 1101110?(A) 0001 0100 0011 1111(B) 0101 0111 0111 0011(C) 0011 0100 0011 01015. What is output if you compile and execute the following code? void main(){int i = 11;int const *p = &i;p++;printf("%d", *p);}(A) 11 (B) 12 (C) Garbage value (D) Compile error (E) None of above6. Which of following C++ code is correct?(A) int f(){int *a = new int(3);return *a;}(B) int *f(){int a[3] = {1, 2, 3};return a;}(C) vector<int> f(){vector<int> v(3);return v;}(D) void f(int *ret){int a[3] = {1, 2, 3};ret = a;return;}7. Given that the 180-degree rotated image of a 5-digit number is another 5-digit number and the difference between the numbers is 78633, what is the original 5-digit number?(A) 60918 (B) 91086 (C) 18609 (D) 10968 (E) 869018. Which of the following statements are true?(A) We can create a binary tree from given inorder and preorder traversal sequences.(B) We can create a binary tree from given preorder and postorder traversal sequences.(C) For an almost sorted array, insertion sort can be more effective than Quicksort.(D) Suppose T(n) is the runtime of resolving a problem with n elements, T(n) = Θ(1) if n = 1; T(n) =2T(n/2) + Θ(n) if > 1; so T(n) is Θ(n log n).(E) None of the above.9. Which of the following statements are true?(A) Insertion sort and bubble sort are not effcient for large data sets.(B) Quick sort makes O(n^2) comparisons in the worst case.(C) There is an array: 7, 6, 5, 4, 3, 2, 1. If using selection sort (ascending), the number of swap operation is 6.(D) Heap sort uses two heap operations: insertion and root deletion.(E) None of above.10. Assume both x and y are integers, which one of the followings returns the minimum of the two integers?(A) y ^ ((x ^ y) & ~(x < y))(B) y ^(x ^ y)(C) x ^ (x ^ y)(D) (x ^ y) ^ (y ^ x)(E) None of the above11. The Orchid Pavilion (兰亭集序) is well known as the top of "行书" in history of Chinese literature. The most fascinating sentence "Well I know it is a lie to say that life and death is the same thing, and that longevity and early death make no difference! Alas!" ("周知一死生为虚诞，齐彭殇为妄作。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。