Thunk-lifting Reducing heap usage in an implementation of a lazy functional language

合集下载

heapsnap解析解析backtrace -回复

heapsnap解析解析backtrace -回复什么是heapsnap?heapsnap是一种用于分析和解析应用程序的堆栈跟踪数据的工具。

它可以帮助开发人员识别和解决应用程序中的内存泄漏和性能问题。

heapsnap能够提供关于应用程序在运行时中分配和释放内存的详细信息，从而帮助开发人员准确地定位问题，并提供解决问题的线索。

对于开发人员来说，heapsnap是一个非常有用的工具，因为它可以提供有关应用程序执行期间的内存使用情况的重要信息。

通过分析堆栈跟踪数据，开发人员可以确定哪些内存分配是可疑的，例如持续增长的对象或未正确释放的内存。

这些信息可以帮助开发人员识别和解决潜在的内存泄漏问题，从而提高应用程序的性能和稳定性。

使用heapsnap进行堆栈跟踪分析的步骤如下：1. 收集堆栈跟踪数据：首先，需要收集应用程序的堆栈跟踪数据。

这可以通过在应用程序中插入适当的代码来实现，以在每次内存分配或释放时捕获堆栈跟踪信息。

堆栈跟踪数据可以包含函数调用堆栈的信息，包括函数名称、参数和返回值。

2. 解析堆栈跟踪数据：一旦收集到堆栈跟踪数据，接下来就需要对其进行解析。

heapsnap工具可以提供一个解析器，用于解析和分析堆栈跟踪数据。

解析器可以根据堆栈跟踪信息中的函数调用关系和参数值来重构应用程序的执行流程，从而帮助开发人员理解应用程序的整体行为。

3. 分析内存分配/释放行为：解析堆栈跟踪数据后，就可以开始分析应用程序的内存分配和释放行为。

heapsnap可以提供一系列功能，帮助开发人员发现可能存在的内存泄漏或性能问题。

例如，heapsnap可以标记持续增长的对象、未释放的内存块或异常的内存使用模式。

4. 定位问题：一旦发现了内存泄漏或性能问题，开发人员需要进一步定位问题所在。

heapsnap可以提供一些工具，如对象查找器和内存快照比较工具，帮助开发人员深入了解问题的根本原因。

通过分析具体的堆内存状态和对象引用关系，开发人员可以找到导致内存泄漏或性能问题的代码段或逻辑。

PerfHUD 5 快速教程说明书

Chapter 1.Quick TutorialOverviewThis chapter presents a short PerfHUD 5 tutorial to quickly introduce you to severalconvenient and powerful new features. Even if you’ve used previous versions ofPerfHUD, we highly recommend that you read through this tutorial because somuch is new in PerfHUD 5.Launching PerfHUDBy default, the PerfHUD installer will place a shortcut to the PerfHUD Launcheron your desktop. To analyze an application, simply drag its icon onto the PerfHUDlauncher. Keep in mind that the application needs to opt-in for PerfHUD analysis,to prevent unauthorized parties from analyzing your application.Let’s analyze the sample DirectX 10 application that ships with PerfHUD, Sparkles.(This sample is taken from the NVIDIA Direct3D 10 SDK, and includes the opt-inmodification.) For this particular application, you can use the “Sparkles Sample”shortcut in the PerfHUD group in the Start menu.If this is the first time you’re running PerfHUD, you’ll see a configuration dialogbox. The main thing you have to do here is to choose a shortcut key. Pick Ctrl+Z.Once you click OK, Sparkles will start, and PerfHUD will be running on it, asshown below.Note that any keyboard or mouse input will still go the Sparkles application, and notto PerfHUD, until you activate PerfHUD using your hotkey (Ctrl+Z). PerfHUDreminds you of your hotkey with a message at the bottom of the screen: “PressCtrl+Z to activate PerfHUD”.Before activating PerfHUD, press F9 and F10 to hide the user interface of Sparkles,reducing clutter. (Remember, these are hotkeys of Sparkles – once PerfHUD isactive, F9 and F10 will perform different functions.)Activating PerfHUDActivate PerfHUD by pressing Ctrl+Z. You’ll see the status line at the bottom ofthe screen change to four buttons, one for each mode of PerfHUD:between PerfHUD and your application at any time. For example, you may want tonavigate to a different part of the scene to analyze it, and then re-enable PerfHUDwhen you’re done.Help ScreenAt any time while you’re running PerfHUD, you can press F1 to view the Helpwindow. This window also has options for getting System Information as well assetting various PerfHUD options.Performance DashboardPerfHUD starts in the Performance Dashboard. This mode displays many usefuldata values, such as per-unit GPU utilization, driver time, memory usage, and more.New in PerfHUD 5 is the ability to completely customize the PerformanceDashboard’s layout.Creating a New Batch Size GraphLet’s start by creating a new Batch Size graph. This graph displays batches andsizes, allowing you to easily understand the batching characteristics of yourapplication.To add a new graph, right-click on the background and choose New Batch SizeGraph.A new Batch Size graph will now appear with its default settings:Every graph in the Performance Dashboard is customizable. To do this, simplyhover your mouse anywhere on the graph. You’ll see three boxes appear: blue andred boxes at the upper right of the graph, and a green box on the lower right:Clicking on the blue box brings up a configuration dialog.Clicking on the red box closes the current graph.Clicking and dragging on the green box resizes the current graph.Let’s customize the Batch Size Graph. First, resize it using the green box. Thenclick on the blue box and you’ll see the Graph Configuration Dialog.Set the Maximum Batch Size to 100. Then click OK. The graph will now showmore bars.Adding SignalsThe most common type of graph in PerfHUD is the GPU/Driver Graph. EachGPU/Driver graph can display up to 4 signals simultaneously. PerfHUD 5 allowsyou to choose from a huge list of both GPU and driver signals, allowing you tomonitor virtually any aspect of your application’s graphics performance.Let’s add some signals to the GPU/Driver graph that displays the DrawPrimitiveCount and Average Batch by default. To do this, hover over the graph and click onthe blue square at the upper-right of the graph. A Graph Configuration Dialog willpop up:Here, you can choose any signal you want for each line color, as well as descriptionsfor each. You can also decide whether you want to graph the raw signal or apercentage.Choose D3D FPS for the blue line, and name it “FPS”.Choose D3D vidmem MB for the yellow line, and name it “D3D Vid Mem (MB)”Speeding Up and Slowing Down TimeBy pressing the + and – keys, you can scale the passing of time from 6x faster thannormal down to 1/8 speed. Pressing the – key again when at 1/8 speed will freezetime completely. Controlling time is helpful when you want to find a particularlytroublesome set of frames.Running ExperimentsYou can also perform various useful experiments from the Performance Dashboard.These are listed below along with their respective keyboard shortcuts.Use 2x2 Textures Ctrl+TSet NULL Viewport Ctrl+VWireframe Ctrl+WIgnore Draw Calls Ctrl+NColor Fixed Function Shaders Red Ctrl+1Color ps_1_1 Shaders Light Green Ctrl+2Color ps_1_3 Shaders Green Ctrl+3Color ps_1_4 Shaders Yellow Ctrl+4Color ps_2_0 Shaders Light blue Ctrl+5Color ps_2_a Shaders Blue Ctrl+6Color ps_3_0 Shaders Orange Ctrl+7Color ps_4_0 Shaders Red Ctrl+8Using the Frame DebuggerThe Performance Dashboard is most useful for finding a troublesome spot in yourscene. Once you’ve found that spot, you will often want to freeze the frame, debugits draw calls, and analyze its performance in detail.Press F7 to switch to the Frame Debugger. The Frame Debugger will show youjust the first draw call in the scene, which in this case is the skybox:Click and drag the slider at the bottom of the screen from side to side.You’ll see how the frame builds up with various draw calls.The current draw call is highlighted with an orange wireframe.You can use the up and down arrow keys to decrement or increment the currentdraw call. Home jumps to the first draw call, and End jumps to the last draw call.Page Up and Page Down decrement or increment the current draw call by largeramounts.Drag the slider to draw call 2. You should see the cat highlighted in orangewireframe.Viewing Textures and Render TargetsAll the textures used by the current draw call are shown in the Textures panel on theleft of the screen. Click on the Textures panel (to get focus) and press + twiceto enlarge the textures. (Pressing - will reduce the textures.) Note that if you hoverover a texture, a tooltip will appear showing u-v coordinates and RGBA colorinformation.On the right is the list of Render Targets. You can perform the same operations inthat panel as in the Textures panel.Changing the Viewing ModeIn addition to viewing the Frame Buffer as usual, you can also view Wireframe,Depth Complexity, and Depth Buffer renderings for the current frame bychoosing options from the drop-down. These views are shown below.WireframeDepth ComplexityFrame BufferUsing the Advanced State InspectorsTo analyze a particular draw call in depth, you can use PerfHUD’s Advanced StateInspectors. Access these by clicking on the Advanced… button at the lower-rightof the screen.The Vertex Assembly State InspectorYou’ll first see the Vertex Assembly State Inspector. Here you can see the geometryused in the current draw call. You can click and drag the mouse on the geometry torotate it. You can also view details about the geometry in the panel on the right.Next, switch to the Vertex Shader state inspector by clicking on the red VertexShader block at the top of the screen.Vertex Shader State InspectorThe Vertex Shader State Inspector shows you any vertex shader code from thecurrent draw call, as well as any textures and shader constants that are used. In thiscase, there are no textures, so the panel at the left of the screen is blank. You canalso edit the shader in real-time (we’ll cover that when we look at the pixel shader).Click on the purple Geometry Shader block at the top of the screen.Geometry Shader State InspectorThis state inspector is similar to the Vertex Shader state inspector, showing anygeometry shader code, textures, and constants.Click on the green Pixel Shader block at the top of the screen.Pixel Shader State InspectorThe Pixel Shader state inspector is similar to the Vertex Shader and GeometryShader state inspectors, showing any geometry shader code, textures, and constants.The search field (shown in green above) allows you to quickly find a particular textstring. Type paintamp into the search field and press Enter. The shader editorwill jump to the first occurrence of paintamp.Now, replace paintamp with 0.5. Then right-click in the editing area and chooseCompile from the context menu. (You can also save and load your shaders usingthe context menu.)Your modified shader is now used by the application. Press F2 to hide PerfHUD’suser interface, so you can see the modified rendering.Revert the shader to its original form by right-clicking in the editing area andchoosing Revert to Original Shader.Next, click on the blue Raster Operations block at the top of the screen. Raster Operations State InspectorThe Raster Operations state inspector allows you to view and manipulate numeroususeful render states. Any changes you make here affect all draw calls in the scene.(Future versions of PerfHUD will allow you to affect draw calls grouped by statebuckets.)Select the first dropdown (for the Fillmode) and change it to Wireframe. Yourscreen should now look like this:Now right-click on that same drop-down and select Restore All States from theresulting context menu. Note that you can restore states by category if you want to.Frame ProfilerPress F8 to enter the Frame Profiler. You’ll see PerfHUD quickly run a series oftests on the current frame, giving you detailed statistics about draw call performanceand GPU usage. This is one of the uniquely powerful features of PerfHUD –complete bottleneck analysis with just one key press.The Frame Profiler offers several different visualizations, which are listed andexplained briefly below.Unit Utilization Bars. Shows how long each GPU unit was used for the selected draw call, state bucket, and frame. You can define state bucket groupings using the checkboxes at the top of the screen.Unit Bottleneck Bars. Shows how long each GPU unit was the bottleneck for the selected draw call, state bucket, and frame. You can define state bucket groupings using the checkboxes at the top of the screen.Draw Call Duration. Shows how long each draw call in the frame took. (The horizontal axis is draw call number.) You can click to jump to a draw call, and see tooltips to get exact values.Unit Utilization Graph. Shows how much each GPU unit was utilized for each draw call in the frame. You can click to jump to a draw call, and see tooltips to getexact values.Shaded Pixels. Shows how many pixels were drawn by each draw call, as well as what percentage of the screen was covered by the draw call. You can click to jumpto a draw call, and see tooltips to get exact values.Primitives. Shows the number of primitives drawn by each draw call, along with the percentage of the screen covered. You can click to jump to a draw call, and seetooltips to get exact values.click to jump to a draw call, and see tooltips to get exact values.。

heap5测试指标

heap5测试指标英文回答：Heap5 is a testing metric that measures the performance and efficiency of a heap data structure. It evaluates how well a heap can handle various operations, such as insertion, deletion, and retrieval of elements. The metric takes into account factors like time complexity and space complexity.One important aspect of Heap5 is its time complexity. It measures how efficiently a heap can perform operationsin terms of the number of comparisons and swaps required. For example, a heap with a time complexity of O(log n) for insertion and deletion operations would be considered more efficient than a heap with a time complexity of O(n).Another important factor is the space complexity of a heap. This metric measures the amount of memory required by the heap to store elements and perform operations. A heapwith a lower space complexity would be considered more efficient, as it would require less memory.To illustrate the concept of Heap5, let's consider an example. Imagine we have two different heap implementations, Heap A and Heap B. Heap A has a time complexity of O(log n) for insertion and deletion operations, while Heap B has a time complexity of O(n). In terms of space complexity, Heap A requires 10 units of memory, while Heap B requires 20 units.In this scenario, Heap A would be considered more efficient according to the Heap5 metric. It performs operations in a more optimal manner, with fewer comparisons and swaps. Additionally, it requires less memory to store elements and perform operations.中文回答：Heap5是一种测试指标，用于评估堆数据结构的性能和效率。

作业车间调度的空闲时间邻域搜索遗传算法

作业车间调度的空闲时间邻域搜索遗传算法调度问题是目前工业生产管理中非常重要的问题之一。

由于作业车间调度涉及大量的变数，约束条件和客观条件，以及特定调度形式的复杂性，调度问题的解决方案是一个非常复杂的技术问题。

随着不断发展的生产计划，面临着更多复杂的调度问题。

空闲时间邻域搜索遗传算法(Idle Time Neighborhood Search Genetic Algorithm, ITNSGA)是一种新型的遗传算法，它可以有效的解决作业车间调度的调度问题。

作业车间调度是常见的调度问题之一，它是指在指定的机器约束下分配各种任务，使得工作计划产生最小总耗时。

调度问题有时也被称为作业规划。

严格来说，作业车间调度是定义在某些特定情况下完成工作计划的组织和安排，其目标是实现最低工作成本。

因此，作业车间调度一般以最小总耗时和最小总机时为其目标。

空闲时间邻域搜索遗传算法(ITNSGA)是一种基于遗传算法的调度方法，它的特点是将空闲时间考虑进调度算法，可以增加作业车间调度的灵活性，更好的满足客观条件。

空闲时间也称作指定时间，它的出现可以提高算法的可行性和有效性，为调度问题提供更好的解决方案。

空闲时间邻域搜索遗传算法主要构成分为搜索过程和遗传算法。

搜索过程主要是邻域搜索和工作调度，邻域搜索通过考虑空闲时间的增加，来调整可行性和有效性，以此达到最优的工作调度。

而遗传算法则是利用遗传编码表示调度结果，在此基础上进行组合交叉，以及变异来优化调度结果，实现最优化调度问题。

空闲时间邻域搜索遗传算法主要应用于大规模的调度问题，其优势主要表现在可行性和有效性方面。

首先，空闲时间邻域搜索遗传算法可以大大提升调度可行性，减少无效搜索，实现更好的处理指定时间的能力。

其次，空闲时间邻域搜索遗传算法能够更好的利用加入的空闲时间，减少总工作耗时和总机时，实现更高的调度效率。

此外，空闲时间邻域搜索遗传算法也可以有效解决作业调度中的空闲时间填充问题。

令牌环方法 -回复

令牌环方法-回复令牌环方法是一种常见的解决互斥问题的算法。

在这篇文章中，我们将会详细介绍令牌环方法的原理和具体实现流程，并探讨其在计算机科学中的应用。

引言：互斥是计算机科学中一个重要的概念，指的是在同一时间只允许一个进程或线程访问某个临界资源。

在多线程或多进程的场景下，互斥问题是不可避免的，因此解决互斥问题的算法被广泛研究和应用。

令牌环方法就是其中一种常见的解决互斥问题的算法。

一、令牌环方法的原理令牌环方法的原理比较简单，基本思路是利用一个特殊的令牌按照某种规则在每个进程或线程之间传递，只有拥有令牌的进程或线程可以访问临界资源。

具体来说，令牌环方法包含以下几个要素：1. 环形结构：所有参与互斥访问的进程或线程构成一个环形结构，每个节点代表一个进程或线程。

2. 令牌：一个特殊的令牌在环中按照一定的规则传递，只有持有令牌的节点才能访问临界资源。

3. 临界资源：需要进行互斥访问的共享资源或代码段。

在令牌环方法中，令牌按照固定的顺序在环中传递。

如果一个进程或线程想要访问临界资源，它必须等待令牌到达自己的节点。

一旦令牌到达，该进程或线程可以获得令牌并且访问临界资源。

使用完后，它会将令牌传递给下一个节点，由下一个节点进行访问。

二、令牌环方法的实现流程下面我们将详细介绍令牌环方法的实现流程。

1. 初始化令牌环：首先，需要确定参与互斥访问的进程或线程数量，然后创建一个包含所有节点的环形数据结构，并将令牌设置在某一个节点上。

2. 等待令牌：每个节点在访问临界资源之前，首先会检查是否持有令牌。

如果没有令牌，则该节点会等待令牌。

这可以通过使用线程同步机制如信号量或条件变量来实现。

3. 获取令牌：一旦令牌到达该节点，它将获得令牌，并且被允许访问临界资源。

此时，其他节点会被阻塞，不能访问临界资源。

4. 使用临界资源：持有令牌的节点可以安全地访问临界资源，直到操作完成。

5. 传递令牌：使用完临界资源后，节点会将令牌传递给下一个节点，使其可以访问临界资源。

FTL垃圾回收,磨损平衡的学习笔记

FTL垃圾回收，磨损平衡的学习笔记垃圾回收的概念其实是很简单的，太多了资料可以看看就会明白，主要的经典算法就是Greedy policy，Cost-benefit policy, Cost-Age-Times(CAT)policy。

这里我就简单说一下需要掌握的几个关键性概念：1、预留空间OP(Over Provisioning) 和 WAF(Write Amplification Factor)由于WAF的大小和copy 有效数据有关，想要减少WAF的值出发点是减少copy。

2、掌握垃圾回收，沿着以下线路进行探索（1）GC的原因：块最终会被写满，必须时刻保证有free page 来满足新的写入需求（2）GC的过程：When to GC 什么时候唤醒GC操作->Which block选择作为victim block, 这里的选择方法就是上述提到的经典算法以及改进->How many blocks将要被erase 擦除，只要涉及到擦除，就会和磨损平衡有关->How to 写回这些有效数据，Where to 分配这些有效数据，这里称之为Data redistribution policy数据再分配策略Where 来分配新请求写入的数据想要改进GC的算法，最基本的简单的就是按照上述来思考。

现如今的都是在经典算法上进行改良。

我在学习的时候做过三种算法的模拟，后来自己加了一个Two-block Policy 来实现block再分配，做了一个简简单单的改良，WAF写放大，以及copy次数显著减少。

就是有两类block,一类是host write block, 写入的是从host发来的新数据；另一类是copy write block，写入的是victim block上的有效数据。

Z这样简单的分类对后续学习磨损平衡很有帮助。

包括那些经常更新的数据还有不经常更新的数据怎么来做处理都是学习和思考的点。

针对资源受限工程调度问题的一种局部优化算法

进行优化。并通过分析领域中解的合法性以及可能出现的重复情况，削减领域中解的数量，提高搜索效率。在ＰＰＩＳＬＢ的数据测试中，ＦＬ经ＢＳ优化所得到的结果已经优于所有非智能甚至大部分智能演化算法。作为一种通过局部搜索进行优化的方法，ＢＳ可ＦＬ
ｎｍｂｒｆｈｏｕｉｎｉｈｅｄｂｎｌｓｎｈａｉｉｆｈｏｕｉｎｈｅｄａｅｌｓｔｅｒｐｔｉｎｓａｕｉｈｍａｃｕ，ｈｓｕｅｓｏｅｓｌｔｎｔｅｆｌｙａａｙｉｇｔｅｖｄｔｏｅｓｌｔｓｉｔｅｆｌｓｗｌａｅｅｉｏｔｔｓｔｏｉｌｙｔｏｎｉｈｔｗｈｃｙｏｃｒｉｔｉｎｗｙｔｅｓａｃｆｃｅｃｓｒｉｅ．ｎｄｔｅｔｏＳＬＢ，ｈｅｕｔｅｉｅｏｔｅｏｔｓｔｎｏＢＳｈｖｕｐｒｒｄａｌｔｏｅｏ — ａｈｅｒｈｅｉｎｙｉａｓｄＩａａｔｓｆＰＰＩｔｅｒｓｌｄｒｄｆｍｈｐｉａｉｆＦＬａｅｏｔｅｆｍｅｈｓｂｉｓｖｒｍｉｏｏｌｔｉｅｏｏ —ｎｅｌｅｔｅｏｕｉｎａｇｒｈｎｖｎｍｏｔｏｈｓｂａｎｄｆｍｎｅｌｅｔｅｏｕｉｎａｇｒｈＡｓａｐｉｓｔｎ印一ａｎｄｆｍｎｎｉｔｌｇｎｖｌｔｌｏｔｍｓａｄｅｅｓｆｔｏｅｏｔｉｅｏｉｔｌｇｎｖｌｔｌｏｔｍ．ｎｏｔｒｉｏｉｒｉｏｉｍｉａｉｏ

主动队列管理下大时滞网络路径拥塞控制算法

文章编号：1006 - 9348 (2021)03 - 0268 - 04主动队列管理下大时滞网络路径拥塞控制算法刘国芳，张炜(四川大学锦江学院，四川眉山620860)摘要:与传统的无线网络相比，大时滞网络对路径拥塞环境下的无线通道交换具有较高的要求。

为此提出主动队列管理下大时滞网络路径拥塞控制算法。

首先利用主动队列管理算法对相邻路由节点网络路径的拥塞情况展开预测，进而分析网络路由节点的队列状态;然后以优化后续节点队列、传输距离以及传输方向为目的，从路径概率选择、分组丢弃函数、WSN蚁群路由选取三个角度优化网络路径，从而实现路径拥塞控制。

实验结果表明，上述算法能够有效缩短网络的传输时滞，且能耗和丢包率较低，具有较高的应用价值。

关键词：主动队列管理;无线通道;交换网络;路由；拥塞控制中图分类号:TP399 文献标识码：BPath Congestion Control Algorithm for Large TimeDelay Networks under Active Queue Management 第38卷第3期__________________________计算机仿真____________________________2021年3月LIU Guo -fan g,Z H A N G W ei(Jinjiang College,Sichuan University,Meishan Sichuan620860，China)ABSTRACT：In the large time - delay network,there is a high demand for wireless channel switching in path congestion environment.In this regard,this paper puts forward a path congestion control algorithm with active queue management for large delay networks.Firstly,based on the active queue management algorithm,the congestion of the network path of the adjacent routing nodes was predicted,and the queue status of the network routing nodes was analyzed.Secondly,the optimization of subsequent node queue,transmission distance and transmission direction were taken as indicators to optimize the network path from path probability selection,packet drop function and WSN ant colony routing selection.Eventually,path congestion control was completed.The simulation results show that the algorithm has short transmission delay,low energy consumption and packet loss rate,and high practicability.KEYW ORDS：Active queue management；Wireless channel；Switching network；Routing；Congestion controli引言无线通道交换网络是设定在监测区域中的一些小型路由节点，通过无线通信的方式衍生出的具有多跳性、自组织性的网络系统[|]。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Thunk-lifting:Reducing heap usagein an implementation of a lazy functional languageA.Reza Haydarlou Pieter H.HartelAbstractThunk-lifting is a program transformation for lazy functional programs.The transforma-tion aims at reducing the amount of heap space allocated to the program when it executes.Thunk-lifting transforms a function application that contains as arguments further,nested,function applications into a new function application without nesting.The transformationthus essentially folds some function applications.The applications to be folded are selectedon the basis of a set of conditions,which have been chosen such that thunk-lifting neverincreases the amount of heap space required by a transformed program.Thunk-lifting has been implemented and applied to a number of medium size benchmark programs.The results show that the number of cell claims in the heap decreases on averageby5%,with a maximum of16%.1IntroductionGraph reduction[11]is a technique for implementing lazy functional languages.An expression is represented as a graph that is located in the heap.During each reduction step,the evaluator performs a transformation on the graph.The transformation process terminates as soon as there are no more reducible expressions left.Much of the creation and interpretation of a graph is realised at run-time,which requires time and space.Thus any method to avoid building graph in the heap and subsequently reducing it may improve the situation.Thunk-lifting is such a method.It is an optimisation that is used with the FAST compiler. FAST(Functional programming on ArrayS of Transputers)is an optimising compiler[5]for a lazy functional language,which is basically a subset of Miranda1[15].The FAST compiler translates lazy functional programs to a subset of C called functional C[9].For each function in the functional program a corresponding C function is generated.The run-time system is based on the G-machine[8].The Thunk-lifting program transformation lifts certain nested expressions,which at run-time will be represented as thunks,to the top level.This makes it possible for the compiler to avoid building graph(suspension)for thunks.A thunk is a special suspension that satisﬁes criteria that will be developed in Section2.Section3presents some experiments.Section4 compares thunk-lifting in the FAST compiler and the equivalent of thunk-lifting in the Spineless Tagless G-machine(STG[12]).Section5presents the conclusions.2Thunk-liftingThunk-lifting is a transformation on a lazy functional program.It tries to achieve the following aim:The generated code for the transformed program,when executed,must allocateless space than the code for the original program.Thunk-lifting needs information which is provided by compile-time strictness analysis.Con-sider the following function deﬁnitions:>h1c d=(plus(square c)(square d)):NIL>plus(x)(y)=x+y>square(x)=x*xHere the strict arguments,on the left-hand side of the deﬁnitions,have been annotated by enclosing the arguments in parenthesis.The FAST compiler translates the deﬁnitions into C code as follows:h1(c,d){return cons(vap2(plus’,vap1(square’,c),vap1(square’,d)),nil);}plus(x,y)plus’(x,y){{return x+y;return plus(reduce(x),reduce(y));}}square(x)square’(x){{return x*x;return square(reduce(x));}}The library functions vap1,vap2,build a suspension for the functions plus’and square’. The suspended functions are so called prelude functions,they are different from the original functions plus and square.The library function reduce evaluates a previously built suspension. To evaluate a suspension,of square’say,reduce calls the prelude function square’.The prelude function makes sure that the strict arguments are in reduced form.This is done by invoking reduce on all strict arguments(x in this case).When all strict arguments have been reduced, the prelude function calls the original function(square).This mechanism can be optimised in a number of ways[4],but for the present discussion this simpliﬁed description sufﬁces.The run-time graph built by the function h1is shown in Figure1(a).The root node of the graph is a cons cell.Its right child(tail)is NIL and its left child(head)is a suspension which is represented by a vap3-node.A vap-node is the graph representation of an expression which is built by the library function vap.The vap3-node has three children,which from left to right,are the name of the function plus(the pointer to the code for plus)and the suspensions for the expressions(square c)and(square d).The compiler knows that the constructor function cons is not strict in its arguments.Therefore code is generated,which builds a suspension for plus and its arguments,(square c)and(square d),and which thus postpones the evaluation of plus and square.We call the suspensions for(square c)and(square d)thunks.In Figure1(a),we see that the thunks for(square c)and(square d)are inside the suspension for plus.The question is:Is it possible to lift the thunks for(square c)and(square d)to the top level?Inother words,is it possible to generate straight calls to the function square instead ofbuilding suspensions for them?2.1When is thunk-lifting beneﬁcial?Thunk-lifting may be performed when a function call occurs in a non-strict context.How-ever thunk-lifting of an expression creates an extra function deﬁnition.Thus the code of the transformed program becomes larger and perhaps less efﬁcient.In the following sections,we give more examples which lead to the criteria on which the thunk-lifting transformation is based.The results are summarised in section2.1.3.2.1.1A thunk must occur in a strict argument positionThe arguments of a function f,whose call appears in a non-strict context,may contain sus-pensions.Not all these suspensions are thunks.Suspensions which occur in the non-strict arguments of f will also be built by the transformed version.Consider the following function deﬁnitions:>h2c d=(first c(square d)):NIL>first(a)b=aThe corresponding C code is:h2(c,d){return cons(vap2(first’,c,vap1(square’,d)),nil);}first(a,b)first’(a,b){{return a;return first(reduce(a),b);}}Thunk-lifting yields:>h2T c d=(firstT c d):NIL>firstT(x)y=first x(square y)The corresponding C code is now:h2T(c,d){return cons(vap2(firstT’,c,d),nil);}firstT(x,y)firstT’(x,y){{return first(x,vap1(square’,y));return firstT(reduce(x),y);}}The functionﬁrst is strict in itsﬁrst argument and non-strict in its second argument.Thunk-lifting creates a new functionﬁrstT whose body consists of a call to the functionﬁrst.We see that the compiler makes a suspension for the expression(square y)occurring in the body of the new functionﬁrstT.In the generated C code for the original program(body of h2),a suspension has been made for the same expression(square d)as well.As a result,an extra functionﬁrstT is created,whereas we do not avoid building graph in the heap.We conclude that if a suspension occurs in a non-strict position,thunk-lifting should not be applied to that suspension.2.1.2A transformed program must not claim more spaceEach vap-node(cell)contains a pointer to the code of the suspended function and pointers to the arguments of the function.To compute the size of each cell,we assume that each pointer occupies one heap location.In the FAST system this is a32bit word.The transformed program may occupy more heap locations than the original version if the number of free variables occurring in the arguments of the suspended functions is large. Consider the following function deﬁnitions:>h3p q k s r=(second(f3q k s r)(square p)):NIL>second a(b)=b>f3(a)(b)(c)(d)=(a+b)-(c*d)The corresponding C code is:h3(p,q,k,s,r){return cons(vap2(second’,vap4(f3’,q,k,s,r),vap1(square’,p)),nil);}second(a,b)second’(a,b){{return b;return second(a,reduce(b));}}f3(a,b,c,d)f3’(a,b,c,d){{return(a+b)-(c*d);return f3(reduce(a),reduce(b),reduce(c),reduce(d));}}Thunk-lifting yields:>h3T p q k s r=(secondT p q k s r):NIL>secondT(p)q k s r=second(f3q k s r)(square p)The corresponding C code is now:h3T(p,q,k,s,r){return cons(vap5(secondT’,p,q,k,s,r),nil);}secondT(p,q,k,s,r)secondT’(p,q,k,s,r){{return second(vap4(f3’,q,k,s,r),return secondT(reduce(p),q,k,s,r);square(p));}}Consider the generated C code for the original program.The function second occurs in a non-strict context.Therefore suspensions are built for the function second as well as its arguments (f3q k s r)and(square p).The suspension for the function second occupies3heap locations and the suspensions for itsﬁrst and second arguments occupy respectively5and2heap locations. As a result,the application of second in the original program occupies35210heap locations.The compiler knows that second is not strict in itsﬁrst argument.In the generated C code for the transformed function secondT,a suspension is made for theﬁrst argument of second,namely(f3q k s r)which occupies5heap locations.No suspension is built for its second argument(square p)because it occurs in a strict context.Thus we have succeeded in saving2 heap locations.But in the generated C code for h3T,a suspension is built for the new function secondT which has5arguments and thus occupies6heap locations.The total number of heap locations that is used by the transformed program is6511which is more than for the original program.::1;;program::1function deﬁnition::let11;;in top level let-expressionsimple expression::1function applicationvariableconstantFigure2:Abstract syntax of the thunk-lifter input language2.1.3The deﬁnition of a thunkNow we can formulate the criteria for thunk-lifting which are guaranteed to improve the efﬁciency:1.A lifted function must occur in a non-strict context.2.At least one of the arguments of a lifted function must be a function application.Thiswill be called a composite argument.3.The lifted function must be strict in that argument.4.The total number of heap locations allocated by the transformed program should be lessthan the number of heap locations allocated by the original program.Each composite argument that satisﬁes the above criteria is deﬁned to be thunk.2.2Developing a formula for thunk-liftingThe abstract syntax of the input language for the thunk-lifting transformation is a simple functional language as given in Figure2.Programs in this form are produced as a result of compiling away more elaborate syntax.A thunk-lifter input program consists of a set of function deﬁnitions.Without loss of generality we may assume that let-expressions occur only at the top level.A constant can be a number,boolean,character or a string.No pattern matching is permitted for function arguments and no local recursive deﬁnitions are allowed. Data structures are built and accessed using primitive functions such as cons and hd.The results of strictness analysis are present in the form of annotations,which are not shown in the abstract syntax.To guarantee that thunk-lifting is beneﬁcial,the total size of the vap-nodes built by the transformed program must be less than the size of the vap-nodes built by the original program.The schemes in Figure3compute the total size of the vap-nodes built by a program. These vap-nodes are built only for expressions which are function applications and then only when they occur in a non-strict context.To support the development of a formula on which to base the thunk-lifting strategy,we give three auxiliary functions.These deﬁnitions use the schemes in Figure3.Theﬁrst function returns the list of expressions that appear in the strict argument positions of a function:::f1f is strict in its-th argument positionThunk lifting should only take place if an expression appears in a strict argument position. Such an expression must not be a single constant or variable,but it must be a“composite”::1let11;............;in11 ::The new program fragment pT uses heap locations to the amount of:pT fT112Here f1.Comparing the two expressions(1)and(2),we can formalise the criterion for thunk-lifting as follows:pTSimpliﬁcation yields:pTsubstitute(1)and(2)11subtract(1)from both sidesrearrangementThe inequality()denotes the criterion on which the thunk-lifting strategy is based.It means that thunk-lifting of f1is beneﬁcial when the size of the vap-nodes,occurring in the strict arguments of f,is larger than the difference between the number of free variables in f1and the number of arguments of f.The formal thunk-lifting criterion governs the transformation of just one expression in an entire program.The thunk-lifting of one expression forms a new program in which the next expression is considered.This gives rise to a chain of programs012.The differences between and1are due to thunk-lifting of precisely one expression that occurs in.Since the thunk-lifting criterion guarantees that1allocates less heap space than the program,we may conclude that allocates less heap space than0.It is impractical to actually implement thunk-lifting by considering just one function at the time.Our implementation considers all unrelated expressions at the same time,and repeats this process until no more expressions can be lifted.The net result of this procedure will not affect theﬁnal form of the thunk-lifted program.Before moving on to the experiments we should like to point out that thunk-lifting optimises for space.This is often beneﬁcial for the execution time as well.However,since a thunk-lifted program will contain more,but smaller functions,execution time may increase in some cases. To take into account the effects of breaking up large functions into smaller ones,a model would be required which allows the compiler to reason about the execution times.Such a model would not only account for say the cost of a function call,but it would also take into account the effects that breaking up the program has on the arrangement of the code in memory.This is necessary to deal with possible effects on the cache.This is perhaps not impossible,but it seems that before embarking on such an exercise one should considerﬁrst the scope of the effects that pure space oriented thunk-lifting has on some real programs.Should the effects be large,then further investigations are justiﬁed.3ExperimentsTo test the effect of the thunk-lifting,a number of benchmark programs[6]have been trans-formed,compiled and executed.The benchmark programs are applications from different areas.The benchmark set contains small and medium size programs,each of which runs on a realistic input data set.The largest program comprises653lines.There are a few numericalprograms total heap spacetransformed transformed transformed 93830051468380223.651587521264846121.8109365024951577.9262719656933518.5150801341300028.68811231477489833.4359245752547419.7133396031603788531.9276799583506500361.6116733553257091591.5137497172940597037.9measurements.Errors in the execution times are probably larger than5%.The behaviour of a complex architecture is very difﬁcult to capture in just two simple parameters.Caches for instance are notorious for causing this sort of behaviour.On one occasion we found that only a very slight modiﬁcation of a large program,which consisted of the removal of two unused functions,caused it to run two times slower.Similar results are reported in[3].4The equivalent of thunk-lifting in the STG machineThere are several abstract machine designs to support functional languages.Examples are the G-machine[8]and the Spineless Tagless G-machine(STG)[12].They have in common the property that each function,in a functional program,is compiled into an instruction sequence for these abstract machines.The two machine models differ in the way laziness is implemented.It is interesting to see whether thunk-lifting applies as well to the STG machine as it does to the G-machine,for which it has been developed.4.1The STG languageThe STG language(See[12]for the syntax of the STG language)is the abstract machine code for the Spineless Tagless G-machine.An STG program is just a collection of bindings.There is a special form of binding whose general form is:11where(1)are the free variables occurring in and(1)are the parameters of the function.From an operational point of view,is bound to a heap-allocated closure.A closure is entered by loading a pointer to it into a special register called Node and jumping to the code pointer in the closure.The code accesses its free variables via Node.The updateﬂag indicates whether the closure should be updated when it reaches its normal form.The closure is updatable when the updateﬂag is and it is non-updatable if theﬂag is[12].The STG language supports boxed as well as unboxed values.An unboxed value is the bit-pattern representing the value itself,on which the built-in machine instructions operate.A boxed value is a pointer to a heap-allocated box containing an unboxed value[13].In the STG language,the primitive integers0#,1#,are unboxed values and the primitive operators+#, #,*#and/#operate only on unboxed values.When a variable of unboxed type is bound,theexpression to which it is bound must be evaluated immediately.Since let and letrec expressions always build closures,a variable of unboxed type can not be bound to these expressions. Instead,such a binding can be made using a case expression because case expressions always perform evaluation.As an example,we give the Miranda and the STG version of the function map:>map f[]=[]>map f(y:ys)=(f y):(map f ys)map={}\n{f,xs}->case xs ofNil{}->Nil{}Cons{y,ys}->let fy={f,y}\u{}->f{y}mfy={f,ys}\u{}->map{f,ys}in Cons{fy,mfy}4.2Implicit evaluation of strict arguments is made explicitThe function map is strict in its second argument which is of the algebraic data type list.In the STG version of the function map,the strict argument(xs)of map is evaluated explicitly using a case expression.When the strict arguments are numbers their evaluation is implicit.Consider the function plus and its(non-optimised)STG version:>plus x y=x+yplus={}\n{x,y}->+{x,y}The arguments x and y must be evaluated before they can be added,but this fact is implicit. To make the evaluation of x and y explicit,the following data type can be declared:int::=MkInt int#This declares the data type of(boxed)integers,int,as an algebraic data type with a single constructor,MkInt.The latter has a single argument of type int#,the type of unboxed integer. So the value(MkInt2#)represents the boxed integer2and2#stands for the unboxed constant2, of type int#.Since the STG language supports unboxed values,the evaluation of the arguments of the function plus can be made completely explicit as follows:plus={}\n{x,y}->case x ofMkInt x#->case y ofMkInt y#->case(plus#x#y#)oft#->MkInt t#plus#={}\n{x#,y#}->x#+#y#The arguments x and y are now explicitly evaluated by case expressions and plus#is used which produces an unboxed number.Theﬁnal result is boxed again.To exploit the information obtained from strictness analysis,a transformational framework has been presented in[13].In this framework,a function is transformed to a semantically equivalent version in which the strict arguments are explicitly evaluated by case expressions. Using the rules of the transformational framework in[13],the function plus is split into two functions called wrapper and worker.The types of the two functions are given below:wrapper function plus:int->intworker function plus#:int#->int#The wrapper function takes a boxed type(integer),extracts the unboxed value from the box and gives it to the worker function.The latter does the real work.It explicitly evaluates all strict arguments before passing them to functions.So far we have shown how the STG machine exploits the results of strictness analysis.In the following section,we will use the results of strictness analysis to decrease the number of closures built by let expressions.4.3Reducing the number of closures in the STG machineIn previous sections,we have considered thunk-lifting in connection with the FAST compiler, which is based on a variant of the G-machine.The goal of the transformation was decreasing the number of vap-nodes that are constructed by a program at run-time.The aim of such a transformation in the STG machine can be decreasing the number of closures that are con-structed in the heap at run-time.In the STG language,let expressions construct closures in the heap at run-time.Consider the function h1as deﬁned in Section2:>h1c d=(plus(square c)(square d)):NILThe non-optimised STG version of the function h1looks as follows:h1={}\n{c,d}->let ps={c,d}\u{}->let sc={c}\u{}->square{c}sd={d}\u{}->square{d}in plus{sc,sd}nil={}\n{}->Nil{}in Cons{ps,nil}The constructor function cons is not strict in its arguments,hence,closures for ps and for nil are constructed.If theﬁrst argument of cons is needed later,in evaluated form(head of the cons cell)the closure for ps must be entered.The code then builds closures for sc and sd. In cases such as this,the construction of the closures for sc and for sd can actually be avoided by thunk-lifting.The function plus is strict in its arguments and the arguments(square c)and (square d)can be evaluated explicitly using case expressions.It is possible to apply the principles of thunk-lifting in several different ways.Let us consider the same route which has been taken in the previous sections,in relation with the FAST compiler.This means that we have to transform the function h1and generate a new function as follows:>h1T c d=(plusT c d):NIL>plusT(x)(y)=plus(square x)(square y)The corresponding STG versions of these functions are as follows:h1T={}\n{c,d}->let ps’={c,d}\u{}->plusT{c,d}nil={}\n{}->Nil{}in Cons{ps’,nil}plusT={}\n{c,d}->case c ofMkInt c#->case d ofMkInt d#->case(square#c#)ofsc#->case(square#d#)ofsd#->case(plus#sc#sd#)ofps#->MkInt ps#The functions plus#and square#are the worker functions which take the unboxed and evaluated arguments.No closures are built for the expressions(square c)and(square d).They are evaluated explicitly by case expressions.Since the STG machine binds all top-level functions(globals)to a statically allocated closure, a new closure must be allocated for the new generated(global)function plusT.In addition,the generation of extra functions requires time.Now the question is:Is it possible to generate an optimised STG code,without generating new functions?The answer is yes.By simply unfolding the function plusT,which occurs in the body of the closure for ps’,we get the following optimised STG version of the function h1:h1T={}\n{c,d}->let ps’={c,d}\u{}->case c ofMkInt c#->case d ofMkInt d#->case(square#c#)ofsc#->case(square#d#)ofsd#->case(plus#sc#sd#)ofps#->MkInt ps#nil={}\n{}->Nil{}in Cons{ps’,nil}When the closure for ps’is entered,a pointer to it is loaded into Node and then a jump is made to the code which begins with the evaluation of c.The important point is that the code can access the value of the free variables c and d via Node.Closures in the STG machine play the role of an environment via which the value of the free variables can be accessed.In the case of the FAST compiler,thunk-lifting has to generate new functions to access the value of the free variables occurring in the code.This is perhaps not the most natural way to apply thunk-lifting in the context of the STG-machine,but it does show the beneﬁts of the method.If one were to implement thunk-lifting in a STG based compiler,slightly different criteria and transformations would be used instead. 5ConclusionsThunk-lifting transforms a function application that contains as arguments further,nested, function applications into a new function application without nesting.The transformation folds function applications,which are selected on the basis of a set of conditions.The conditions take a number of properties of functions into account,such as the(non) strictness of the argument positions of the functions involved.Also the amount of space required to build suspended function applications and the number of arguments as well as the number of free variables in the expressions are taken into account by the conditions.This makes it possible to guarantee that thunk-lifting never increases the amount of heap space required by a program.On average,the transformed versions of a set of medium size benchmark programs require5%less heap space than the original versions,with a maximum of16%.Thunk-lifting may on the other hand increase the number of function calls,though in our experiments we have found such increases to occur rarely,and if they occur the effect is small. Thunk-lifting causes most of the transformed programs to run faster.Thunk-lifting is shown to be applicable to the G-machine as well as the STG-machine.These are both abstract machine designs underlying the implementation of many lazy functional languages.The transformation thus has a wide range of applicability. AcknowledgementsWe thank Marcel Beemster and the referees for their comments on a draft version of the paper.The FAST compiler represents joint work with Hugh Glaser and John Wild,which was supported by the Science and Engineering Research Council,UK,under grant No.GR/F35081, FAST:Functional programming for ArrayS of Transputers.References[1]ing divide and conquer for parallel geometric evaluation.PhD thesis,School ofComputer Studies,Univ.of Leeds,England,Sep1992.[2]J.Glas,R.F.H.Hofman,and W.G.Vree.Parallelization of branch-and-bound algorithmsin a functional programming environment.In H.Kuchen and R.Loogen,editors,4th Parallel implementation of functional languages,pages47–58,Aachen,Germany,Sep1992.Aachener Informatik-Berichte92-19,RWTH Aachen,Fachgruppe Informatik.[3]K.Hammond,G.L.Burn,and D.B.Howe.Spiking your caches.In K.Hammond andJ.T.O’Donnell,editors,Functional programming,pages58–68,Ayr,Scotland,Jul1993.Springer-Verlag,Berlin.[4]P.H.Hartel,H.W.Glaser,and J.M.Wild.On the beneﬁts of different analyses inthe compilation of functional languages.In H.W.Glaser and P.H.Hartel,editors,3rd Implementation of functional languages on parallel architectures,pages123–145,Southampton, England,Jun1991.CSTR91-07,Dept.of Electr.and Comp.Sci,Univ.of Southampton, England.[5]P.H.Hartel,H.W.Glaser,and pilation of functional languages usingﬂowgraph analysis.Software—practice and experience,24(2):127–173,Feb1994.[6]P.H.Hartel and ngendoen.Benchmarking implementations of lazy functionallanguages.In6th Functional programming languages and computer architecture,pages341–349,Copenhagen,Denmark,Jun1993.ACM.[7]P.H.Hartel and W.G.Vree.Arrays in a lazy functional language–a case study:the fastFourier transform.In G.Hains and L.M.R.Mullin,editors,2nd Arrays,functional languages, and parallel systems(ATABLE),pages52–66.Publication841,Dept.d’informatique et de recherche op´e rationelle,Univ.de Montr´e al,Canada,Jun1992.[8]piling lazy functional languages.PhD thesis,Dept.of Comp.Sci,ChalmersUniv.of Technology,G¨oteborg,Sweden,1987.[9]ngendoen and P.H.Hartel.FCG:a code generator for lazy functional languages.InU.Kastens and P.Pfahler,editors,Compiler construction(CC92),LNCS641,pages278–296, Paderborn,Germany,Oct1992.Springer-Verlag,Berlin.[10]H.L.Muller.Simulating computer architectures.PhD thesis,Dept.of Comp.Sys,Univ.ofAmsterdam,Feb1993.[11]S.L.Peyton Jones.The implementation of functional programming languages.Prentice Hall,Englewood Cliffs,New Jersey,1987.[12]S.L.Peyton Jones.Implementing lazy functional languages on stock hardware:thespineless tagless G-machine.Journal functional programming,2(2):127–202,Apr1992. [13]S.L.Peyton Jones and unchbury.Unboxed values asﬁrst class citizens in a non-strictfunctional language.In R.J.M.Hughes,editor,5th Functional programming languages and computer architecture,LNCS523,pages636–666,Cambridge,Massachusetts,Sep1991.Springer-Verlag,Berlin.[14]Q.F.Stout.Supporting divide-and-conquer algorithms for image processing.Journalparallel and distributed computing,4(1):95–115,Feb1987.[15]D.A.Turner.Miranda system manual.Research Software Ltd,23St Augustines Road,Canterbury,Kent CT11XP,England,Apr1990.[16]W.G.Vree.Design considerations for a parallel reduction machine.PhD thesis,Dept.of Comp.Sys,Univ.of Amsterdam,Dec1989.[17]H.H.Wang.A parallel method for tri-diagonal equations.ACM transactions on mathematicalsoftware,7(2):170–183,Jun1981.。