Java Runtime Systems Characterization and Architectural Implications

合集下载

Java运行参数设置

Java运⾏参数设置1.概述Java⽀持的运⾏参数包括如下⼏种：标准参数（-）：所有的JVM实现都必须实现这些参数的功能，⽽且向后兼容；⾮标准参数（-X）：默认jvm实现这些参数的功能，但是并不保证所有jvm实现都满⾜，且不保证向后兼容；⾮Stable参数（-XX）：此类参数各个jvm实现会有所不同，将来可能会随时取消，需要慎重使⽤；2. 标准参数标准参数⼜可以分为如下⼏种：运⾏模式相关的，如-server,-client，类，jar路径相关的,如-cp,-classpath运⾏调试相关的，如断⾔开关（-ea和-da），-verbose系列（-verbose:class，–verbose:gc等）设置系统变量的-D参数。

2.1 运⾏模式相关的关于-client 与-server :JVM⼯作在Server模式可以⼤⼤提⾼性能，但应⽤的启动会⽐client模式慢⼤概10%。

当该参数不指定时，虚拟机启动检测主机是否为服务器，如果是，则以Server模式启动，否则以client模式启动。

Client模式启动速度较快，Server模式启动较慢；但是启动进⼊稳定期长期运⾏之后Server模式的程序运⾏速度⽐Client要快很多。

这是因为Server模式启动的JVM采⽤的是重量级的虚拟机，对程序采⽤了更多的优化；⽽Client模式启动的JVM采⽤的是轻量级的虚拟机。

所以Server启动慢，但稳定后速度⽐Client远远要快。

2.2 类，jar路径相关的-cp ：⽬录和 zip/jar ⽂件的类搜索路径-classpath: ⽬录和 zip/jar ⽂件的类搜索路径。

⽤ : 分隔的⽬录, JAR 档案和 ZIP 档案列表, ⽤于搜索类⽂件。

2.3 运⾏调试相关的（1）verbose-verbose:class在程序运⾏的时候究竟会有多少类被加载呢，⼀个简单程序会加载上百个类的！你可以⽤verbose:class来监视，在命令⾏输⼊java -verbose:class XXX (XXX为程序名)你会在控制台看到加载的类的情况。

Java异常处理运行时异常（RuntimeException）详解及实例

Java异常处理运⾏时异常（RuntimeException）详解及实例Java异常处理运⾏时异常（RuntimeException）详解及实例RuntimeExceptionRunntimeException的⼦类：ClassCastException多态中，可以使⽤Instanceof 判断，进⾏规避ArithmeticException进⾏if判断，如果除数为0，进⾏returnNullPointerException进⾏if判断，是否为nullArrayIndexOutOfBoundsException使⽤数组length属性，避免越界这些异常时可以通过程序员的良好编程习惯进⾏避免的1：遇到运⾏时异常⽆需进⾏处理，直接找到出现问题的代码，进⾏规避。

2：就像⼈上⽕⼀样⽛疼⼀样，找到原因，⾃⾏解决即可3：该种异常编译器不会检查程序员是否处理该异常4：如果是运⾏时异常，那么没有必要在函数上进⾏声明。

案例1：除法运算功能（div(int x,int y)）2：if判断如果除数为0，throw new ArithmeticException();3：函数声明throws ArithmeticException4：main⽅法调⽤div,不进⾏处理5：编译通过，运⾏正常6：如果除数为0，报异常，程序停⽌。

7：如果是运⾏时异常，那么没有必要在函数上进⾏声明。

1：Object类中的wait()⽅法，内部throw了2个异常 IllegalMonitorStateException InterruptedException1：只声明了⼀个(throws) IllegalMonitorStateException是运⾏是异常没有声明。

class Demo{public static void main(String[] args){div(2, 1);}public static void div(int x, int y) {if (y == 0) {throw new ArithmeticException();}System.out.println(x / y);}}感谢阅读，希望能帮助到⼤家，谢谢⼤家对本站的⽀持！。

java runtimeexception的触发方式 -回复

java runtimeexception的触发方式-回复在Java编程中，异常处理是相当重要的一部分。

Java中的所有异常都是从ng.Throwable类派生出来的，它有两个子类：Error和Exception。

其中，Error通常表示系统错误或资源耗尽等严重问题，而Exception则代表程序运行时出现的可恢复的问题。

RuntimeException就是Exception的一个直接子类，它是一个未检查异常，意味着编译器不会强制要求我们捕获此类异常。

那么，什么是RuntimeException？它是如何触发的呢？这篇文章将一步一步地为你解答这些问题。

一、什么是RuntimeException？在Java语言中，RuntimeException是一种特殊的异常类型，它包含了所有可能在Java虚拟机上抛出的“正常”错误情况。

这些异常通常是由于程序员的编程错误导致的，比如除数为零、数组下标越界、空指针引用等。

二、为什么需要了解RuntimeException？虽然RuntimeExceotion是由程序员的编程错误引起的，但是我们不能完全避免它们的发生。

因此，了解并熟悉如何处理这些异常是非常重要的。

此外，掌握如何触发RuntimeException也能帮助我们在编写代码时更好地避免这类问题。

三、如何触发RuntimeException？1. 空指针异常（NullPointerException）这是最常见的RuntimeException之一。

当试图访问一个null对象的属性或者调用其方法时，就会引发这个异常。

例如：javaString str = null;System.out.println(str.length());上述代码中，str是null，但我们试图调用它的length()方法，这就会抛出NullPointerException。

2. 数组下标越界异常（ArrayIndexOutOfBoundsException）当我们试图访问数组的一个不存在的元素时，就会引发这个异常。

system在java中的用法

system在java中的用法在Java中，system是一个常用的类，它包含了许多有用的方法和属性，可以用于在程序中获取信息、执行任务等。

在本文中，我们将详细介绍system在Java中的用法及其重要性。

一、system类的重要性Java中的system类是一个非常重要的类，它提供的方法和属性可以让我们获取关于程序运行环境的信息，也可以让我们执行与系统相关的任务。

system类的重要性在于它的一些方法和属性可以让我们写出更加灵活、智能的程序，这样可以提高程序的效率和性能。

二、system类的常用方法1. System.currentTimeMillis(): 这个方法可以获取当前系统时间的毫秒数，是非常常用的一个方法，可以用于计时、记录程序运行时间等方面。

2. System.out.println(): 这是一个非常常用的方法，可以将参数输出到控制台中，常常用于调试和打印程序结果。

3. System.exit(int status): 这个方法可以强制结束程序的运行，其中status表示程序结束时返回的状态码，常常用于处理异常或结束程序。

4. System.getProperty(String key): 这个方法可以获取系统属性，例如获取操作系统的类型、获取Java版本等。

5. System.in.read(): 这个方法可以从标准输入中读取一个字节的数据，通常用于从控制台中读取用户输入。

三、实例演示下面是一个简单的程序演示了system类的一些用法，程序将获取当前系统时间并输出到控制台中：```javaimport java.util.Date;public class SystemDemo {public static void main(String[] args) {long now = System.currentTimeMillis();Date date = new Date(now);System.out.println("当前系统时间为：" + date);}}```通过这个程序，我们可以看到system类的用法非常简单，而且也非常实用。

java中文翻译 -回复

java中文翻译-回复Java中文翻译Java作为一种广泛使用的编程语言，其独特性和强大的功能使其成为开发人员和企业的首选。

然而，对于初学者来说，理解和掌握Java可能并不容易。

本文将以中括号内的内容为主题，为您一步一步回答有关Java中文翻译的问题，帮助您更好地掌握这门语言。

Java中文翻译的第一步是了解Java程序中各种元素的中文名称。

下面是一些常见的Java关键词和常用术语的中文对照表：1. 源代码（source code）：程序员编写的Java代码。

2. 编译（compile）：将源代码翻译成可以被计算机执行的机器语言代码。

3. 编译器（compiler）：用于将源代码转换成可执行文件的软件工具。

4. 运行时（runtime）：程序在计算机上执行的时期。

5. 类（class）：Java程序的基本单元，包含了数据和方法。

6. 对象（object）：具体的实例，是类的实体化。

7. 方法（method）：类中定义的用于执行特定任务的函数。

8. 变量（variable）：用于存储数据的内存区域。

9. 参数（parameter）：方法接收的输入值。

10. 返回值（return value）：方法执行后返回给调用方的值。

在掌握了Java编程中常见术语的中文对照后，可以开始编写代码并进行中文翻译。

Java中文翻译的第二步是翻译代码中的注释和标识符。

注释是对代码的解释和说明，通常用于提高代码的可读性。

标识符是用于标识变量、方法和类等元素的名称。

以下是一些常见的注释和标识符的中文译名建议：1. 单行注释（single line comment）：对单行代码进行注释，使用“”符号表示。

2. 多行注释（multi-line comment）：对多行代码进行注释，使用“/* ...*/”符号包裹注释内容。

3. 标识符规则（identifier rules）：Java中标识符的命名规则，例如变量命名应以字母开头，后续可以是字母、数字或下划线。

自定义全局异常处理器（Java）

⾃定义全局异常处理器（Java）正常业务系统中，当前后端分离时，系统即使有未知异常，也要保证接⼝能返回错误提⽰，也需要根据业务规则制定相应的异常状态码和异常提⽰。

所以需要⼀个全局异常处理器。

相关代码：异常下⾯是 Java 异常继承图：┌───────────┐│ Object │└───────────┘▲│┌───────────┐│ Throwable │└───────────┘▲┌─────────┴─────────┐││┌───────────┐┌───────────┐│ Error ││ Exception │└───────────┘└───────────┘▲▲┌───────┘┌────┴──────────┐│││┌─────────────────┐┌─────────────────┐┌───────────┐│OutOfMemoryError │... │RuntimeException ││IOException│...└─────────────────┘└─────────────────┘└───────────┘▲┌───────────┴─────────────┐││┌─────────────────────┐┌─────────────────────────┐│NullPointerException ││IllegalArgumentException │...└─────────────────────┘└─────────────────────────┘根据编译时是否需要捕获，异常可以分为两类：1、写代码时，编译器规定必须捕获的异常，不捕获将报错；2、（抛出后）不必须捕获的异常，编译器对此类异常不做处理。

必须捕获的异常：Exception 以及 Exception 除去 RuntimeException 的⼦类。

不必须捕获的异常：Error 以及 Error 的⼦类；RuntimeException 以及 RuntimeException 的⼦类。

JAVA常见的运行时异常总结

public class Test { public static void main(String[] args) { int[] a = {0,1,2,3}; System.out.println(a[4]); } } 异常信息如下： Exception in thread "main" ng.ArrayIndexOutOfBoundsException: 4 at com.darkmi.basic.Test.main(Test.java:7) ArithmeticExecption：算术异常类：示例： package com.darkmi.basic; public class Test { public static void main(String[] args) { int a = 10 / 0; System.out.println(a); } } 异常信息如下： Exception in thread "main" ng.ArithmeticException: / by zero at com.darkmi.basic.Test.main(Test.java:6) ClassCastException：类型强制转换异常示例： package com.dark0 次发布时间: 2011-01-04 12:29:21 发布人: wanzhuanIT
来源: 网络转载总结了几个 JAVA 中常见的 RuntimeException： NullPointerException：空指针异常类示例： package com.darkmi.basic; public class Test { public static void main(String[] args) { System.out.println(toUpper(null)); } public static String toUpper(String str){ return str.toUpperCase(); } } 异常信息如下： Exception in thread "main" ng.NullPointerException at com.darkmi.basic.Test.toUpper(Test.java:11) at com.darkmi.basic.Test.main(Test.java:6) ArrayIndexOutOfBoundsException：数组下标越界异常示例： package com.darkmi.basic;

Java常见异常（RuntimeException）详细介绍并总结

Java常见异常（RuntimeException）详细介绍并总结本⽂重在Java中异常机制的⼀些概念。

写本⽂的⽬的在于⽅便我很长时间后若是忘了这些东西可以通过这篇⽂章迅速回忆起来。

1. 异常机制1.1 异常机制是指当程序出现错误后，程序如何处理。

具体来说，异常机制提供了程序退出的安全通道。

当出现错误后，程序执⾏的流程发⽣改变，程序的控制权转移到异常处理器。

1.2 传统的处理异常的办法是，函数返回⼀个特殊的结果来表⽰出现异常（通常这个特殊结果是⼤家约定俗称的），调⽤该函数的程序负责检查并分析函数返回的结果。

这样做有如下的弊端：例如函数返回-1代表出现异常，但是如果函数确实要返回-1这个正确的值时就会出现混淆；可读性降低，将程序代码与处理异常的代码混爹在⼀起；由调⽤函数的程序来分析错误，这就要求客户程序员对库函数有很深的了解。

1.3 异常处理的流程1.3.1 遇到错误，⽅法⽴即结束，并不返回⼀个值；同时，抛出⼀个异常对象1.3.2 调⽤该⽅法的程序也不会继续执⾏下去，⽽是搜索⼀个可以处理该异常的异常处理器，并执⾏其中的代码2 异常的分类2.1 异常的分类2.1.1 异常的继承结构：基类为Throwable，Error和Exception继承Throwable，RuntimeException和IOException等继承Exception，具体的RuntimeException继承RuntimeException。

2.1.2 Error和RuntimeException及其⼦类成为未检查异常（unchecked），其它异常成为已检查异常（checked）。

2.2 每个类型的异常的特点2.2.1 Error体系 Error类体系描述了Java运⾏系统中的内部错误以及资源耗尽的情形。

应⽤程序不应该抛出这种类型的对象（⼀般是由虚拟机抛出）。

如果出现这种错误，除了尽⼒使程序安全退出外，在其他⽅⾯是⽆能为⼒的。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Java Runtime Systems: Characterization and Architectural ImplicationsRamesh Radhakrishnan,Member,IEEE,N.Vijaykrishnan,Member,IEEE, Lizy Kurian John,Senior Member,IEEE,Anand Sivasubramaniam,Member,IEEE,Juan Rubio,Member,IEEE,and Jyotsna SabarinathanAbstractÐThe Java Virtual Machine(JVM)is the cornerstone of Java technology and its efficiency in executing the portable Java bytecodes is crucial for the success of this technology.Interpretation,Just-In-Time(JIT)compilation,and hardware realization are well-known solutions for a JVM and previous research has proposed optimizations for each of these techniques.However,each technique has its pros and cons and may not be uniformly attractive for all hardware platforms.Instead,an understanding of the architectural implications of JVM implementations with real applications can be crucial to the development of enabling technologies for efficient Java runtime system development on a wide range of platforms.Toward this goal,this paper examines architectural issues from both the hardware and JVM implementation perspectives.The paper starts by identifying the important execution characteristics of Javaapplications from a bytecode perspective.It then explores the potential of a smart JIT compiler strategy that can dynamically interpret or compile based on associated costs and investigates the CPU and cache architectural support that would benefit JVMimplementations.We also study the available parallelism during the different execution modes using applications from the SPECjvm98 benchmarks.At the bytecode level,it is observed that less than45out of the256bytecodes constitute90percent of the dynamic bytecode stream.Method sizes fall into a trinodal distribution with peaks of1,9,and26bytecodes across all benchmarks.Thearchitectural issues explored in this study show that,when Java applications are executed with a JIT compiler,selective translation using good heuristics can improve performance,but the saving is only10-15percent at best.The instruction and data cacheperformance of Java applications are seen to be better than that of C/C++applications except in the case of data cache performance in the JIT mode.Write misses resulting from installation of JIT compiler output dominate the misses and deteriorate the data cacheperformance in JIT mode.A study on the available parallelism shows that Java programs executed using JIT compilers haveparallelism comparable to C/C++programs for small window sizes,but falls behind when the window size is increased.Java programs executed using the interpreter have very little parallelism due to the stack nature of the JVM instruction set,which is dominant in the interpreted execution mode.In addition,this work gives revealing insights and architectural proposals for designing an efficient Java runtime system.Index TermsÐJava,Java bytecodes,CPU and cache architectures,ILP,performance evaluation,benchmarking.æ1I NTRODUCTIONT HE Java Virtual Machine(JVM)[1]is the cornerstone of Java technology,epitomizing theªwrite-once run-any-whereºpromise.It is expected that this enabling technology will make it a lot easier to develop portable software and standardized interfaces that span a spectrum of hardware platforms.The envisioned underlying platforms for this technology include powerful(resource-rich)servers,net-work-based and personal computers,together with resource-constrained environments such as hand-held devices,specialized hardware/embedded systems,and even household appliances.If this technology is to succeed,it is important that the JVM provide an efficient execution/ runtime environment across these diverse hardware plat-forms.This paper examines different architectural issues, from both the hardware and JVM implementation perspec-tives,toward this goal.Applications in Java are compiled into the bytecode format to execute in the Java Virtual Machine(JVM).The core of the JVM implementation is the execution engine that executes the bytecodes.This can be implemented in four different ways:1.An interpreter is a software emulation of the virtualmachine.It uses a loop which fetches,decodes,andexecutes the bytecodes until the program ends.Dueto the software emulation,the Java interpreter has anadditional overhead and executes more instructionsthan just the bytecodes.2.A Just-in-time(JIT)compiler is an execution modelwhich tries to speed up the execution of interpretedprograms.It compiles a Java method into nativeinstructions on the fly and caches the nativesequence.On future references to the same method,the cached native method can be executed directlywithout the need for interpretation.JIT compilers.R.Radhakrishnan,L.K.John,and J.Rubio are with the Laboratory forComputer Architecture,Department of Electrical and Computer Engineer-ing,University of Texas at Austin,Austin,TX78712.E-mail:{radhakri,ljohn,jrubio}@..N.Vijaykrishnan and A.Sivasubramaniam are with the Department ofComputer Science and Engineering,220Pond Lab.,Pennsylvania State University,University Park,PA16802.E-mail:{vijay,anand}@..J.Sabarinathan is with the Motorola Somerset Design Center,6263McNeil Dr.#1112,Austin,TX78829.E-mail:jyotsna@.Manuscript received28Apr.2000;revised16Oct.2000;accepted31Oct.2000.For information on obtaining reprints of this article,please send e-mail to:tc@,and reference IEEECS Log Number112014.0018-9340/01/$10.00ß2001IEEEhave been released by many vendors,like IBM[2],Symantec[3],and piling duringprogram execution,however,inhibits aggressiveoptimizations because compilation must only incura small overhead.Another disadvantage of JITcompilers is the two to three times increase in theobject code,which becomes critical in memoryconstrained embedded systems.There are manyongoing projects in developing JIT compilers thataim to achieve C++-like performance,such asCACAO[4].3.Off-line bytecode compilers can be classified intotwo types:those that generate native code and thosethat generate an intermediate language like C.Harissa[5],TowerJ[6],and Toba[7]are compilersthat generate C code from bytecodes.The choice of Cas the target language permits the reuse of extensivecompilation technology available in different plat-forms to generate the native code.In bytecodecompilers that generate native code directly,likeNET[8]and Marmot[9],portability becomesextremely difficult.In general,only applications thatoperate in a homogeneous environment and thosethat undergo infrequent changes benefit from thistype of execution.4.A Java processor is an execution model thatimplements the JVM directly on silicon.It not onlyavoids the overhead of translation of the bytecodesto another processor's native language,but alsoprovides support for Java runtime features.It can beoptimized to deliver much better performance than ageneral purpose processor for Java applications byproviding special support for stack processing,multithreading,garbage collection,object addres-sing,and symbolic resolution.Java processors can becost-effective to design and deploy in a wide rangeof embedded applications,such as telephony andweb tops.The picoJava[10]processor from SunMicrosystems is an example of a Java processor.It is our belief that no one technique will be universally preferred/accepted over all platforms in the immediate future.Many previous studies[11],[12],[13],[10],[14]have focused on enhancing each of the bytecode execution techniques.On the other hand,a three-pronged attack at optimizing the runtime system of all techniques would be even more valuable.Many of the proposals for improve-ments with one technique may be applicable to the others as well.For instance,an improvement in the synchronization mechanism could be useful for an interpreted or JIT mode of execution.Proposals to improve the locality behavior of Java execution could be useful in the design of Java processors,as well as in the runtime environment on general purpose processors.Finally,this three-pronged strategy can also help us design environments that efficiently and seamlessly combine the different techniques wherever possible.A first step toward this three-pronged approach is to gain an understanding of the execution characteristics of different Java runtime systems for real applications.Such a study can help us evaluate the pros and cons of the different runtime systems(helping us selectively use what works best in a given environment),isolate architectural and runtime bottlenecks in the execution to identify the scope for potential improvement,and derive design enhance-ments that can improve performance in a given setting.This study embarks on this ambitious goal,specifically trying to answer the following questions:.Do the characteristics seen at the bytecode level favor any particular runtime implementation?Howcan we use the characteristics identified at thebytecode level to implement more efficient runtimeimplementations?.Where does the time go in a JIT-based execution(i.e., in translation to native code or in executing thetranslated code)?Can we use a hybrid JIT-inter-preter technique that can do even better?If so,whatis the best we can hope to save from such a hybridtechnique?.What are the execution characteristics when execut-ing Java programs(using an interpreter or JITcompiler)on general-purpose CPU(such as theSPARC)?Are these different from those for tradi-tional C/C++programs?Based on such a study,canwe suggest architectural support in the CPU(eithergeneral-purpose or a specialized Java processor)thatcan enhance Java executions?To our knowledge,there has been no prior effort that has extensively studied all these issues in a unified framework for Java programs.This paper sets out to answer some of the above questions using applications drawn from the SPECjvm98[15]benchmarks,available JVM implementa-tions such as JDK1.1.6[16]and Kaffe VM0.9.2[17],and simulation/profiling tools on the Shade[18]environment. All the experiments have been conducted on Sun Ultra-SPARC machines running SunOS5.6.1.1Related WorkStudies characterizing Java workloads and performance analysis of Java applications are becoming increasingly important and relevant as Java increases in popularity,both as a language and software development platform.A detailed characterization of the JVM workload for the UltraSparc platform was done in[19]by Barisone et al.The study included a bytecode profile of the SPECjvm98 benchmarks,characterizing the types of bytecodes present and its frequency distribution.In this paper,we start with such a study and extend it to characterize other metrics, such as locality and method sizes,as they impact the performance of the runtime environment very strongly. Barisone et e the profile information collected from the interpreter and JIT execution modes as an input to a mathematical model of a RISC architecture to suggest architectural support for Java workloads.Our study uses a detailed superscalar processor simulator and also includes studies on available parallelism to understand the support required in current and future wide-issue processors. Romer et al.[20]studied the performance of interpreters and concluded that no special hardware support is needed for increased performance.Hsieh et al.[21]studied the cache and branch performance of interpreted Java code,C/C++version of the Java code,and native code generated by Caffine (a bytecode to native code compiler)[22].They attribute the inefficient use of the microarchitectural resources by the interpreter as a significant performance penalty and suggest that an offline bytecode to native code translator is a more efficient Java execution model.Our work differs from these studies in two important ways.First,we include a JIT compiler in this study which is the most commonly used execution model presently.Second,the benchmarks used in our study are large real world applications,while the above-mentioned study uses microbenchmarks due to the unavailability of a Java benchmark suite at the time of their study.We see that the characteristics of the application used affects favor different execution modes and,therefore,the choice of benchmarks used is important.Other studies have explored possibilities of improving performance of the Java runtime system by understand-ing the bottlenecks in the runtime environment and ways to eliminate them.Some of these studies try to improve the performance through better synchronization mechan-isms [23],[24],[25],more efficient garbage collection techniques [26],and understanding the memory referen-cing behavior of Java applications [27],etc.Improving the runtime system,tuning the architecture to better execute Java workloads and better compiler/interpreter perfor-mance are all equally important to achieve efficient performance for Java applications.The rest of this paper is organized as follows:The next section gives details on the experimental platform.In Section 3,the bytecode characteristics of the SPECjvm98are presented.Section 4examines the relative performance of JIT and interpreter modes and explores the benefits of a hybrid strategy.Section 5investigates some of the questions raised earlier with respect to the CPU and cache architec-tures.Section 6collates the implications and inferences that can be drawn from this study.Finally,Section 7summarizes the contributions of this work and outlines directions for future research.2E XPERIMENTAL P LATFORMWe use the SPECjvm98benchmark suite to study the architectural implications of a Java runtime environment.The SPECjvm98benchmark suite consists of seven Java programs which represent different classes of Java applica-tions.The benchmark programs can be run using three different inputs,which are named s100,s10,and s1.Theseproblem sizes do not scale linearly,as the naming suggests.We use the s1input set to present the results in this paper and the effects of larger data sets,s10and s100,has also been investigated.The increased method reuse with larger data sets results in increased code locality,reduced time spent in compilation as compared to execution,and other such issues as can be expected.The benchmarks are run at the command line prompt and do not include graphics,AWT (graphical interfaces),or networking.A description of the benchmarks is given in Table 1.All benchmarks except mtrt are single-threaded.Java is used to build applications that span a wide range,which includes applets at the lower end to server-side applications on the high end.The observations cited in this paper hold for those subsets of applications which are similar to the SPECjvm98bench-marks when run with the dataset used in this study.Two popular JVM implementations have been used in this study:the Sun JDK 1.1.6[16]and Kaffe VM 0.9.2[17].Both these JVM implementations support the JIT and interpreted mode.Since the source code for the Kaffe VM compiler was available,we could instrument it to obtain the behavior of the translation routines for the JIT mode in detail.Some of the data presented in Sections 4and 5are obtained from the instrumented translate routines in Kaffee.The results using Sun's JDK are presented for the other sections and only differences,if any,from the KaffeVM environment are mentioned.The use of two runtime implementations also gives us more confidence in our results,filtering out any noise due to the implementation details.To capture architectural interactions,we have obtained traces using the Shade binary instrumentation tool [18]while running the benchmarks under different execution modes.Our cache simulations use the cachesim5simulators available in the Shade suite,while branch predictors have been developed in-house.The instruction level parallelism studies are performed utilizing a cycle-accurate superscalar processor simulator This simulator can be configured to a variety of out-of-order multiple issue configurations with desired cache and branch predictors.3C HARACTERISTICSAT THEB YTECODE L EVELWe characterize bytecode instruction mix,bytecode locality,method locality,etc.in order to understand the benchmarks at the bytecode level.The first characteristic we examine is the bytecode instruction mix of the JVM,which is a stack-oriented architecture.To simplify the discussion,weRADHAKRISHNAN ET AL.:JAVA RUNTIME SYSTEMS:CHARACTERIZATION ANDARCHITECTURAL IMPLICATIONS 133TABLE 1Description of the SPECjvm98Benchmarksclassify the instructions into different types based on their inherent functionality,as shown in Table 2.Table 3shows the resulting instruction mix for the SPECjvm98benchmark suite.The total bytecode count ranges from 2million for db to approximately a billion for compress .Most of the benchmarks show similar distribu-tions for the different instruction types.Load instructions outnumber the rest,accounting for 35.5percent of the total number of bytecodes executed on the average.Constant pool and method call bytecodes come next with average frequen-cies of 21percent and 11percent,respectively.From an architectural point of view,this implies that transferring data elements to and from the memory space allocated for local variables and the Java stack paring this with the benchmark 126.gcc from the SPEC CPU95suite,which has roughly 25percent of memory access operations when run on a SPARC V.9architecture,it can be seen that the JVM places greater stress on the memory system.Consequently,we expect that techniques such as instruction folding proposed in [28]for Java processors and instructioncombining proposed in [29]for JIT compilers can improve the overall performance of Java applications.The second characteristic we examine is the dynamic size of a method.1Invoking methods in Java is expensive as it requires the setting up of an execution environment and a new stack for each new method [1].Fig.1shows the method sizes for the different benchmarks.A trinodal distribution is observed,where most of the methods are either 1,9,or 26bytecodes long.This seems to be a characteristic of the runtime environment itself (and not of any particular application)and can be attributed to a frequently used library.However,the existence of single bytecode methods indicates the presence of wrapper methods to implement specific features of the Java language like private and protected methods or interfaces .These methods consist of a control transfer instruction which transfers control to an appropriate routine.Further analysis of the traces shows that a few unique bytecodes constitute the bulk of the dynamic bytecode134IEEE TRANSACTIONS ON COMPUTERS,VOL.50,NO.2,FEBRUARY 2001TABLE 2Classification ofBytecodesTABLE 3Dynamic Instruction Mix at the BytecodeLevel1.A java method is equivalent to a ªfunctionºor ªprocedureºin a procedural language like C.stream.In most benchmarks,fewer than 45distinct bytecodes constitute 90percent of the executed bytecodes and fewer than 33bytecodes constitute 80percent of the executed bytecodes (Table 4).It is observed that memory access and memory allocation-related bytecodes dominate the bytecode stream of all the benchmarks.This also suggests that if the instruction cache can hold the JVM interpreter code corresponding to these bytecodes (i.e.,all the cases of the switch statement in the interpreter loop),the cache performance will be better.Table 5presents the number of unique methods and the frequency of calls to those methods.The number of methods and the dynamic calls are obtained at runtime by dynamically profiling the application.Hence,only methods that execute at least once have been counted.Table 5also shows that the static size of the benchmarks remain constant across the different data sets (since the number of unique methods does not vary),although the dynamic instruction count increases for the bigger data sets (due to increased method calls).The number of unique calls has an impact on the number of indirect call sites present in the application.Looking at the three data sets,we see that there is very little difference in the number of methods across data sets.Another bytecode characteristic we look at is the method reuse factor for the different data sets.The method reuse factor can be defined as the ratio of method calls to number of methods visited at least once.It indicates the locality of methods.The method reuse factor is presented in Table 6.The performance benefits that can be obtained from using a JIT compiler are directly proportional to the method reuse factor since the cost of compilation is amortized over multiple calls in JIT execution.The higher number of method calls indicates that the method reuse in the benchmarks for larger data sets would be substantially more.This would then lead to better performance for the JITs (as observed in the next section).In Section 5,we show that the instruction count when the benchmarks are executed using a JIT compiler is much lower than when using an interpreter for the s100data set.Since there is higher method reuse in all benchmarks for the larger data sets,using a JIT results in better performance over an interpreter.The bytecode characteristics described in this section help in understanding some of the issues involved in the performance of the Java runtime system (presented in the remainder of the paper).4W HENORW HETHERTOT RANSLATEDynamic compilation has been popularly used [11],[30]to speed up Java executions.This approach avoids the costly interpretation of JVM bytecodes while sidestepping the issue of having to precompile all the routines that could ever be referenced (from both the feasibility and perfor-mance angles).Dynamic compilation techniques,however,pay the penalty of having the compilation/translation to native code falling in the critical path of program execution.Since this cost is expected to be high,it needs to be amortized over multiple executions of the translated code.Or else,performance can become worse than when the code is just interpreted.Knowing when to dynamically compile a method (using a JIT),or whether to compile at all,is extremely important for good performance.To our knowledge,there has not been any previous study that has examined this issue in depth in the context of Java programs,though thereRADHAKRISHNAN ETAL.:JAVA RUNTIME SYSTEMS:CHARACTERIZATION AND ARCHITECTURAL IMPLICATIONS 135Fig.1.Dynamic method size.TABLE 4Number of Distinct Bytecodes that Account for 80Percent,90Percent,and 100Percent of the Dynamic Instruction StreamTABLE 5Total Number ofMethod Calls (Dynamic)and Unique Methods for the Three Data Setshave been previous studies [13],[31],[12],[4]examining efficiency of the translation procedure and the translated code.Most of the currently available execution environ-ments,such as JDK 1.2[16]and Kaffe [17],employ limited heuristics to decide on when (or whether)to JIT.They typically translate a method on its first invocation,regardless of how long it takes to interpret/translate/execute the method and how many times the method is invoked.It is not clear if one could do better (with a smarter heuristic)than what many of these environments provide.We investigate these issues in this section using five SPECjvm98[15]benchmarks (together with a simple HelloWorld program 2)on the Kaffe environment.Fig.2shows the results for the different benchmarks.All execution times are normalized with respect to the execu-tion time taken by the JIT mode on Kaffe.On top of the JIT execution bar is given the ratio of the time taken by this mode to the time taken for interpreting the program using Kaffe VM.As expected (from the method reuse character-istics for the various benchmarks),we find that translating (JIT-ing)the invoked methods significantly outperforms interpreting the JVM bytecodes for the SPECjvm98.The first bar,which corresponds to execution time using the default JIT,is further broken down into two components,the total time taken to translate/compile the invoked methods and the time taken to execute these translated (native code)methods.The considered workloads span the spectrum,from those in which the translation times dominate,such as hello and db (because most of the methods are neither time consuming nor invoked numerous times),to those in which the native code execution dominates,such as compress and jack (where the cost of translation is amortized over numerous invocations).The JIT mode in Kaffe compiles a method to native code on its first invocation.We next investigate how well the smartest heuristic can do so that we compile only those methods that are time consuming (the translation/compila-tion cost is outweighed by the execution time)and interpret the remaining methods.This can tell us whether we should strive to develop a more intelligent selective compilation heuristic at all and,if so,what the performance benefit is that we can expect.Let us say that a method i takes s i time to interpret, i time to translate,and i i time to execute the translated code.Then,there exists a crossover point x i i a s i Ài i ,where it would be better to translate themethod if the number of times a method is invoked n i b x i and interpret it otherwise.We assume that an oracle supplies n i (the number of times a method is invoked)and x i (the ideal cut-off threshold for a method).If n i `x i ,we interpret all invocations of the method,and otherwise translate it on the very first invocation.The second bar in Fig.2for each application shows the performance with this oracle,which we shall call opt .It can be observed that there is very little difference between the naive heuristic used by Kaffe and opt for compress and jack since most of the time is spent in the execution of the actual code anyway (very little time in translation or interpretation).As the translation component gets larger (applications like db ,javac ,or hello ),the opt model suggests that some of the less time-consuming (or less frequently invoked)methods be inter-preted to lower the execution time.This results in a 10-15percent savings in execution time for these applica-tions.It is to be noted that the exact savings would definitely depend on the efficiency of the translation routines,the translated code execution and interpretation.The opt results give useful insights.Fig.2shows that,by improving the heuristic that is employed to decide on when/whether to JIT,one can at best hope to trim 10-15percent in the execution time.It must be observed that the 10-15percent gains observed can vary with the amount of method reuse and the degree of optimization that is used.For example,we observed that the translation time for the Kaffe JVM accounts for a smaller portion of overall execution time with larger data sets (7.5percent for the s10dataset (shown in Table 7)as opposed to the 32percent for the s1dataset).Hence,reducing the translation overhead will be of lesser importance when execution time dominates translation time.However,as more aggressive optimizations are used,the translation time can consume a significant portion of execution time for even larger datasets.For instance,the base configuration of the translator in IBM's Jalapeno VM [32]takes negligible translation time when using the s100data set for javac.However,with more aggressive optimizations,about 30percent of overall execution time is consumed in translation to ensure that the resulting code is executed much faster [32].Thus,there exists a trade-off between reducing the amount of time spent in optimizing the code and the amount of time spent in actually executing the optimized code.136IEEE TRANSACTIONS ON COMPUTERS,VOL.50,NO.2,FEBRUARY2001Fig.2.Dynamic compilation:How well can we do?2.While we do not make any major conclusions based on this simple program,it serves to observe the behavior of the JVM implementation while loading and resolving system classes during system initialization.TABLE 6Method Reuse Factor for the Different DataSets。