Java Memory Model

合集下载

Java内存模型-jsr133规范介绍

Java内存模型-jsr133规范介绍最近在看《深⼊理解Java虚拟机：JVM⾼级特性与最佳实践》讲到了线程相关的细节知识，⾥⾯讲述了关于java内存模型，也就是jsr 133定义的规范。

系统的看了jsr 133规范的前⾯⼏个章节的内容，觉得受益匪浅。

废话不说，简要的介绍⼀下java内存规范。

什么是内存规范在jsr-133中是这么定义的A memory model describes, given a program and an execution trace of that program, whetherthe execution trace is a legal execution of the program. For the Java programming language, thememory model works by examining each read in an execution trace and checking that the writeobserved by that read is valid according to certain rules.也就是说⼀个内存模型描述了⼀个给定的程序和和它的执⾏路径是否⼀个合法的执⾏路径。

对于java序⾔来说，内存模型通过考察在程序执⾏路径中每⼀个读操作，根据特定的规则，检查写操作对应的读操作是否能是有效的。

java内存模型只是定义了⼀个规范，具体的实现可以是根据实际情况⾃由实现的。

但是实现要满⾜java内存模型定义的规范。

处理器和内存的交互这个要感谢硅⼯业的发展，导致⽬前处理器的性能越来越强⼤。

⽬前市场上基本上都是多核处理器。

如何利⽤多核处理器执⾏程序的优势，使得程序性能得到极⼤的提升，是⽬前来说最重要的。

⽬前所有的运算都是处理器来执⾏的，我们在⼤学的时候就学习过⼀个基本概念程序 = 数据 + 算法，那么处理器负责计算，数据从哪⾥获取了？数据可以存放在处理器寄存器⾥⾯（⽬前x86处理都是基于寄存器架构的），处理器缓存⾥⾯，内存，磁盘，光驱等。

java jc机制

java jc机制【最新版】目录1.Java 内存模型与 JC 机制概述2.Java 内存模型的组成3.JC 机制的工作原理4.JC 机制在 Java 内存模型中的应用5.JC 机制的优缺点正文一、Java 内存模型与 JC 机制概述Java 内存模型（Java Memory Model，简称 JMM）是 Java 多线程编程中用于描述 Java 内存操作的一种规范。

JMM 定义了变量的可见性以及对变量的操作原子性。

为了更好地解决 Java 内存模型中的问题，提出了JC（Java Compiler）机制，它是一种编译时优化技术，可以有效地提高Java 程序的性能。

二、Java 内存模型的组成Java 内存模型主要由以下几个部分组成：1.主内存（Main Memory）：所有线程共享的内存区域。

2.工作内存（Working Memory）：每个线程独立的内存区域，用于存储该线程使用到的主内存中的变量副本。

3.堆（Heap）：用于存储对象实例的内存区域。

4.方法区（Method Area）：用于存储类和方法的元数据。

5.栈（Stack）：用于存储局部变量和方法调用的内存区域。

三、JC 机制的工作原理JC 机制通过对 Java 内存模型的优化，可以有效地提高程序性能。

它的工作原理主要包括以下几点：1.编译时检测：在编译时，JC 会对代码进行检测，找出可能存在内存模型问题的代码，如非原子操作等。

2.编译时优化：JC 会对代码进行优化，如将多次访问同一个变量的操作合并为一次，减少内存访问次数。

3.运行时优化：在程序运行时，JC 会对内存访问进行优化，如使用锁消除技术，减少锁的使用，提高程序性能。

四、JC 机制在 Java 内存模型中的应用JC 机制在 Java 内存模型中的应用主要体现在以下几个方面：1.保证变量的可见性：JC 机制可以确保在多线程环境下，一个线程对变量的修改可以被其他线程立即看到。

2.保证操作的原子性：JC 机制可以确保对变量的操作是原子的，即不会被其他线程中断。

java内存管理之内存模型

java内存管理之内存模型1，运⾏时数据区域1. 程序计数器（program counter register）2. Java虚拟机栈（jvm stack）3. 本地⽅法栈（native method stack）4. java堆（heap）5. ⽅法区（method area）6. 运⾏时常量池7. 直接内存1. 程序计数器（program counter register）1.1 概念程序计数器是⼀块较⼩的内存空间，可以把它看作当前线程正在执⾏的字节码的⾏号指⽰器。

也就是说，程序计数器⾥⾯记录的是当前线程正在执⾏的那⼀条字节码指令的地址。

注：如果当前线程正在执⾏的是⼀个本地⽅法，那么此时程序计数器为空。

（这⾥的本地⽅法即⾮java语⾔编写的⽅法）。

1.2 作⽤1）字节码解释器通过改变程序计数器来依次读取指令，从⽽实现代码的流程控制，如：顺序执⾏、选择、循环、异常处理。

2）在多线程的情况下，程序计数器⽤于记录当前线程执⾏的位置，从⽽当线程被切换回来的时候能够知道该线程上次运⾏到哪⼉了。

1.3 特点1）线程私有。

每条线程都有⼀个程序计数器。

2）是唯⼀不会出现OutOfMemoryError的内存区域。

（ng.OutOfMemoryError内存溢出，即说明jvm的内存不够⽤了）3）⽣命周期随着线程的创建⽽创建，随着线程的结束⽽死亡。

2. Java虚拟机栈（jvm stack）2.1 概念 Java虚拟机栈是描述Java⽅法运⾏过程的内存模型。

Java虚拟机栈会为每⼀个即将运⾏的Java⽅法创建⼀块叫做“栈帧”的区域，这块区域⽤于存储该⽅法在运⾏过程中所需要的⼀些信息，这些信息包括： 1）局部变量表（基本数据类型、引⽤类型、returnAddress类型的变量（此变量为JVM原始数据类型，在java语⾔中不存在对应类型）） 2）操作数栈 3）动态链接 4）⽅法出⼝信息等当⽅法在运⾏过程中需要创建局部变量时，就将局部变量的值存⼊栈帧的局部变量表中。

Java内存模型 Volatile与Synchronized

Java内存模型Volatile与Synchronized共享内存模型指的就是Java内存模型(简称JMM)，JMM决定一个线程对共享变量的写入时,能对另一个线程可见。

从抽象的角度来看，JMM定义了线程和主内存之间的抽象关系：线程之间的共享变量存储在主内存（main memory）中，每个线程都有一个私有的本地内存（local memory），本地内存中存储了该线程以读/写共享变量的副本。

本地内存是JMM的一个抽象概念，并不真实存在。

它涵盖了缓存，写缓冲区，寄存器以及其他的硬件和编译器优化。

从上图来看，线程A与线程B之间如要通信的话，必须要经历下面2个步骤：1.首先，线程A把本地内存A中更新过的共享变量刷新到主内存中去。

2.然后，线程B到主内存中去读取线程A之前已更新过的共享变量。

下面通过示意图来说明这两个步骤：如上图所示，本地内存A和B有主内存中共享前端要学好必须每天坚持学习。

为了方便大家的交流学习，也是创建了一个群每天都有分享学习方法和专业老师直播前端课程，这个扣裙首先是132 中间是667 最后是127 前端学习零基础想要学习的同学欢迎加入，如果只是凑热闹就不要来了!!!变量x的副本。

假设初始时，这三个内存中的x值都为0。

线程A在执行时，把更新后的x值（假设值为1）临时存放在自己的本地内存A中。

当线程A和线程B需要通信时，线程A首先会把自己本地内存中修改后的x值刷新到主内存中，此时主内存中的x值变为了1。

随后，线程B到主内存中去读取线程A更新后的x值，此时线程B 的本地内存的x值也变为了1。

从整体来看，这两个步骤实质上是线程A在向线程B发送消息，而且这个通信过程必须要经过主内存。

JMM通过控制主内存与每个线程的本地内存之间的交互，来为java程序员提供内存可见性保证。

总结：什么是Java内存模型：java内存模型简称jmm，定义了一个线程对另一个线程可见。

共享变量存放在主内存中，每个线程都有自己的本地内存，当多个线程同时访问一个数据的时候，可能本地内存没有及时刷新到主内存，所以就会发生线程安全问题。

java中的原子类实现原理

java中的原子类实现原理Java中的原子类是一种多线程并发编程的工具，可以实现线程安全的操作。

它能够确保在并发环境下，多个线程对共享变量进行操作时的正确性和一致性。

原子类的实现原理主要依赖于底层的硬件支持和CAS（Compare and Swap）操作。

CAS是一种无锁的原子操作，它由三个操作数组成：内存地址V、旧的预期值A和新的值B。

当且仅当V的值等于A时，将V的值设置为B，否则不做任何操作。

CAS操作是原子的，因为它使用底层硬件提供的原子指令来实现。

在Java中，原子类是通过内存模型（Memory Model）和CAS操作来实现的。

内存模型定义了多线程并发执行时，共享变量的可见性和操作的顺序性。

原子类利用了内存模型的特性，保证了多线程环境下的线程安全。

原子类的实现原理可以简单分为两个方面：内存模型和CAS操作。

内存模型定义了共享变量的可见性和操作的顺序性。

在多线程环境下，每个线程都有自己的工作内存，用于保存线程的局部变量和共享变量的副本。

当一个线程对共享变量进行修改时，必须将修改后的值刷新到主内存中，以便其他线程能够看到这个修改。

而当一个线程需要读取共享变量时，必须将主内存中的值加载到自己的工作内存中，以便操作。

CAS操作是一种无锁的原子操作，它使用底层硬件提供的原子指令来实现。

CAS操作包含了三个操作数：内存地址V、旧的预期值A和新的值B。

当且仅当V的值等于A时，将V的值设置为B，否则不做任何操作。

CAS操作是原子的，因为它使用了底层硬件提供的原子指令来保证操作的原子性。

在原子类的实现中，CAS操作是非常关键的。

原子类中的每个方法都是通过CAS操作来实现的。

当一个线程调用原子类的方法时，它首先会读取共享变量的值到自己的工作内存中，然后对这个值进行操作，最后使用CAS操作将修改后的值写回主内存。

如果CAS操作失败，说明其他线程已经修改了共享变量的值，当前线程需要重新读取共享变量的值，并重新进行操作。

java原理面试题

java原理面试题一、介绍Java是一种高级编程语言，由Sun Microsystems（现为Oracle Corporation）于1995年推出。

作为一种面向对象的语言，Java在很多领域都有广泛的应用。

本文将介绍Java的一些基本原理，并提供一些常见的Java原理面试题。

二、Java虚拟机（JVM）1. 什么是JVM？JVM全称为Java虚拟机（Java Virtual Machine），是Java程序的运行环境。

它将Java字节码（由Java编译器生成的中间代码）解释或编译成机器代码，使得Java程序能够在不同的操作系统和硬件平台上运行。

2. JVM的组成部分有哪些？JVM主要由以下几个组成部分构成：- 类加载器（ClassLoader）：负责将字节码加载到JVM中。

- 运行时数据区（Runtime Data Area）：包括方法区、堆、栈等。

- 执行引擎（Execution Engine）：解释或编译字节码进行执行。

- 本地方法接口（Native Interface）：与本地库进行交互的接口。

- 垃圾回收系统（Garbage Collection System）：用于自动回收无用的对象。

3. JVM的工作原理是什么？JVM首先通过类加载器将字节码加载到内存中的方法区中。

然后，它通过执行引擎解释或编译字节码，将其转换为对应平台的机器代码。

执行引擎将机器代码交给处理器执行，并将结果返回给JVM。

同时，垃圾回收系统负责回收无用的对象，释放内存空间。

三、Java内存管理1. 什么是Java内存模型？Java内存模型（Java Memory Model，JMM）定义了Java程序在内存中的存储方式和访问规则。

它规定了线程如何与主内存以及其他线程之间进行通信。

2. Java中的内存分区有哪几种？Java内存分为以下几个区域：- 方法区（Method Area）：用于存储类的结构信息和静态变量。

- 堆（Heap）：存储对象实例。

java 内存模型面试题

java 内存模型面试题Java 内存模型（Java Memory Model，简称JMM）是指Java虚拟机在多线程编程中处理内存操作的规范。

对于面试官提出的Java 内存模型面试题，我们将逐一探讨并提供合适的答案。

1. 什么是Java内存模型？Java内存模型是一种规范，描述了多线程环境下，Java虚拟机如何处理共享数据的可见性、有序性和原子性。

它定义了线程之间的通信方式，以及线程如何与主内存进行数据交互。

Java内存模型为程序员提供了一种强大的工具，确保线程安全和可靠性。

2. 请简要介绍Java内存模型的主要特性。

- 可见性（Visibility）：一个线程对共享变量的修改对其他线程可见。

- 有序性（Ordering）：程序按照代码的先后顺序执行。

- 原子性（Atomicity）：对于基本数据类型的读写操作是原子性的。

- Happens-Before关系：通过同步或者volatile变量的使用，确定两个操作的执行顺序。

3. 什么是可见性问题？如何解决可见性问题？可见性问题是指当一个线程对共享变量进行修改后，其他线程可能无法立即看到最新的修改值。

为了解决可见性问题，可以使用volatile关键字修饰共享变量，它保证了变量的修改对所有线程可见。

此外，通过使用锁（synchronized或Lock）也能保证可见性，因为锁的释放与获取会使得变量的修改对其他线程可见。

4. 什么是有序性问题？如何解决有序性问题？有序性问题是指程序指令按照代码的先后顺序执行的假象在多线程环境下不成立。

为了解决有序性问题，可以使用volatile关键字或者synchronized关键字，确保指令按照预期顺序执行。

此外，也可以使用concurrent包中提供的原子类（如AtomicInteger）来保证有序性。

5. 什么是原子性问题？如何解决原子性问题？原子性问题是指一个操作在执行过程中不能被中断，可以看作是一个不可分割的整体。

共享内存在Java中的实现和应用

共享内存在Java中的实现和应用共享内存是一种用于进程间通信的机制，它允许多个进程共享同一块内存区域。

在Java中，共享内存主要通过以下几种方式实现和应用：Java内存映射、并发集合类、Java共享数据模型和进程间通信。

首先，Java内存映射是Java提供的一种共享内存的机制。

通过Java 的NIO（New Input/Output）库中的MappedByteBuffer类，我们可以将文件或内存映射到内存中，并通过同一个文件或内存来实现进程之间的通信。

这种方式可以提高进程间通信的效率，并且方便地处理大量数据。

例如，一些进程可以将数据写入共享内存，而其他进程可以直接从共享内存中读取数据，避免了不必要的数据拷贝和通信开销。

其次，Java提供了一系列的并发集合类，如ConcurrentHashMap、ConcurrentLinkedQueue等，它们内部使用了共享内存实现多线程之间的安全访问。

这些集合类通过使用非阻塞算法和锁分离等技术，实现了高效地共享内存访问。

这样，多个线程可以同时读写共享内存，而无需显式地进行同步操作。

这种方式在并发编程中得到广泛应用，例如多线程的生产者-消费者模型、线程池等场景。

此外，Java还提供了一种共享数据模型，即Java内存模型（Java Memory Model，JMM）。

JMM定义了多线程之间如何共享数据的规范，通过使用volatile、synchronized等关键字来保证共享内存的可见性和一致性。

JMM提供了一种方便的方式来实现多线程之间的共享数据访问，使得开发者可以更容易地编写并发程序。

例如，多个线程可以通过共享变量来进行状态同步、线程间的通信等操作。

最后，Java中的进程间通信也可以使用共享内存来实现。

通过操作系统的底层API，Java可以创建共享内存区域，并在不同的进程之间共享该内存区域。

例如，Java提供了一种称为JNI（Java Native Interface）的机制，允许Java程序通过调用本地代码来访问操作系统的底层功能。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Impact of Java Memory Model on Out-of-Order Multiprocessors Tulika Mitra Abhik Roychoudhury Qinghua ShenSchool of ComputingNational University of Singapore{tulika,abhik,shenqing}@.sgAbstractThe semantics of Java multithreading dictates all pos-sible behaviors that a multithreaded Java program can ex-hibit on any platform.This is called the Java Memory Model (JMM)and describes the allowed reorderings among the memory operations in a thread.However,multiprocessor platforms traditionally have memory consistency models of their own.In this paper,we study the interaction between the JMM and the multiprocessor memory consistency mod-els.In particular,memory barriers may have to be inserted to ensure that the multiprocessor execution of a multi-threaded Java program respects the JMM.We study the im-pact of these additional memory barriers on program per-formance.Our experimental results indicate that the perfor-mance gain achieved by relaxed hardware memory consis-tency models far exceeds the performance degradation due to the introduction of JMM.1.IntroductionThe Java programming language supports shared mem-ory multithreading where threads can manipulate shared objects.The threads can execute on a single processor or a multiprocessor.However,multiprocessors have different memory consistency models.A memory consistency model deﬁnes the values that can be returned on the read of a shared variable,and thereby provides a model of execu-tion to the programmer.The most restrictive memory con-sistency model is sequential consistency[9]which does not allow any reordering of the memory operations within a thread.Other memory consistency models allow certain re-ordering of memory operations within a thread to improve performance as long as uniprocessor control/data dependen-cies are not violated(e.g.,a memory read can be re-ordered w.r.t.a memory write as long as they access two different locations).If the reordering within a thread becomes visi-ble to other threads,it may produce“undesirable”results from the programmer’s viewpoint.Therefore,for each plat-form,the programmer needs to explicitly disable certain re-orderings to produce the desired result.However,this kind of platform-dependent programming violates Java’s write once,run everywhere policy.To solve this problem,the Java Language Speciﬁcation provides its own software memory consistency model.This model should be supported by any implementation of multithreaded Java irrespective of the underlying hardware memory consistency model.This is called the Java Memory Model(henceforth called JMM).Conceptually,the JMM deﬁnes the set of all possible be-haviors that a multithreaded Java program can demonstrate on any implementation platform.These possible behaviors are generated by:(a)arbitrary interleaving of the operations from different threads,and(b)certain reordering of mem-ory operations within each thread.Note that only the re-ordering of shared memory operations can change a multi-threaded program’s output.The introduction of a memory model at the programming language level raises new challenges.In particular,if the threads are run on multiple processors,then we need to en-sure that all the executions respect the JMM.Clearly,if the hardware memory model allows more reorderings than the JMM,then we have to disable them in order to enforce the JMM.The reorderings are disabled by inserting memory barrier instructions(processors execute operations across the barriers in program order)at the cost of degrading per-formance.As a result,the performance of a multithreaded Java program depends on the subtle interaction of the soft-ware and hardware memory models.Even though the rela-tive performance of different hardware memory models is well understood[4,14],there has been no effort to quantify the performance impact of the interaction between the soft-ware and hardware memory models.We seek toﬁll this gap. We study the effect of enforcing JMM onﬁve different hard-ware memory models:Sequential Consistency(SC),Total Store Order(TSO),Partial Store Order(PSO),Weak Order-ing(WO),and Release Consistency(RC)[2].We note that the speciﬁcation of the JMM is currently a topic of intense discussion[8].An initial JMM was pro-posed in the Java Language Speciﬁcation[6].Subsequently,this was found to be restrictive in allowing various common compiler optimizations,and too hard to understand[16]. Thus,an expert group has been formed to completely revise the JMM,and a concrete proposal has emerged(but not yet beenﬁnalized)[15].In this paper,we study the old and new proposals for JMM and their impact on multi-processor ex-ecution of multi-threaded Java programs.Related Work Some work have been done at the proces-sor level to evaluate the performance of different hardware consistency models.The work of Gharachorloo et al.[4] showed that in a platform with blocking reads and delayed commit of writes,performance is substantially improved by allowing reads to bypass writes.It also showed that SC per-forms poorly relative to all other models.Pai et al.[14]stud-ied the implementation of SC and RC models on current generation processors with aggressive exploitation of in-struction level parallelism(ILP).They found that hardware prefetching and speculative loads dramatically improve the performance of SC.However,the gap between SC and RC depends on the cache write policy and the complexity of the cache-coherence protocol,and in most cases,RC signif-icantly outperforms SC.Recently there have been many efforts to study software memory models for the Java programming language.These works primarily focus on understanding the allowed behav-iors of the JMM.Some work has been done to formalize the old JMM[5,17].We have previously developed an op-erational executable speciﬁcation of the old JMM in[17]. Also,[22]has used an executable framework called uni-form memory model to specify a new JMM developed by Manson and Pugh[12].To the best of our knowledge,there has been little sys-tematic study to measure the performance impact of JMM on multiprocessor platforms.From this perspective,the re-cent work by Doug Lea[10]is closest to ours.This work serves as a comprehensive guide for implementing the new JMM(as currently speciﬁed by JSR-133).It provides brief background about why various rules in JMM exist and concentrates on the consequences of these rules for com-pilers and Java virtual machines with respect to instruc-tion reorderings,choice of memory barrier instructions,and atomic operations.It includes a set of recommended recipes for complying to JSR-133.However,no quantitative perfor-mance evaluation results are presented.Contributions Concretely,the contributions/ﬁndings of this paper can be summarized as follows.•We study the interaction of hardware and software memory models and their impact on the performance of multithreaded Java programs running on out-of-order multi-processors.This involves(a)a theoretical study of how a software memory model adds barriers at the hardware level,and(b)experimental evaluationof how much performance degradation occurs due to these added memory barriers.•Our experimental results indicate that the performance gain achieved by relaxed hardware memory consis-tency models far exceeds the performance degradation due to the introduction of JMM.That is,given a re-laxed hardware model,the performance reduction due to the introduction of a software memory model is typ-ically not substantial.Overall,the differences between the old and new JMMs also do not produce substan-tial difference in performance in our benchmarks.•The main feature of JMM that leads to appreciable per-formance difference in our benchmarks is the treat-ment of volatile variables1;both the old and new JMM provide special semantics for volatile variables.The introduction of volatile variable semantics leads to up to5%performance degradation in some of the bench-marks.We also note that the semantics of volatile vari-ables is different in the old and new JMM.However this difference in semantics does not translate to sub-stantial difference in performance for our benchmarks. Section Organization The rest of this paper is organized as follows.In the next section,we review the technical background on hardware memory consistency models and JMMs.Section3describes the theoretical methodology to identify the performance effects of JMM;this is done by identifying the memory barriers to be inserted in order to enforce JMM.Section4describes the experimental setup for measuring the effects of a software memory model on multiprocessor platforms.Section5describes the experi-mental results obtained from evaluating the performance of multithreaded Java Grande benchmarks under various hard-ware and software memory models.Discussions and future work appear in Section6.2.BackgroundIn this section,we describe the multiprocessor memory consistency models(hardware memory models)as well as the JMMs(software memory models).2.1.Hardware Memory ModelsMemory consistency models have been used in shared-memory multiprocessors for many years.The simplest model of memory consistency was proposed by Lam-port and is called Sequential Consistency(SC)[9].This model allows operations across threads to be interleaved in any order.Operations within each thread are however con-strained to proceed in program order.SC serves as a very1A variable whose access always leads to access of its master copy.2nd operation1st operation Read Write Lock Unlock Read WO,RC WO,RC RCWrite TSO,PSO,WO,RC PSO,WO,RC PSO,RC PSOLockUnlock TSO,PSO,RC PSO,RC PSO PSO Table1:Reorderings between memory operations for hardware memory modelssimple and intuitive model of execution to the program-mer.However,it disallows most compiler and hardware optimizations.For this reason,shared memory multiproces-sors have employed relaxed memory models,which allow certain reordering of operations within a thread.In this pa-per,we study the performance impact of enforcing the JMM on four relaxed hardware memory models:To-tal Store Order(TSO),Partial Store Order(PSO),Weak Or-dering(WO)and Release Consistency(RC).Details of these memory models appear in[2,3].Note that all the memory models only allow reorderings which do not vio-late the uniprocessor data/controlﬂow dependencies within a thread.The TSO and PSO models(supported by SUN SPARC architecture[21])differ from SC in that they allow mem-ory write operations to be bypassed.By bypassing we mean that even if a write operation is stalled,a succeeding mem-ory operation can execute.Memory read operations,how-ever,are blocking in TSO and PSO.The TSO model only allows reads to bypass previous writes.PSO is a more re-laxed model than TSO as it allows both reads and writes to bypass previous writes.Unlike TSO and PSO,the WO and RC models allow non-blocking reads(i.e.,reads may be by-passed).However they classify memory operations as data operations(normal reads/writes)and synchronization oper-ations(lock/unlock).Both WO and RC allow the data op-erations between two synchronization operations to be arbi-trarily re-ordered.However,they differ in handling the re-orderings between a data operation and a synchronization operation.Both WO and RC maintain the order between a lock and its succeeding data operations as well as unlock and its preceding data operations for proper implementa-tion of synchronization semantics.WO,in addition,main-tains the order between a lock and its preceding data oper-ations as well as unlock and its succeeding data operations. RC relaxes these two restrictions.Table1shows the reordering of operations allowed by different hardware memory models.A blank entry indi-cates that this reordering is not allowed by any memory model.The entries associated with lock and unlock in Ta-ble1require some explanation.Note that lock involves an atomic read-modify-write operation.Therefore,operations after lock cannot bypass it under TSO and PSO.For WO and RC,on the other hand,lock is identiﬁed as a special memory operation and memory reads/writes after a lock are not allowed to bypass it.However,it is possible for lock to bypass previous reads/writes under certain memory mod-els such as PSO and RC.The situation with unlock is dif-ferent.Unlock is just an atomic write to shared memory location(synchronization variable).Therefore,an unlock can be re-ordered w.r.t.both the preceding and succeed-ing reads/writes under certain hardware memory models.In particular,note that as PSO does not distinguish unlock as a special operation,it is possible for an unlock to bypass pre-vious writes,which violates the semantics of unlock.This bypassing is prevented by including a memory barrier in-struction in the software implementation of unlock routine on a multiprocessor with PSO model(such as SUN SPARC architecture[21]).2.2.Java Memory ModelsWe consider two candidate Java Memory mod-els:(a)JMM old,the old JMM(since outdated)given in the Java Language Speciﬁcation[6]and(b)a revised JMM de-veloped by Manson and Pugh[12],henceforth called JMM new.Note that there have been other candidate pro-posals for a new JMM(such as Maessen,Arvind and Shen’s work[11]and Adve’s work[1]).Our study can be(and in-deed should be)extended to these models as well.How-ever,the purpose of our study is not to compare JMM old and JMM new point-by-point.Instead we seek to evalu-ate the performance impact of software memory models on multithreaded Java program performance.JMM old This model is speciﬁed in Chapter17of The Java Language Speciﬁcation[6].It is a set of abstract rules dic-tating the allowed reorderings of read/write operations of shared variables.The Java threads interact among them-selves via shared variables.The JMM essentially imposes ordering constraints on the interaction of the threads with the master copy of the variables and thus with each other.A major difﬁculty in reasoning about JMM old seems to be these ordering constraints.They are given in an informal, rule-based,declarative style.It is difﬁcult to reason how multiple rules determine the applicability/non-applicability of a reordering.As a result,this framework is hard to un-derstand.In this paper,we use the easy-to-read operational style formal speciﬁcation developed by us[17]to decide the enabling/disabling of different reorderings.Reorder?2nd operation1st operation Read Write Lock UnlockRead Yes Yes No NoWrite Yes Yes Yes NoLock No No No NoUnlock Yes No No No Table2:Reorderings between memory operations for JMM oldReorder?2nd operation1st operation Read Write Lock UnlockRead Yes Yes Yes NoWrite Yes Yes Yes NoLock No No No NoUnlock Yes Yes No No Table3:Reorderings between memory operations for JMM newTable2shows the allowed reorderings for read, write,lock,and unlock operations in JMM old.In addi-tion JMM old provides special treatment to volatile vari-ables that we will describe later.JMM new The lack of rigor in the speciﬁcation of JMM old has led to some problems.For example,some important compiler optimizations,such as fetch elimination(elimi-nation of a memory read operation if it is preceded by a read/write operation to the same variable),are prohib-ited in JMM old[16].Therefore,the JMM is currently go-ing through an ofﬁcial revision by an expert group JSR-133[8].There have been multiple proposals for revised JMM:JMM new by Manson and Pugh(appeared in[12] and subsequently revised further),Maessen et al.[11],and Adve[1].The JSR-133expert group is now converging to-wards a revised JMM by drawing on the concrete propos-als.A fullﬂedged discussion on the planned features of the revised JMM appears in[15].In this paper,we choose JMM new as the candidate revised JMM and use its formal executable description given in[23,22].Table3shows the allowed reorderings for read,write,lock,and unlock oper-ations in JMM new.Similar sets of allowed reorderings also appear in the reordering table of Doug Lea’s cookbook[10]. JMM new also provides special treatment for both volatile variables andﬁnalﬁelds.Major Differences Both the JMMs allow arbitrary reorder-ing of shared variable read/write operations.However,some other characteristics distinguish JMM old from JMM new in terms of performance as follows.•Synchronization Operations:JMM old does not al-low locks to be re-ordered w.r.t preceding reads andunlocks w.r.t.following writes.JMM new relaxes this constraint(compare Tables2and3).•Volatile Variables:A volatile variable is one for which the Java Virtual Machine(JVM)always ac-cesses the shared copy.JMM old does not allow reads/writes of volatile variables to be re-ordered among themselves.Thus read/write of volatile vari-able u cannot bypass a preceding read/write of another volatile variable v.In contrast,JMM new sim-ply treats a volatile variable read as the acquire of a lock and volatile variable write as the release of a lock.•Final Fields:JMM new has separate semantics forﬁ-nalﬁelds(ﬁelds which are written only once,i.e.,in the constructor).In particular,it proposes that values written to aﬁnalﬁeld f of an object within the con-structor be visible to reads by other threads that use f.3.Interaction of Memory ModelsIn this section,we identify how the JMM is enforced in a multiprocessor platform and how it can affect the perfor-mance of multithreaded Java programs.3.1.OverviewFigure1shows the relationship between the JMM and the underlying hardware memory model in a multiproces-sor implementation of Java multithreading.Both the com-piler reorderings as well as the reorderings introduced by the hardware memory model need to respect the JMM. Pugh has studied how an inappropriate choice of JMM can disable common compiler reorderings[16].In this paper, we systematically study how the choice of JMM can en-able/disable reorderings allowed by the hardware memory models.Note that if the hardware memory model is more relaxed(allows more reorderings and thereby allows more behaviors)than the JMM,then the Java Virtual Machine (JVM)needs to disable these reorderings.This disabling is done via inserting memory barrier instructions at appropri-ate places.A memory barrier instruction forces the proces-sor to execute memory operations across the barrier in pro-gram order.If the JMM is more restrictive than the hard-ware memory model,a multithreaded Java program will execute with too many memory barriers on multiproces-sor platforms.On the other hand,if the hardware memory model is too restrictive compared to the JMM,the perfor-mance enhancing features of the JMM cannot be exploited in that framework.This explains how the choice of JMM can affect the multithreaded program performance on mul-tiprocessors.We want to study the effect of the JMM in enabling or disabling reorderings allowed by the hardware memory models.To evaluate the impact of a JMM on multiproces-sor performance,we need to check whether the JMM per-mits the relaxations allowed by the multiprocessor memoryShouldFigure1:Multiprocessor Implementation of Java Multi-threadingmodel concerned.If they are disallowed,then memory bar-riers need to be explicitly inserted degrading performance. We do so in this paper for two JMMs onﬁve different hard-ware memory models.3.2.Memory Barrier InsertionWe now discuss which reorderings of operations by hardware need to be prevented(by inserting memory bar-rier instructions)to maintain the JMM semantics on speciﬁc hardware memory models.First,we partition the operations in question into two classes:shared variable reads/writes, volatile variable reads/writes and synchronization opera-tions(lock/unlock).Furthermore,as JMM new gives special semantics forﬁnalﬁelds,we considerﬁnalﬁeld writes sep-arately.Notations We will employ the following notations to clas-sify the reorderings that need to be prevented.If we asso-ciate a requirement Rd↑(W r↑)with operation x,it means that all read(write)operations occurring before x must be completed before x starts.On the other hand,if a require-ment of Rd↓(W r↓)is associated with operation x,then all read(write)operations occurring after x must start af-ter x completes.Finally,RW↑≡Rd↑∧W r↑and RW↓≡Rd↓∧W r↓.Barriers for Reads/Writes In the absence of locks,unlocks, and volatile variables,reads/writes to shared variables can be arbitrarily re-ordered within a thread for JMM old.These reorderings are subject to satisfying the uniprocessor con-trol/data dependencies.Even though JMM new advocates that the actions are“usually done in their original order”within a thread,the semantics of the read/write actions al-lows for other reorderings to be deduced.This fact is ex-plicitly mentioned in[12,13]and summarized in[10]in the form of a reordering table.According to the reordering ta-ble,shared variable accesses can be arbitrarily re-ordered within a thread.Since both the JMMs freely allow reorder-ing of shared variable reads/writes among themselves,no memory barrier needs to be inserted before read/write in-structions(in the absence of lock,unlock,and volatile vari-ables)to satisfy the JMM semantics under any of the hard-ware memory models.Barriers for Lock/Unlock In both the JMMs,the order be-tween a lock and the succeeding reads/writes as well as un-lock and previous reads/writes are maintained for proper implementation of synchronization semantics.In addition, JMM old does not allow locks to be re-ordered w.r.t preced-ing reads and unlocks w.r.t.following writes.JMM new has a more relaxed model for synchronization operations which is similar to RC hardware memory model.It relaxes the or-der between a lock and the preceding reads/writes as well as unlock and the following reads/writes.Finally,both the JMMs do not allow synchronization operations to bypass each other.The memory barrier insertion requirements for lock and unlock to satisfy JMM old and JMM new are summarized in Table4.These results are derived by comparing Table1 with Table2and Table3,respectively.For example,con-sider a particular hardware memory model,say PSO,and a software memory model,say JMM new.By comparing Ta-ble1with Table3,we can see that PSO allows the follow-ing reorderings that are not allowed by JMM new•write followed by unlock•unlock followed by unlock•unlock followed by lockTheﬁrst two reorderings are disabled by associating W r↑with unlock and the last reordering is disabled by associat-ing W r↑with lock.Table4is not a conclusive guide on the expected perfor-mance of multithreaded Java benchmarks on various hard-ware memory models.For example,the WO memory model does not introduce any barriers before/after unlock as shown in Table4.However,the effect of barriers is achieved by the hardware itself.Therefore,to measure actual performance of multithreaded Java programs on various multiprocessor platforms,a simulation study is essential.Barriers for Volatile Reads/Writes Recall that a volatile variable is one for which the JVM always accesses the shared copy.JMM old does not allow reads/writes of volatile variables to be re-ordered among themselves though they may be re-ordered w.r.t.reads/writes of normal variables. Thus the compiler has to introduce memory barriers with volatile reads/writes.Note that a memory barrier before an instruction I simply forces all reads and/or writes before I to commit before I starts.It is not possible to introduce aOperation SC TSO PSO WO RC Lock Rd↑Unlock W r↑∧W r↓W r↓Operation SC TSO PSO WO RC Lock W r↑Unlock W r↑(a)JMM old(b)JMM new Table4:Satisfying the reordering requirements for lock and unlock in the JMMsbarrier that selectively forces only the incomplete volatile reads/writes to commit.The memory barrier insertion re-quirement for volatile reads/writes under JMM old are:Operation SC TSO PSO WO RCVRd W r↑W r↑RW↑RW↑VWr W r↑RW↑RW↑Allowing arbitrary reordering of normal and volatile reads/writes creates some problems for JMM old.Consider the following pseudo-code.Thread1Thread2write a,1read volatile vwrite a,2read awrite volatile v,1Assuming v and a are initialized to0,it is possible to read v=1and a=1in the second thread.Indeed it is this weakness of the volatile variable semantics which prevents an easyﬁx of the“Double Checked Locking”idiom[18]us-ing volatile variables.In JMM new,this problem is rectiﬁed by assign-ing“acquire-release”semantics to volatile variable opera-tions.Thus a volatile write behaves like a“release”opera-tion and cannot bypass the previous normal reads/writes; similarly a volatile read behaves like an“acquire”op-eration and cannot be bypassed by the following nor-mal reads/writes.V olatile variable reads/writes are still not allowed to bypass each other,just like JMM old.Re-ordering requirements for volatile reads/writes w.r.t.nor-mal reads/writes in order to satisfy JMM new are shown in the following.Op SC TSO PSO WO RCVRd W r↑W r↑RW↑∧RW↑∧RW↓RW↓VWr W r↑RW↑RW↑As we can see,for WO and RC we have the additional RW↓requirement for volatile reads owing to its treatment as a lock acquisition operation.Barriers for Final Fields Finalﬁelds are theﬁelds of an object which are written only once(i.e.,in the construc-tor).JMM old does not prescribe any special semantics for ﬁnalﬁelds,and treats them as normal variables.However, JMM new provides specialized semantics forﬁnalﬁelds. Here we only considerﬁnalﬁelds which are not visible to other threads before the constructor terminates(called “properly constructed”ﬁnalﬁelds in JMM new).This se-mantics requires that all the writes(writes toﬁnal as well as non-ﬁnalﬁelds)in a constructor to be visible when aﬁ-nalﬁeld is frozen(i.e.,initialized).Since we can assume that allﬁnalﬁelds are frozen before the termination of the constructor,the effect of this semantics can be achieved by inserting a barrier at the end of a constructor.Thus,the re-turn statement of the constructor comes with the require-ment W r↑.To measure the performance impact of this semantics of JMM new,it is just not enough to measure the time taken by these additional barriers.Due to the additional safety pro-vided by the semantics,certain synchronizations can now be eliminated.However,identifying these neo-redundant syn-chronizations is rather difﬁcult,and we have not done so in this paper.Nevertheless,we found the overhead due to ad-ditional barriers forﬁnalﬁeld writes is not substantial. 4.Performance EvaluationIn this section,we compare the impact of JMM old and JMM new on multithreaded Java program perfor-mance through simulation study.First,we consider the performance of various Java Grande benchmarks on dif-ferent hardware memory models.We then study the performance of the same benchmarks when these hard-ware memory models are required to comply to JMM old and JMM new.4.1.BenchmarksWe chooseﬁve different benchmarks from multithreaded Java Grande suite,SOR,LU,Series,Sync,and Ray, which are suitable for parallel execution on shared mem-ory multiprocessors[7].Sync is a low-level benchmark that measures the performance of synchronized methods and blocks.SOR,LU,and Series are moderate-sized ker-nels.LU solves a40×40linear system using LU factoriza-tion followed by a triangular solve.It is a Java version of the well known Linpack benchmark.SOR performs100it-erations of successive over-relaxation on a50×50grid. Series computes theﬁrst30Fourier coefﬁcients of the function f(x)=(x+1)x on the interval0...2.Ray is a large scale application that renders a3D scene contain-ing64spheres.Each benchmark is run with four parallel threads.Table5shows the number of volatile reads/writes,Benchmark V olatile V olatile Constructors with Lock Unlock Read Write Final Field WritesSOR516044802044 LU83009365244 Series002444 Sync00444 Ray482076888 Table5:Characteristics of Benchmarks usedsynchronization operations andﬁnalﬁeld writes for our benchmarks.The LU and SOR benchmarks have substan-tial number of volatile variable reads/writes,accounting for up to15%of the total memory operations.4.2.MethodologyWe use Simics[20],a full-system simulator for multipro-cessor platforms in our performance evaluation.Simics is a system-level architectural simulator developed by Virtutech and supports various processors like SPARC,Alpha,x86 etc.It can run completely unmodiﬁed operating systems. We take advantage of the set of application programming interfaces(API)provided by Simics to write new compo-nents,add new commands,and write control/analysis rou-tines.Multiprocessor platform We simulate a shared memory multiprocessor(SMP)consisting of four SUN UltraSPARC II.The processors are conﬁgured as4-way superscalar out-of-order execution engines with64-entry reorder buffers. We use separate256KB instruction and data caches:each is 2-way set associative with32-byte line size.The caches use write-back,write-allocate policy with cache miss penalty of 100clock cycles.We use MESI cache coherence protocol.Simics provides two basic execution modes:an in-order execution mode and an out-of-order mode.In in-order exe-cution mode,instructions are scheduled sequentially in pro-gram order.Thus there are no reorderings among memory operations in this mode.The out-of-order execution mode has the feature of a modern pipelined out-of-order proces-sor.That is,the instructions need not be issued to the func-tional units and completed in program order.In Simics,this is achieved by breaking instructions into several phases that can be scheduled independently.This mode can produce multiple outstanding memory requests that do not neces-sarily occur in program order.Clearly,we must use out-of-order execution mode in our experiments so that we can simulate multiprocessor platform with different hardware memory models.Multithreading Support We use Linux SMP kernel as the operating system and Kaffe as the JVM.Since our simu-lated platform is a four-processor SMP machine,some mea-sures are taken for the multithreaded programs to make best use of the multiprocessors.Linux supports multithread-ing through POSIX Threads(Pthreads),message passing li-braries and multiple processes.Since both message passing libraries and multiple processes typically communicate ei-ther by means of Inter-Process Communications(IPC)or a messaging API,they do not map naturally to SMP ma-chines.Only Pthreads enforce a shared memory paradigm and so we decide to use Pthreads.Moreover,Kaffe provides several methods for the imple-mentation of Java multithreading including kernel-level and application-level threads.However,threads at application-level do not take advantage of the kernel threading and all the threads are mapped to a single process.As a result,ap-plication level threads cannot take advantage of SMP.Con-sequently,we decide to use kernel Pthreads library so that the Java threads are scheduled to different processors. Hardware Memory Models We conﬁgure Simics consis-tency controller[19]to simulate various hardware memory models.The consistency controller ensures that the archi-tecturally deﬁned consistency model is not violated.The consistency controller can be constrained through the fol-lowing attributes(setting an attribute to zero will imply no constraint):•load-load,if set to non-zero loads are issued in pro-gram order•load-store,if set to non-zero program order is main-tained for stores following loads•store-load,if set to non-zero program order is main-tained for loads following stores•store-store,if set to non-zero stores are issued in pro-gram orderObviously,if all the four attributes are set to non-zero,pro-gram order is maintained for all the memory operations. In this case,the hardware memory model is SC.For TSO where writes can be reordered w.r.t.the following reads,we only need to set store-load to zero and other attributes to non-zero.For PSO,store-load and store-store are set to zero and the other two to non-zero.In addition,for implementing PSO,we had to modify the Simics consistency controller. The default consistency controller stalls a store operation if there is an earlier instruction that can cause an exception and。