Web Hosting Java, JSP, Tomcat 6, J2EE, Servlets, Struts, Jboss
Chapter 8. JNI You cannot dedicate a book
October 31, 2006 on 11:06 pm | In Java | No Comments Chapter 8. JNI You cannot dedicate a book to Java performance without addressing the issue of Java Native Interface (JNI.) After all, JNI is perceived by many to be the performance silver bullet when all else fails. Imagine working on a performance-challenged Java code, trying all the ideas set forth in earlier chapters and still falling short in achieving efficiency. You may be tempted, in desperation, to turn to JNI as a last resort. There are actually three popular scenarios for the use of native methods: . You may have a complex, pre-existing software written in C/C++ and the conversion of this code to Java is not a practical option for some reason. You can hook this software into your Java code via JNI. In this case, JNI bails you out of a costly porting effort. . You may need computational results that are simply not available in the Java environment and that necessitate the use of a system call. There are many such examples in the JDK that you may not be aware of. For example, System.currentTimeMillis() retrieves a timestamp from the underlying physical machine. . You may want to speed up a slow computation, and you think JNI will help you via C/C++ execution. In this chapter we will discuss the last (third) item only, as it is rooted in performance considerations that are up for debate. There’s a popular misperception that JNI is always a performance winner. The reality is not that simple. Page 176
Note: If you are looking for cheap and inexpensive provider to host and run your tomcat application check Actions tomcat hosting services
Chapter 8. JNI You cannot dedicate a book
October 31, 2006 on 11:06 pm | In Java | No Comments Optimization 46: JNI Surprise We start with a counterintuitive example that will take most Java programmers by surprise. This is the case of a nontrivial task that is executed faster in Java than JNI. A JNI call involves substantial work under the covers to cross from Java to the native environment and back. At the very least we must convert data structures from their JVM representation to the ones expected by C/C++. Take a String, for example. It is represented in Java as a sequence of Unicode characters in which each character is a two-byte entity. A String in C/C++ is a sequence of single-byte characters terminated by a null byte. There’s some work involved in bouncing String objects back and forth across those two environments. Take a concrete task of converting an ASCII String to uppercase. This is definitely faster in C/C++, as you can convert the individual characters in place. In Java, however, we must create a new String object to contain the corresponding uppercase string. The original (lowercase) string remains intact, as String objects in Java are immutable. Given that fact, I was surprised to find out that the Java version was faster than JNI. Both versions were tested with a string of length 128: String s = “a”; for (i = 2; i < 8; i++) {// Build a String of 128 characters s += s; } The measurement consisted of 10,000 iterations of the Java version: String p = s.toUpperCase(); as compared to a corresponding JNI version: String p = jniToUpperCase(s); The C++ implementation of jniToUpperCase() is given by jstring JNICALL Java_Jni_jniToUpperCase(JNIEnv *env, jclass thisClass, jstring s) { char *utf_string; jboolean isCopy; utf_string = (char*) env->GetStringUTFChars(s, &isCopy); for (char *p = utf_string;*p;p++) { *p = toupper(*p); } jstring rs = env->NewStringUTF(utf_string); if (isCopy == JNI_TRUE) { env->ReleaseStringUTFChars(s,utf_string); } return (rs); } The Java version was substantially faster. See Figure 8.1. Figure 8.1. JNI is not an automatic performance winner Page 177
Note: If you are looking for cheap and inexpensive provider to host and run your tomcat application check Actions tomcat hosting services
Optimization 45: Read/Write Locks Another way to ease
October 31, 2006 on 8:34 pm | In Java | No Comments Key Points . SMP is currently the dominant MP architecture. It consists of multiple symmetric processors connected via a single bus to a single memory system. The bus is the scalability weak link in the SMP architecture. Large caches, one per processor, are meant to keep bus contention under control. . Amdhal’s law puts an upper limit on the potential scalability of an application. The scalability is limited by portions of the computation that are synchronized or otherwise single-threaded. . The straight-line execution of unsynchronized code is faster than synchronized code, even without contention. Synchronized code hides the cost of acquiring and releasing a lock associated with a class or object. . In the presence of thread contention, synchronized code could become a severe performance and scalability inhibitor. The trick to scalability is to reduce and, if possible, eliminate synchronized code. Following are some steps you can take towards that goal: . Division of labor. Split a monolithic task into multiple subtasks that are conducive to parallel execution by concurrent threads. . False sharing. If two class (or object) members are logically unrelated, don’t use the associated class (or object) lock to synchronize access. That will force the two unrelated data entities to share a lock, which increases contention. Protect access to those members using distinct locks. . Code motion. Synchronized code should only contain access to shared data and nothing else. Code that does not directly manipulate shared resources should not reside within the scope of synchronization. . Share nothing. If you need only a small, fixed number of resource instances, you should avoid the use of public resource pools. Make those instances private to the thread and recycle them. . Partial-sharing. It is better to have two identical pools with half the contention. . Reader/writer locks. Shared data that is read-mostly will benefit from these locks. They eliminate contention among reader threads. Page 175
Note: If you are looking for good and quality webspace to host and run your java application check Actions java hosting services
Optimization 45: Read/Write Locks Another way to ease
October 31, 2006 on 8:34 pm | In Java | No Comments Optimization 45: Read/Write Locks Another way to ease the pain of synchronization is to relax the requirement that one and only one thread may have exclusive access to shared data. The need to synchronize access to shared data stems from the fact that the shared data may be modified by one of the threads accessing it. It follows that we must give exclusive access only to those threads aiming to modify shared data (writers). Conversely, threads that are merely interested in reading shared data (readers) could access shared data concurrently. Reader/writer locks are those that allow multiple readers to access shared data instead of waiting for exclusive access. A thread trying to get read access to shared data will be granted read access in one of two cases: . No other thread was granted access. . The only threads granted access are readers. If a writer thread has been granted access, all readers must wait for the writer thread to leave the critical section. A writer thread is granted access if and only if no other thread has been granted access to the shared resource. Java does not provide built-in read/write synchronization, but you can build your own from the available synchronization primitive building blocks. See D. Lea, Concurrent Programming in Java [LEA97 ], for one such implementation. If all your threads try to modify a shared resource, then reader/writer locks would not help. In fact, they would hurt performance because their implementation is by nature more complex and therefore slower than plain locks. If, however, your shared data is read-mostly, reader/writer locks will improve scalability by eliminating contention among reader threads. Page 174
Note: If you are looking for good and quality webspace to host and run your java application check Actions java hosting services
Optimization 44: Partial Sharing Optimization 43 discussed the
October 31, 2006 on 6:29 pm | In Java | No Comments Optimization 44: Partial Sharing Optimization 43 discussed the various design options for providing concurrent access to an HTML document cache. This is a particular instance of a more generic issue, that of resource pooling. When a particular resource is expensive to acquire and release, we want to spread the cost by recycling the resource many times before we let go of it (Optimizations 36, 37, and 38.) Examples of such resources you are likely to encounter are file contents, JDBC connections, threads, and more. A pooled resource may often be some user-defined object that is frequently used by your application, and pooling it may be more efficient than relying on the garbage-collection subsystem. We have already mentioned the two opposite extremes of resource sharing: the publicly shared resource pool and the thread-private instance of a resource. Between these two extremes lies the sharing middle ground of the partial-sharing resource pool. When each thread requires a single instance of a resource, you can easily eliminate contention by making it thread-private (Optimization 36). If the required number of instances cannot be determined in advance, or if the side effects of maintaining thread-private resources are too severe, you need to use a resource pool that is shared among all threads (Optimization 37). Such shared resources often become a thread contention hot spot that severely degrades performance and scalability. Threads spend significant cycles spinning idle. Partial sharing of resource pools offers a way out of a hotly contended resource pool without paying a heavy toll for side effects such as memory and cache consumption. On one extreme you can have a single resource pool serving all threads as in Figure 7.12. Figure 7.12. A single shared resource pool Our goal was to reduce thread contention by reducing the number of threads competing for a resource. Towards that goal we converted the single resource pool above into multiple identical subpools. We preferred two pools (Figure 7.13) with half the contention, or four pools with one-fourth contention, over a single pool that draws all the activity. Figure 7.13. Spreading the contention over two pools Page 172
Note: If you are looking for best quality webspace to host and run your tomcat application check Vision tomcat hosting services
Optimization 44: Partial Sharing Optimization 43 discussed the
October 31, 2006 on 6:29 pm | In Java | No Comments Page 173
Note: If you are looking for best quality webspace to host and run your tomcat application check Vision tomcat hosting services
Page 170
October 31, 2006 on 4:33 pm | In Java | No Comments Page 170
Note: If you are looking for cheapest and affordable webspace to host and run your servlet application check Astra servlet hosting services
Page 170
October 31, 2006 on 4:33 pm | In Java | No Comments Optimization 43: Share Nothing In the previous section (Optimization 42) we established the validity of the principle calling for a reduction in the amount of computation (and hence CPU cycles) executed inside synchronized code. In the most extreme form of this principle we eliminate sharing altogether, leading to the elimination of the synchronized code. In this chapter we introduce various techniques to minimize time spent in synchronized code, but erasing it altogether is even better. Your first question when faced with a synchronization hot spot should be: Can I restructure my design so as to eliminate the need for synchronization? Let’s look at a concrete example involving a Web-server implementation. In Chapter 11 we develop a multithreaded Web server. One type of Web server is a file server for HTML documents. A Web server that does not perform file caching (such as in the early releases of the popular Apache server) will be slowed down by file I/O. The typical solution employed by many Web-server implementations is to cache HTML documents in a memory-resident cache managed by the server. There’s typically a single cache object shared by all threads of the multithreaded server. This shared cache is often the place where all server threads collide, competing for exclusive access. Since the cache is a dynamic object from which files come and go, access to it must be synchronized. There are quite a few potential solutions to this contention hot spot: . Private file chaches. Give each thread a private file cache, independent of the other threads. The need for synchronization is removed along with the lock contention. . Partial sharing. Find a middle ground between having a single server-wide cache on one extreme and providing a cache per thread on the other. We discuss this further in Optimization 44. . Read/write locks. The file cache is read-mostly. Updating (write) the cache is relatively infrequent. See Optimization 45 for further details. The scalability upside of eliminating shared resources is clear. There’s a downside, as well. Providing each server thread with its own copy of the cache leads to a drastic increase in the memory footprint of the server. What if each cache grows to 10 MB and we have 500 active threads? Such a design choice will have to be compensated for by either restricting cache size, limiting the number of concurrent threads, or some combination of both. This design choice is available in many multithreaded applications and it does not always come bundled with a footprint penalty. It should find a place in your bag of tricks. Page 171
Note: If you are looking for cheapest and affordable webspace to host and run your servlet application check Astra servlet hosting services
Optimization 42: Code Motion Splitting a sequential task
October 31, 2006 on 3:10 pm | In Java | No Comments Optimization 42: Code Motion Splitting a sequential task into parallel subtasks is just the first step. Beyond that we must structure our multithreaded code to allow the subtasks to execute independently and minimize friction among the threads executing those parallel subtasks. The most common friction among threads is the contention for shared resources. When a thread acquires exclusive access to a shared resource, all other threads that want access to that resource must wait. We therefore must speed up execution inside the synchronized blocks and methods and release shared resources as fast as possible. Code motion is often associated with loop optimization [BEN82]. A computation whose value is constant across loop iterations should not be performed inside a loop. It ought to be computed once before entering the loop. Similarly, the critical section should only contain critical computations. The critical computations are those that directly manipulate shared resources. All other computations ought to be performed outside the critical section. We discuss an example of code motion below. Imagine a multithreaded application that must log some data to a shared file (a Web server must log every request, for example). We will mimic such an application with the code below. The main program creates a number of parallel threads whose run() method attempts to log data to the shared file. The shared file itself is created and closed by the main program. An output stream reference is provided to the threads as a constructor argument: import java.io.*; class CodeMotion { public static void main(String args[]) { try { if(args.length != 2) { System.out.println(”Usage java CodeMotion
Note: If you are looking for best quality webspace to host and run your tomcat application check Vision tomcat hosting services
Optimization 42: Code Motion Splitting a sequential task
October 31, 2006 on 3:10 pm | In Java | No Comments Page 169
Note: If you are looking for best quality webspace to host and run your tomcat application check Vision tomcat hosting services
...I
just wanted to take the time to say "Thank you!" for our new webmail
system. It's great! Thanks for taking such good care of us.
Thanks
for helping me out. Just for the record, Webhostingjava.net has been a great
web host! So far your support and handling of questions has far
exceeded that of a "larger web hosting company".
I
would like to thank you for helping me with my domain...You have
shown me great patience and professionalism. I would not hesitate to
recommend you to my clients.