This article has been prepared to understand the process of Garbage Collection. This includes how the JVM carry out the memory reclaim process from heap. This document also covers the JVM and Tomcat server settings and precautions need to take while coding which we can use to use the better heap memory.
Mostly the programmers especially java people think that they are not at all required to worry about the internal memory allocation and freeing that memory. It is simply assumed that create the objects, use it and java will take care of the removing or freeing the allocated memory through the mechanism like Garbage Collection. Due to this it is assumed that Java has resolved one of the nasty problems that plague other programming languages—the dreaded memory leak. But the question is “Is it true?”
Enterprise applications written in the Java language involve complex object relationships and utilize large numbers of objects. Although, the Java language automatically manages memory associated with object life cycles, understanding the application usage patterns for objects is important. In particular, verify the following:
- The application is not over-utilizing objects.
- The application is not leaking objects.
- The Java heap parameters are set properly to handle a given object usage pattern.
Understanding the effect of garbage collection is necessary to apply these management techniques.
The garbage collector first performs a task called marking. The garbage collector traverses the application graph, starting with the root objects; those are objects that are represented by all active stack frames and all the static variables loaded into the system. Each object the garbage collector meets is marked as being used, and will not be deleted in the sweeping stage.
The sweeping stage is where the deletion of objects takes place. There are many ways to delete an object: The traditional C way was to mark the space as free, and let the allocator methods use complex data structures to search the memory for the required free space. This was later improved by providing a defragmenting system which compacted memory by moving objects closer to each other, removing any fragments of free space and therefore allowing allocation to be much faster:
For the last trick to be possible a new idea was introduced in garbage collected languages: even though objects are represented by references, much like in C, they don’t really reference their real memory location. Instead, they refer to a location in a dictionary which keeps track of where the object is at any moment.
Fortunately for us - but unfortunately for these garbage collection algorithms - our servers and personal computers got faster (and multiple) processors and bigger memory capacities. Compacting memory areas this large often was very taxing on the application, especially considering that when doing that, the whole application had to freeze due to the changes in the virtual memory map. Fortunately for us though, some smart people improved those algorithms in three ways: concurrency, parallelization and generational collection.
Garbage Collection Algorithms
There are around six basic garbage collection strategies with JDK 1.4.2 version and more that dozens of command line options for tuning and configuring it. The use of all the garbage collection algorithms are same that is to identify the memory blocks that are not reachable by the user programs resulting in the OutOfMemory issues. Below are the algorithms that are used for garbage collection.
1 – Reference Counting:
Each object has an associated reference count. This count indicates the number of active references to that object. If this count is zero, it is garbage and can be recycled. Whenever the reference is modified, the count is updated. Once this count is zero, the memory is reclaimed.
2 – Tracing Collectors:
Mostly the standard garbage collectors do not use Reference Counting. They will use some form of tracing collector’s algorithms. This algorithm will trace all objects starting from root until all reachable objects have been examined.
3 – Mark-Sweep collectors:
This is most basic form of collector algorithm. In this case the collector visits each node starting from root and marks each node. Once there are no any references, the collection is complete. The heap is swept and the objects not marked are reclaimed and returned to free list.
4 – Copying Collectors:
In this case, the heap is divided into equally sized semi spaces. One with active data and another with unused. Once the active space fills up, the objects are copied from active to unused space and the roles are flipped becoming unused space as active. This has advantages as it examines only active data. But will have a overhead of copying data from active to unused space.
5 – Heap Compaction:
In the copying collectors, the set of live objects can be compacted at the bottom of heap. This improves locality of reference and eliminates heap fragmentation and greatly reduces the cost of object allocation which eliminates the need to maintain free lists or look-aside lists or perform best-fit or first-fit algorithms and allocating N bytes is simple to add N to heal pointer.
6 – Mark-compact collectors:
The copying algorithm has excellent performance characteristics, but it has the drawback of requiring twice as much memory as a mark-sweep collector. The mark-compact algorithm combines mark-sweep and copying in a way that avoids this problem, at the cost of some increased collection complexity. Like mark-sweep, mark-compact is a two-phase process, where each live object is visited and marked in the marking phase. Then, marked objects are copied such that all the live objects are compacted at the bottom of the heap. If a complete compaction is performed at every collection, the resulting heap is similar to the result of a copying collector -- there is a clear demarcation between the active portion of the heap and the free area, so that allocation costs are comparable to a copying collector. Long-lived objects tend to accumulate at the bottom of the heap, so they are not copied repeatedly as they are in a copying collector.
JDK uses all of the algorithms in some sense. Early JDK used mark-sweep and mark-compact while version 1.2 and later employed a hybrid approach called generational approach. In this the heap is divided into multiple generations. Objects are created in young generation and the objects that meet some criteria are promoted to older generation. It can use different collection strategy for different generations separately.
By default, the 1.4.1 JDK divides the heap into two sections, a young generation and an old generation. (Actually, there is also a third section, the permanent space, which is used for storing loaded class and method objects.) The young generation is divided into a creation space, often called Eden, and two survivor semi-spaces, using a copying collector.
Reasons for OutOfMemoryError errors
1. You are out of memory. Add more to your heap.
2. You are out of memory. The code is hanging on to object references and a GC can’t do the job. Use the profiler to debug this code.
3. You ran out of file descriptors. This can happen if the threshold is too low.
4. You have too many threads running. Some OS have limits to the number of threads which may be executed by the process. Refer to your OS docs to raise this threshold.
5. If you have a lot of servlets or JSPs, you may need to increase your permanent generation. By default it is 64M. Quadrupling it to be –XX:maxPermSize=256m can be good start.
6. Your OS limits the amount of memory your process may take.
7. The JVM has a bug. This has been known to happen with JVM1.2 and using EJBs with another servlet engine.
8. On the platform look for the java –X options. This may be very helpful.
Garbage Collection Tips and Memory Leaks in Coding Context
1 Small Objects:
Small objects are easy to allocate while large objects will be allocated directly in old generation heap area, take long to initialize and might cause fragmentation. It is always better to allocate small immutable objects. The mutable objects will eventually make your code more obscure at best, or fragment the memory and confuse GC at worst.
2 Non Uniformed Memory Access:
Keep your objects constrained to single thread as much as possible. This will increase the memory usage performance. The basic idea of Non Uniformed Memory Access is to provide increased performance for processors by allowing each processor to work with specific memory space.
3 Object Pools:
Allocation of majority of objects is faster. So there is no any need to have pools for objects as they create issues except for the reasons like creation and initialization of objects are more expensive like connections. The issues are like an unused object takes memory for no reason. Also synchronization is required to fetch an object which is slow process.
4 Finalizable Object:
When a finalizable object is allocated it is marked as such. When the application has no more references to it, the GC enqueue it in the object finalization queue. The JVM has a thread dedicated to removing elements from this queue and calling the finalize method on them; however, to keep the data integrity on the object, the GC does not claim it and traverses its tree as a live object! Only after the object’s finalize method gets called, the object and the references it contains are allowed to be claimed.
- While the GC does a great job at removing unreachable objects, it doesn’t help against memory leaks as they might occur by sloppy code which leaves references to unused objects. The following list contains the common trouble-makers and some solutions:
- Objects defined in higher scope than they should might stay will stay alive longer than expected. Always define the objects in the lowest scope possible for them.
- Listeners for Observable objects which were not removed after their task was done will stay alive, receive events and spend processor and memory resources for no reason. Always make sure the listeners are removed from their Observable class when they are not needed anymore.
- Always use the finally clause when removing references to listeners or other type of objects from usually persistent collections.
- Instances of inner class contain references to their outer classes. You must be aware of this behavior and if you don’t use the outer class, define the inner class as static.
- Using Maps, the kept object usually should remove themselves from the Map when their use is over which is often forgotten. Luckily WeakHashMap keeps the keys as weak references and should be used for such metadata.
- And the use of finalize() method which might be extremely slow and delay the claiming of new memory spaces or even do worse and resurrect the finalize object.
- If the Collection objects are used to store user defined objects, set the reference to null (for the root object) once you have finished with. This way the total memory will be available for garbage collection.
Tomcat Server Settings for Heap Usage
We can reset the heap memory size which is used by Tomcat server. This is achieved by setting the environmental variable CATALINA_OPTS in startup.sh. Below are the environmental variables which can be set to have max and min limit for heap memory usage by Tomcat and JVM.
1 - CATALINA_OPTS
This variable is used to set minimum and maximum heap memory that Tomcat uses. This variable is set in startup.sh for Linux and in startup.bat for Windows platforms. Below is the syntax for both
For Windows -
Set CATALINA_OPTS=”-Xms256m –Xmx1024m”
For UNIX –
export CATALINA_OPTS=”-Xms256m –Xmx1024m”
2 – JAVA_OPTS
This variable is used to set minimum and maximum heap memory that JVM uses. This is also set in startup.sh for Linux and startup.bat for Windows platforms. Below is the syntax to set in both environments
For Windows –
Set JAVA_OPTS=”-Xms256m –Xmx1024m”
For UNIX –
Export JAVA_OPTS=”-Xms256m –Xmx1024m”
In both of these settings –Xms indicates the minimum heap size Tomcat or JVM will use. And –Xmx indicates the maximum heap size that will be used. The setting of these parameters will also decide the garbage collection cycles. So this minimum and maximum number should be set accordingly. Also if the maximum limit is much more, it may happen that GC will take long to check the unreachable objects which will again cause the memory issues. So we should be cautious while setting the minimum and maximum limits.
Memory profiling ToolsThere are various tools available which can carry out the profiling of memory used by java programs. Heap profiling provides the information about memory allocation footprints of the application. We can do following tasks through these tools –
- Observer Garbage Collection cycles
- Can observer memory utilizations
- Can observer CPU utilizations
- Can check the Object references
Some of the tools and utilities like jmap, jhat, NetBean’s profiler, JProbe etc can be used for such purposes. These can read the heap dump files also and can provide you the visual representations. This is the tool which can do all above functions and can provide many charts that will help developer to analyze the memory related issues in java code. Below is the graph that may be seen in JProbe.
The JProbe Memory Debugger allows developers to observe and record how an application is using memory as it runs. This, as with the Profiler, was surprisingly fast considering the overhead that is surely involved. A graph records memory usage (not unlike the Performance Monitor in Windows) at regular intervals that are user selectectable. Additionally, the Memory Leak Doctor will allow developers to take a more granular look at what is going on inside the application and help to identify the key causes.
By looking at the tips given above the developers can avoid the issues related to memory. Also the developers can make use of the available profiling tools which are helpful in identifying the memory leaks and improve the performance.