Mar 19, 2010

Hibernate - Performance Tuning

Overview

This article gives an introduction to Hibernate and brief overview of how to tune the Hibernate components and various strategies used to tune the application which uses Hibernate.

Hibernate is an object/relational mapping tool for Java environments. The term object/relational mapping (ORM) refers to the technique of mapping a data representation from an object model to a relational data model with a SQL-based schema.

Hibernate not only takes care of the mapping from Java classes to database tables (and from Java data types to SQL data types), but also provides data query and retrieval facilities and can significantly reduce development time otherwise spent with manual data handling in SQL and JDBC.

There are various Fetching strategies and Caching mechanism to tune the Hibernate application and various ways of tuning java collection objects which are used with Hibernate. Hibernate provides ways with which we can get the Metrics/Statistics which can be used for Performance Tuning.

The following sections describe few strategies on how to tune in Hibernate to match the application needs.

  • Fetching Strategies
  • Caching
  • Lazy Fetching
  • N+1 query problem
  • Collection Frameworks
  • Monitoring performance

Fetching Strategies

A fetching strategy is the strategy Hibernate will use for retrieving associated objects if the application needs to navigate the association. Fetch strategies may be declared in the O/R mapping metadata, or over-ridden by a particular HQL or Criteria query.

Hibernate3 defines the following fetching strategies:

  • Join fetching - Hibernate retrieves the associated instance or collection in the same SELECT, using an OUTER JOIN.
  • Select fetching - a second SELECT is used to retrieve the associated entity or collection. Unless you explicitly disable lazy fetching by specifying lazy="false", this second select will only be executed when you actually access the association.
  • Subselect fetching - a second SELECT is used to retrieve the associated collections for all entities retrieved in a previous query or fetch. Unless you explicitly disable lazy fetching by specifying lazy="false", this second select will only be executed when you actually access the association.
  • Batch fetching - an optimization strategy for select fetching - Hibernate retrieves a batch of entity instances or collections in a single SELECT, by specifying a list of primary keys or foreign keys.

Hibernate also distinguishes between:

  • Immediate fetching - an association, collection or attribute is fetched immediately, when the owner is loaded.
  • Lazy collection fetching - a collection is fetched when the application invokes an operation upon that collection. (This is the default for collections.)
  • "Extra-lazy" collection fetching - individual elements of the collection are accessed from the database as needed. Hibernate tries not to fetch the whole collection into memory unless absolutely needed (suitable for very large collections)
  • Proxy fetching - a single-valued association is fetched when a method other than the identifier getter is invoked upon the associated object.
  • "No-proxy" fetching - a single-valued association is fetched when the instance variable is accessed. Compared to proxy fetching, this approach is less lazy (the association is fetched even when only the identifier is accessed) but more transparent, since no proxy is visible to the application. This approach requires buildtime bytecode instrumentation and is rarely necessary.
  • Lazy attribute fetching - an attribute or single valued association is fetched when the instance variable is accessed. This approach requires buildtime bytecode instrumentation and is rarely necessary.

We have two orthogonal notions here: when is the association fetched, and how is it fetched (what SQL is used). Don't confuse them! We use fetch to tune performance. We may use lazy to define a contract for what data is always available in any detached instance of a particular class.

Solving the n+1 selects problem

The biggest performance killer in applications that persist objects to SQL databases is the n+1 selects problem. When you tune the performance of a Hibernate application, this problem is the first thing you’ll usually need to address. Its normal (and recommended) to map almost all associations for lazy initialization.

This means you generally set all collections to lazy="true" and even change some of the one-to-one and many-to-one associations to not use outer joins by default. This is the only way to avoid retrieving all objects in the database in every transaction. Unfortunately, this decision exposes you to the n+1 selects problem.

It’s easy to understand this problem by considering a simple query that retrieves all Items for a particular user:

Iterator items = session.createCriteria(Item.class) 

.add( Expression.eq("item.seller", user) )

.list()

.iterator();



This query returns a list of items, where each collection of bids is an uninitialized collection wrapper. Suppose that we now wish to find the maximum bid for each item. The following code would be one way to do this:




List maxAmounts = new ArrayList(); 
while (items.hasNext())

{

Item item = (Item) items.next();
BigDecimal maxAmount = new BigDecimal("0");
for ( Iterator b = item.getBids().iterator(); b.hasNext(); )
{
Bid bid = (Bid) b.next();
if ( bid.getAmount().compareTo(maxAmount) == 1 )
maxAmount = bid.getAmount();
}
maxAmounts.add(new MaxAmount( item.getId(), maxAmount ) );
}



But there is a huge problem with this solution (aside from the fact that this would be much better executed in the database using aggregation functions):



Each time we access the collection of bids, Hibernate must fetch this lazy collection from the database for each item. If the initial query returns 20 items, the entire transaction requires 1 initial select that retrieves the items plus 20 additional selects to load the bids collections of each item. This might easily result in unacceptable latency in a system that accesses the database across a network. Usually you don't explicitly create such operations, because you should quickly see doing so is suboptimal.



However, the n+1 selects problem is often hidden in more complex application logic, and you may not recognize it by looking at a single routine.



Batch fetching



The first attempt to solve this problem might be to enable batch fetching. We change our mapping for the bids collection to look like this:




<set name="bids" lazy="true" inverse="true" batch-size="10"> 



With batch fetching enabled, Hibernate pre-fetches the next 10 collections when the first collection is accessed. This reduces the problem from n+1 selects to n/10 + 1 selects. For many applications, this may be sufficient to achieve acceptable latency. On the other hand, it also means that in some other transactions, collections are fetched unnecessarily. It isn’t the best we can do in terms of reducing the number of round trips to the database



Batch fetching of collections is particularly useful if you have a nested tree of items, i.e. the typical bill-of-materials pattern.



Eager fetching



We can try enabling eager fetching at the level of the mapping document:




<set name="bids" inverse="true" outer-join="true"> 



The outer-join attribute is available for collections and other associations. It forces Hibernate to load the association eagerly, using an SQL outer join. Note that, as previously mentioned, HQL queries ignore the outer-join attribute; but we might be using a criteria query. This mapping avoids the problem as far as this transaction is concerned; we’re now able to load all bids in the initial select. Unfortunately, any other transaction that retrieves items using get(), load(), or a criteria query will also retrieve all the bids at once. Retrieving unnecessary data imposes extra load on both the database server and the application server and may also reduce the concurrency of the system, creating too many unnecessary read locks at the database level. Hence we consider eager fetching at the level of the mapping file to be almost always a bad approach. The outer-join attribute of collection mappings is arguably a misfeature of Hibernate (fortunately, it’s disabled by default). Occasionally it makes sense to enable outer-join for a <many-to-one> or <one-to-one> association (the default is auto), but we’d never do this in the case of a collection.



HQL aggregation



A much, much better solution is to take advantage of HQL aggregation and perform the work of calculating the maximum bid on the database. Thus we avoid the problem:




String query = "select MaxAmount( item.id, max(bid.amount) )" + " from Item item join item.bids bid" + " where item.seller = :user group by item.id"; 

List maxAmounts = session.createQuery(query).setEntity("user", user).list();



Unfortunately, this isn’t a complete solution to the generic issue. In general, we may need to do more complex processing on the bids than merely calculating the maximum amount. We’d prefer to do this processing in the Java application.



Hibernate ‘on-delete=cascade’



Be careful when you use Hibernate’s support for database ON DELETE CASCADE constraint. If not configured properly, your application might be more performance-costly than you think.



Let us take and example of setting on-delete=”cascade”:




<set name="children" inverse="true" cascade="all"> 

<key name="PARENT_ID" on-delete="cascade">

<one-to-many class="Child">

<set>



For a parent object associated with N child objects (cascade=”all”), the setting on-delete=”cascade” avoids issuing N deletes to the database if the parent were to be deleted. This setting only optimizes away the delete statements while all semantics are preserved



Following is the procedure Hibernate performs to cascade delete in the simplest scenario:




  • Mark entity for deletion


  • Iterate through all of entity’s associations having cascade delete (one or many)


  • If the association is single, perform delete on this association (by starting step 1 for this association)


  • If the association is many, perform delete on this collection by:


  • Iterate all the collection elements, loading them from the database if necessary


  • For each collection element, perform delete on the element (by starting step1 for this element)


  • If not on-delete=”cascade”, issue delete statement on this entity



When you have a to-many association with cascade=”all” as described in our example, Hibernate iterates the associated collection having size N, loading them from database if necessary, and marks each collection element for deletion. Before actually issuing delete statements for these elements, Hibernate searches for any other cascades configured on them and perform these found cascading actions accordingly.



Even with the setting on-delete=”cascade” on a specific lazy collection, cascade=”all” causes Hibernate to initialize and iterate the collection (because it wants to check if the elements in this collection also cascade delete to other associations). Therefore, when you are sure there are no other cascading actions configured on your child entity (the element in the on-delete=”cascade” collection), use the setting cascade=”save-update” instead of cascade=”all” to prevent Hibernate from performing the delete cascade checks and consequently avoid loading your lazy to-many association into memory.



If you have multiple layers of delete cascade, from parent to children and from child to grand children, consider the size of your collections and decide if you want to set up multiple layers of ON DELETE CASCADE constraints in database accordingly or preserve the configuration of cascade=”all” and on-delete=”cascade” to let Hibernate handle the delete cascade on the grand children.



Lazy fetching



By default, Hibernate3 uses lazy select fetching for collections and lazy proxy fetching for single-valued associations. These defaults make sense for almost all associations in almost all applications. However, lazy fetching poses one problem that you must be aware of. Access to a lazy association outside of the context of an open Hibernate session will result in an exception.



For example:




s = sessions.openSession();

Transaction tx = s.beginTransaction();

User u = (User) s.createQuery("from User u where u.name=:userName")

.setString("userName", userName).uniqueResult();

Map permissions = u.getPermissions();

tx.commit();

s.close();

Integer accessLevel = (Integer) permissions.get("accounts"); // Error!



Since the lazy collection was not initialized when the Session was closed, the collection will not be able to load its state. Hibernate does not support lazy initialization for detached objects. The fix is to move the code that reads from the collection to just before the transaction is committed. Alternatively, we could use a non-lazy collection or association, by specifying lazy="false" for the association mapping. However, it is intended that lazy initialization be used for almost all collections and associations. If you define too many non-lazy associations in your object model, Hibernate will end up needing to fetch the entire database into memory in every transaction!



Initializing collections and proxies



A LazyInitializationException will be thrown by Hibernate if an uninitialized collection or proxy is accessed outside of the scope of the Session, ie. when the entity owning the collection or having the reference to the proxy is in the detached state.



Sometimes we need to ensure that a proxy or collection is initialized before closing the Session. Of course, we can always force initialization by calling item.getBids() or item.getBids().size(), for example. But that is confusing to readers of the code and is not convenient for generic code. The static methods Hibernate.initialize() and Hibernate.isInitialized() provide the application with a convenient way of working with lazily initialized collections or proxies. Hibernate.initialize(item) will force the initialization of a proxy, item, as long as its Session is still open. Hibernate.initialize(item.getBids()) has a similar effect for the collection of bids.



Another option is to keep the Session open until all needed collections and proxies have been loaded. In some application architectures, particularly where the code that accesses data using Hibernate, and the code that uses it are in different application layers or different physical processes, it can be a problem to ensure that the Session is open when a collection is initialized.



There are two basic ways to deal with this issue:




  • In a web-based application, a servlet filter can be used to close the Session only at the very end of a user request, once the rendering of the view is complete (the Open Session in View pattern). Of course, this places heavy demands on the correctness of the exception handling of your application infrastructure. It is vitally important that the Session is closed and the transaction ended before returning to the user, even when an exception occurs during rendering of the view.




  • In an application with a separate business tier, the business logic must "prepare" all collections that will be needed by the web tier before returning. This means that the business tier should load all the data and return all the data already initialized to the presentation/web tier that is required for a particular use case. Usually, the application calls Hibernate.initialize() for each collection that will be needed in the web tier (this call must occur before the session is closed) or retrieves the collection eagerly using a Hibernate query with a FETCH clause or a FetchMode.JOIN in Criteria. This is usually easier if you adopt the Command pattern instead of a Session Facade.




  • You may also attach a previously loaded object to a new Session with merge() or lock() before accessing uninitialized collections (or other proxies). No, Hibernate does not, and certainly should not do this automatically, since it would introduce ad hoc transaction semantics!



The Second Level Cache



A Hibernate Session is a transaction-level cache of persistent data. It is possible to configure a cluster or JVMlevel (SessionFactory-level) cache on a class-by-class and collection-by-collection basis. You may even plug-in a clustered cache. Be careful. Caches are never aware of changes made to the persistent store by another application (though they may be configured to regularly expire cached data).



You have the option to tell Hibernate which caching implementation to use by specifying the name of a class that implements org.hibernate.cache.CacheProvider using the property hibernate.cache.provider_class. Hibernate comes bundled with a number of built-in integrations with open-source cache providers (listed below); additionally, you could implement your own and plug it in as outlined above.



Cache Providers

Cache Provider class Type Cluster Safe Query Cache Supported
Hashtable (not intended for production use) org.hibernate.cache.HashtableCacheProvider memory yes  
EHCache org.hibernate.cache.EhCacheProvider memory, disk  yes
OSCache org.hibernate.cache.OSCacheProvider memory, disk  yes
SwarmCache org.hibernate.cache.SwarmCacheProviderclustered (ip multicast) yes (clustered invalidation) 
JBoss TreeCache org.hibernate.cache.TreeCacheProvider clustered (ip multicast), transactional yes (replication) yes (clock sync req.)



Cache mappings



The <cache> element of a class or collection mapping has the following form:




<cache 
usage="transactional|read-write|nonstrict-read-write|read-only" (1)

region="RegionName" (2)

include="all|non-lazy" (3)

/>



(1) Usage (required) specifies the caching strategy: transactional, read-write, nonstrict-read-write or read-only



(2) Region (optional, defaults to the class or collection role name) specifies the name of the second level cache region



(3) Include (optional, defaults to all) non-lazy specifies that properties of the entity mapped with lazy="true" may not be cached when attribute-level lazy fetching is enabled.



Alternatively (preferably?), you may specify <class-cache> and <collection-cache> elements in hibernate.cfg.xml.




  • Strategy: read only



If your application needs to read but never modify instances of a persistent class, a read-only cache may be used. This is the simplest and best performing strategy. It's even perfectly safe for use in a cluster.




  • Strategy: read/write



If the application needs to update data, a read-write cache might be appropriate. This cache strategy should never be used if serializable transaction isolation level is required. If the cache is used in a JTA environment, you must specify the property hibernate.transaction.manager_lookup_class, naming a strategy for obtaining the JTA TransactionManager. In other environments, you should ensure that the transaction is completed when Session.close() or Session.disconnect() is called. If you wish to use this strategy in a cluster, you should ensure that the underlying cache implementation supports locking. The built-in cache providers do not.




  • Strategy: nonstrict read/write



If the application only occasionally needs to update data (ie. if it is extremely unlikely that two transactions would try to update the same item simultaneously) and strict transaction isolation is not required, a nonstrictread-write cache might be appropriate. If the cache is used in a JTA environment, you must specify hibernate.transaction.manager_lookup_class. In other environments, you should ensure that the transaction is completed when Session.close() or Session.disconnect() is called.




  • Strategy: transactional



The transactional cache strategy provides support for fully transactional cache providers such as JBoss TreeCache. Such a cache may only be used in a JTA environment and you must specify hibernate.transaction.manager_lookup_class.



Cache Concurrency Strategy Support

Cache Read-only Read-only nonstrictread- write Read-write transactional
Hashtable(not intended for production use) yes yes yes  
EHCache yes yes yes  
OSCache yes yes yes  
SwarmCache yes     yes
JBoss TreeCache yes     yes



Managing the caches



Whenever you pass an object to save(), update() or saveOrUpdate() and whenever you retrieve an object using load(), get(), list(), iterate() or scroll(), that object is added to the internal cache of the Session. When flush() is subsequently called, the state of that object will be synchronized with the database. If you do not want this synchronization to occur or if you are processing a huge number of objects and need to manage memory efficiently, the evict() method may be used to remove the object and its collections from the firstlevelcache.




ScrollableResult items = sess.createQuery("from Item as item").scroll(); //a huge result set 

while ( items.next() ) {

Item item = (Item) items.get(0);

doSomethingWithAItem(item);

sess.evict(item);

}



The Session also provides a contains() method to determine if an instance belongs to the session cache. To completely evict all objects from the session cache, call Session.clear() For the second-level cache, there are methods defined on SessionFactory for evicting the cached state of an instance, entire class, collection instance or entire collection role.




sessionFactory.evict(Item.class, itemId); //evict a particular Item 

sessionFactory.evict(Item.class); //evict all Items sessionFactory.evictCollection("Item.bids", itemId); //evict a particular collection of bids

sessionFactory.evictCollection("Item.bids"); //evict all bid collections



The CacheMode controls how a particular session interacts with the second-level cache.



• CacheMode.NORMAL - read items from and writes items to the second-level cache



• CacheMode.GET - read items from the second-level cache, but doesn’t write to the second-level cache except when updating data



• CacheMode.PUT - write items to the second-level cache, but don't read from the second-level cache



• CacheMode.REFRESH - write items to the second-level cache, but don't read from the second-level cache, bypass the effect of hibernate.cache.use_minimal_puts, forcing a refresh of the second-level cache for all items read from the database.



To browse the contents of a second-level or query cache region, use the Statistics API:




Map cacheEntries = sessionFactory.getStatistics() 

.getSecondLevelCacheStatistics(regionName)

.getEntries();



You'll need to enable statistics, and, optionally, force Hibernate to keep the cache entries in a more human understandable format:






hibernate.generate_statistics true



hibernate.cache.use_structured_entries true





The Query Cache



Query result sets may also be cached. This is only useful for queries that are run frequently with the same parameters. To use the query cache you must first enable it:






hibernate.cache.use_query_cache true





This setting causes the creation of two new cache regions - one holding cached query result sets (org.hibernate.cache.StandardQueryCache), the other holding timestamps of the most recent updates to queryable tables (org.hibernate.cache.UpdateTimestampsCache). Note that the query cache does not cache the state of the actual entities in the result set; it caches only identifier values and results of value type. So the query cache should always be used in conjunction with the second-level cache.



Most queries do not benefit from caching, so by default queries are not cached. To enable caching, call Query.setCacheable(true). This call allows the query to look for existing cache results or add its results to the cache when it is executed.



Monitoring performance



Optimization is not much use without monitoring and access to performance numbers. Hibernate provides a full range of figures about its internal operations.



Statistics in Hibernate are available per SessionFactory.




  • Option1: Call sessionFactory.getStatistics () and read or display the Statistics yourself.


  • Option2: Through JMX



Statistics Interface



? Hibernate provides a number of metrics, from very basic to the specialized information only relevant in certain scenarios.



? All available counters are described in the Statistics interface API, in three categories:




  • Metrics related to the general Session usage, such as number of open sessions, retrieved JDBC connections, etc.


  • Metrics related to he entities, collections, queries, and caches as a whole (aka global metrics),


  • Detailed metrics related to a particular entity, collection, query or cache region.



Monitoring a SessionFactory



You can access SessionFactory metrics in two ways. Your first option is to call sessionFactory.getStatistics() and read or display the Statistics yourself.



Hibernate can also use JMX to publish metrics if you enable the StatisticsService MBean. You may enable a single MBean for all your SessionFactory or one per factory. See the following code for minimalistic configuration examples:




// MBean service registration for a specific SessionFactory

Hashtable tb = new Hashtable();

tb.put("type", "statistics");

tb.put("sessionFactory", "myFinancialApp");

ObjectName on = new ObjectName("hibernate", tb); // MBean object name

StatisticsService stats = new StatisticsService(); // MBean implementation

stats.setSessionFactory(sessionFactory); // Bind the stats to a SessionFactory

server.registerMBean(stats, on); // Register the Mbean on the server

// MBean service registration for all SessionFactory's

Hashtable tb = new Hashtable();

tb.put("type", "statistics");

tb.put("sessionFactory", "all");

ObjectName on = new ObjectName("hibernate", tb); // MBean object name

StatisticsService stats = new StatisticsService(); // MBean implementation

server.registerMBean(stats, on); // Register the MBean on the server



In the first case, we retrieve and use the MBean directly. In the second one, we must give the JNDI name in which the session factory is held before using it. Use hibernateStatsBean.setSessionFactoryJNDIName("my/JNDI/Name")



You can (de)activate the monitoring for a SessionFactory




  • at configuration time, set hibernate.generate_statistics to false


  • at runtime: sf.getStatistics().setStatisticsEnabled(true) or hibernateStatsBean.setStatisticsEnabled(true)



Statistics can be reset programmatically using the clear() method. A summary can be sent to a logger (info level) using the logSummary() method.



Metrics



Hibernate provides a number of metrics, from very basic to the specialized information only relevant in certain scenarios. All available counters are described in the Statistics interface API, in three categories:




  • Metrics related to the general Session usage, such as number of open sessions, retrieved JDBC connections, etc.


  • Metrics related to he entities, collections, queries, and caches as a whole (aka global metrics),


  • Detailed metrics related to a particular entity, collection, query or cache region.



For example, you can check the cache hit, miss, and put ratio of entities, collections and queries, and the average time a query needs. Beware that the number of milliseconds is subject to approximation in Java. Hibernate is tied to the JVM precision, on some platforms this might even only be accurate to 10 seconds.



Simple getters are used to access the global metrics (i.e. not tied to a particular entity, collection, cache region, etc.). You can access the metrics of a particular entity, collection or cache region through its name, and through its HQL or SQL representation for queries. Please refer to the Statistics, EntityStatistics, CollectionStatistics, SecondLevelCacheStatistics, and QueryStatistics API Javadoc for more information. The following code shows a simple example:




Statistics stats = HibernateUtil.sessionFactory.getStatistics();

double queryCacheHitCount = stats.getQueryCacheHitCount();

double queryCacheMissCount = stats.getQueryCacheMissCount();

double queryCacheHitRatio =

queryCacheHitCount / (queryCacheHitCount + queryCacheMissCount);
log.info("Query Hit ratio:" + queryCacheHitRatio);
EntityStatistics entityStats =
stats.getEntityStatistics( Cat.class.getName() );
long changes = entityStats.getInsertCount()+ entityStats.getUpdateCount()+ entityStats.getDeleteCount ();
log.info (Cat.class.getName() + “changed " + changes + "times" );



To work on all entities, collections, queries and region caches, you can retrieve the list of names of entities, collections, queries and region caches with the following methods: getQueries(), getEntityNames(), getCollectionRoleNames(), and getSecondLevelCacheRegionNames().



Conclusion



The above are only some of the various ways that the server can be tuned. Bear in mind, however, that a poorly designed, poorly written application will usually have poor performance, regardless of tuning. Performance must always be a key consideration throughout the stages of the application development life cycle - from design to deployment. It happens too often that performance takes a back seat to functionality, and problems are found later that are difficult to fix.


Related Posts



0 comments:

Text Widget

Copyright © Vinay's Blog | Powered by Blogger

Design by | Blogger Theme by