Dec 3, 2009

Performance Optimization for large I/O Operations

This article discussed about the following topics

  • Issues with export & import of huge files
  • Our Approaches to resolve these issues
  • NIO package advantage
  • Performance issues during serialization
  • Progressive rendering
  • Case Studies with Solutions

Issues while exporting huge files

Issues faced during exporting or importing of large Data/Files:

  • It takes long time to export or import a huge data/file.
  • Out of Memory error during the file I/O operation or copying to a location.
  • In some instances the application hangs. Hard disks are very good at reading and writing sizable chunks of data, but they are much less efficient when working with small blocks of data.
  • Transaction timeouts.

Our Approach-Key points

The basic rules for speeding up I/O performance are

  • To maximize I/O performance, you should batch read and write operations.
  • Use Buffered stream operation to achieve batch read and write operations or use Java NIO package.
  • Minimize accessing the hard disk.
  • Minimize processing bytes and characters individually.
  • Minimize accessing the underlying operating system.
  • Better understanding of java.io and nio package  package can lead to major improvements in I/O performance.

Let us look at some of the techniques to improve I/O performance

  • Large chunks of a file are read from a disk and then accessed a byte or character at a time. This is a problem.
  • Use buffering to minimize disk access and underlying operating system. As shown below, with buffering

Without buffering : inefficient code

try
{
File f = new File("myFile.txt");
FileInputStream fis = new FileInputStream(f);
int count = 0;
int b = ;
while((b = fis.read()) != -1)
{
if(b== '\n')
{
count++;
}
}
// fis should be closed in a finally block.
fis.close() ;
}
catch(IOException io)
{
}


Note : fis.read() is a native method call to theunderlying system.



With Buffering: yields better performance



try
{
File f = new File("myFile.txt");
FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis);
int count = 0;
int b = ;
while((b = bis.read()) != -1)
{
if(b== '\n')
{
count++;
}
}
//bis should be closed in a finally block.
bis.close() ;
}
catch(IOException io)
{
}


Note: bis.read() takes the next byte from the input buffer and only rarely access the underlying operating system.




  • Instead of reading a character or a byte at a time, the above code with buffering can be improved further by reading one line at a time as shown below:



FileReader fr = new FileReader(f); 
BufferedReader br = new BufferedReader(fr);
While (br.readLine() != null) count++;



  • It is recommended to use logging frameworks like Log4J or apache commons  logging, which uses buffering instead of using default behavior of System.out.println(…..) for better performance.



Use the NIO package




  • If you are using JDK 1.4 or later use the NIO package, which uses performance-enhancing features like buffers to hold data, memory mapping of files, non-blocking I/O operations etc.


  • Previously Java IO code must read data from the file system, bring it up into JVM memory, and then push it back down to the file system through Java IO.


  • Java NIO has the potential to really improve performance in a lot of areas. File copies is just one of them


  • Provides the ability to monitor multiple I/O operations concurrently, also known as "multiplexing."


  • Multiplexed NIO is a technique that moves all the I/O work into a single thread that watches over many I/O operations executing concurrently.



NIO-Example



Basic file-to-file copy algorithm implemented using 'NIO':



public static void copyFile(File sourceFile, File destFile) throws IOException
{
if(!destFile.exists())
{
destFile.createNewFile();
}
FileChannel source = null;
FileChannel destination = null;
try
{
source = new FileInputStream(sourceFile).getChannel();
destination = new FileOutputStream(destFile).getChannel();
destination.transferFrom(source, 0, source.size());
}
finally
{
if(source != null)
{
source.close();
}
if(destination != null)
{
destination.close();
}
}
}


Use the NIO package




  • Note that there is no reference to the buffering used or the implementation of the actual copy algorithm.


  • This is key to the potential performance advantages of this algorithm.



Serialization Example




  • Serializing a class and writing to an Output stream.


  • This stems from the fact that serialization is a recursive process.


  • If your class has Jpanel then all of the Swing UI widgets and any objects they reference in a class are written. This is one care item to be taken care during Serialization.


  • To improved Serialization we need to take care when object are serialized.


  • Use the transient keyword to specify when the object implements Serialization.



 


image

public class TestObject implements Serializable
{
private int value;
private String name;
private Date timeStamp;
private JPanel panel;
public TestObject(int value)
{
this.value = value;
name = new String("Object:" + value);
timeStamp = new Date();
panel = new JPanel();
panel.add(new JTextField());
panel.add(new JButton("Help"));
panel.add(new JLabel("This is a text label"));
}
}
//Writing objects to a stream:
for (int i =0;i <50; i++)
{
vector.addElement(new TestObject(i));
}
Stopwatch timer = new Stopwatch().start();
try
{
OutputStream file = new FileOutputStream("Out.test");
OutputStream buffer = new BufferedOutputStream(file);
ObjectOutputStream out = new ObjectOutputStream(buffer);
out.writeObject(vector);
out.close();
}
catch (Exception e)
{
e.printStackTrace();
}
timer.stop();
System.out.println("elapsed = " + timer.getElapsedTime());
//Reading Objects From the Stream:
Stopwatch timer = new Stopwatch().start();
try
{
InputStream file = new FileInputStream("Out.test");
InputStream buffer = new BufferedInputStream(file);
ObjectInputStream in = new ObjectInputStream(buffer);
vector = (Vector)in.readObject();
in.close();
}
catch (Exception e)
{
e.printStackTrace();
}
timer.stop();
System.out.println("elapsed = " + timer.getElapsedTime());
//Improved serializable object
public class TestObjectTrans implements Serializable
{
private int value;
private transient String name;
private Date timeStamp;
private transient JPanel panel;
public TestObjectTrans(int value)
{
this.value = value;
timeStamp = new Date();
initTransients();
}
public void initTransients()
{
name = new String("Object:" + value);
panel = new JPanel();
panel.add(new JTextField());
panel.add(new JButton("Help"));
panel.add(new JLabel("This is a text label"));
}
private void readObject(ObjectInputStream in)
throws IOException, ClassNotFoundException
{
in.defaultReadObject();
initTransients();
}
}


Case Study 1



Requirement :




  • Export the data to an export sheet when export data link is clicked.


  • Data is retrieved from a table which has 65 columns and more than 80,000 records.



Solution:




  • To achieve the above requirement read the records using readLine() and store it in a String Buffer and write to an Output Stream using NIO.


  • Keep a counter and which will count the number of records.


  • The string Buffer length to be set to 0 and clear the memory and flush the output stream when counter count reaches 20.


  • Clearing will have the advantage of not storing the huge data in memory and write the data quickly to the outputstream.



Case Study 2



Requirement:




  • Read a CSV file of 50MB and insert the data into a DB based on few processing instructions.



Solution:




  • Read the file from the remote Location and copy to a local location using java NIO on the server for faster retrieval. This should be a temporary file.


  • Use readLine() of the file InputStream and process data is set of 50 by storing them in a String Buffer.


  • Use a batchupdate to store data in records of 50 it in the database.


  • Incase if there is an exception in process of updating the batch job then we will add the 50 records in an exception list & process them one by one separately.


  • The logger will contain the error log for records which were not updated with exception.



Case Study 3



Requirement:




  • User selects to generate a 500 pages PDF report on click of a link/submit of a page.



Solution:




  • Since the report size is too huge of 500 page. We will loading the whole page in background, but just show around first 10 to 20 pages on click or submit so that user UI is not hanged/idle till all the 500 pages.


  • As the user scroll down the PDF the remaining pages will be loaded parallel in background.


  • Progressive rendering is the act of displaying each object as it is downloaded.


  • A method and a system are provided for processing displayed text and progressively displaying results of processing the displayed text. In some embodiments, displayed text may be submitted as processing requests to process portions of the displayed text Both IE and FF supports progressive rendering;


  • They differ on how they render tables.


  • Internet Explorer renders a , it downloads all the objects within the table before displaying it. This is required so that Internet Explorer can render the table with the correct width for each column.


  • Firefox renders all objects progressively regardless if it's in a table. That is to say - each object is displayed as soon as it is downloaded.


  • If a html page has to display a table with more than 1000 records then make an Ajax call and load the data progressively for a set of 50 records.


0 comments:

Text Widget

Copyright © Vinay's Blog | Powered by Blogger

Design by | Blogger Theme by