Comparison of 2 Java Excel APIs on Large Workbooks/Worksheets
Comparison of Apache POI and JExcelApi
My final findings show the most I could write on my machine before getting java.lang.OutOfMemory errors. I have 4GB of ram on my box. Basically I created the ExcelHelper inteface and had a PoiHelper implementation and a JExcelApi implementation. I then could compare apples to apples by making a jexcelapi junit tests which extended the poihelper tests with a different factor method for the ExcelHelper reference. In my junit tests basically I had a loop of paged queries which grabbed 10K Firestorm JDBC Dtos at a time and wrote them to the Excel Workbook.
public class PoiRegressionTest extends TestCase {
private static final Logger log = Logger.getLogger(PoiRegressionTest.class);
public static Test suite() {
TestSuite suite = new TestSuite();
suite.addTest(new PoiRegressionTest("testWriteSmallWorkbook"));
suite.addTest(new PoiRegressionTest("testWriteLargeWorkbook"));
return new IntegrationTestSetup(suite);
}
public ExcelHelper getExcelHelper(String fileName) throws Exception
{
return new PoiHelper(fileName, "0");
}
public class JExcelApiRegressionTest extends PoiRegressionTest {
private static final Logger log = Logger
.getLogger(JExcelApiRegressionTest.class);
public static Test suite() {
TestSuite suite = new TestSuite();
suite.addTest(new PoiRegressionTest("testWriteSmallWorkbook"));
suite.addTest(new PoiRegressionTest("testWriteLargeWorkbook"));
return new IntegrationTestSetup(suite);
}
public ExcelHelper getExcelHelper(String fileName) throws Exception
{
return new JExcelApiHelper(fileName, "0");
}
Apache POI HSSF Api - 35 K rows of all 90+ columns of records.
JExcelApi - 150K rows of 25 columns of records. This is spread across mutliple worksheets in a single workbook.
In the end game analysis it seems like POI is a bit more feature rich than JExcelApi, but JExcelApi is slightly more memory efficient. However they both suffer from the same problems in that Microsoft OLE documents have upstream pointers to downstream data which doesn’t allow you to stream them. Thereby ulitmately making the memory of the machine the limiting factor since the whole document must be in memory while its being written.
Microsoft Documents are not linear. They have upstream pointers to downstream documents. This shows the excel format.
http://www.onjava.com/pub/a/onjava/2003/04/16/poi_excel.html
Bla.xls {
OLE2CDF headers
"Workbook" stream {
Workbook {
Static String Table Record..
Sheet names... and pointers <- this appears to be the issue
}
Sheet {
ROW
ROW
...
NUMBER RECORD (cell)
LABELSST Record (cell)
...
}
Sheet
}
}
... images, macros, etc.
Document Summary
Summary
Optimized PDFs are linear and therefore can be flushed like a stream and therefore memory of the box, as long as you page through the data, is not a limiting factor. I believe the problem for us however is that they want the data in Excel and PDF is not going to meet the requirements of the “accounting” folks.
