More HSSF dumbass

A month or two ago I wrote about a Java package for writing and reading excel documents named POI+HSSF, which stands for Poor Obfuscation Implementation + Horrible SpreadSheet Format. Read all about it.

Today, I had to write an importer for a bunch of data in an excel sheet, so I figured I should use what I know. Everything seemed to go find until I got to cells containing dates. My cell I wanted to read simple said '10/11', but it was a formatted date. Underneath, the data said 10/11/2007.

No Problem, I setup a SimpleDateFormat and treated the cell as a string.

public static final SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("MM/dd");

public Date parseDate(HSSFCell cell)
{
try
{
String date = cell.getRichStringCellValue().getString();
return DATE_FORMAT(date);
}
catch(ParseException e)
{
throw new RuntimeException("Error parsing date.", e);
}
return null;
}


Unfortunately, this threw a NumberFormatException, stating that the cell had numeric contents. O_o?? what? I debugged my code, and found out the numeric value of cell was 39366.0. "Ok," I thought.. "I can see where POI may have thought it was 10/11, or 10/11/2007.. but those don't add up."

I called over a co-worker to show him this weird-ness, and one of us suddenly thought.. maybe it's number of days. I divided the number by 365 and sure enough.. it equated to 107.8.. which is about that many years since 1900. haha! success!

I changed my code to work with this...

public Date parseDate(HSSFCell cell)
{
try
{
double days = cell.getNumericCellValue(); //days since 1900
long millis = (long) days * 86400000; //days to millis
return new Date(millis);
}
catch(NumberFormatException e)
{
throw new RuntimeException("Error parsing date.", e);
}
return null;
}

Well, this is also incorrect.. My dates would come back with the correct day, but the year was 2077! What the... When using new Date(millis), the millis is the number of milliseconds from January 1st, 1970, not 1900. So, now I need to correct for that.
public Date parseDate(HSSFCell cell)
{
try
{
double days = cell.getNumericCellValue(); //days since 1900
long millis = (long) days * 86400000; //days to millis
Calendar cal = Calendar.getInstance();
cal.setTimeInMillis(millis); //days since 1970
cal.set(Calendar.YEAR, cal.get(Calendar.YEAR) - 70); //remove 70 years diff.

return cal.getTime();
}
catch(NumberFormatException e)
{
throw new RuntimeException("Error parsing date.", e);
}
return null;
}


After using Java's standard way of messing with dates, we've done it! Correctly written a function that should have already been part of the POI+HSSF sdk.

but wait!

I started getting a couple of NumberFormatExceptions with my importation. Apparently, a couple cells in the excel sheet ARE actual text fields and not dates. These fields simply had a date in the month/day format.. no year.

So... now I had to account for this...

public Date parseDate(HSSFCell cell)
{
try
{
double days = cell.getNumericCellValue(); //days since 1900
long millis = (long) days * 86400000; //days to millis
Calendar cal = Calendar.getInstance();
cal.setTimeInMillis(millis); //days since 1970
cal.set(Calendar.YEAR, cal.get(Calendar.YEAR) - 70); //remove 70 years diff.

return cal.getTime();
}
catch(NumberFormatException e)
{
try
{
String date = cell.getRichStringCellValue().getString();
Date dates = DATE_FORMAT.parse(date);

Calendar cal = Calendar.getInstance();
cal.setTime(dates);
cal.set(Calendar.YEAR, 2007); //assumes year is 2007
return cal.getTime();
}
catch(ParseException e2)
{
throw new RuntimeException("Error parsing date.", e2);
}
}
return null;
}


Ok.. looks good, right? wrong.. If the cell is empty, HSSF considers this as 0 for Numeric and "" for String values. So, I need to add in some checks so we don't parse an empty cell as a date, which would come back as January 1st, 1900. Also, a null check for cell, would be a good idea too.

This whole function has begun to really bulk up.. I guess that's why they don't have it in their SDK.

Who Cares?

Anyways, I thought it was rather interesting that Excel stores dates as days. I guess that means you can't store a date/time value in a single cell then. One cell for the date, the other cell for the time.

I find it also kinda funny that Microsoft and Java don't agree on when to start the day counter. I swear I've seen another program or language that counts days or time starting at 1940.