Flat File Parser

After a few projects where I had to parse through legacy flat files I decided enough was enough and decided to write my own parser. This parser would do exactly one thing efficiently and that was convert lines from the flat file to java objects. I wanted something that was thin and did exactly what I mentioned above and no other frills. Though now that I have it working a few frills may be in order . I have created a project at JavaForge where this tool will reside. If you do find it useful please drop a comment in the discussions forum on the javaforge site javaforge.com/project/2066. The goal is to parse a flat file (either character separated columns or fixed length columns). The parser supports two methods of parsing a file. In the first approach you are responsible for reading the file and providing each line that needs to be transformed to the transformer. The second approach is SAX-like, in that you register a listener and the transformer will call your listener whenever it finds a record and also when it could not resolve a record. First let’s run through the first approach and at the end I will show you the SAX-line parsing approach.

Let’s create a java bean class to represent our record with space character separated columns.

As you can see we use Java 5.0 annotations to mark our record format. By default the parser sets itself up to parse character separated columns and the delimiter is space.

By default the parser is setup to parse character-separated columns. The attribute spaceEscapeCharacter indicates the character used to represent spaces within column data. The parser can replace that with space before loading it into your java object. The recordIdValue identifies the value of the key column. The transformer keeps an internal mapping of the key value to the java bean class that represents it. By default the first column is the key column. You can change that by passing in parameter recordIdColumn for character separated columns or using recordStartIdColumn / recordEndIdColumn for fixed length columns. By default the column separator is space for character. You can change that using columnSeparator.

That’s enough on defining the file format. Now here is how to actually read it.

You get a transformer instance as shown above. Pass it an array of all classes that represent your various records and that uses annotations as defined above. Now you have a fully loaded bean from which to read your data. That’s all.

Now lets see how you define the same for a fixed column record format. The parsing code above stays the same. The difference is in how you annotate your result bean class.

The parsing logic stays the same. Just give it the correct line of data.

Now I will show you the SAX-like parsing approach.

I have this project located at: www.javaforge.com/proj/summary.do?proj_id=271