XML2Sequential package

SeqXML is a package that contains several classes to

parse a sequential stream and address the fields as a name/value pair
convert a sequential stream to XML (elements only)
convert a XML file to a sequential stream

In order to parse the stream, the parser needs to have intimate knowledge about the structure of the stream. The structure of the stream must be described with an interface definition.

This document describes

Installation
Introduction to SeqXML package
An example
Syntax of the interface definition
What to do
Known Bugs

Installation

First download the zip file from the sourceforge project page.

The installation is very simple. Install the jar file SeqXML.jar in your classpath. Please verify that you also have Xerces parser (version 1.3.x and up) in your class path.

Test the installation by typing

java com.softwareag.benelux.test.CreateXMLStream <url to employee idl file>. The idl file is included in the zip file and must be placed in a directory that can be reached by your Web server.

After executing this test program the result must look as follows

Creating test data ...
de Grijs Rudolf 20-11-1961De Raaf 13 Culemborg 4312DJ 0345 54 41 57 rudolf.de_grijs@softwareag.com
Convert to XML ...
<?xml version="1.0" encoding="UTF-8"?>
<employee><lastname>de Grijs</lastname><firstname>Rudolf</firstname><birthdate>20-11-1961</birthdate><address><street>De Raaf 13</street><city>Culemborg</city><postalcode>4312DJ</postalcode></address><address><street></street><city></city><postalcode></postalcode></address><contact><phone>0345 54 41 57</phone><phone></phone><email>rudolf.de_grijs@softwareag.com</email></contact></employee>
And back again
de Grijs Rudolf 20-11-1961De Raaf 13 Culemborg 4312DJ 0345 54 41 57 rudolf.de_grijs@softwareag.com

Sequential streams have been around for decades. Think for example about work files or a message that is passed between applications. Nowadays XML is the choice to exchange messages between (remote) applications (SOAP is a good example) and the representation independent content.

The SeqXML package is the solution to handle sequential streams.

SeqXML has three major classes

ParseSeqStream
This class is used to parse a sequential stream and address the fields as a name/value pair. But it does not end here. You can also build a stream from scratch or modify an existing stream in the same consistent manner.
SeqToXml
This class converts a sequential stream to a XML stream. Probably you would like to convert the end result to something different. In that case just pick your favorite XSL parser to transform the XML stream.
XmlToSeq
Of course you also need to be able to convert your XML stream back to a sequential stream. That is exactly what XML2Seq does.

Current limitations SeqXML

The current version of SeqXML can only parse character data, since it is the only format that can be handled in a platform independent manner. Future version of SeqXML might support Java native data types (todo).

Furthermore the size of the fields need to be fixed, i.e. variable length fields are not supported. Neither are variable sized groups or vectors. Due to this requirement the size of the sequential stream will have a fixed size (todo).

This does not hold for the XML stream. But the structure of the XML file must exactly match the interface definition (todo).

General

All three classes use an interface definition in order to parse the sequential data. This means that you must provide every instance with this interface definition. All three classes implement interface IdlParser in order to guarantee a consistent manner of initialization.

ParseSeqStream

Class ParseSeqStream is used to handle sequential streams with a fixed layout. After that you have initialized the class with the proper idl file you must assign the sequential stream (or create an empty stream) using method createBuffer().

Next you can get the root of the parse tree with method getRoot(), which returns a FieldInterface (instance of Member). From here you can address all fields in the sequential stream with the methods of class FieldInterface.

The data structure of the parse tree

As mentioned in the previous paragraph, method getRoot returns the root element of the parsed structure. During the parsing of the idl file an in memory presentation is build where every group is represented by class Member (decorated HashMap) and every field is represented by class Field.

An instance of class Member can occur more than once, effectively creating a repeating group. The same can be done with a field (= array). A Member can contain other Members and Fields. This way the layout of the sequential stream is defined by this data structure.

A Field instance defines the location of the sequential, i.e. if you have an instance of a Field then you can extract or assign a value. The next example will illustrate this data structure.

1 LastName (A30) /* string of length 30
1 FirstName (A10)
1 Address
    2 Street (A30)
    2 City   (A30)
    2 PostalCode (A10)

If you would use the above interface definition, the next in-memory presentation would be built

If you would invoke getRoot() then you would get a reference to "Employee". From there you can get to the other Members/Fields. As you can see, a Member is always an intermediate node.

Interface FieldInterface looks as follows (not complete)

public interface FieldInterface {
    FieldInterface getElement(String key) throws SeqXMLException ;
    FieldInterface[] getElements(String key) throws SeqXMLException ;

    void setValue(String newValue) throws SeqXMLException;
    String getValue() throws SeqXMLException;

}

Method getElement is used to get a Member or Field. Use method getElements if your data stucture contains repeating groups and/or arrays. This interface does not contain an iterator. The only way to get to a Field is through an explicit path, e.g.

postcode = root.getElement("Address").getElement("Postcode");

Assume there are two addresses and you need the second one (delivery address). In that case you can get to the city field as follows

deliveryCity = root.getElement("Address")[1].getElement("City");

Depending if you initialized the buffer (createBuffer(String)) with a value or that you created an empty buffer you can read/write to the buffer using a Field object, e.g.

    System.out.println("Delivery city is " + deliveryCity.getValue());
    // Modifying this value
    deliveryCity.setValue("Moscow");

SeqToXml

This class is used to convert a fixed sequential stream to XML. Just as the other two classes, you first need to initialize this class with an interface definition.

Next you need to assign the (fixed) sequential stream and then you can convert this stream to XML:

seq2xml.setSeqStream(charstream);
Document doc = seq2xml.CreateXMLFile("employee");

In the last step you must provide a name for the root element. If you would use the result of the previous example, then the result of the last step would be (after serializing Document doc):

<?xml version="1.0" encoding="utf-8"?>
<employee>
   <LastName>..</LastName>
   <FirstName>..</FirstName>
   <Address>
      <Street>...</Street>
      <City>Moscow</City>
      <PostalCode>...</PostalCode>
   </Address>
</employee>

The conversion is straightforward: every field is converted to an element and the value of the fields are the values of the corresponding elements (actually text nodes of the elements).

Please note that for every field there is an element, even if this field contains no value (in a fixed stream, this is all spaces of zeroes).

XmlToSeq

XmlToSeq is the inverse function of SeqToXml, i.e. XmlToSeq can convert a DOM document to a sequential stream.

After initializing XmlToSeq, you can convert a DOM Document to a sequential stream

    xml2seq.ConvertToSeq(String) or
    xml2seq.ConvertToSeq(org.w3c.dom.Document)
    xml2seq.ConvertToSeq(java.net.URL)

After you have successfully converted the DOM document, you can get result with

String result = xml2seq.getSeqStream();

If the interface definition is out of balance with the XML document structure ConvertToSeq(..) will fail.

Use cases

The next example shows you in one program how you

can build a sequential stream
convert this sequential stream to XML
convert the XML stream back to the original input (1)

The following interface definition will be used (http://127.0.0.1/data/employee.idl)

1 lastname (A40)
1 firstname (A20)
1 birthdate (A10)
1 address (/2)     /* i.e. group occurs two times
   2 street (A30)
   2 city (A10)
   2 postalcode (A10)
1 contact
   2 phone (A20/2)
   2 email (A60)

String charstream = null;

try {
// build sequential stream
java.net.URL url = new java.net.URL("http://127.0.0.1/data/employee.idl");
ParseSeqStream testdata = new ParseSeqStream();
testdata.initialize(url);    // build in-memory presentation
testdata.createBuffer(); // create an empty buffer
                                   // pass a string if you want to read the data!
IField root = testdata.getRoot(); // get the root. From here you can get all fields ...
root.getElement("lastname").setValue("de Grijs");
root.getElement("firstname").setValue("Rudolf");
root.getElement("birthdate").setValue("20-11-1961");
IField[] Address = root.getElements("address"); // address is a group
Address[0].getElement("street").setValue("De Raaf 13"); // only assign first address
Address[0].getElement("city").setValue("Culemborg");
Address[0].getElement("postalcode").setValue("4312DJ");
IField contact = root.getElement("contact");
IField[] phones = contact.getElements("phone"); // phone is an array with two occurences
phones[0].setValue("0345 54 41 57");
contact.getElement("email").setValue("rudolf.de_grijs@softwareag.com");
System.out.println("Test data ...\n" + testdata); // toString() writes out the result ..
charstream = testdata.getSeqStream();

// convert to XML
SeqToXml seq2xml = new SeqToXml();
seq2xml.initialize(url); // initialize parser with interface definition
seq2xml.setSeqStream(charstream); // assign previous character stream
Document doc = seq2xml.CreateXMLFile("employee"); // create XML file
org.apache.xml.serialize.XMLSerializer ser = new org.apache.xml.serialize.XMLSerializer(System.out,
    new org.apache.xml.serialize.OutputFormat());
ser.serialize(doc);
System.out.println(xmlString); // write out the result

// And now we will convert the data back ...
XmlToSeq xml2seq = new XmlToSeq();
xml2seq.initialize(url);
xml2seq.ConvertToSeq(doc); // convert previous document back. Result is the same as charstream.

System.out.println("And back again\n" + xml2seq.getSeqStream());

}

catch(Exception exc) {
System.out.println(exc);
}

This simple example shows you how easy it is to

convert fixed character streams to and from XML
manipulate the data itself

Syntax of the interface definition

The syntax of the interface definition follows the syntax of the interface definition as it is used by EntireX SDK tools.

There are only a few differences:

interface definition does not contain an enclosing group
limited support for datatypes, viz. A(lphanumeric) and N(umeric).
The numeric data has been introduced for mainframe zoned data and only for positive numbers since these numbers are compatible with the EBCDIC values.
no support for multi-dimensional structures.
but you can simulate a multi-dimensional array by adding extra groups, e.g.

1 multidimens (/3,2)

can be rewritten to

1 multidimens(/3)
2 secdimens (/2)

Furthermore class IField contains various methods to convert the value of a field to Java native data types.

The syntax

group-definition:: <level> <name> [<occurence count>]

field-definition:: <level> <name> (<type><length>[<occurence count>])

<level>:: {number+} level is a number that corresponds to the level of the group or field.

<name>:: any combination of letters or numbers without any restrictions.

But you are not allowed to use spaces (this includes tabs, linefeeds and carriage returns), since a space is used as delimiter.

<type>:: {A|N} A is alphanumeric and N is numeric.
<length>:: {number+|number+.number+}

As you can see you can also specify a precision. The number before the decimal point defines the number of digits before the decimal point and the number behind the decimal point defines the number of digits behind the decimal point.

<occurence count>::{number+}

number >= 0

What to do

This section describes what needs to be done

Support for Java data types.
Application that can visually create custom XML layout and generate a XSL file to do the transformation.
Thoroughly testing this package
Changes to current code
- Change the current balanced line method of XmlToSeq using SAX parser
- Use a uniform algorithm to parse the sequential data

Known Bugs

Up till know there are no known bugs. But that can change if someone is willing to pickup 3 from the previous paragraph ;0).