This section describes the underlying ideas and concepts that would be useful for users to make effective use of GO Publisher. It would be helpful for you to familiarise yourself with the concepts in this section before configuring your first GO Publisher project.
Why Schema Translation?
When you supply data to someone else, the chances are that the other person is going to use the information in a different way than you do.
The data model in your database will have been designed to serve the business processes that you run. It will use the terminology you use and contain the information you need to know. When you hand the data over to someone else it needs to be in a form that they can understand. This means not just converting the format (from a database to an XML format) but also converting the structure of the data to suit their application. For example, you need to use terminology they will understand and pass the relevant information. It is useful to think about the translation as consisting of two processes: format conversion – the change from relational to XML encoding of the data, and model translation – the translation from your conceptual model to the data consumers conceptual model i.e. interpreting the data you understand so that it fits the consumers way of thinking about the world.
Different consumers of your data may need different data models. A rich data model containing many feature types and detailed attribution may be suitable for an expert user who wants to carry out complex analysis. The same information may well be of use to a more general user needing to provide context to other data or to illustrate a report, but the level of detail and technical jargon in the data model would inhibit them from using it. The more general user needs a simplified view of the information which suits their purposes.
If you need your data to conform to standards then those standards will provide the target model for your translation for you. If you are not bound by standards you need to agree the data model and format with the people you are giving your data to.
What is schema translation?
When you use GO Publisher to translate your data you will be translating from a data model implemented in a relational database to a model implemented in XML. This section will look at some of the common scenarios which occur when transforming data from one model to another, but before considering the translation scenarios it is worth looking at some of the characteristics of relational and XML data models.
Relational databases are used for the storage and querying of data whilst XML is used for exchanging data. The two paradigms therefore have some different characteristics;
The relational database paradigm is based on set theory. Each table in a relational model can be regarded as the set of all entities of a given type. The entities in a table are not inherently ordered, although entities can be retrieved in a specified order.
Relationships can be built into the database model with constraints added to enforce the integrity of those relationships. However, other relationships can also exist implicitly within the database and these can be discovered by querying the database using the same mechanism used to access the explicit relationships. The relational paradigm is therefore one in which there are an indefinite number of relations inherent within the database which may be found by querying.
There is essentially only one kind of relationship in the relational paradigm: the relational join. The relationships between entities are not inherently directional, so the two related entities do not have different roles in the relation.
The relational paradigm is used for the storage and retrieval of data by querying. The fact that relationships can be dynamically discovered by a relational database management system is therefore one of the most important characteristics of this paradigm.
XML data structure
XML data structure is a tree structure. The essential relationship between elements in XML is therefore the parent-child relationship where the parent contains the child element. The nodes of an XML tree are ordered. The other principal relationships between elements are therefore the next-sibling and previous-sibling relationships. All XML relationships are inherently directional, so in each relation the two elements always take different roles: parent and child, or next and previous.
Every XML document has one root element. This is the outermost tag that encloses all the rest of the documents content. The root element is therefore the only element in the document that does not have a parent element.
The possible relationships between elements are explicitly declared in the XML schema. It is not possible to discover new relationships.
XML is a medium for the transfer of data, especially via the internet. Because all relationships are declared explicitly in the schema it is easy to validate XML documents. This is extremely valuable in data exchange since the XML schema can form a part of the definition of a data exchange interface. Conformance to that interface can be tested by validation of XML documents against the schema and so errors in the data exchange are easily detected.
This section describes a number of scenarios that commonly occur in translating data from a relational model to an XML model. Each scenario is illustrated with an example showing a relational and XML representation of some data. In each example the conceptual model for both the relational and XML model is shown as a UML (Unified Modelling Language) class diagram.
In the most basic translation scenario each row in a relational database table becomes one element within XML. Each column within a row becomes a child element or attribute of the element corresponding to the row.
In this case the data structures within the relational database and XML can be regarded as equivalent since they could both be seen as encodings of the same conceptual model. This scenario represents a format translation with no effective translation between conceptual models.
This is an image showing Simple Translation (Figure 1).
Figure 1 shows how a simple Building entity could be represented in both a relational table and an XML model. The translation process must convert each row of the BUILDING table into a Building element in XML.
The simplicity of the model makes for a high degree of interoperability. Most GIS applications could easily connect to this database model or read the XML model. However, the simplicity of the model limits its expressiveness. Only basic information can be held in this model.
Grouping of columns
In this scenario a number of columns in the database model are grouped together into a complex element within the XML model.
This creates an entity in the XML model which is not expressed in the relational model. The database model remains simple and efficient, but the XML model is more expressive. This makes the XML model more suited to data exchange since it is easier to interpret.
This is an image showing Grouping of columns (Figure 2)
Figure 2 shows how a number of columns from a BUILDING table could be grouped to form an address object in the XML model. The conceptual model for the XML shows that the address element is optional: a building does not have to have an address.
In the relational model there is no explicit connection between the fields which make up the address. If, for example, the address of a feature is optional, this is represented in the database as each field being allowed to contain nulls. Rules specifying that all or none of the fields in the group must be present cannot be expressed simply in the database model. In the XML model the concept of the group is explicit, so such rules can be expressed easily.
It would be possible to create an explicit address entity in the relational model, but the choice of tables in the relational database is driven by performance considerations as well as data modelling ones. A small loss of clarity in the model is often traded off against gains in performance. For data exchange, clarity in the model is more important, so XML models are generally more normalised than storage models.
This example shows a single grouping of columns. In general groups can be nested within one another to form any number of parent-child relationships within the XML model from a single table in the relational model.
In relational models joins between tables represent bi-directional relationships. Relational modelling does not imply any significance in the direction of a relationship. Relationships between XML elements, on the other hand, are always directed: parent, child, previous-sibling, next-sibling etc.
In order to translate relational joins into XML element relationships we therefore need to choose a direction in which to represent the relationship in XML. The XML relationship we are most interested in for data translation purposes is the child-parent relationship, so in translating a relation we need to choose one table as the 'parent' and the other as 'child'.
As with the simple translation scenario each row in each table becomes an element in the XML model. Where elements are created for rows in the 'child' table these elements appear as children to the rows created for the 'parent' table.
In geography parlance the 'feature-attribute' relationship is the most common example of the XML 'parent-child' relation.
This image shows the Table join translation (Figure 3)
Figure 3 shows a table called BUILDING_EXTENSION which is joined to the BUILDING table via the BLDG_ID and ID columns. The conceptual models for the relational and XML models are very similar. The only difference is in the specialisation of the association. In the conceptual model for the relational model there is no need to specify the type or direction of the association. In the XML view we must specify the 'composition' association to show that buildings contain extensions.
Concatenation takes two or more values from the database and puts them together in a single value in the output XML, optionally with a separator between like a comma. In this example the town and the reference number concatenated together to form the uniqueReference number.
This is an image showing table colums concatenated together to form a single property value (Figure 4)
This pattern is not reversible i.e. It is not possible to query against this property.
Filtering is simply the process of using only certain rows from a table based on some conditions. In this example only the rows where the DESCRIPTIVE_GROUP column has the value "Building" are output as XML. The other rows are discarded.
This is an image showing filtering certain records to publish as XML. (Figure 5)
In GO Publisher filters are set up in the “SQL Filters” tab. The filter is expressed as a SQL fragment. This fragment is part of the WHERE clause of the SQL which will be used to extract data. The above example SQL Filter might look something like this DESCRIPTIVE_GROUP = 'Building'.
In fan-out each row from a table is translated into a different element in the output XML. The rules specifying which row goes to which element are specified as a series of filters. In this example the DESRCIPTIVE_GROUP column is used to decide which XML element to create.
This is an image showing fanout using filtering to output each row as a different element in the XML. (Figure 6)
In fan-in translation several different tables are mapped to the same XML element. The fact that these objects come from different tables in the database is therefore lost in the output XML. If the output XML schema has no way to represent these distinctions we can presume that this information is not required by any client applications i.e. we are holding more information about the classification of object in our database than is required in the XML.
For this pattern to be applied there must be a degree of commonality between the tables which are fanned in. In this example we can see that the tables have columns with the same names. In practice the column names could be different, but the meaning and usage of the columns must be common if they all correspond to the same element in the output XML.
This image shows multiple tables mapping to the same XML element. (Figure 7)
Constants allow us to insert a value into the output data which does not exist in the database by including a fixed value in the translation. This value can be set for each translation and so can vary from table to table.
This pattern has a number of uses:
To add metadata which is constant across the database e.g. if the schema requires the name of the source database to be included we can use a constant to apply our database name to all instances.
To add column metadata. For example, the XML schema may require units of measure. We are unlikely to store these in the database if we always use the same units.
To convert information held in the table name into attribute or element values. For example, in a fan-in translation the distinction between tables is lost, but a constant could be used to preserve this information in the output XML.
This example shows a fan-in mapping where a constant has been defined for each table to populate the “type” element.
This image shows constants used to populate the type element (Figure 8)
GO Publisher makes use of all the translations patterns mentioned on this page.
Now that you have a fair idea about the concepts behind schema transformation you may want to go ahead and read about GO Publisher Projects