Physical Database Organisation
Using default settings, GO Loader can automatically create tables and populate those tables that are appropriate for small datasets. However, if you are using larger datasets (> 5 million features), then some consideration and action is required to organise the physical characteristics of your database in order to be able to gain the best performance for your system. GO Loader provides some automation and assistance in this task. For medium sized databases, action may be necessary to create database Tablespaces, and assign tables and indexes to these tablespaces (see Tablespaces section). For very large datasets (> 40 million features), database Partitioning may be required – which is supported by GO Loader if you have purchased the Enterprise Edition of the product.
A Tablespace is a means of making a logical association from one or more tables or indexes to a physical database file. This means that it is possible to ensure that, for instance, database tables and indexes are stored on different physical disks or devices from each other, or that each individual table and index can be stored in separate individual files that may be placed on separate or shared devices as required. For performance reasons, it is best to ensure that, as far as possible, tables and indexes are placed on separate disks to each other. Where practical, database tables may be assigned to separate tablespaces placed on separate devices. For very large tables, the table may need to be partitioned (see partitioning section), and individual partitions applied to separate tablespaces.
Databases are created with a default set of tablespaces. Extra tablespaces may be created by logging in as a user with sufficient privileges for tablespace creation, using the Enterprise Manager console, or from SQL*Plus using the commands contained in the example scripts (see createTablespaces.sql in the SQL subdirectory of your GO Loader installation). Within GO Loader, a default tablespace may be assigned to all tables and indexes and/or each table and index can be assigned to specific tablespaces.
Partitioning is the process of dividing a single logical database table into a number of identical physical tables for the purpose of easing the management and potentially improving the performance of very large database tables. Partitioning does not need to affect the queries that user applications use to access data from the database table (the physical partitioning is hidden from the client and may be ignored in queries, but certain partition(s) can be indicated to improve query performance. To determine which physical database partition is used to place each individual database object being added to the partitioned table, one of the table's columns is designated as the partition key. Depending on the value of each record's partition key, the record will be placed into the appropriate partition. GO Loader uses the position of features to determine the value of the partition key for each record.
In order to obtain performance benefits from a partitioned database table, queries need to include information regarding the partition(s) to check for the required objects.
The Enterprise Edition of GO Loader supports the creation of database tables and indexes that are partitioned, even allowing individual partitions to be assigned to separate tablespaces. GO Loader Enterprise Edition creates partition keys that are based on the spatial location of the feature in relation to user definable partition maps. In this way, queries from the database that relate to a specific geographical area can always be satisfied by relating the query to just two (or more if the area of interest crosses a partition boundary) known partitions from the database.
The indexes of partitioned tables themselves may or may not be partitioned. For particularly large tables, the index size may become too large to be physically contained within a single tablespace, in which case the index must be partitioned. Each index for a particular partitioned table may be specified as partitioned or not partitioned: if it is partitioned, a default tablespace can be specified for the partitioned index, with individual partitions assigned their own tablespaces if required. GO Loader offers facilities for automatically assigning tablespace names for tables and indexes, depending on the partition. To use this facility, tablespaces need to be created in the database with the names DATA. INDX, DATAPn, INDXPn, where n=0, 1, etc up to the maximum number of partitions. A provided script (createTablespaces.sql) can be used for this purpose.
The choice of the number and extent of partitions and the geographical areas used to assign partition keys will depend on a number of factors including the size of the database tables, the expected distribution of the source data, the maximum data size allowed within individual tablespaces, the usual extent of geographical queries, etc. GO Loader allows for the creation of user defined partition maps within specially designated database tables. A provided script (createPartitions.sql) may be used to create the tables and it also illustrates the population of the tables to create a partition map. Users may create their own partition maps based on their own requirements which can be imported into GO Loader.
Tables (and associated indexes if required) may be partitioned according to a chosen partition map (it is possible to vary the partition map used for different tables) as required. When each feature is loaded to the database, the bounding rectangle of each feature is compared to the bounding rectangles of each partition within the specified partition map. The partition key is assigned to the partition it is wholly contained within, but if it overlaps between partitions it is placed in the overlap partition. Thus building A is stored in partition P1 of the feature table, whereas building B is stored in the overlap partition P0.
In this way, when queries are made against the partitioned table for a specified geographical area, it is only necessary to query the partition(s) overlapping with the area of interest, plus the overlap partition. Thus, in the diagram above, to find objects in the area bounded indicated by the dashed red line, partitions P0 and P1 only need be checked. These partitions may be identified programmatically by using a query against the GO Loader partition database table.
If the boundaries of a partition map are changed in the database, then the changed boundaries can be read into your projects. If you make such changes after loading the table, then you would need to reload the data in order for the partition keys of loaded data to correspond with the partition areas defined in the database table. If you have changed the overall number of partitions then you will need to reset the SQL, drop table(s) partitioned using the map, and recreate them so that they have the correct number of partitions. Therefore it is worthwhile planning your partitions. The source file data volumes will give a reasonable indication of the density of features to be expected within a particular area, but your data provider may be able to supply a feature density map if you require a particularly accurate/even distribution of features within your partitions. In general more even distribution of feature numbers within partitions will optimise performance, but uneven feature distribution within partitions is also completely acceptable, and administration may be simplified by using partitions based on a regular grid.
When GO Loader allocates partition keys to a feature, it does so based on one of the geometry properties of the feature. The geometry property used for the creation of cluster and partition keys is known as the primary geometry. The primary geometry of a feature type is irrelevant if spatial clustering and partitioning are not being used and the choice of primary geometry property is trivial for feature types that only have one geometry. However, if you are using partitioning or clustering and the feature types for your data contain more than one geometry property, you must consider which geometry to use as the primary geometry. The primary geometry property can be selected in the Translations tab.
When GO Loader calculates a cluster key and a partition key, it does so for a feature as a whole based on the primary geometry. The keys apply to the whole feature and so can be put in any record that forms part of the feature. This means that cluster keys and partition keys can be added to any table, whether or not that table contains the primary geometry. So, where joins are used to put properties of a feature into different tables, any or all of those tables can be given cluster or partition keys and all the records relating to one feature will be given the same cluster and partition key values.
Choosing the Primary Geometry
For features that have only one geometry, GO Loader will automatically choose that geometry as the primary geometry. For features that may have more than one geometry, GO Loader will choose one of the geometry properties arbitrarily. So, for example, if your data contains road features that can have both a road centre line geometry and a road casing area geometry, GO Loader will arbitrarily choose one of these geometries. If you plan to use spatial partitioning or clustering, you should check that the choice of primary geometry is appropriate and change it if necessary.
If a feature is read and has no value for the primary geometry property, then cluster and partition keys cannot be properly assigned and default values will be used. This will result in all features with no value for the primary geometry being placed in the default cluster and default partition. The primary geometry should therefore be one that normally contains a value. If the primary geometry property is often absent, then the clustering and partitioning of the database will be unbalanced and the performance and management benefits will be reduced.
In general, the primary geometry of a feature type should be the geometry property most often queried against as this will mean that the clustering and partitioning are optimised for the most common queries. If the geometry properties have a similar spatial extent e.g. a road centre line lies within the road casing and the edge of the road casing is always close to the centre line, then the clustering and partitioning will be very similar regardless of the choice of geometry property.
Example - Choosing a Primary Geometry
The feature type "Road" can have four geometry properties:
- centreLine a line representing the route along the road;
- roadCasing an optional polygon property representing the road surface which appears on approximately half the road features in the data;
- startPoint a point representing one end of the road;
- endPoint a point representing the other end of the road.
Our main application simply finds all roads within a radius of a point. To do this it will query against the centreLine geometry.
We should therefore use the centreLine geometry as the primary geometry since
- startPoint and endPoint do not reflect the full extent of the feature in our application. Clustering the features by these properties would place roads which are close together far apart on disk and in different partitions.
- roadCasing is only present in half the features. Using roadCasing would result in half the roads being placed in a single partition and a single cluster (the defaults). In effect, only the roads with a roadCasing value would be clustered and partitioned.
- centreLine is the most commonly queried geometry property. By clustering on this property, we will get the best reduction in query times for our application because we are matching the clustering to the queries. By partitioning by this property, we will be able to relate query extents to partitions and so can manage the database based on what areas the users want to query.