Loading and  Unloading with Informix

 

Note, for simplicity these facilities will herein be referred to as ‘loaders’.

 

The following individuals would rather slit their throats than admit that they had anything to do with this:  Jay Aymond, “Obnoxio The Clown”, Andrew Hamm, Art Kagel, Jonathan Leffler, Jack Parker (editor), David Williams (FAQ owner).

 

This is not meant to be a definitive document defining everything there is to know about the subject. This is meant to be a starting point on your road to understanding this topic and a roadmap of where to visit next on your journey of discovery. 

 

This section of the FAQ is organized into:

 

Load Matrix

Loader       

Availability

*Error Handling

*Performance

Interface

Pro

Con

Load

All versions

N/A

1GB/HR

SQL, All languages.  Not ODBC or JDBC

Simple

Slow, Long Transactions, 2GB Limit

dbload

Not 8.x

Error log

1GB/HR

Operating System

Flexible

Slow, more complex to specify, 2GB Limit

Insert Cursor

All versions

Programmatic

2GB/HR

All languages

Faster

More complex, requires programming, 2GB Limit

dbimport/dbexport

Not 8.x

N/A

1GB/HR

Operating System

Simple

Does not retain database info, 2GB limit per table.  Entire database

Myexport/myimport

All versions

N/A

Dependent on SQLCMD

Operating System

Resolves dbimport issues

Does not lock the database on unloads.  Not distributed by Informix, 2GB Limit

SQLCMD, SQLRELOAD, SQLUNLOAD

5,7-9

Depends on Mode, can be programmed

1GB/HR

Operating System

Flexible (non-Informix) output formats

2GB file size limits. Not certified on 64-bit. Not distributed by Informix

Perl DBI and DBD::Informix

All

programmed

1-2 Gb
    [Similar to insert cursor, load, dbload etc]

Perl (Operating System)

Extremely powerful data massaging capability.
*CAN* be programmed to high degree.

performance does not exceed dbload, load, etc.   *MUST* be programmed, 2GB Limit

High Performance Loader/Unloader

7.x, 9.x

Reject Files

2GB/CPU/HR

X-windows, Operating System

Fast, File size limited by OS

Finicky, resource intensive

Parallel Loader

8.x

Reject Files

2GB/CPU/HR

SQL

Fast, File size limited by OS

None

 

* Error Handling.  All databases support violations tables which can handle most errors, see the topic on this below.

* Performance is relative.  These load rates are intended to be compared to one another and to provide some concept of expected performance.

           

Load Issues:

 

2GB Limit

With the exception of the parallel loaders (HPL and Ploader) all of these utilities are subject to the 2GB limitation when dealing with output or input files.  You may get around this when writing to tape, but if you are dealing with that much data in a table it is highly recommended that you use one of the parallel loaders anyway.

 

Violations tables

'These are tables which store rows that could not be inserted into the table due to violations such as uniqueness or table constraints. You can create them using the SQL command `start violations for table xxx' and enable logging to them using the "set [constraints | indexes | triggers].  See the SQL syntax guide for more complete information.  Two tables are created, one which stores the row in question and another which stores information on the error within the row.  

 

Light Appends and Scans

When doing traditional reads or writes to a table using the insert, update and select statements, the row in question is read from the table and inserted into a buffer.  When doing an insert the row to be inserted is placed into, potentially, the same buffer.  When performing scans of an entire table or large quantities of inserts this can flood the buffer with data, and implies substantial overhead to manage. 

 

A light scan avoids this by using it’s own set of buffer pools which do not have the same associate overhead, they are typically 1-4 times faster than a traditional sequential scan.  To enable a light scan the table to be read must be larger than the buffer pool (note this implies that the optimizer knows this – don’t forget to update statistics), the isolation mode must be dirty read or Repeatable read or committed read with a shared table lock.  With version 7.x or 9.x there can be no varchars in the table.

 

A light append writes to it’s own set of disk pages – again avoiding the buffer ‘wall’.  If the load is successful these pages are appended to the table.  This has some interesting implications:

 

    1 – when loading a raw table or running in express mode, none of your special SQL tricks will work - triggers, indices and referential constraints are not processed.  In fact you have to turn them off for light appends (which is what you would want to do anyway, and then rebuild them after the fact).

    2 - Since new pages are appended, any existing free space within your table IS NOT RE-USED.  So if you delete * from table and then do an express load you will have dead space in your table - this will be re-used with standard inserts, but not by the express load.

 

Table types

With the advent of the ‘raw’ table type, it should be noted that these are very effective for loading.  Not only are these the ideal target for a light append, but there is no logging associate with the table.  Once the load is complete and you wish to create indices or constraints, be sure to alter the table type to whatever you desire, typically standard except on XPS.  A table is turned into a raw table with the ‘alter table type (raw);’ statement and altered back by ‘alter table type(standard);’ or whatever table type you choose.  This may require a backup of the database or a reset of the backup flags before the ‘alter table’ is allowed.

 

 

Long Transactions

These are described in detail elsewhere in the FAQ (Refer to Informix FAQ - Common Questions about the engine).  With respect to loading these can be a real pain.  Essentially, a transaction which processes too much data will fill the logical logs.  These logs cannot be freed unless the transaction completes, the transaction cannot complete because it runs out of log space.  In the end the engine will either rollback the transaction or crash.  Hence it is wise to avoid running into this sort of situation.  Care should be taken that loads do not occur against logged tables. 

 

Furthermore, even though a table is not logged, extent allocations for tables ARE logged.  If the table has been created with insufficient extents, the engine may have to continually allocate new extents for the table – a logged event which can also trigger a long transaction if enough of them occur.

 

Fragmentation

With 7.x and higher, the engine will generate a separate thread for each table fragment on a separate dbspace for reads and writes.  If you have four extents on a single dbspace, this will only get you one thread, but 4 extents on 4 different fragments will cause the engine to use 4 threads to read or write to the table.  This is a major factor in getting tables to scan or insert data more quickly.  Not only is there less data per disk, but you get more threads.

 

Blobs

BYTE blobs are unloaded in a hex format that uses two characters in the unload file for every byte in the data.  This makes handling large byte blobs expensive on disk space.  TEXT blobs are unloaded more or less verbatim, except that newlines, delimiters and backslashes are protected by another backslash -- the same as long character strings.

 

The way that BYTE and TEXT blobs are logged is different from regular data, but as long as you aren't deleting and inserting data in a single transaction, the load will proceed as long as you have as enough free disk space in the blobspaces (or tablespaces) where the blobs will go.

 

[The rules for HPL may be different - I'd expect it to support loading BYTE blobs (in particular, but also TEXT blobs) directly from named files.  I don't know whether XPS supports BYTE and TEXT blobs or not; I suspect that some versions did not.]

Loaders:

There are several facilities to allow you to get data into and out of an Informix database.  Each one has some advantage over the others which keeps it from becoming totally obsolete.  Below you find will discussions for:

·        LOAD (UNLOAD)

·        DBLOAD

·        Insert Cursor

·        Dbimport/dbexport

·        Myimport/myexport

·        SQLCMD

·        Perl DBD

·        HPL

·        Ploader

Load

LOAD is the simplest and oldest of these.  It is invoked from within SQL as:

 

LOAD FROM filename INSERT INTO table;

 

There is a corresponding UNLOAD statement:

 

UNLOAD TO filename SELECT * FROM table WHERE [condition];

 

Load also has the ability to specify a delimiter and a limited ability to map your destination columns:

 

LOAD FROM filename DELIMITER '|' INSERT INTO table(column1, column2, column3);

 

LOAD is the easiest loader to script with, there is no setup required to make it work.  However, the entire contents of the load is handled as one transaction.  For a large file this may overflow the transaction logs on a logging database and run you into a long transaction.  Load has no error handling ability, if there is a bad record in the load file, the load will abort, leaving some data loaded and everything from the bad record on - not loaded. 

 

It is also not exactly the fastest loader around, however if you include setup time in your load time calculation LOAD is very effective for files under 1000 records.  Your mileage may vary, but you can expect a load rate of 1GB/hour with this facility.  Load rates mentioned here are relative (especially to one another).

 

There is also the issue that LOAD and UNLOAD commands are not strictly SQL statements understood by the server in the sense that INSERT is an SQL statement understood by the server.  Indeed, neither LOAD nor UNLOAD is available as a preparable SQL statement, and neither is available in ESQL/C (and, I presume, not in either ODBC or JDBC either).  They are available in ISQL and DB-Access, which recognizes them explicitly and handles them in the client side, and in I4GL (D4GL and NewEra), which again handles them specially on the client side -- as a subroutine call.

 

LOAD and UNLOAD are documented in the SQL Syntax manual.

 

Back to Matrix

dbload

DBLOAD resolves the long transaction issue as records are committed at  intervals specified by the user, (default is every 100 rows).  It is  more complex in that it requires a command file to specify the load  parameters.  This same command file provides much better mapping between  source and destination columns. dbload also has an error handling capability.   Each bad record is rejected to an error file along with a reason that  record was rejected.

 

The other important feature of DB-Load is that it handles records with a fixed layout, possibly with no delimiters.  This uses a different notation to describe the data, of course.

 

This is documented in the Informix Migration Guide under the Utilities chapter (11) 

 

DBLOAD will run at a rate comparable to the LOAD statement.  (1GB/HR here)

 

Back to Matrix

Insert Cursor

INSERT CURSOR. Moderately more difficult to work with, especially since this requires a procedural language like 4gl or esql/c.  The insert cursor is faster than LOAD and DBLOAD because it opens a 'pipe' (if you will) directly to the table, the insert statement is parsed only once, and data either goes in or doesn't - error handling is up to the programmer.  I do not have a tasty simple example of one lying around, I'll work on that.

 

The insert cursor will be documented under the language where it is used.  It should have a separate topic since there is some work to using it, declaring the cursor, (potentially describing it under esql/c) opening it, putting to it, flushing, and closing.  This gives you the finest level of control over your load.

 

The hardest parts of using an INSERT cursor are the error handling and recovery and parsing the input data for subsequent use as parameters to the INSERT cursor.

 

The insert cursor is roughly twice as fast as dbload or load.  (2GB/HR

here)

 

Back to Matrix

dbimport/dbexport

 

If you need to unload an entire database (note not an instance) move it somewhere and re-import it, then the dbexport/dbimport pair are fairly nice.  dbexport database will create a subdirectory from your current location entitled database.exp and fill it with unload files named with the first 5 characters of the table name followed by the tabid.  It will also generate a schema for the database in that directory entitled database.sql.  There can be no active connections to the database during the export.  dbimport reads this directory and rebuilds the database. 

 

These have the capability of writing directly to tape instead of disk.  dbimport also has the ability to specify the destination dbspace and logging mode of the new database - which is key, dbimport is not smart enough by itself to recreate the database with the same logging that the export read from.

 

Never let the schema be written to the tape with DB-Export.  Always, but always, ensure that the schema is written to the file.

 

These utilities are not available with Informix 8.X (XPS).  They are documented in Informix Migration Guide under the utilities chapter (11).

 

Back to Matrix

Myexport/myimport

Myexport/myimport is a set of Korn Shell (KSH) scripts that use the dbexport compatibility flag (-l) in the myschema utility (from the package utils2_ak) to create a schema and Jonathan Leffler's excellent sqlunload/sqlreload utilities from the sqlcmd package (version 52 or later) to emulate most of the functionality of the dbexport and dbimport utilities with complete compatibility and with extended functionality.  Databases dbexported can be myimported and vice-versa.  The extended functionality includes parallel unload in myexport which speeds unloading of larger databases and helps reduce any potential problems caused by myexport NOT locking the database during the unload.  The advantage of not locking the database is that you can myexport while production activity is going on so that you can use myexport for periodic snapshots of the database to improve disaster and moron recovery.  The former of course protects against flaky disks while the latter against flaky users. The downside of not locking the database is that you may get inconsistent data due to ongoing work, especially in non-logged databases.  The parallel option reduces the conflicts.  With the -p flag myexport will export all tables in parallel using multiple copies of sqlunload.  Myimport has a feature that allows one to provide a list of dbspace name mappings which it will use when recreating the database on a server where the dbspace names may be different than on the source server.  Myimport is great to load a copy of a production database on a test machine.  Since myschema does not yet fully support 9.2x features some 9.2x/9.3x databases may not be able to use myexport/myimport.   However, likewise as myschema adds support for new features myexport/myimport will be able to take advantage of these as well.  If only dbschema had the equivalent of the myschema -l flag you could use that in the scripts.

 

The utils2_ak package can be downloaded from http://www.iiug.org/software.

 

Back to Matrix

SQLCMD, SQLRELOAD, SQLUNLOAD

 

The source code for the current version of the SQLCMD utility can be downloaded from the Software Archive of the IIUG web site at http://www.iiug.org/software.  As of 2001-07-01, the current version is 64 dated 2001-04-04.  SQLCMD is a general-purpose SQL Command Interpreter, and you need ClientSDK (or ESQL/C) and a C compiler to build it.

 

SQLCMD supports the UNLOAD command as specified by DB-Access.  SQLCMD supports the LOAD command as specified by DB-Access.  If the database is logged, the LOAD statement must be executed in a transaction.

 

SQLCMD also supports the RELOAD command, which is very similar to the LOAD command, but if the database is logged, it automatically applies a transaction around the load (unless one is already in progress), and it commits every N rows (where N is 1024 by default but can be changed by the -N txsize option), and so on.

 

SQLRELOAD is a convenient short hand for loading a single table:

 

sqlreload -d dbase -t table [-i datafile] [-N txsize]

 

This is equivalent to 'LOAD FROM "datafile" INSERT INTO table'.  If you don't specify the input file, the input comes from standard input. SQLRELOAD automatically takes care of transactions in a logged database.

 

SQLUNLOAD is a convenient short hand for unloading a single table:

 

sqlunload -d dbase -t table [-o datafile]

 

This is equivalent to 'UNLOAD TO "datafile" SELECT * FROM table'.  If you don't specify the output file, the output goes to standard output.

 

Note: sqlreload is a synonym for 'sqlcmd -R' and is implemented by a link to the sqlcmd executable.  Similarly, sqlunload is a synonym for 'sqlcmd -U', also implemented by a link to the sqlcmd executable.

 

Note that you can specify "/dev/stdin", "/dev/stdout" and "/dev/stderr" as file names in most contexts, even if the physical devices do not exist on your system (SQLCMD fakes them when necessary).  So, for example, omitting -o datafile from sqlunload is equivalent to specifying -o /dev/stdout.

 

Format Options

 

The overall output format can be specified by the -F option, which takes the values 'unload', 'select', 'fixed', 'csv', 'quote', or 'xml'.  The default is 'unload'.  The 'select' format omits the final delimiter from the data.  In fixed format, all fields except BYTE and TEXT blobs use a constant output width, with no separator between the fields.  In CSV (comma-separated variable) output, each field that is not a pure number field is enclosed in double quotes, and the fields are separated by commas.  This can be used to export data for use in spreadsheets, etc. The 'quote' format is similar to CSV except that the comma separator is replaced with the delimiter.  The 'xml' format encloses each record in <RECORD> and </RECORD> tags, the latter followed by a blank line, and separates each field with a newline, and encloses each field in <fieldname> and </fieldname> tags.  You can change the RECORD tag by specifying -F xml=mytag.  Any less than, greater than or ampersand characters in the data are replaced with "&lt;", "&gt;" and "&amp;" symbols instead.  Note that SQLCMD is not willing to load XML data, and there are no plans to upgrade it to handle the complexities of parsing XML.

 

For both loading and unloading, the delimiter can be controlled by the -D option or the DBDELIMITER environment variable (defaulting to pipe). Similarly, the escape character can be controlled by the -E option or the non-standard DBESCAPE environment variable (defaulting to backslash), and the quote character for CSV or quote format can be controlled by the -Q option or the non-standard DBQUOTE environment variable (defaulting to double quote).  When a newline, delimiter, escape or quote character appears in the data, it is preceded by the escape character.

 

The format for date fields is controlled by the command line option –A 'format' or the DBDATE environment variable, and the format notation uses the notation recognized by the rfmtdate() function -- dd for day number, ddd for day abbreviation, mm for month number, mmm for month abbreviation, yyyy for year (and yy for 2-digit year if you absolutely need it, but avoid using it if at all possible).

 

If the data to be loaded contains integer fields with leading zeroes, you will need to specify '-a 10' to prevent the values being interpreted as octal numbers.  You can also set '-a 16' to treat them as hexadecimals.  By default, the value is '-a 0' which treats 0xXXXX as a hexadecimal value, 0XXXX as an octal value, and anything else as a decimal.

 

During a reload operation, you can control the size of the implicit transactions with the -N option.

 

If you need column names in the first line of the output (often useful with CSV), add the -H option.  If you need the data types, add the –T option.  The information appears in the order column names, column types if both -H and -T are specified.

 

Note that all command line options can be overridden in scripts with suitable commands.  The 'context' command gives you the current settings of the working context for SQLCMD.

 

SQLCMD v64 accepts DBMONEY for formatting DECIMAL and MONEY values, but not for formatting FLOAT and SMALLFLOAT values.  This is a bug.

 

SQLUPLOAD

 

SQLUPLOAD is some alpha-quality software distributed as part of SQLCMD. It provides an 'update if present else insert' functionality, sometimes known as 'UPSERT'.  There are other ways to deal with UPSERT in the engine if you have the data loaded into some transient table (which might or might not be a temporary table), but SQLUPLOAD works without such workarounds.  See the man page provided with the software for more information.

 

UNLOAD FORMAT

 

The file unload.format distributed with SQLCMD describes the format of unloaded data in detail.

 

DBLDFMT

 

The DBLDFMT utility is also available from the IIUG web site.  It can take fixed format data and convert it into delimited data suitable for use with a LOAD operation.  It can do things with fixed format data that DB-Load cannot do, such as insert actual decimal points when the data is supplied with implicit decimal points.  It can also do a lot of date and time transformations.  However, it does not attempt to handle the actual loading of data; it is a filter program.  Also, it does not deal with variable length inputs, so the date and time transformations are of limited benefit -- they should probably be handled in a separate program, or a date/time transformer is needed for delimited data.

 

ISEXTRACT

 

The ISEXTRACT utility is also available from the IIUG web site.  It can extract data from C-ISAM files provided you know the data types and position of each column in the data file.  The output format is LOAD format.

 

Back to Matrix

Perl DBI and DBD::Informix

Often, the hardest part of data up-loading is making sense of the source data. Perl can replace an ill-fitting combination of languages such as the COBOL of the source system, ftp transfers, 4GL and scripts of shell, awk, and SQL.

 

The main Perl tool for Informix programming is a pair of modules called DBI and DBD::Informix. These module allows direct, multiple connections to an Informix engine. Other modules in the DBI suite support connections to over a dozen major database engines, all accessed using the uniform interface provided by DBI.

 

There is a well-established public domain supply of modules, and for loading of data, you can find modules to read and write CSV files, compressed files, convert to and from EBCDIC, analyse strings to extract addresses and proper names, and many other processing tasks. Fixed records, BCD and binary information can all be read, analysed, converted and uploaded within one script.  Modules of this nature and the most recent version of Perl itself may be found at www.cpan.org.

 

Validations against existing database tables can be easily programmed. A very powerful regular expression engine is built-in to the language. The power of the transform and load process is only limited by your imagination and your prowess as a programmer.

 

SQL statements can be executed directly from strings or prepared as a cursor for repeated execution. With the ability to connect to multiple engines and transform complicated source data, Perl can greatly simplify your work. If the performance of the Perl database inserts is not sufficient (similar to SQL or 4GL) then you can still use Perl to generate your upload files, and it may be ideally used as a source for piped data to HPL.

 

Perl itself comes with online documentation and tutorials accessed through a command called perldoc. Nearly all modules come with good or adequate documentation, also accessed through the perldoc command.

 

Amongst copious books you've probably already seen lining the stores, there are the famous Camel and Llama books, "Programming Perl" and "Learning Perl". The first one is a mixed tutorial and reference for mid to advanced programmers.

 

The Cheetah book "Programming the Perl::DBI" is a worthy reference and tutorial for DBI programmers. All of those books are from O'Reilly Publishing.

 

Perl can be downloaded from http://www.cpan.org.

 

Back to Matrix

High Performance Loader/Unloader

(version 7.x and 9.x specific)

 

The High Performance Loader can take advantage of the full instance and filesystem, provided things have been set up proPerly.

 

HPL will initiate a separate thread for each component of your load, so each separate table fragment gets it's own 'insert' or scan thread, each separate load or unload 'file' will get it's own read or write thread.  Conversion threads between the two sets can be controlled (to some degree) with a separate $PLCONFIG file.

 

In express mode an HPL load job will perform light appends.  When unloading, to get light scans, be sure to set options to dirty read.

 

HPL can also operated in 'deluxe' mode in which it goes through the SQL buffers and maintains your indices, constraints and fires triggers.  Of course this means that you've hobbled your load, but that may be desirable.

 

HPL can read from or write to files, tapes or pipes.  These are set up in devices, so you actually unload to a device.

 

Setup for HPL jobs is performed using a GUI which is fairly intuitive given the many complexities of the job setup.  For a load you are walked through the five pieces of the process consisting of a device which describes the file(s) to be loaded from, a format which describes the contents of the file(s), filters which describe data to be excluded (be warned when using a filter make sure you use the same datatype in describing the filter as exists in the file or the load will spend needless cycles converting), a map which describes how to map your input to the table, options where you can specify the type of load, and the table to be loaded.  (for an unload it is the reverse; query, options, map, device).  You can easily 'generate' a load/unload job to handle standard processes, although HPL has a lot of flexibility built in if you want to use it - the ability to read EBCDIC, not map fields to/from columns or map one to two places.  You can even link in your own functions to the loader and perform custom conversions (I have not done this personally).

 

To initiate the gui (and build the onpload database - which database name you have control over with the environment variable $DBONPLOAD) invoke ipload from an X-station near you.  (has to be X).  Mind that you've set up your $DISPLAY, xhosted the server, set your $INFORMIXSERVER to the tcp connection.  While you can run the load/unload jobs directly from the interface, I do not recommend it.   You get a log message every 10,000 rows, the interface can only handle about 12 of these messages per second – it’s not fast enough to keep up with the display.  So while your job may have finished in 30 minutes, it can take another 30 minutes for the log messages to stop scrolling on your tube - and guess what - the job ain't done until those log messages are complete.  Use the command line interface 'onpload' to run things from the command line, if you want to watch progress, tail -f the log file.

 

Recently a utility called onpladm has been added to the package to allow you to maintain job definitions from the command line and avoid the GUI interface.  This removes the X-station requirement for HPL.

 

The bottleneck with HPL is generally NOT the cpu, but disk I/O, hence to make things run as fast as possible it is useful to minimize this I/O.

 

Recommendations for optimum performance:  (unload)

1 - write to named pipes.  (use mknod -p to create these.  Your device elements look like:  'cat > /tmp/pipe.1')

2 - have a second set of processes which read the pipes.  (gzip - > file.gz)

3 - spread your unload files over multiple devices.

4 - make sure you have dirty read set under options.  (to enable light scans).

 

(load)

1 - read from named pipes.  (Your device elements look like 'cat /tmp/pipe.1')

2 - feed these pipes with an external process (gunzip -c file.gz > /tmp/pipe.1)

(actually, I just set up the device as 'gunzip -c file.gz' - but to be consitent I tried to spell it out this way).

3 - fragment your table over multiple dbspaces (2 per CPU).  This is a major element to performance.  When you set up your dbspaces, do not try to have big, or deep, dbspaces.  1 2GB chunk per dbspace is best.  Fragment your tables across dbspaces, not try to get 8GB on one dbspace.  You will get more write threads.

4 - When creating the table, your initial extent should be the minimum, the next extent should be your real first extent - alter the table later to correct it.  If you do not size the table properly, HPL may waste needless time in next extent allocation.

 

 

With this facility you can expect load rates at a minimum of 2GB/CPU/HR.  For a 4CPU box this means 8GB/Hour.  If you follow the directions above......  Unload rates should crank to around 4 or 5GB/CPU/HR.  This is an invaluable tool for re-organizing a table.  It is also a cute trick to get around the buffer overhead when you want to read from one table and insert into another, set up an unload job to a pipe, set up the load job to read from the pipe and Bob's your uncle.

 

HPL is documented in its own manual called, appropriately enough, 'High Performance Loader'.

 

Con.

The interface is a real pain.  HPL can 'fall over' easily if you get too exotic.  It is better to drop and redfine a job than to try to debug it.  As Kerry Smith pointed out - it's like a race car, finicky but fast as h*ll.

 

It is highly recommended that you test an HPL job with 10 or 20 thousand rows before entrusting your data to it.  It is easy to get something wrong.

 

Back to Matrix

Parallel Loader

(8.x XPS specific)

 

Imagine fixing the problems with HPL.  Welcome to the XPS parallel loader.

 

For starters, there is no concept of a job with this loader.  In fact you're not really aware that it is a loader at all.  You create 'external' tables which can represent files, tapes or pipes.  You then treat these tables as though they were regular SQL tables.  A load is therefore, 'select from external_table insert into internal_table;' and unload is the reverse.  This makes it incredibly easy to set up your load/unload processes, and lends itself well to scripting.  There is new syntax to support the external table creation and the table type is 'E'.  Beyond that there's not much to it, you can treat these 'tables' as though they were normal tables, perform joins with them or whatnot - although be advised that there are no statistics for these tables, so the optimizer will assume that there are 2 billion rows (max int) rows in the table.

 

As with HPL, you want to make sure that you fragment your table optimally as well as your output files or whatnot.  The load rate for this loader is roughly comparable to HPL - again the bottleneck is not CPU, but disk.  Since XPS servers tend to be larger than 7.x or 9.x servers, the load seems to run faster, but the actual load rate is the same - 2GB/CPU/HR or thereabouts.  With XPS you also have a 'raw' table type which optimally supports the light appends used with XPS.  Once you've loaded your table you can alter the table type to whatever you want (provided you've done a backup - or otherwise reset the backup flags).

 

Note that XPS does not have to be running ploader to handle light appends.  It will use these against any eligible table as needed.

 

The Parallel Loader support the same features as HPL, EBCDIC, express/deluxe load modes.  Mapping, mathematic calculations and so forth you handle within your SQL select/insert statement.

 

The Parallel loader is documented in chapter 8 of the XPS Admin Reference manual. 

 

Con.

Provided you don't do something silly, there aren't any real cons.  Make sure you have enough disk space to support an unload, make sure the files exist with the proper names for a load.  A load job can fail, but not as easily as with HPL. 

 

Back to Matrix

 

APPENDIX

Perl Examples:

UNLOAD:

As always, the hardest part is dealing with nulls and blobs!

use DBI;
$dbdelim = $ENV{DBDELIMITER} ? $ENV{DBDELIMITER} : "|";
$dbh = DBI->connect('dbi:Informix:dbase@server', $username, $password,
{RaiseError=>1});
$dbh->{ChopBlanks} = 1;
$sth = $dbh->prepare("SELECT * FROM Customers");
$sth->execute;

# Option 1: more or less Informix UNLOAD format but:
# -- all fields enclosed in single quotes
# -- no escapes of $dbdelim in data
# -- no escapes of embedded newlines
# -- byte blobs are not hex-encoded
$sth->dump_results(32767,"\n",$dbdelim); # See 'Perldoc DBI' for 4
optional arguments

# Option 2: works with no nulls in data; noisy if -w flag set
# -- no escapes of $dbdelim in data
# -- no escapes of embedded newlines
# -- byte blobs are not hex-encoded
$, = $dbdelim;
while ($ref = $sth->fetchrow_arrayref)
{
print @$ref, "\n";
}

# Option 3: maps nulls to empty strings; quiet if -w flag set
# -- no escapes of $dbdelim in data
# -- no escapes of embedded newlines
# -- byte blobs are not hex-encoded
$, = $dbdelim;
while ($ref = $sth->fetchrow_arrayref)
{
my(@row) = map { defined $_ ? $_ : "" } @$ref;
print @row, "\n";
}

# Option 4: UNLOAD format except for BYTE blobs
# -- escapes backslashes, newlines and $dbdelim in data
# -- byte blobs are not hex-encoded
$, = $dbdelim;
while ($ref = $sth->fetchrow_arrayref)
{
my(@row) = map { defined $_ ? $_ : "" } @$ref;
s/[\\$dbdelim\n]/\\$&/gm for @row;
print @row, "\n";
}

If there are BYTE blobs in the data, then you need to use the 'pack'
command with the H format for the BYTE blobs (only).  This is messier...

 

# Option 5: UNLOAD format with BYTE blobs hex-encoded
# -- escapes backslashes, newlines and $dbdelim in data
$, = $dbdelim;
# Generate list of column numbers of BYTE columns.
my(@byte);
my($nbyte) = 0;
my(@types) = (@{$sth->{ix_NativeTypeName}});
for (my($i) = 0; $i <= $#types; $i++)
{
$byte[$nbyte++] = $i if ($types[$i] eq "BYTE");
}
while ($ref = $sth->fetchrow_arrayref)
{
my(@row) = map { defined $_ ? $_ : "" } @$ref;
s/[\\$dbdelim\n]/\\$&/gm for @row;
for (my($i) = 0; $i < $nbyte; $i++)
{
$row[$byte[$i]] = unpack('H*', $row[$byte[$i]]);
}
print @row, "\n";
}

I'd hope that the longhand loops could be shortened, but haven't spent
the time working out how to do it.  Note the counter-intuitive use of
unpack for the UNLOAD operation; we'll need to use pack to convert from
the hex-string to the binary data.