Syntax: dsimport [options] <operation> filename
Note: Not supported for use with DocuShare Archive Server.
The dsimport command allows the bulk insertion of new site objects into a DocuShare repository or allows
the admin to update existing objects. Scenarios for this operation include moving a collection tree from one DocuShare server to another or inserting data gathered by a separate application, such as user accounts or large quantities of scanned documents, into an existing object.
The <filename> argument is a path to the exported file. The command automatically imports the file as its
exported format—XML or CSV.
Attempts to alter system resource objects, such as the admin account, are ignored in order to preserve the security of the server. If the quiet option is not given, a warning message is printed when system resource objects are encountered.
If any of the objects defined within <filename> are of type Document and the import operation updates the
object document or version properties, a Documents directory within the same directory as filename must
contain the object’s documents. An update to the document property occurs on all add and replace
operations as well as modify operations in which the object’s document field is given. The files within the
documents directory are paired to object definitions according to the following rules:
1. If a handle H is declared for the Rendition object, look for a file named H_xxx where xxx is an
integer starting with 0.
2. If the input XML comes from a previous DocuShare version, look for H_yyy where yyy is the
original filename.
3. If the document property of the object is defined with value D, look for a file named D.1.
4. If the document property of the object is defined with value D, look for a file named D.
Note: Importing large volumes, in excess of 4000 objects or documents, may exceed the resources of the
server and the capacity of the Notification Service. See Importing large volumes section of this Knowledge Base article.
Resume feature
In the case that dsimport errors out on a badly formed portion of the import file, a dump file is generated in the same directory as the input XML. If the input XML filename is Document-11.xml, the dump filename is Document-11.xml.dmp. If the resume dump file is deleted, dsimport will start as if objects were not
imported previously. Execute the same dsimport command again to resume the import from the broken
point, after the problem in the import file is corrected.
There is no option for resume. The resume feature is enabled by default. To disable the resume feature,
specify -f (force) option.
Example of dsimport resume
If dsimport add Document\Document.xml fails on Document-456, edit the import file to correct the flaw
in the Document-456 entry. The DocuShare administrator can enter the same command again and resume importing from the Document-456 entry.
Options for dsimport
-a Import single objects one at a time.
Using this option, the user can avoid RMI errors, Message Queue full and other associated problems during a large import.
If this option is on, dsimport will check the resume.dmp file under the import directory (handle mapping dump file from a previous import stage). The resume.dmp file links all the separate imports together as if it was a large single import.
· If the file cannot be found, a warning message displays and the user is prompted to confirm continuing to import the object without the resume.dmp file. The user can override the warning to continue the import.
· If the file is found, it will load the previous handle mapping information and continue to import the current XML file.
-d Destination. Specifies a repository container object to put imported objects. In most cases, this is a Collection handle. The input object definition file defines a set of objects with some internal structure, which may be a flat list or a full containment hierarchy. The default destination for imported objects, that have no parents declared, is placed in the general repository space as orphaned objects. If a destination container object is specified, all objects in the imported file that have no parents are placed in the destination container object.
-f Force. If an imported object references another object that does not exist, either locally within the imported file or within the repository if used with -r, then dsimport will attempt to complete the object import by removing the offending references. Offending object references with the list, such as references to parents or children, will be removed. Singleton references to a user, such as owner and modifiedBy, will be set to the dsimport User (admin by default: see -u).
-h Help. Display command help text.
-i Single object import. Imports only a single specified object; dsimport to skip all other objects.
-l Level. Write the specified level (debug, trace, info, warn, error, fatal) message to the log file.
-n User or Group Name recognition.
User Name recognition specifies alternative behavior for the add operation when imported users have the same user name as existing repository users. By default, add assumes that all imported objects are new, unique objects.
If an imported user has the same user name as an existing repository user, an error is signaled.
-n use uses the existing repository user if a user name match is found. In this case, the imported user is simply used as a reference within the imported file and dsimport maps all references to user name is matched to the repository user. This option is most useful when adding new objects owned by existing users. -n must_exist specifies the opposite of the default add behavior and signals an error if any imported user objects fail to match existing repository users. You must specify an option value when using -n, such as “-n user” or “n must_exist”.
Group Name recognition specifies alternative behavior for the add operation when imported groups have the same group title and owner username as existing repository groups. Default behavior is to create a new group. -n use or -n must_exist uses the existing repository group if both group title and owner username match. This option is most useful when importing into the same repository the material was exported from, particularly for maintaining ACL lists.
-o Get handle mapping report.
-q Quiet. Run the command in quiet mode. Suppresses all informational messages and user yes/no prompts. Assume Yes to all prompts and proceed at your own risk.
-r Repository References. Allows external references to existing repository objects during an add operation. Normally, add requires all object references within the imported file to reference objects declared locally within the file, with the possible exception of Groups and Users (see option -n). If quiet mode (-q) is not set, each external reference is reported.
-t Thread. Maximum thread number. By default multi-thread importing is disabled. If set to an integer between 2 and 20, the multi-thread importing feature is enabled.
Note: The maximum thread number is 20. Numbers greater than 20 is defaulted to 20. Thread performance is hardware dependent.
As an estimate, a 12500 document import takes about 58 minutes; with 15 threads specified, import takes about 53 minutes—providing approximately a 5%~15% improvement for 2~20 threads.
-u User. Run the command as the specified user. Requires a user handle as an argument. By default, all commands run as the admin user, which gives them unrestricted rights and makes admin the author of any edit operations performed by the command. For example, specifying -u User-12 causes the command to run as User-12 instead of admin.
-w Walker. What operations will be performed without altering the repository. dsimport -w runs through all of the requested operations, reporting on the changes that would occur and any errors or warnings encountered, but does not alter any objects.
Operations for dsimport
add Add new objects in DocuShare for each object specified in the XML file. A new object is created in the repository for each object defined within the input file. Each object is given a new handle according to the standard rules of object creation. All references to the original handle within the input file are converted to use the new handle during the import process. An error response is
returned when a handle refers to an object that is not defined by a dsobject definition in the input file.
modify The use of repository handles in modify and replace is the sole exception to the object definition file’s self-contained requirement. For this exception, only the handle attribute is allowed to reference an existing repository object. No other references within the object definition file can be external to the file.
replace Replaces existing objects in a DocuShare database. For each object specified in the input file, the existing repository object having the same handle is replaced by the new input definition. Attempts to replace objects whose handle does not exist in the repository results in an error response. Replace differs from modify in that properties not given an explicit value are given their default value rather than left unchanged with their current repository value.
Importing large volumes
Importing large volumes, in excess of 4000 objects or documents, may exceed the resources of the server and the capacity of the Notification Service. The Notification service is used to inform other DocuShare services of events that may require action; services such as Indexing and Subscriptions rely on the Notification service. The Notification service can be disabled to allow the large volume of objects to be imported.
To disable Notification service:
Caution: Users who subscribe to Subscription services will not receive notices when the
Notification Service is disabled during dsimport.
1. Open a command prompt window.
a. Cd to C:\Xerox\Docushare\bin.
b. Enter dsservice stop Notification to shutdown the Notification service.
2. Complete the bulk importing of objects to DocuShare. When dsimport is completed, a re-index is
required. Search results will not be accurate until the server has completed re-indexing. See dsindex on page 2–26.
Enter dsindex index_all to re-index the server.
3. Enter dsservice start Notification to restart the
Examples using dsimport
dsimport -d Collection-10 add /tmp/Collection-47/Collection-47.xml — import a file into Collection-10
Note: If an imported file does not immediately display in the Web UI, select Refresh from the
collection’s paging menu.
dsimport -n must_exist add /tmp/Collection-47/Collection-47.xml — import the same file in different
formats
dsimport add User.xml — import a set of new user accounts that have been defined external to
DocuShare
dsimport -n use -d Collection-12 -t 16 add Collection-10\Collection-10.xml — import the data without
any threads.
dsimport –d collection-A add Collection-B\Collection-B.xml — import all objects in Collection B to
Collection A.
CSV/XML file import examples
dsimport -d Collection-10 add Document\Document.csv — import a csv file
General rules for all objects.
a. The fields for DocuShare host is optional.
b. If there is comma in the any fields, put them in double quotes.
c. Handle is always the required field. It is used to specify the relationship between different
objects.
d. In some batch processing cases, if you do not want specify their relationship in dsimport, you
can put a constant value for them. For document, you can put Document-11 for all documents,
put User-11 for all users.
e. Minimum required fields for a DocuShare Document, followed with an example.
Document handle, title, rendition_file
Document-11, MyTitle, ”Hello, World.doc”
f. Minimum required fields for a DocuShare Collection, followed with an example.
Collection handle, title
Collection-11, ”MyCollection-Hello, World”
g. Minimum required fields for a DocuShare User, followed with an example.
User handle, title, username, password, last_name
User-11, userTitle, myname, mypassword, userLastName
Note: If there is a required custom property in the schema without a default value assigned to it,
the property must be included in the import file.
Behavior nuances of dsimport
The command dsimport must analyze input data and convert it to valid repository objects. The combination of data, operations, and options create a variety of behavioral conditions, as specified below.
Adding, modifying, and replacing objects with required properties
When using the add and replace operations, every imported object must include a non-empty value for all
of its required properties. For most objects, it is displayname. When an unrequired property is excluded, it
is given the default value for that property.
Dsimport modify/replace supports membership and containment modification/replace.
For example:
1. dsexport Group-11.
2. Edit Group-11.xml - add User-14 to member list of Group-11
3. Execute dsimport modify Group-11\Group-11.xml. This will add User-14 as a member to
Group-11.
Users
User definitions are treated differently than other object types. User definitions are matched against the
current repository by user name rather than by their handle. In the -n use add condition, any imported
Users whose user name matches the user name of an existing repository object is used solely as a
placeholder reference so that other object definitions in the imported file can reference that User object.
The repository User object is not altered. In the modify and replace conditions, the user name property is
sufficient to identify each User and a matching handle is unnecessary unless required for reference from
elsewhere in the input file.