DSpace

Материал из RSU WiKi
Перейти к: навигация, поиск
Pen.pngЭта статья находится в процессе написания.
Если вы считаете, что её стоило бы доработать как можно быстрее, пожалуйста, скажите об этом.
Tower of babel.png外國 language!
В статье используется несколько языков. Необходимо использовать один. Совсем неплохо, если это будет русский.

Содержание

Информация

Программное обеспечение DSpase - вариант бесплатного готового ПО, представляет собой распространённую электронную систему управления данными, которая фиксирует, хранит, индексирует и перераспределяет документы. Это ПО эффективно используется для организации институциональных и отраслевых электронных архивов и депозитариев научных статей в разных странах мира

Установка на SLES

Установка вспомогательных программ

Oracle Java JDK 6

DSpace requires Oracle Java 6 (standard SDK is fine, you don't need J2EE). Please note, at this time, DSpace does not function properly with Java JDK 7 (see warning below). DSpace does not currently support Java 7, as there is a known issue with Java 7 and Lucene/SOLR (which DSpace uses for search & browse functionality). For more details, see this article on the Apache site: "WARNING: Index corruption and crashes in Apache Lucene Core / Apache Solr with Java 7" as well as this Java bug report: 7073868 Other flavors of Java may cause issues Only Oracle's Java has been tested with each release and is known to work correctly. Other flavors of Java may pose problems.

Apache Maven 2.2.x or higher (Java build tool)

DSpace 1.7.x requires usage of Maven 2.2.x DSpace 1.7.x required usage of Maven 2.2.x, as it did not build properly when using Maven 2.0.x or Maven 3.x. This was a known issue (see DS-788). However, DSpace 1.8.x resolved this issue so that DSpace now builds properly with Maven 2.2.x or above. Maven is necessary in the first stage of the build process to assemble the installation package for your DSpace instance. It gives you the flexibility to customize DSpace using the existing Maven projects found in the [dspace-source]/dspace/modules directory or by adding in your own Maven project to build the installation package for DSpace, and apply any custom interface "overlay" changes.

Configuring a Proxy

You can configure a proxy to use for some or all of your HTTP requests in Maven 2.0. The username and password are only required if your proxy requires basic authentication (note that later releases may support storing your passwords in a secured keystore‚ in the mean time, please ensure your settings.xml file (usually ${user.home}/.m2/settings.xml) is secured with permissions appropriate for your operating system).

Example:

<settings>
  .
  .
  <proxies>
   <proxy>
      <active>true</active>
      <protocol>http</protocol>
      <host>proxy.somewhere.com</host>
      <port>8080</port>
      <username>proxyuser</username>
      <password>somepassword</password>
      <nonProxyHosts>www.google.com|*.somewhere.com</nonProxyHosts>
    </proxy>
  </proxies>
  .
  .
</settings>

Apache Ant 1.8 or later (Java build tool)

Apache Ant is still required for the second stage of the build process. It is used once the installation package has been constructed in [dspace-source]/dspace/target/dspace-<version>-build and still uses some of the familiar ant build targets found in the 1.4.x build process.

Relational Database (PostgreSQL or Oracle)

PostgreSQL 8.3 to 8.4 PostgreSQL can be downloaded from the following location: http://www.postgresql.org/ . It is highly recommended that you try to work with Postgres 8.4 or greater, however 8.3 should still work. Unicode (specifically UTF-8) support must be enabled. This is enabled by default in 8.0+. Once installed, you need to enable TCP/IP connections (DSpace uses JDBC). In postgresql.conf: uncomment the line starting: listen_addresses = 'localhost'. Then tighten up security a bit by editing pg_hba.conf and adding this line: host dspace dspace 127.0.0.1 255.255.255.255 md5. Then restart PostgreSQL.

Oracle 10g or greater Details on acquiring Oracle can be downloaded from the following location: http://www.oracle.com/database/.

You will need to create a database for DSpace. Make sure that the character set is one of the Unicode character sets. DSpace uses UTF-8 natively, and it is suggested that the Oracle database use the same character set. You will also need to create a user account for DSpace (e.g. dspace) and ensure that it has permissions to add and remove tables in the database. Refer to the Quick Installation for more details.

NOTE: If the database server is not on the same machine as DSpace, you must install the Oracle client to the DSpace server and point tnsnames.ora and listener.ora files to the database the Oracle server.

NOTE: DSpace uses sequences to generate unique object IDs — beware Oracle sequences, which are said to lose their values when doing a database export/import, say restoring from a backup. Be sure to run the script etc/update-sequences.sql after importing.

For people interested in switching from Postgres to Oracle, I know of no tools that would do this automatically. You will need to recreate the community, collection, and eperson structure in the Oracle system, and then use the item export and import tools to move your content over.

Servlet Engine (Apache Tomcat 5.5 or 6, Jetty, Caucho Resin or equivalent)

Apache Tomcat 5.5 or later. Tomcat can be downloaded from the following location: http://tomcat.apache.org. Note that DSpace will need to run as the same user as Tomcat, so you might want to install and run Tomcat as a user called 'dspace'. Set the environment variable TOMCAT_USER appropriately. You need to ensure that Tomcat has a) enough memory to run DSpace and b) uses UTF-8 as its default file encoding for international character support. So ensure in your startup scripts (etc) that the following environment variable is set:

JAVA_OPTS="-Xmx512M -Xms64M -Dfile.encoding=UTF-8"

Modifications in [tomcat]/conf/server.xml: You also need to alter Tomcat's default configuration to support searching and browsing of multi-byte UTF-8 correctly. You need to add a configuration option to the <Connector> element in [tomcat]/config/server.xml: URIEncoding="UTF-8" e.g. if you're using the default Tomcat config, it should read:

<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 -->
<Connector port="8080"
              maxThreads="150"
              minSpareThreads="25"
          maxSpareThreads="75"
              enableLookups="false"
              redirectPort="8443"
          acceptCount="100"
              connectionTimeout="20000"
          disableUploadTimeout="true"
              URIEncoding="UTF-8"/>

You may change the port from 8080 by editing it in the file above, and by setting the variable CONNECTOR_PORT in server.xml. Jetty or Caucho Resin DSpace will also run on an equivalent servlet Engine, such as Jetty (http://www.mortbay.org/jetty/index.html) or Caucho Resin (http://www.caucho.com/). Jetty and Resin are configured for correct handling of UTF-8 by default. Perl (only required for [dspace]/bin/dspace-info.pl)

Installation Instructions

Overview of Install Options

With the advent of a new Apache Maven 2 based build architecture (first introduced inDSpace 1.5.x), you now have two options in how you may wish to install and manage your local installation of DSpace. If you've used DSpace 1.4.x, please recognize that the initial build procedure has changed to allow for more customization. You will find the later 'Ant based' stages of the installation procedure familiar. Maven is used to resolve the dependencies of DSpace online from the 'Maven Central Repository' server.

It is important to note that the strategies are identical in terms of the list of procedures required to complete the build process, the only difference being that the Source Release includes "more modules" that will be built given their presence in the distribution package.

  1. Binary Release (dspace-<version>-release.zip)
    1. This distribution will be adequate for most cases of running a DSpace instance. It is intended to be the quickest way to get DSpace installed and running while still allowing for customization of the themes and branding of your DSpace instance.
    2. This method allows you to customize DSpace configurations (in dspace.cfg) or user interfaces, using basic pre-built interface "overlays".
    3. It downloads "precompiled" libraries for the core dspace-api, supporting servlets, taglibraries, aspects and themes for the dspace-xmlui, dspace-xmlui and other webservice/applications.
    4. This approach only exposes selected parts of the application for customization. All other modules are downloaded from the 'Maven Central Repository' The directory structure for this release is the following:
      1. [dspace-source]
        1. dspace/ - DSpace 'build' and configuration module
  2. Source Release (dspace-<version>-src-release.zip)
    1. This method is recommended for those who wish to develop DSpace further or alter its underlying capabilities to a greater degree.
    2. It contains all dspace code for the core dspace-api, supporting servlets, taglibraries, aspects and themes for Manakin (dspace-xmlui), and other webservice/applications.
    3. Provides all the same capabilities as the binary release. The directory structure for this release is more detailed:
      1. [dspace-source]
        1. dspace/ - DSpace 'build' and configuration module
        2. dspace-api/ - Java API source module
        3. dspace-discovery - Discovery source module
        4. dspace-jspui/ - JSP-UI source module
        5. dspace-oai - OAI-PMH source module
        6. dspace-xmlui - XML-UI (Manakin) source module
        7. dspace-lni - Lightweight Network Interface source module
        8. dspace-stats - Statistics source module
        9. dspace-sword - SWORD (Simple Web-serve Offering Repository Deposit) deposit service source module
        10. dspace-swordv2 - SWORDv2 source module
        11. dspace-sword-client - XMLUI client for SWORD
        12. pom.xml - DSpace Parent Project definition

Overview of DSpace Directories

Before beginning an installation, it is important to get a general understanding of the DSpace directories and the names by which they are generally referred. (Please attempt to use these below directory names when asking for help on the DSpace Mailing Lists, as it will help everyone better understand what directory you may be referring to.)

DSpace uses three separate directory trees. Although you don't need to know all the details of them in order to install DSpace, you do need to know they exist and also know how they're referred to in this document:

The installation directory, referred to as [dspace]. This is the location where DSpace is installed and running off of it is the location that gets defined in the dspace.cfg as "dspace.dir". It is where all the DSpace configuration files, command line scripts, documentation and webapps will be installed to.

The source directory, referred to as [dspace-source] . This is the location where the DSpace release distribution has been unzipped into. It usually has the name of the archive that you expanded such as dspace-<version>-release or dspace-<version>-src-release. Normally it is the directory where all of your "build" commands will be run.

The web deployment directory. This is the directory that contains your DSpace web application(s). In DSpace 1.5.x and above, this corresponds to [dspace]/webapps by default. However, if you are using Tomcat, you may decide to copy your DSpace web applications from [dspace]/webapps/ to [tomcat]/webapps/ (with [tomcat] being wherever you installed Tomcat‚ also known as $CATALINA_HOME). For details on the contents of these separate directory trees, refer to directories.html. Note that the [dspace-source] and [dspace] directories are always separate!

Installation

This method gets you up and running with DSpace quickly and easily. It is identical in both the Default Release and Source Release distributions.

  • Create the DSpace user. This needs to be the same user that Tomcat (or Jetty etc.) will run as. e.g. as root run:

useradd -m dspace

  • Download the latest DSpace release There are two version available with each release of DSpace: (dspace-1.x-release. and dspace-1.x-src-release.xxx); you only need to choose one. If you want a copy of all underlying Java source code, you should download the dspace-1.x-src-release.xxx Within each version, you have a choice of compressed file format. Choose the one that best fits your environment.

Unpack the DSpace software. After downloading the software, based on the compression file format, choose one of the following methods to unpack your software: Zip file. If you downloaded dspace-1.8-release.zip do the following: unzip dspace-1.8-release.zip .gz file. If you downloaded dspace-1.8-release.tar.gz do the following: gunzip -c dspace-1.8-release.tar.gz | tar -xf - .bz2 file. If you downloaded _dspace-1.8-release.tar.bz2_do the following: bunzip2 dspace-1.8-release.tar.bz | tar -xf -

For ease of reference, we will refer to the location of this unzipped version of the DSpace release as [dspace-source] in the remainder of these instructions. After unpacking the file, the user may which to change the ownership of the dspace-1.6-release to the 'dspace' user. (And you may need to change the group).

  • Database Setup

Also see notes above

    • PostgreSQL:

A PostgreSQL JDBC driver is configured as part of the default DSpace build. You no longer need to copy any PostgreSQL jars to get PostgreSQL installed.

      • Create a dspace database user. This is entirely separate from the dspace operating-system user created above.
createuser -U postgres -d -A -P dspace

You will be prompted for the password of the PostgreSQL superuser (postgres). Then you'll be prompted (twice) for a password for the new dspace user.

      • Create a dspace database, owned by the dspace PostgreSQL user (you are still logged in at 'root'):
createdb -U dspace -E UNICODE dspace

You will be prompted for the password of the DSpace database user. (This isn't the same as the dspace user's UNIX password.)

    • Oracle:

Setting up DSpace to use Oracle is a bit different now. You will need still need to get a copy of the Oracle JDBC driver, but instead of copying it into the lib directory you will need to install it into your local Maven repository. (You'll need to download it first from this location: http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html.) Run the following command (all on one line): mvn install:install-file

   -Dfile=ojdbc6.jar
   -DgroupId=com.oracle
   -DartifactId=ojdbc6
   -Dversion=11.2.0.3
   -Dpackaging=jar
   -DgeneratePom=true

You need to compile DSpace with an Oracle driver (ojdbc6.jar) corresponding to your Oracle version - update the version in [dspace-source]/pom.xml E.g.:

<dependency>
  <groupId>com.oracle</groupId>
  <artifactId>ojdbc6</artifactId>
  <version>11.2.0.3</version>
</dependency>

Create a database for DSpace. Make sure that the character set is one of the Unicode character sets. DSpace uses UTF-8 natively, and it is required that the Oracle database use the same character set. Create a user account for DSpace (e.g. dspace,) and ensure that it has permissions to add and remove tables in the database. Edit the [dspace-source]/dspace/config/dspace.cfg database settings: db.name = oracle db.driver = oracle.jdbc.OracleDriver db.url = jdbc:oracle:thin:@host:port/SID

Where SID is the SID of your database defined in tnsnames.ora, default Oracle port is 1521. Alternatively, you can use a full SID definition, e.g.:

db.url = jdbc:oracle:thin:@(description=(address_list=(address=(protocol=TCP)(host=localhost
(port=1521)))(connect_data=(service_name=DSPACE)))
  • Also set the username and password of the database you created in step 3:

db.username = your_oracle_username db.password = your_oracle_password

  • Initial Configuration: Edit [dspace-source]/dspace/config/dspace.cfg, in particular you'll need to set these properties:
    • dspace.dir - must be set to the [dspace] (installation) directory.
    • dspace.url - complete URL of this server's DSpace home page.
    • dspace.hostname - fully-qualified domain name of web server.
    • dspace.name - "Proper" name of your server, e.g. "My Digital Library".
    • db.password - the database password you entered in the previous step.
    • mail.server - fully-qualified domain name of your outgoing mail server.
    • mail.from.address - the "From:" address to put on email sent by DSpace.
    • feedback.recipient - mailbox for feedback mail.
    • mail.admin - mailbox for DSpace site administrator.
    • alert.recipient - mailbox for server errors/alerts (not essential but very useful!)
    • registration.notify - mailbox for emails when new users register (optional)

You can interpolate the value of one configuration variable in the value of another one. For example, to set feedback.recipient to the same value as mail.admin, the line would look like: feedback.recipient = ${mail.admin} Refer to the General Configuration section for details and examples of the above.

  1. DSpace Directory: Create the directory for the DSpace installation (i.e. [dspace]). As root (or a user with appropriate permissions), run:
mkdir [dspace]
chown dspace [dspace]

(Assuming the dspace UNIX username.) Installation Package: As the dspace UNIX user, generate the DSpace installation package.

cd [dspace-source]/dspace/
mvn package

Defaults to PostgreSQL settings Without any extra arguments, the DSpace installation package is initialized for PostgreSQL. If you want to use Oracle instead, you should build the DSpace installation package as follows:

mvn -Ddb.name=oracle package
  • Build DSpace and Initialize Database: As the dspace UNIX user, initialize the DSpace database and install DSpace to [dspace]_:

cd [dspace-source]/dspace/target/dspace-[version]-build ant fresh_install

To see a complete list of build targets, run: ant help The most likely thing to go wrong here is the database connection. See the Common Problems Section.

  • Deploy Web Applications: You have two choices or techniques for having Tomcat/Jetty/Resin serve up your web applications:
    • Technique A. Simple and complete. You copy only (or all) of the DSpace Web application(s) you wish to use from the [dspace]/webapps directory to the appropriate directory in your Tomcat/Jetty/Resin installation. For example:

cp -R [dspace]/webapps/* [tomcat]/webapps* (This will copy all the web applications to Tomcat). cp -R [dspace]/webapps/jspui [tomcat]/webapps* (This will copy only the jspui web application to Tomcat.)

    • Technique B. Tell your Tomcat/Jetty/Resin installation where to find your DSpace web application(s). As an example, in the <Host> section of your [tomcat]/conf/server.xml you could add lines similar to the following (but replace [dspace] with your installation location):
<!-- Define the default virtual host
    Note:  XML Schema validation will not work with Xerces 2.2.
    -->
    <Host name="localhost"  appBase="[dspace]/webapps"
    ....

Administrator Account: Create an initial administrator account: [dspace]/bin/dspace create-administrator

  • Initial Startup! Now the moment of truth! Start up (or restart) Tomcat/Jetty/Resin. Visit the base URL(s) of your server, depending on which DSpace web applications you want to use. You should see the DSpace home page. Congratulations! Base URLs of DSpace Web Applications:

JSP User Interface - (e.g.) http://dspace.myu.edu:8080/jspui XML User Interface (aka. Manakin) - (e.g.) http://dspace.myu.edu:8080/xmlui OAI-PMH Interface - (e.g.) http://dspace.myu.edu:8080/oai/request?verb=Identify (Should return an XML-based response) In order to set up some communities and collections, you'll need to login as your DSpace Administrator (which you created with create-administrator above) and access the administration UI in either the JSP or XML user interface.

Установка на Ubuntu Server

Install the server stack of Tomcat (web server) and PostgreSQL (database)

sudo apt-get install tasksel
sudo tasksel

Select the following packages

[*] LAMP server
[*] PostgreSQL database
[*] Tomcat Java server

Install the Compile / Build tools

sudo apt-get install ant maven2

Configure the Prerequisite Software

Create the database user (dspace)

sudo su postgres
createuser -U postgres -d -A -P dspace
exit

Allow the database user (dspace) to connect to the database

sudo vi /etc/postgresql/8.4/main/pg_hba.conf
  1. Add this line to the configuration: local all dspace md5

sudo service postgresql restart Create the dspace database createdb -U dspace -E UNICODE dspace

Configure Tomcat to know about the DSpace webapps.

sudo vi /etc/tomcat6/server.xml
# Insert the following chunk of text just above the closing </Host>

<Context path="/xmlui" docBase="/dspace/webapps/xmlui" allowLinking="true"/>
<Context path="/sword" docBase="/dspace/webapps/sword" allowLinking="true"/>
<Context path="/oai"   docBase="/dspace/webapps/oai"   allowLinking="true"/>
<Context path="/jspui" docBase="/dspace/webapps/jspui" allowLinking="true"/>
<Context path="/lni"   docBase="/dspace/webapps/lni"   allowLinking="true"/>
<Context path="/solr"  docBase="/dspace/webapps/solr"  allowLinking="true"/>

Download and Install DSpace

Create the [dspace] directory. The [dspace] directory is where the running dspace code will reside.

sudo mkdir /dspace

Download the Source Release

The source release allows you to customize every aspect of DSpace. This step downloads the compressed archive from SourceForge, and unpacks it in your current directory. The dspace-1.x.x-src-release directory is typically referred to as [dspace-src].

wget http://sourceforge.net/projects/dspace/files/DSpace%20Stable/1.7.2/dspace-1.7.2-src-release.tar.bz2
tar -xvjf dspace-1.7.2-src-release.tar.bz2

Installing Sun JDK 1.6.0 on Ubuntu 12.04 Server

Download Java

Oracle.com either hates wget or is storing a cookie saying "yes, this guy accepted the T's and C's."

Actually download Java

Easiest thing to do was just to download the .bin file onto my desktop and use SCP (WinSCP if you need) to move it to my server.

Install Java

I decided to install Java in /usr/local/java.

sudo mkdir /usr/local/java
sudo cp jdk-6u32-linux-x64.bin /usr/local/java
cd /usr/local/java
sudo ./jdk-6u32-linux-x64.bin
Unpacking...
Checksumming...
Extracting...
UnZipSFX 5.50 of 17 February 2002, by Info-ZIP (Zip-Bugs@lists.wku.edu).
creating: jdk1.6.0_32/

and so forth.

Set Environmental Variables

I don't hate myself, so I set a couple of environmental variables, notably PATH, JAVA_HOME, and JDK_HOME. The first one allows you to just type

javac foo.java
java foo

as opposed to writing it all out, and the latter two are requested by Hadoop. I set these in my /etc/profile file. Add the following to the end of your /etc/profile:

export JAVA_HOME=/usr/local/java/jdk1.6.0_32
export JDK_HOME=$JAVA_HOME
export PATH=$PATH:/usr/local.java/jdk1.6.0_32/bin

Finally, run

source /etc/profile

There. Now Java is installed for all intents and purposes.

Compile and Build DSpace

The source release that has been obtained is human readable source code, and must be compiled to machine code for the server to run it. "mvn package" compiles the source code, and "ant" will do all the work necessary to initialize the database with the DSpace schema, and copy all of the compiled machine code to a location where the web server can serve it. ant fresh_install will populate the dspace database and [dspace] directory with new information. This will overwrite any existing installation of DSpace that you may have. For upgrades the better command to use would be ant update, as it doesn't alter the database or modify your assetstore.

cd dspace-1.7.2-src-release
mvn -U package
cd dspace/target/dspace-1.7.2-build.dir
sudo ant fresh_install

Fix Tomcat permissions, and restart the Tomcat server

This guide follows the convention where the tomcat user will own all of the files in [dspace], so we have to change the owner of the files to tomcat6. Restarting tomcat will deploy the dspace webapps that are now ready to be viewed.

sudo chown tomcat6:tomcat6 /dspace -R
sudo service tomcat6 restart

Аутентификация через LDAP

Конфигурационный файл: [dspace]/config/modules/authentication.cfg

Исправить значение строки

plugin.sequence.org.dspace.authenticate.AuthenticationMethod

на следующее:

plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \
org.dspace.authenticate.LDAPAuthentication


Конфигурационный файл: [dspace]/config/modules/authentication-LDAP.cfg

Исправить значение поля

enable

на следующее:

enable = true

Настройка поиска

Для того, чтобы поиск элементов производился не только по полям, которые установлены по умолчанию необходимо сделать следующее:

  1. В файле dspace.cfg найти раздел "Fields to Index for Search"
  2. Добавить в конец новый индекс со следующим по порядку номером, например поле "Описание"
webui.browse.index.6 = series:metadata:dc.description.*:text
  1. Сохранить файл
  2. Остановить Tomcat
  3. Выполнить команду [dspace]/bin/dspace index-init
  4. Запустить Tomcat

Конвертирование MARC записей в формат Дублинского ядра (Dublin Core)

https://docs.google.com/viewer?a=v&q=cache:kPzQrLoET-AJ:digital.library.adelaide.edu.au/dspace/bitstream/2440/14784/1/Importing%2520MARC%2520data%2520into%2520DSpace.doc+mark+to+dublin+core+conversion&hl=ru&gl=ru&pid=bl&srcid=ADGEESiTTLnNTJnCQY1kENarACCHzOPEVPbUn8HYzVXW4pJJydlptlESvH-NZa4lHopUf25V_H_EcqkD2_0qbOMIgD_a_ioWlSH4kWmCwv5bJEuzOj7JkwUQJU48uAseSxzpnoutGHqV&sig=AHIEtbRUAhKkg0677-3xp_mPZ9oxV94dmQ

Для конвертации файла MARC-формата в формат Dublin core используется несколько скриптов. Для их корректной работы надо установить следующие модули из CPAN: MARC::File::USMARC позволяющий проводить базовые операции для конвертации из MARC во внутренню структуру данных, и MARC::Crosswalk::DublinCore, который использует эту структуру для преобразования данных в формат Dublin core.

Перед импортом необходимо учесть несколько проблем, возникающих из-за требований DSpace:

  1. атрибуты element и qualifier должны быть в нижнем регистре;
  2. данные могут содержать спецсимволы (&<>), которые надо пропускать ;
  3. автор должен использовать element=”contributor” и qualifier=”author”;
  4. qualifier “isPartOf” должен быть “ispartofseries”;
  5. некоторые элементы предоставляют схему, а не квалификатор; DSpace не распознает схему атрибутов;
  6. элемент type технически корректен, но неинформативен и должен быть заменен;
  7. элемент format используется внутри DSpace для описания файлов и должен быть очищен;
  8. элементы date повторяются, т. к. MARC::Crosswalk::DublinCore извлекает дату из всех полей и подполей, из которох возможно;

Первый скрипт:

#!/usr/bin/perl -w
 
use MARC::Crosswalk::DublinCore;
use MARC::File::USMARC;
 
$/ = chr(29); # MARC record separator
 
print qq|<collection>\n|;
 
while (my $blob = <>) { # suck in one MARC record at a time
 
print qq|<dublin_core>\n|;
 
# convert the MARC to DC
my $marc = MARC::Record->new_from_usmarc( $blob );
my $crosswalk = MARC::Crosswalk::DublinCore->new( qualified => 1 );
my $dc = $crosswalk->as_dublincore( $marc );
 
# output the DC as XML
for( $dc->elements ) {
 
my $element = lc $_->name;
my $qualifier = lc $_->qualifier;
my $scheme = lc $_->scheme;
my $content = lc $_->content;
 
# escape reserved characters
$content =~ s/&/&amp;/gs;
$content =~ s/</&lt;/gs; 
$content =~ s/>/&gt;/gs;
 
 
# munge attributes for DSpace compatibility
if ($element eq 'creator') {
$element = 'contributor';
$qualifier = 'author';
}
if ($element eq 'format') {
$element = 'description';
$qualifier = '';
}
if ($element eq 'language') {
if ($scheme eq 'iso 639-2') {
$qualifier = 'iso';
$scheme = '';
} else {
$element = 'description';
$qualifier = '';
}
}
if ($qualifier eq 'ispartof') {
$qualifier = 'ispartofseries';
}
 
printf qq| <dcvalue element="%s"|, $element;
printf qq| qualifier="%s"|, $qualifier if $qualifier;
# output scheme as qualifier
printf qq| qualifier="%s"|, $scheme if $scheme;
printf qq| language="ru">%s</dcvalue>\n|, $content;
}
 
print qq|</dublin_core>\n|;
 
}
 
print qq|</collection>\n|;
 
exit;

Скрипт запускается так :

> ./marc2dc.pl marc.bib > collection.xml

Построение структуры для импорта

Having converted our metadata into Dublin Core, I then needed to use this to build the required directory structure. This is accomplished with a second Perl script, build.pl, which needs to be customised for each collection to be imported. The basic idea is: Сконвертированные метаданные необходимо построить согласно требуемой структуре. Это делается вторым скриптом. Основная его идея:

  1. извлечение данных записи Dublin Core из XML файла;
  2. создание поддиректорий;
  3. извлечение имени файлов документов из идентификатора;
  4. создание dublin_core.xml файла;
  5. создание контент-файлов;
  6. создание символьных ссылок на файлы документов.

Второй скипт:

#!/usr/bin/perl -w
 
$/ = "</dublin_core>\n"; # record separator
 
$what = 100001; # dummy id for when there’s no file
 
while (<>) {
 
# discard the top and bottom tags
s/<collection>\n//;
s/<\/collection>\n//;
 
# extract the file path from the identifier
# use the file name as an id
# note that identifier element is discarded!
if (s!<dcvalue element="identifier" qualifier="uri"
language="en">http://.*/theses/(.*?)/([^/]+).pdf<\/dcvalue>\n!!s) {
$path = $1;
$id = $2;
} else {
$path = '';
$id = $what++;
}
 
# let the operator know where we’re up to
print "$path/$id\n";
 
# create the item directory
mkdir "import/$id", 0755;
 
# create the dublin_core.xml file
open DC, ">import/$id/dublin_core.xml"
 or die "Cannot open dublin core for $id, $!\n";
print DC $_;
close DC;
 
# assuming we have a file ...
if ($path) {
 
# ... create the contents file ...
open OUT, ">import/$id/contents"
or die "Cannot open contents for $id, $!\n";
print OUT "$id.pdf";
close OUT;
 
# ... and create a symbolic link to the actual file
symlink "/scratch/dspace/import/theses/$path/$id.pdf", "import/$id/$id.pdf";
 
 
}
 
}
 
__END__

Запуск:

> mkdir import
> ./build.pl collection.xml

Импорт метаданных в формате Dublin core в DSpace

Подготовленные к импорту метаданные должны иметь следующую структуру:

archive_directory/
   item_000/
       dublin_core.xml         -- qualified Dublin Core metadata for metadata fields belonging to the dc schema (Метаданные в формате Дублинского ядра)
       metadata_[prefix].xml   -- metadata in another schema, the prefix is the name of the schema as registered with the metadata   registry (Метаданные в другой схеме)
       contents                -- text file containing one line per filename
       file_1.doc              -- files to be added as bitstreams to the item
       file_2.pdf
   item_001/
       dublin_core.xml
       contents
       file_1.png
       ...

Перед выполнением импорта проверьте, что полученные файлы соответствует формату Дублинского ядра, т. к. при импорте возможны ошибки. Особенно это касается значений атрибутов qualifier тэга <dcvalue>. Файлы dublin_core.xml или metadata_[prefix].xml должны иметь следующий формат, где каждый элемент метаданных имеет собственное описание с тэгом <dcvalue>. Тэг <dcvalue> имеет три атрибута:

  1. <element> - the Dublin Core element (Элемент Дублинского ядра)
  2. <qualifier> - the element's qualifier (спецификатор элемента)
  3. <language> - (optional)ISO language code for element (языковой код ISO элемента)
<dublin_core>
   <dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue>
   <dcvalue element="date" qualifier="issued">1990</dcvalue>
   <dcvalue element="title" qualifier="alternate" language="fr">J'aime les Printemps</dcvalue>
</dublin_core>

Пожалуйста, сравните пары значений element="element_value" qualifier="qualifier_value" со значениями этих пар в схеме метаданных (metadata registry) DSpace

Добавление элементов в коллекцию

Для добавления элементов в коллекцию вы должны указать следующую информацию:

  1. eperson
  2. Collection ID (either Handle (e.g. 123456789/14) or Database ID (e.g. 2)
  3. Исходная директория, где находятся элементы
  4. Mapfile. Его у вас еще нет, поэтому нужно указать, где он будет (e.g. /Import/Col_14/mapfile)

В командной строке выполните:

[dspace]/bin/dspace import --add --eperson=joe@user.com --collection=CollectionID --source=items_dir --mapfile=mapfile

или используя короткую форму:

[dspace]/bin/dspace import -a -e joe@user.com -c CollectionID -s items_dir -m mapfile

Команда будет циклично обрабатывать элементы исходной директории, импортировать их и генерировать файл карты, в котором будут сохранена структура элементов. Его необходимо сохранить, т.к. используя его можно перемещать или удалять импортированные элементы.


Особенности работы

При присоединении каких-либо внешних файлов к экземпляру, убедитесь, что файл не пустой, иначе DSpace не будет его загружать.

Чтобы новый элемент метаданных отображался при создании нового экземпляра (например, в ниспадающем списке), нужно сделать следующее:

  • Откройте файл [dspace]/config/input-forms.xml
  • Просмотрите раздел <form-value-pairs>element в input-forms.xml.
  • Проверьте <value-pairs> наличие элементов Dublin Core, которые вы хотите добавить.
  • Добавьте или измените <pair> элемент,где <displayed-value>содержит выводимое на экран название элемента, а <stored-value> содержит название этого элемента в Dspace:
<pair>
<displayed-value>Gov'tDoc#</displayed-value>
<stored-value>govdoc</stored-value>
</pair>
  • Перезапустите Tomcat

Русификация интерфейса

Для того, чтобы быстро перевести интерфейс xmlui и jspui на русский язык, нужно открыть файл [DSpace_dir]\config\dspace.cfg, установиь значение параметра «xmlui.supported.locales» равным «ru, en», а параметра «default.language» равным «ru_RU». Так же нужно установть значение параметра «default.locale» равным «ru».


JSPUI после перезапуска службы TOMCAT будет радовать вкраплениями украинского языка, а XMLUI вовсе не переведётся.

Для того, чтобы исправить эту несправедливость, необходимо создать файлы по адресу:

[TOMCAT-dir]\webapps\jspui\WEB-INF\classes\Messages_ru.properties [TOMCAT-dir]\webapps\xmlui\i18n\messages_ru.xml

Содержимое файлов доступно по этой ссылке

Перевод соджержимого форм

Для перевода содержимого различных форм, например формы добавления экземпляра, следует перевести на русский язык слова, заключенные в тэги <displayed-value></displayed-value> в файле [dspace_dir]/config/input_forms.xml

Для изменения или перевода текста на страницах dspace нужно отредактировать файл

[dspace_dir]/webapps/xmlui/i18n/messages_ru.xml

Изменения применяются после сохранения файла и перезагрузки изменяемой страницы.

Счетчик элементов

Для активации счетчика элементов в файле dspace.cfg нужно:

  • Указать значение true следующим параметрам
webui.strengths.show = true 
webui.strengths.cache = true 
  • Выполнить
/[dspace]/bin/dspace itemcounter

В данном случае счетчик нужно будет обновлять вручную. Если нужно, чтобы он обновлялся автоматически, значение второго параметра нужно оставить по умолчанию

Ошибка java.io.IOException: No such file or directory

Если при добавлении элемента в коллекцию возникает ошибка java.io.IOException: No such file or directory, значит имеет место проблема с правами доступа. Директориям [dspace-source/upload и [dspace-source]/assetstore нужно назначить разрешения на запись:

chown -R tomcat /dspace/upload   
chown -R tomcat /dspace/assetstore 

Примечания


См. также

Ссылки

Личные инструменты
Пространства имён

Варианты
Действия
Навигация
Инструменты