ReadDB is client-server software for storing and accessing mapped short reads.

Files

Running ReadDB

The
jar file contains both the server code and the client code. The following instructions describe how to run the server and the client. Since the jar file contains several main classes depending on whether you want to run the server or a client class, you cannot run it with java -jar. Instead, add the jar file to your classpath or specify it on the command line with java -cp readdb.jar. You should also put the Picard SamTools jar file in your classpath as you'll need it to convert from SAM/BAM to ReadDB's format. If you've downloaded both files, you might add them to your classpath by running
export CLASSPATH=${HOME}/readdb-5-25-11.jar:${HOME}/sam-1.38.jar
(assuming you put the files in your home directory and used the default names).

Server Setup

First, create a directory for ReadDB that contains the following files

users.txt

This file describes the usernames and passwords that can access the server. There is one line per user, in the format username:password The users.txt file is re-read each time a user attempts to authenticate, so you can modify it while the server is running.

groups.txt

ReadDB allows you to create groups of users and assign rights to those groups rather than the individuals. This file contains one line per group, in the format groupname: userone usertwo userthree The special group "admin" is for users who can shut down the server, add users, and add users to groups. The groups file is read on startup, but there are API functions to add users to groups.

defaultACL.txt

This is the default, initial ACL for new alignments. It looks like
read: userone usertwo public
write: userone usertwo
admin: userone usertwo
This file is re-read whenever an alignment is created and is used to seed the ACL for the new alignment. admin is the set of users and groups who can change the ACL for the alignment.

Starting the Server

Once users.txt, groups.txt, and defaultACL.txt are in place (eg, in "datadir"), you can start the server with java -Xmx1G edu.mit.csail.cgs.projects.readdb.Server -M 400 -d datadir -p 52000 -C 400 The server will log on STDERR. I haven't done extensive testing to correlate the java heap size and the number of cached files. 3GB seems adequate for our usage and 400 files. Don't be too alarmed if you see high memory usage with top or other tools- ReadDB uses mmap to access data files. The full file size will be included in the process's virtual size even if the data isn't in RAM.

Client Setup

The client software will look for a ~/.readdb_passwd file that looks like
username=userone
passwd=useronepassword
hostname=readdb.csail.mit.edu
port=52000
Be sure to change the hostname and port to those that you're using for your Server.

Loading Data

You can load data by passing it on STDIN to java edu.mit.csail.cgs.projects.readdb.ImportHits --align alignmentname Lines for unpaired reads must be tab delimited with the following fields:
  1. chromosome (integer)
  2. position
  3. strand
  4. length
  5. weight
Lines for paired reads must be tab delimited with the following fields
  1. chromosome for left read
  2. pos for left read
  3. strand for left read
  4. length for left read
  5. chromosome for right read
  6. pos for right read
  7. strand for right read
  8. length for right read
  9. weight (applies to the whole pair)
edu.mit.csail.cgs.projects.readdb.SAMToReadDB will convert from SAM/BAM format to ReadDB, except that it does not convert string chromosome identifiers to the numeric identifiers needed by ReadDB. Feel free to modify it to suit your local convention for non-numeric chromosomes. For a simple test, I used
java -cp /tmp/readdb.jar edu.mit.csail.cgs.projects.readdb.SAMToReadDB < small.against_hg19.bam | egrep '^chr[[:digit:]]+[[:space:]]' | sed -e 's/^chr//' | java -cp /tmp/readdb.jar edu.mit.csail.cgs.projects.readdb.ImportHits --align sample 
And then some tests to make sure it loaded:
java edu.mit.csail.cgs.projects.readdb.ReadDB getchroms sample

java edu.mit.csail.cgs.projects.readdb.ReadDB getcount sample 3

echo "1:0-10000000" | java edu.mit.csail.cgs.projects.readdb.Query --align sample

Command Line Queries

edu.mit.csail.cgs.projects.readdb.Query is the basic query class. It reads regions (eg, "4:1000000-2000000" or "15:0-10000:+") on STDIN and produces output on STDOUT with either aligned read information or a histogram. edu.mit.csail.cgs.projects.readdb.ReadDB provides additional query functionality. The first argument is the command, followed by any additional arguments. Commands are

Java and Perl API

Client.java and ReadDBClient.pm implement Java and Perl interfaces to ReadDB. Client.java contains the documentation and is the "official" version. ReadDBClient.pm mimics the Java version (and doesn't contain method documentation) and receives less use and testing. Contact the authors if you're interested in using ReadDB with GBrowse or the UCSC genome browser. Some work has been done for the former and the latter would definitely be of interest.

Documentation

The javadocs are in the jar file and
online here. Client.java contains the API docs for the java client; ImportHits.java, ReadDB.java, and Query.java are the command line clients.