While many of the programs in the source tree are useful only in very specialized contexts, some are generally useful. Here are brief descriptions of the generally useful programs organized by category. Underneath this are longer descriptions of all programs alphabetically sorted. ------ Code generators/program utilities ---------- autoSql - Generate database/tab file objects autoXml - Generate XML parser newProg - make a new C source skeleton. You'll want to customize this one stringify - Convert file to C strings subs - a utility to perform massive string substitutions on source ------ Genome Browser Database Related -------- bedCoverage - Analyse coverage by bed files. Bases can be covered more than once (unlike featureBits) bedSort - Sort a .bed file by chrom,chromStart featureBits - Correlate tables via bitmap projections. getFeatDna - Get dna for a type of feature in browser database. hgsql - Execute some sql code using passwords in .hg.conf same as 'mysql -u user -ppassword -A' --------Text crunching------------ catDir - concatenate files in directory to stdout. For those times when too many files for cat to handle. countChars - Count the number of occurences of a particular char endsInLf - Check that last letter in files is end of line fixCr - strip s from ends of lines randomLines - Pick out random lines from file subChar - Substitute one character for another throughout a file. toLower - Convert upper case to lower case in file. Leave other chars alone toUpper - Convert lower case to upper case in file. Leave other chars alone wordLine - chop up file by white space and output one word per line. ------ Number crunching---------- addCols - Sum columns in a text file. aveCols - Add together columns calc - Little command line calculator. tableSum - Sum or average rows and columns to produce smaller table. ------ Fasta file utilities -------- faCmp - Compare two .fa files faFrag - Extract a piece of DNA from a .fa file. faNoise - Add noise to .fa file faOneRecord - Extract a single record from a .FA file faRc - Reverse complement a FA file faSize - print total base count in fa files. faSplit - Split an fa file into several files. faTrans - Translate DNA .fa file to peptide ------- Blat and related utilities ------------- blat - Fast alignment of similar sequences. (license needed for commercial use) psLayout - Fast program to find largish DNA overlaps. (commercial license) pslFilter - filter out psl file pslPretty - Convert PSL to human readable output pslReps - Find repeats and best or near best in genome alignments. pslSwap - Swap query and target in PSL ------- Other Sequence analysis ---------- ameme - Find motifs in DNA sequence. (aka improbizer) correctEst - Correct ESTs by passing them through genome gsBig - Run Genscan on big input and produce GTF files and other parsed output trfBig - Run trf (tandem repeat finder) on big input and make bed file. waba - Cross-species dna alignment. Handles large gaps (commercial license) With all of these you can use 'stdin' or 'stdout' as file name arguments to get them to fit into a pipe or something.... ------- Full Alphabetical Listing -------- addCols - Add together columns usage: addCols XXX ====================================================================== addFinFinf - Add list of finished clones to end of finf file. usage: addFinFinf cloneList finished.finf options: -inf - do same thing to sequence.inf file instead ====================================================================== addMhcClones - Add info on missing MHC files to Greg's stuff. usage: addMhcClones ~/gs/otherIn/mhc/clones ~/gs options: ====================================================================== agpCloneCheck - Check that have all clones in an agp file (and the right version too) usage: agpCloneCheck file.agp gsDir ====================================================================== agpCloneList - Make simple list of all clones in agp file to stdout usage: agpCloneList file(s).agp options: -ver print clone version as well as name. ====================================================================== agpNotInf - List clones in .agp file not in .inf file usage: agpNotInf ctg_nt.agp sequence.inf ====================================================================== agpToFa - Convert a .agp file to a .fa file usage: agpToFa in.agp agpSeq out.fa freezeDir Where agpSeq matches a sequence name in in.agp and seqDir is where program looks for sequence. This is currently a fairly limited implementation. It only works on finished clones options: -simple - treat freezeDir as a simple directory full of .fa files. In this case .fa files must be named accession.fa and only have one record -subDirs=predraft,draft,fin,extras Explicitly specify comma separated list of freeze subdirectories. ====================================================================== agpToGl - Convert AGP file to GL file. Some fakery involved. usage: agpToGl source.agp dest options: -md=seq_contig.md - get list of clones to reverse from NCBI .md file and produce output parsed into contigs ====================================================================== agpVsMap - Plot clones in agp file vs. map coordinates usage: agpVsMap XXX ====================================================================== ali2alx - produces an index file for each chromosome into an ali file. Usage: ali2alx in.ali alxDir ====================================================================== aliGlue - tell where a cDNA is located quickly. usage: aliGlue genomeListFile otherListFile otherType ignore.ooc 5and3.pai outRoot The genomeListFile is a list of .FA files containing genomic sequence, which should altogether be 5 megabases or less. Typically this file is a list of BACs. The otherListFile is a list of .FA files containing the sequences to compare against the genomic files. These may be of unlimited size. The otherType should be either 'mRNA' or 'genomic', and controls whether the gap penalties are mRNA style (introns ok) or not. Ignore.ooc is a file containing overrepresented 10.mers. 5and3.pai is a list of 5' and 3' ests in the 'other' sequence. You can use /dev/null here if there is no pairing info. outRoot specifies the base name of the three output files the program creates: outRoot.hit, outRoot.glu, and outRoot.ok The program will create the files outRoot.hit outRoot.glu outRoot.ok which contain the cDNA hits, gluing cDNAs, and a sign that the program ended ok respectively. ====================================================================== altGraph - do alt-splice clustering and generate constraints for genie and load database. Usage: altGraph load bacAccession(s) Loads clusters into database altGraph gff gffDir bacAccession(s) This produces a constraint gff file for genie for each bac (that's been preloaded into database altGraph align geno.lst mrna.lst 10.ooc gffFile This will align each file in geno.lst to the mrna in mrna.lst and put the resulting constriants in gffFile less than 5 megabases each. 10.ooc is an overused 10-mer file for the patSpace algorithm. altGraph psl mrna.psl genoFaDir gffFile This will take the alignments in mrna.psl against genomic .fa files in genoFaDir and produce and altGraph of them in gffFile altGraph extract id(s) Extracts altGraph of given ID from database altGraph view bacAccession(s) Just views clusters ====================================================================== altSplice - constructs altSplice graphs using information alignments in est and mrna databases. At first only for chr22 usage: altSplice ====================================================================== ameme - find common patterns in DNA usage ameme good=goodIn.fa [bad=badIn.fa] [numMotifs=2] [background=m1] [maxOcc=2] [motifOutput=fileName] [html=output.html] [gif=output.gif] where goodIn.fa is a multi-sequence fa file containing instances of the motif you want to find, badIn.fa is a file containing similar sequences but lacking the motif, numMotifs is the number of motifs to scan for, background is m0,m1, or m2 for various levels of Markov models, maxOcc is the maximum occurrences of the motif you expect to find in a single sequence and motifOutput is the name of afile to store just the motifs in. ====================================================================== assessLibs - Make table that assesses the percentage of library that covers 5' and 3' ends usage: assessLibs database refTrack options: -chrom=chrNN - restrict this to chromosome NN ====================================================================== autoSql - create SQL and C code for permanently storing a structure in database and loading it back into memory based on a specification file usage: autoSql specFile outRoot This will create outRoot.sql outRoot.c and outRoot.h based on the contents of specFile ====================================================================== autoXml - Generate structures code and parser for XML file from DTD-like spec usage: autoXml file.dtdx root This will generate root.c, root.h options: -textField=xxx what to name text between start/end tags. Default 'text' -comment=xxx Comment to appear at top of generated code files -picky Generate parser that rejects stuff it doesn't understand -main Put in a main routine that's a test harness -prefix=xxx Prefix to add to structure names. By default same as root -positive Don't write out optional attributes with negative values ====================================================================== aveCols - Add together columns usage: aveCols file File may be stdin. ====================================================================== axtAndBed - Intersect an axt with a bed file and output axt. usage: axtAndBed in.axt in.bed out.axt options: -xxx=XXX ====================================================================== axtBest - Remove second best alignments usage: axtBest in.axt chrom out.axt options: -winSize=N - Size of window, default 10000 -minScore=N - Minimum score alignments to consider. Default 1000 -minOutSize=N - Minimum score of piece to output. Default 10 -matrix=file.mat - override default scoring matrix Alignments scoring over minScore (where each matching base counts about +100 in the default scoring scheme) are projected onto the target sequence. The score within each overlapping 1000 base window is calculated, and the best scoring alignments in each window are marked. Alignments that are never the best are thrown out. The best scoring alignment for each window is the output, chopping up alignments if necessary ====================================================================== axtCalcMatrix - Calculate substitution matrix and make indel histogram usage: axtCalcMatrix files(s).axt ====================================================================== axtDropSelf - Drop alignments that just align same thing to itself usage: axtDropSelf in.axt out.axt options: -xxx=XXX ====================================================================== axtForEst - Generate file of mouse/human alignments corresponding to MGC EST's usage: axtForEst database axtDir output.axt options: -chrom=chrN - restrict to a specific chromosome -track=track - Use a track other than est -lib=libWildCard (SQL format where 1073787440s like * normally) -refSeq - Don't correct indels in EST, treat as refSeq To get MGC est's for Dec 2001 human do: axtForEst mm2 ~/bz/axtBest mgcEst.axt -track=tightEst -lib=NIH_MGC% ====================================================================== axtIndex - build index of axt file usage: axtIndex in.axt out.axt.ix options: -xxx=XXX ====================================================================== axtPretty - Convert axt to more human readable format. usage: axtPretty in.axt out.pretty options: -line=N Size of line, default 70 ====================================================================== axtQueryCount - Count bases covered on each query sequence usage: axtQueryCount in.axt options: -xxx=XXX ====================================================================== axtSort - Sort axt files usage: axtSort in.axt out.axt options: -query - Sort by query position, not target ====================================================================== axtSplitByTarget - Split a single axt file into one file per target usage: axtSplitByTarget in.axt outDir ====================================================================== axtSwap - Swap source and query in an axt file usage: axtSwap source.axt target.sizes query.sizes dest.axt options: -xxx=XXX ====================================================================== axtToBed - Convert axt alignments to simple bed format usage: axtToBed in.axt out.bed options: -xxx=XXX ====================================================================== axtToMaf - Convert from axt to maf format usage: axtToMaf in.axt tSizes qSizes out.maf Where tSizes and qSizes is a file that contains the sizes of the target and query sequences. Very often this with be a chrom.sizes file ====================================================================== axtToPsl - Convert axt to psl format usage: axtToPsl in.axt tSizes qSizes out.psl Where tSizes and qSizes are tab-delimited files with columns. options: -xxx=XXX ====================================================================== bargeVsMap - Plot clones in barge vs. map coordinates usage: bargeVsMap bargeFile infoFile output.gif ====================================================================== barger - make a clone map based on sequence overlap and sts positions. usage: barger sequence.inf strictPairs loosePairs inSts outDir where inClones is a list of clones, inPairs contains a list of overlapping clones, inSts contains sts positions and outDir is a directory where the various result files are written. (outDir will be created if it doesn't exist). ====================================================================== bedCons - Look at conservation of a BED track vs. a refence (nonredundant) alignment track usage: bedCons database refAliTrack bedTrack Bed track can also be a bed file (ending in .bed suffix) options: -chrom=chrom - restrict to a single chromosome. ====================================================================== bedCoverage - Analyse coverage by bed files - chromosome by chromosome and genome-wide. usage: bedCoverage database bedFile Note bed file must be sorted by chromosome -restrict=restrict.bed Restrict to parts in restrict.bed ====================================================================== bedCoverage - Analyse coverage by bed files - chromosome by chromosome and genome-wide. usage: bedCoverage database bedFile Note bed file must be sorted by chromosome ====================================================================== bedDown - Make stuff to find a BED format submission in a new version usage: bedDown database table output.fa output.tab ====================================================================== bedSort - Sort a .bed file by chrom,chromStart usage: bedSort in.bed out.bed in.bed and out.bed may be the same. ====================================================================== bedUp - Load bed submissions after conversion back into new database. usage: bedUp oldDb table newDb old.tab convert.psl missing ====================================================================== please use: bin param sequence where param - name of file with parameters and sequence - name of file with sequence ====================================================================== binGood - convert text format alignment file to binary format usage: binGood good.txt good.ali ====================================================================== blat - Standalone BLAT v. 15 fast sequence search command line tool usage: blat database query [-ooc=11.ooc] output.psl where: database is either a .fa file, a .nib file, or a list of .fa or .nib files, query is similarly a .fa, .nib, or list of .fa or .nib files -ooc=11.ooc tells the program to load over-occurring 11-mers from and external file. This will increase the speed by a factor of 40 in many cases, but is not required output.psl is where to put the output. options: -t=type Database type. Type is one of: dna - DNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein The default is dna -q=type Query type. Type is one of: dna - DNA sequence rna - RNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein rnax - DNA sequence translated in three frames to protein The default is dna -prot Synonymous with -d=prot -q=prot -ooc=N.ooc Use overused tile file N.ooc. N should correspond to the tileSize -tileSize=N sets the size of match that triggers an alignment. Usually between 8 and 12 Default is 11 for DNA and 5 for protein. -oneOff=N If set to 1 this allows one mismatch in tile and still triggers an alignments. Default is 0. -minMatch=N sets the number of tile matches. Usually set from 2 to 4 Default is 2 for nucleotide, 1 for protein. -minScore=N sets minimum score. This is twice the matches minus the mismatches minus some sort of gap penalty. Default is 30 -minIdentity=N Sets minimum sequence identity (in percent). Default is 90 for nucleotide searches, 25 for protein or translated protein searches. -maxGap=N sets the size of maximum gap between tiles in a clump. Usually set from 0 to 3. Default is 2. Only relevent for minMatch > 1. -noHead suppress .psl header (so it's just a tab-separated file) -makeOoc=N.ooc Make overused tile file -repMatch=N sets the number of repetitions of a tile allowed before it is marked as overused. Typically this is 256 for tileSize 12, 1024 for tile size 11, 4096 for tile size 10. Default is 1024. Typically only comes into play with makeOoc -mask=type Mask out repeats. Alignments won't be started in masked region but may extend through it in nucleotide searches. Masked areas are ignored entirely in protein or translated searches. Types are lower - mask out lower cased sequence upper - mask out upper cased sequence out - mask according to database.out RepeatMasker .out file file.out - mask database according to RepeatMasker file.out -qMask=type Mask out repeats in query sequence. Similar to -mask above but for query rather than target sequence. -minRepDivergence=NN - minimum percent divergence of repeats to allow them to be unmasked. Default is 15. Only relevant for masking using RepeatMasker .out files. -dots=N Output dot every N sequences to show program's progress -trimT Trim leading poly-T -noTrimA Don't trim trailing poly-A -trimHardA Remove poly-A tail from qSize as well as alignments in psl output -out=type Controls output file format. Type is one of: psl - Default. Tab separated format without actual sequence pslx - Tab separated format with sequence axt - blastz-associated axt format maf - multiz-associated maf format wublast - similar to wublast format blast - similar to NCBI blast format ====================================================================== blat2p - blat two proteins usage: blat2p a.fa b.fa options: -xxx=XXX ====================================================================== blatFilter - filter blat alignments somewhat usage: blatFilter output.psl infile(s) options: -xxx=XXX ====================================================================== borfBig - Run Victor Solovyev's bestorf repeatedly usage: borfBig in.fa out.tab options: -exe=borf - where exe file belongs -tmpFa=file - where to put temp .fa file -tmpOrf=file - where to put temp bestorf output file ====================================================================== calc - Little command line calculator usage: calc this + that * theOther / (a + b) ====================================================================== catDir - concatenate files in directory to stdout. For those times when too many files for cat to handle. usage: catDir dir(s) options: -r Recurse into subdirectories -suffix=.suf This will restrict things to files ending in .suf '-wild=*.???' This will match wildcards. -nonz Prints file name of non-zero length files ====================================================================== catUncomment - Concatenate input removing lines that start with '#' Output goes to stdout usage: catUncomment file(s) ====================================================================== ccCp - copy a file to cluster.usage: ccCp sourceFile destFile [hostList] This will copy sourceFile to destFile for all machines in hostList example: ccCp h.zip /var/tmp/h.zip newHosts ====================================================================== cdnaOff - creates sorted offset files that position cDNAs in chromosome. usage: cdnaOff good.txt outputDir\ ====================================================================== cdnaOnOoJobs - make condor submission file for EST and mRNA alignments on draft assembly usage: cdnaOnOoJobs ooDir conDir cdna(s) This will create conDir and fill it up with a condor submission file and various log files. Please give full path names to both ooDir and conDir cdna should be 'refseq', 'mrna', and/or 'est' ====================================================================== checkDbSync - Check databases on different machines are in sync usage: checkDbSync database user password referenceHost otherHost(s) ====================================================================== checkGoldDupes - Check gold files in assembly for duplicates usage: checkGoldDupes ooDir goldFileName options: -xxx=XXX ====================================================================== checkNt - Check that ctg_nt.agp, ctg_coords, and ctg.fa are consistent usage: checkNt ctg_ng.agp ctg_coords ctg.fa options: -xxx=XXX ====================================================================== checkYbr - Check NCBI assembly (aka Yellow Brick Road) usage: checkYbr build.agp contig.fa seq_contig.md options: -checkUs=ourDir - check that our NT*/NT*.fa under ourDir look right ====================================================================== checkableBorf - Convert borfBig orf-finder output to checkable form usage: checkableBorf file.borf file.cds output.check options: -xxx=XXX ====================================================================== chopFaLines - Read in FA file with long lines and rewrite it with shorter lines usage: chopFaLines in.fa out.fa ====================================================================== clusterRna - Make clusters of mRNA and ESTs usage: clusterRna database rnaOut.bed estOut.bed options: -MGC=mgc.out - output MGC ESTs to sequence fully -chrom=chrN - work on chrN (default chr22) -rna=rnaTable - table to use for mRNA -est=estTable - table to use for ESTs -orient=orientTable - table to use for EST orientation -group=group.out - produce list of mRNA/EST in cluster ====================================================================== cmpMap - compare maps. usage: cmpMap sequence.inf map.wu map.jk ====================================================================== correctEst - Correct ESTs by passing them through genome usage: correctEst oldEst.fa ali.psl nibDir out.fa The corrected sequence will be in upper case options: -xxx=XXX ====================================================================== countChars - Count the number of occurences of a particular char usage: countChars char file(s) Char can either be a two digit hexadecimal value or a single letter literal character ====================================================================== ctgFaToFa - Convert from one big file with all NT contigs to one contig per file. usage: ctgFaToFa ctg.fa ctg_coords ntDir ====================================================================== ctgToChromFa - convert contig level fa files to chromosome level usage: ctgToChromFa chromName inserts chromDir ordered.lst outFile options: spacing=number - set spacing between contigs to number (default 200000) lift=file.lft - set spacing between contigs from lift file. ====================================================================== dataSim - Simulate system where data is dynamically distributed usage: dataSim machine aCount bCount options: -xxx=XXX ====================================================================== dnaMotifFind - Locate preexisting motifs in DNA sequence usage: dnaMotifFind motifFile sequence.fa output.tab options: -markov=level Level of Markov background model - 0 1 or 2 -background=seq.fa Sequence to use for background model -threshold=N significance threshold (ln based, default 8.0) -rc Include reverse complement ====================================================================== dupeFoo - Do some duplication analysis usage: dupeFoo dupe.psl dupe.fa region.out ====================================================================== eisenInput - Create input for Eisen-style cluster program usage: eisenInput database output.txt ====================================================================== emblMatrixToMotif - Convert transfac matrix in EMBL format to dnaMotif usage: emblMatrixToMotif in.embl out.tab options: -org=human get only human (or mouse or whatever). ====================================================================== endsInLf - Check that last letter in files is end of line usage: endsInLf file(s) options: -zeroOk ====================================================================== estLibStats - Calculate some stats on EST libraries given file from polyInfo usage: estLibStats database eiInfo.bed output options: -xxx=XXX ====================================================================== This program aligns cDNA with genomic sequence. Usage: exonAli named output cdnaName(s) exonAli in output listFile exonAli all output faFile ntDir exonAli starting output faFile ntDir startingIx [count] exonAli resume output faFile ntDir ====================================================================== expToRna - Make a little two column table that associates rnaClusters with expression info usage: expToRna database rnaTable expTable output.tab options: -chrom=chr22 - restrict to a chromosome (default is whole genome) ====================================================================== faCmp - Compare two .fa files usage: faCmp a.fa b.fa ====================================================================== faFilterN - Get rid of sequences with too many N's usage: faFilterN in.fa out.fa maxPercentN options: -out=in.fa.out -uniq=self.psl ====================================================================== faFrag - Extract a piece of DNA from a .fa file. usage: faFrag in.fa start end out.fa ====================================================================== faNcbiToUcsc - Convert FA file from NCBI to UCSC format. usage: faNcbiToUcsc inFile outFile options: -split - split into separate files -ntLast - look for NT_ on last bit -wordBefore=xx The word before the accession, default 'gb' -wordIx=N The word (starting at zero) the accession is in ====================================================================== faNoise - Add noise to .fa file usage: faNoise inName outName transitionPpt transversionPpt insertPpt deletePpt chimeraPpt options: -upper - output in upper case ====================================================================== faOneRecord - Extract a single record from a .FA file usage: faOneRecord in.fa recordName ====================================================================== faRc - Reverse complement a FA file usage: faRc in.fa out.fa In.fa and out.fa may be the same file. options: -keepName - keep name identical (don't prepend RC) ====================================================================== faSize - print total base count in fa files. usage: faSize file(s).fa Command flags detailed=on outputs name and size of each record ====================================================================== faSplit - Split an fa file into several files. usage: faSplit how input.fa count outRoot where how is either 'base' 'sequence' or 'size'. Files split by sequence will be broken at the nearest fa record boundary, while those split by base will be broken at any base. Files broken by size will be broken every count bases. Examples: faSplit sequence estAll.fa 100 est This will break up estAll.fa into 100 files (numbered est001.fa est002.fa, ... est100.fa Files will only be broken at fa record boundaries faSplit base chr1.fa 10 1_ This will break up chr1.fa into 10 files faSplit size input.fa 2000 outRoot This breaks up input.fa into 2000 base chunks faSplit about est.fa 20000 outRoot This will break up est.fa into files of about 20000 bytes each. Options: -maxN=N - Suppress pieces with more than maxN n's. Only used with size. default is size-1 (only suppresses pieces that are all N). -oneFile - Put output in one file. Only used with size -out=outFile Get masking from outfile. Only used with size. -lift=file.lft Put info on how to reconstruct sequence from pieces in file.lft. Only used with size ====================================================================== faToNib - Convert from .fa to .nib format usage: faToNib [options] in.fa out.nib options: -softMask - create nib that soft-masks lower case sequence Note gfServer/gfClient don't know about this yet -hardMask - create nib that hard-masks lower case sequence ====================================================================== faTrans - Translate DNA .fa file to peptide usage: faTrans in.fa out.fa options: -stop stop at first stop codon (otherwise puts in Z for stop codons) -offset=N start at a particular offset. ====================================================================== fakeFinContigs - Fake up contigs for a finished chromosome usage: fakeFinContigs fin.agp fin.fa finDir rootName finFaDir ooVer This will scan fin.agp for gaps, and create contigs in finDir for each section between gaps Example: fakeFinContigs chr20.agp chr20.fa . ctg20fin ~/gs/fin/fa 101 ====================================================================== fakeOut - fake a RepeatMasker .out file based on a N's in .fa file usage: fakeOut x.fa.masked x.fa.out ====================================================================== featureBits - Correlate tables via bitmap projections. usage: featureBits database table(s) This will return the number of bits in all the tables anded togetherOptions: -bed=output.bed Put intersection into bed format -fa=output.fa Put sequence in intersection into .fa file -faMerge For fa output merge overlapping features. -minSize=N Minimum size to output (default 1) -chrom=chrN Restrict to one chromosome You can include a '!' before a table name to negate it. Some table names can be followed by modifiers such as: :exon:N Break into exons and add N to each end of each exon :cds Break into coding exons :intron:N Break into introns, remove N from each end :upstream:N Consider the region of N bases before region :end:N Consider the region of N bases after region ====================================================================== ffaToFa convert Greg Schuler .ffa fasta files to UCSC .fa fasta files usage: ffaToFa file.ffa faDir trans where ffaDir is directory full of .ffa files, faDir is where you want to put the corresponding .fa files, trans is a table that translates from one name to the other and cloneSizes is a file that lists the size of each clone. If you put 'stdin' for file.ffa, it will read from standard input. ====================================================================== findContigsWithClones - find contigs that contain clones usage: findContigsWithClones cloneList ooDir fileName example: findContigsWithClones chr22.lst oo.29 gold.99 ====================================================================== fixCr - strip s from ends of lines ====================================================================== flagMhcClones - Look for clones Stephan wants in MHC. usage: flagMhcClones mhcClones.txt gs.N ====================================================================== fqToQa - convert from fq format with one big file to format with one file per clone. usage: fqToQa infile.fq outDir qaInfo This will put the quality scores from infile.fq into a series of .qa files in outdir, one file per clone. Info from the '>' lines from the .fq file will be stored in the qaInfo file ====================================================================== fqToQac - convert from fq format with one big file to compressed format with one file per clone. usage: fqToQac [infile.fq] outDir This will put the quality scores from infile.fq into a series of .qac files in outdir, one file per clone. The .qac files can be uncompressed with qacToQa. ====================================================================== fragPart - get part of a fragment's sequence usage: fragPart acc.frag [strand] [start] [end] examples: fragPart AC000001.4_3 + 100 200 This gets between bases 100 and 200 of fragment 4_3 in AC000001.fa somewhere.... ====================================================================== usage: freen output ====================================================================== g2gOverlap- Look at clone clone overlap. usage: g2gOverlap source.psl output [maxBad%] [maxTail] [minUniq] [minMatch] [minFragSize] This would put the overlaps in source.psl into out.long in the forms of tab delimited lines: The optional parameters have the following functions and defaults maxBad% (1) - Maximum percentage of mismatches and inserts maxTail (200) - Maximum non-aligning section on end of fragment minUnique (50) - Minimum number of non-repeat-masked matching bases minMatch (100) - Minimum number of matching bases minFragSize(200) - Minimum size of fragments ====================================================================== g2gSeqOverlap - make a big .fa file with overlap sequence. usage: g2gSeqOverlap pairFile g2g.psl output.fa ====================================================================== gapper - try to find unsequenced gaps in the draft human genome by looking at BAC end pair alignments between clones and between fingerprint contigs. usage: gapper mapFile pairFile bacEnds.psl ooDir ooVersion outRoot This will look at the WashU style fingerprint map file in mapFile the list of BAC end pairs in pairFile and the bacEnd/clone alignments in bacEnd.psl, analyze them, and put the results in a series of files named outRoot.xxx, where xxx includes: .bep - pairs of clones connected by a single bac end ====================================================================== gb2cdi - convert GeneBank (GB) files to .fa and cDna Info (CDI) file. usage: gb2cdi file(s).gb file.fa file.cdiFile ====================================================================== gbOneAcc - retrieve one or a few records from a GenBank flat file. usage: gbOneAcc gbFile acc(s) The output will be printed to standard out ====================================================================== gbToFaRa - Convert GenBank flat format file to an fa file containing the sequence data, an ra file containing other relevant info and a ta file containing summary statistics. usage: gbToFaRa filterFile faFile raFile taFile genBankFile(s) where filterFile is definition of which records and fields to use or the word null if you want no filtering.options: byOrganism=outputDir - Make separate files for each organism ====================================================================== gbtofa converts from GeneBank to fa format. usage: gbtofa in.gb out.fa ====================================================================== gcForBed - Calculate g/c percentage and other stats for regions covered by bed usage: gcForBed in.bed nibDir options: -xxx=XXX ====================================================================== gensub2 - Generate condor submission file from template and two file lists usage: gensub2