How to run BioMAJ?
Definitions
- Bank: Any file or set of files located on a remote server, in BioMAJ a bank corresponds to a <bank>.properties file. (This file describes all properties of the bank server, protocol, location, processes, etc.)
- Process: All transformations applied upstream or downstream of bank uploads (scripts)
- Workflow : how does it work ? Here.
BioMAJ needs to know:
- its configuration parameters in the global.properties file: How to create a global.properties file?
- which database to update with the bank.properties files: How to create a bank.properties file?
- which processes (alignment, indexation …) will be applied to the database with the process files: How to create a process script?
Directory structure
How will banks be stored?
- /db/data
- /offlinedir
- /bank_1
- /version_1
- /version_2
- current => version_2: current is the symlink on the used bank
- /bank_2
- /version_1
- /version_2
- current => version_2
- …
- <rootdir>/conf: configuration files (<bank>.properties)
- <rootdir>/lock: bank locking (to avoid error in download, processing…)
- <rootdir>/process: where to store processes (it also could be in « PATH »), every process can be could by BioMAJ or by a wrapper.
- <rootdir>/cache
- <rootdir>/log: where to store the logs of each bank and process (/bank/version/execution)
At first, the pre-processes will be applied then the files will be downloaded and uncompressed. It is possible to make a selection on the files via the variable local. file, the final files will be stored in the flat/ directory and the post-processes will be applied to the files.
Some generalities
- Only one remote location per bank (it is not possible to mix protocols)
- All execution logs are logged in log dir per bank/version/execution, including
per-process log. - If a workflow step fails, the update stops. At the next update, the worklfow restarts at the failed stage.
- The bank is usable when the entire workflow has been successfully completed.
- In case of failure, only files whose download is incomplete or a failure will be downloaded again
Then you can start to use the BioMAJ client:
The global.properties file is mandatory. If not specified, ‘global.properties’ will be searched in current directory or at BIOMAJ_CONF environment variable path (export BIOMAJ_CONF=/xx/yy/global.properties)
Need help?
biomaj-cli.py -h
How to check a bank status?
biomaj-cli.py --config global.properties --status --bank alu
How to check if your bank file is OK?
biomaj-cli.py --config global.properties --check --bank alu
How to update a bank?
biomaj-cli.py --config global.properties --bank alu --update
How to publish a bank and what is a published bank?
A published bank creates a symbolic link current on the specified released. This helps user accessing a bank with the same path (/../mybank/current). You can manage publishing at update time or later on with the –publish or –publish-version options. One and only one bank release can be published for each bank.
biomaj-cli.py --config global.properties --bank alu --publish
See more options.
How to run BioMAJ with docker? Here.
How does it works? Here.