Global properties

The global.properties file is mandatory and you can use directly the example given at the end of the page or on the biomaj github . If not specified, ‘global.properties’ will be searched in current directory or at BIOMAJ_CONF environment variable path (export BIOMAJ_CONF=/xx/yy/global.properties)

The main configuration, shared by all banks, is in the global.properties file. It can also be superseeded by a file in user home directory ~/.biomaj.cfg (optional).

You can start to use the example file, copy it in your BioMAJ directory under the name « global.properties ». Do not forget to edit the root.dir with your settings.

Mandatory parameters

BioMAJ needs to know :

  • [GENERAL] : header of the global.properties file
  • root.dir : path to directory that contains all files of BioMAJ (database, log, properties files)
  • conf.dir=%(root.dir)s/conf: directory for all bank.properties files
  • log.dir=%(root.dir)s/log: directory of all log files for each bank update
  • process.dir=%(root.dir)s/process: directory to store all process files
  • cache.dir=%(root.dir)s/cache
  • lock.dir=%(root.dir)s/lock: Directory where the bank lock files are stored
  • data.dir=%(root.dir)s/db: The root directory where all databases are stored.
  • db.url=mongodb://localhost:27017
  • db.name=biomaj

If your data is not stored under one directory hierarchy, you can override this value in the database properties file.
data.dir=/var/lib/biomaj

Optional parameters:

Reporting

  • mail.from: sender’s email address
  • mail.smtp.host: server smtp address
  • mail.admin: list of email addresses to send reports to

Options

  • use_ldap=0: ldap authentification
  • use_elastic=1: Using ElasticSearch to index/search
  • historic.logfile.level=DEBUG: definition of the information level for the output
  • bank.num.threads=4: number of threads for bank management
  • files.num.threads=4: number of threads for downloading
  • visibility.default=public: access to default banks

Parsing of http server

It is possible to extract bank information from an URL:

  • Date
  • Name
  • Size

Example of the regular expression used by default in BioMAJ:

http.parse.dir.line=<img[\s]+src="[\S]+"[\s]+alt="\
[DIR\]"[\s]*/?>[\s]*<a[\s]+href="([\S]+)/"[\s]*>.*([\d]
{2}-[\w\d]{2,5}-[\d]{4}\s[\d]{2}:[\d]{2})

http.parse.file.line=<img[\s]+src="[\S]+"[\s]+alt="\[[\s]
+\]"[\s]*/?>[\s]<a[\s]+href="([\S]+)".*([\d]{2}-[\w\d]
{2,5}-[\d]{4}\s[\d]{2}:[\d]{2})[\s]+([\d\.]+[MKG]{0,1})

http.group.dir.name=1
http.group.dir.date=2
http.group.file.name=1
http.group.file.date=2
http.group.file.size=3

More options here.

How to modify and use your own global.properties with docker ?

In the docker version, BioMAJ uses a standard version of the global.properties file that is not accessible for modifications. It is possible to customize your own global.properties file and use it in the docker version of biomaj by mounting it in all services of the docker-compose.yml file, as follows :

volumes:
    - <path_to_global_file>:/etc/biomaj/global.properties

For example in the biomaj-public-proxy service of the docker-compose.yml file:

biomaj-public-proxy:
    image: osallou/biomaj-proxy
    volumes:
        - ${BIOMAJ_DIR}/proxy/public:/proxy:ro
        - ${BIOMAJ_DIR}/biomaj:/var/lib/biomaj/data
        - <path_to_global_file>:/etc/biomaj/global.properties 
    ports:
        - "5000:80"
    depends_on:
        - biomaj-consulv

Remember to add it for each service: biomaj-mongo, biomaj-redis, biomaj-elasticsearch, biomaj-download-message etc…

Example of a general properties file:

[GENERAL]
test=1
root.dir=/<path>/<to>/<biomaj>/<file>
conf.dir=%(root.dir)s/conf
log.dir=%(root.dir)s/log
process.dir=%(root.dir)s/process
cache.dir=%(root.dir)s/cache
# Directory where the bank lock files are stored
lock.dir=%(root.dir)s/lock 
data.dir=%(root.dir)s/db


db.url=mongodb://localhost:27017
db.name=biomaj_test

use_ldap=1
ldap.host=localhost
ldap.port=389
ldap.dn=nodomain

# Use ElasticSearch for index/search capabilities
use_elastic=0
#Comma separated list of elasticsearch nodes host1,host2:port2
elastic_nodes=localhost
elastic_index=biomaj_test

celery.queue=biomaj
celery.broker=mongodb://localhost:27017/biomaj_celery

# Get directory stats (can be time consuming depending on number of files etc...)
data.stats=1

# List of user admin (linux user id, comma separated)
admin=

# Auto publish on updates (do not need publish flag, can be ovveriden in bank property file)
auto_publish=0

########################
# Global properties file

 

#To override these settings for a specific database go to its
#properties file and uncomment or add the specific line you want
#to override.

#----------------
# Mail Configuration
#---------------
#Uncomment thes lines if you want receive mail when the workflow is finished

#mail.smtp.host=
#mail.stmp.host=
mail.admin=
mail.from=
#mail.user=
#mail.password=
#mail.tls=
#---------------------
#Proxy authentification
#---------------------
#proxyHost=
#proxyPort=
#proxyUser=
#proxyPassword=

#Number of thread for processes
bank.num.threads=2

#Number of threads to use for downloading
files.num.threads=4

#to keep more than one release increase this value
keep.old.version=0

#----------------------
# Release configuration
#----------------------
release.separator=_

#The historic log file is generated in log/
#define level information for output : DEBUG,INFO,WARN,ERR
historic.logfile.level=DEBUG

#http.parse.dir.line=<a[\s]+href="([\S]+)/".*alt="\[DIR\]">.*([\d]{2}-[\w\d]{2,5}-[\d]{4}\s[\d]{2}:[\d]{2})
http.parse.dir.line=<img[\s]+src="[\S]+"[\s]+alt="\[DIR\]"[\s]*/?>[\s]*<a[\s]+href="([\S]+)/"[\s]*>.*([\d]{2}-[\w\d]{2,5}-[\d]{4}\s[\d]{2}:[\d]{2})
http.parse.file.line=<img[\s]+src="[\S]+"[\s]+alt="\[[\s]+\]"[\s]*/?>[\s]<a[\s]+href="([\S]+)".*([\d]{2}-[\w\d]{2,5}-[\d]{4}\s[\d]{2}:[\d]{2})[\s]+([\d\.]+[MKG]{0,1})

http.group.dir.name=1
http.group.dir.date=2
http.group.file.name=1
http.group.file.date=2
http.group.file.size=3

 

# Bank default access
visibility.default=public

 

[loggers]
keys = root, biomaj

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
level = INFO
handlers = console

[logger_biomaj]
level = DEBUG
handlers = console
qualname = biomaj
propagate=0

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = DEBUG
formatter = generic

[formatter_generic]
format = %(asctime)s %(levelname)-5.5s [%(name)s][%(threadName)s] %(message)s