WHAT IS BIOMAJ?

BioMAJ ¹ (BIOlogie Mise A Jour) is a databank management software. Its purpose is to maintain any set of data up to date. With its automation, it takes care of all different steps of dataset management (updates monitoring, download, checking, processing, etc.). The software provides support for the management of many sequence databanks on a site. With BioMAJ you can easily and rapidly develop new workflows to manage new public datasets.

Why use BioMAJ?

Many technologies, including genomics, transcriptomics, metagenomics, proteomics, are now generating massive amounts of data. The analysis of these data consists in most cases in a comparison between newly produced data and a reference set of information available in databanks.

Currently, there are many general (such as GenBank, DDBJ, EMBL) and specialized databanks for each scientific domain. For biologists and bioinformaticians, the challenge is to keep all their databanks up to date with the latest available version of genome, transcriptome, annotation. BioMAJ is able to help scientists with this time consuming task.

To simplify databank management, we have developed BioMAJ. BioMAJ is a workflow engine dedicated to data synchronization and processing. Its purpose is to follow and retrieve databank updates. It also can pre- or post- process each downloaded bank. This tool saves a considerable amount of time, you do not have to think when to check databank, and what is the local version of your databank.

Technical definition

BioMAJ is an application designed to help maintain a data warehouse and is intended to assist the data administrator. BioMAJ can manage a great mass of similar data and locally consolidate all distributed sources available on the network via protocols such as ftp, http and rsync, and then automate more or less complex processing chains on this data. The application automates the updating cycle and facilitates supervision of the catalogue of managed databanks while providing a history of maintenance operations. The application helps maintain the main biological databanks provided by the international scientific community on a local platform. BioMAJ’s field of application is nonetheless wider than this and could be extended to any area which manages massive distributed data.

Common usages

The most obvious use is the management of public databases for a bioinformatics facility. Remote databanks (Genbank for example) are downloaded and processed (blast indexing, emboss indexing,…) before being made available for the users. When all treatments are successfully applied, the bank is copied into a dedicated release directory for production. With cron tasks, update tasks can be executed at regular intervals and data are only downloaded if a change is detected (version number).

Since any script can be applied on downloaded data this lead to another usage, the generation of derived databanks. BioMAJ is used to update and process several databases on our site: GAG, AMyDB, etc.

Filangi O, Beausse Y, Assi A, et al. BioMAJ: a flexible framework for databanks synchronization and processing. Bioinformatics. 2008;24(16):1823-1825. [PubMed]