Galaxy workflows are scientific data processing pipelines for performing and reproducing data analyses. The workflows are complex and difficult to create from a suite of 2,000+ tools available in Galaxy, especially for new Galaxy users. In order to make creating workflows easier, faster and less error-prone, a recommendation system was developed to predict following tools. The predictive system analyses the complete set of workflows available on Galaxy’s European server using a deep learning approach to create a tool prediction model.
Workflows are directed acyclic graphs. To create the predictive model, sequences (paths) of tools are extracted from these graphs and learned by a deep learning approach ( Gated Recurrent Neural Network ). The hyperparameters of the deep learning model are optimised using bayesian optimisation. The usage frequency of tools is integrated in the model so that the tools which have not been used recently do not appear in the set of possible tools. This is achieved by learning the usage of each tool over time using a support vector regression model.
An API was developed to predict tools and visualize them using a user interface. It can be used in the Galaxy workflow editor (It is not yet available publicly. The complete code is located here ). The API can be also used for multiple user interface integrations. Using the tool recommendation system, a user does not need to search for the tools in the tool box to create a workflow. The possible tools are available in the “recommended tools” modal popup.