Configuration variables

Cloud provider configuration

GCP project

GCP project ID to use.

Also supported via the environment variable: ELB_GCP_PROJECT.

[cloud-provider]
gcp-project = my-gcp-project

GCP region

Name of the GCP region to use. Recommended value: us-east4.

Also supported via the environment variable: ELB_GCP_REGION.

[cloud-provider]
gcp-region = us-east4

GCP zone

Name of the GCP zone to use. Recommended value: us-east4-b.

Also supported via the environment variable: ELB_GCP_ZONE.

[cloud-provider]
gcp-zone = us-east4-b

AWS Region

Name of the AWS region to use. Recommended value: us-east-1.

For background information about AWS regions, please see the AWS documentation.

[cloud-provider]
aws-region = us-east-1

AWS Subnet

Optional: Name of the AWS subnet to use; must exist in the chosen region.

  • Default: None

  • Values: String

[cloud-provider]
aws-subnet = subnet-SOME-RANDOM-STRING

AWS Security Group

Optional: Name of the AWS security group to use; must exist in the chosen region.

  • Default: None

  • Values: String

[cloud-provider]
aws-security-group = sg-SOME-RANDOM-STRING

AWS Key Pair

Optional: Name of the AWS key pair to use to login to EC2 instances; must exist in the chosen region.

  • Default: None

  • Values: String

[cloud-provider]
aws-key-pair = my-aws-key-name

Cluster configuration

Cluster name

Name of the GKE cluster created.

  • Default: ${USER}-elastic-blast.

  • Values: String

Also supported via the environment variable: ELB_CLUSTER_NAME.

[cluster]
name = my-cluster

Number of worker nodes

Number of machine type nodes to start in the kubernetes cluster.

  • Default: 1

  • Values: Positive integer

[cluster]
num-nodes = 4

Use preemptible nodes

Use preemptible nodes in the kubernetes cluster.

Note: Pre-emptible nodes are rebooted after 24 hours (by GCP). This is fine in most cases as Kubernetes will restart the node and resubmit the search (i.e., batch) that was interrupted. The batches that have already been processed are not lost. The only issue is if a single batch takes longer than 24 hours. We expect the overwhelming majority of Elastic-BLAST searches to take at most several hours, so this should not be an issue at all.

  • Default: no

  • Values: Any string. Set to yes enable.

[cluster]
use-preemptible = yes

Machine type

Type of GCP machine to start as kubernetes worker.

NOTE: The machine’s available RAM must be as large as the size of the BLASTDB specified by BLAST database.

We recommend that you choose a machine with at least 50% more RAM than the BLASTDB size.

[cluster]
machine-type = n1-standard-32

Number of CPUs

Number of CPUs to use per BLAST execution in a kubernetes job.

Must be less than the number of CPUs for the chosen machine type.

For smaller BLAST databases (e.g.: swissprot, pdbnt) a smaller value (e.g.: 4) results in faster runtimes. For nt, experiment using values of 15 and/or 30.

  • Default: 30

  • Values: Positive integer

[cluster]
num-cpus = 30

Persistent disk size

Size of the persistent disk attached to the cluster. This should be large enough to store the BLAST database, query sequence data and the BLAST results.

Format as <number> immediately followed by G for gigabytes, M for megabytes.

Note: Smaller disks than 1000G result in performance degradation.

  • Default: 3000G

  • Values: String

[cluster]
pd-size = 1000G

Minimum number of nodes

Experimental.

Specifies the minimum number of nodes in the kubernetes cluster, enabling auto-scaling.

Requires Maximum number of nodes.

  • Default: None

  • Values: Positive integer

[cluster]
min-nodes = 1

Maximum number of nodes

Experimental.

Specifies the maximum number of nodes in the kubernetes cluster, enabling auto-scaling.

Requires Minimum number of nodes.

  • Default: None

  • Values: Positive integer

[cluster]
max-nodes = 1

Cloud resource labels

Specifies the labels to attach to cloud resources created by ElasticBLAST.

  • Default: cluster-name={cluster_name},client-hostname={hostname},created={create_date},owner={username},project=elastic-blast,creator={username},program={blast_program},db={db}

  • Values: String of key-value pairs separated by commas. See GCP documentation on labels for details.

[cluster]
labels = key1=value1,key2=value2

BLAST configuration options

BLAST program

BLAST program to run.

  • Default: blastn

  • Values: One of: blastp, blastn, megablast, blastx, tblastn, tblastx, psiblast, rpsblast, rpstblastn

NOTE: Currently only blastn is supported in AWS.

[blast]
program = blastp

BLAST options

BLAST options to customize BLAST invocation.

Note: the default output format in ElasticBLAST is 11 (BLAST archive).

If you do not specify an output format (with -outfmt), you can use blast_formatter to format the results in any desired output format.

Below, we have specified “-outfmt 7” for the BLAST tabular format and requested blastp-fast mode.

[blast]
options = -task blastp-fast -outfmt 7

BLAST database

BLAST database name to search. To search a database provided in the cloud by the NCBI use the database name. To search your own custom database, upload the database files to a cloud storage bucket and provide the bucket’s universal resource identifier (URI) plus the database name (see example below).

  • Default: None

  • Values: String. Run the command below to get a list of available options:

update_blastdb.pl --source gcp --showall pretty
Sample BLAST database configuration
[blast]
db = nr
Sample custom BLAST database configuration
[blast]
db = gs://my-database-bucket/mydatabase

Batch length

Number of bases/residues per query batch.

NOTE: this value should change with BLAST program.

  • Default: Auto-configured for supported programs.

  • Values: Positive integer

Also supported via the environment variable: ELB_BATCH_LEN.

[blast]
batch-len = 10000

Input/output configuration options

Query sequence data

Query sequence data for BLAST.

Can be provided as a local path or GCS bucket URI to a single file/tarball.

  • Default: None

  • Values: String

[blast]
queries = /home/${USER}/blast-queries.tar.gz

Results bucket

GCS or AWS S3 bucket URI where to save the output from ElasticBLAST.

Note: This bucket must exist prior to invoking ElasticBLAST and it must include the gs:// or s3:// prefix.

  • Default: gs://${USER}-test for GCP; s3://elasticblast-${USER} for AWS.

  • Values: String

Also supported via the environment variable: ELB_RESULTS_BUCKET.

[blast]
results-bucket = ${YOUR_RESULTS_BUCKET}

Timeout configuration options

BLAST timeout

Timeout in minutes after which kubernetes will terminate a single BLAST job (i.e.: that corresponds to one of the query batches).

  • Default: 10080 (1 week)

  • Values: Positive integer

[timeouts]
blast-k8s-job = 10080

BLASTDB initialization timeout

Timeout in minutes to wait for the persistent disk to be initialized with the selected BLAST database.

  • Default: 45

  • Values: Positive integer

[timeouts]
init-pv = 45

Developer configuration options

ELB_DONT_DELETE_SETUP_JOBS

Set via an environment variable.

  • Default: Disabled

  • Values: Any string. Set to any value to enable.

Do not delete the kubernetes setup jobs after they complete.

ELB_PAUSE_AFTER_INIT_PV

Set via an environment variable.

  • Default: 120

  • Values: Positive integer.

Time in seconds to wait after persistent volume gets initialized to prevent mount errors on BLAST kubernetes jobs.