Configuration variables¶
Cloud provider configuration¶
GCP project¶
GCP project ID to use.
Default: None
Values: String, see Identifying projects
Also supported via the environment variable:
ELB_GCP_PROJECT.
[cloud-provider]
gcp-project = my-gcp-project
GCP region¶
Name of the GCP region to use. Recommended value:
us-east4.
Default: None
Values: String, see GCP region/zone documentation
Also supported via the environment variable:
ELB_GCP_REGION.
[cloud-provider]
gcp-region = us-east4
GCP zone¶
Name of the GCP zone to use. Recommended value:
us-east4-b.
Default: None
Values: String, see GCP region/zone documentation
Also supported via the environment variable:
ELB_GCP_ZONE.
[cloud-provider]
gcp-zone = us-east4-b
AWS Region¶
Name of the AWS region to use. Recommended value:
us-east-1.
Default: None
Values: String, any region that supports Batch, see AWS documentation for details
For background information about AWS regions, please see the AWS documentation.
[cloud-provider]
aws-region = us-east-1
AWS Subnet¶
Optional: Name of the AWS subnet to use; must exist in the chosen region.
Default: None
Values: String
[cloud-provider]
aws-subnet = subnet-SOME-RANDOM-STRING
Cluster configuration¶
Cluster name¶
Name of the GKE cluster created.
Default:
${USER}-elastic-blast.Values: String
Also supported via the environment variable:
ELB_CLUSTER_NAME.
[cluster]
name = my-cluster
Number of worker nodes¶
Number of machine type nodes to start in the kubernetes cluster.
Default:
1Values: Positive integer
[cluster]
num-nodes = 4
Use preemptible nodes¶
Use preemptible nodes in the kubernetes cluster.
Note: Pre-emptible nodes are rebooted after 24 hours (by GCP). This is fine in most cases as Kubernetes will restart the node and resubmit the search (i.e., batch) that was interrupted. The batches that have already been processed are not lost. The only issue is if a single batch takes longer than 24 hours. We expect the overwhelming majority of Elastic-BLAST searches to take at most several hours, so this should not be an issue at all.
Default:
noValues: Any string. Set to
yesenable.
[cluster]
use-preemptible = yes
Machine type¶
Type of GCP machine to start as kubernetes worker.
NOTE: The machine’s available RAM must be as large as the size of the BLASTDB specified by BLAST database.
We recommend that you choose a machine with at least 50% more RAM than the BLASTDB size.
Default:
n1-standard-32.Values: String, see GCP machine types
[cluster]
machine-type = n1-standard-32
Number of CPUs¶
Number of CPUs to use per BLAST execution in a kubernetes job.
Must be less than the number of CPUs for the chosen machine type.
For smaller BLAST databases (e.g.:
swissprot,pdbnt) a smaller value (e.g.: 4) results in faster runtimes. Fornt, experiment using values of 15 and/or 30.
Default:
30Values: Positive integer
[cluster]
num-cpus = 30
Persistent disk size¶
Size of the persistent disk attached to the cluster. This should be large enough to store the BLAST database, query sequence data and the BLAST results.
Format as <number> immediately followed by G for gigabytes, M for megabytes.
Note: Smaller disks than
1000Gresult in performance degradation.
Default:
3000GValues: String
[cluster]
pd-size = 1000G
Minimum number of nodes¶
Experimental.
Specifies the minimum number of nodes in the kubernetes cluster, enabling auto-scaling.
Requires Maximum number of nodes.
Default: None
Values: Positive integer
[cluster]
min-nodes = 1
Maximum number of nodes¶
Experimental.
Specifies the maximum number of nodes in the kubernetes cluster, enabling auto-scaling.
Requires Minimum number of nodes.
Default: None
Values: Positive integer
[cluster]
max-nodes = 1
Cloud resource labels¶
Specifies the labels to attach to cloud resources created by ElasticBLAST.
Default:
cluster-name={cluster_name},client-hostname={hostname},created={create_date},owner={username},project=elastic-blast,creator={username},program={blast_program},db={db}Values: String of key-value pairs separated by commas. See GCP documentation on labels for details.
[cluster]
labels = key1=value1,key2=value2
BLAST configuration options¶
BLAST program¶
BLAST program to run.
Default:
blastnValues: One of:
blastp,blastn,megablast,blastx,tblastn,tblastx,psiblast,rpsblast,rpstblastnNOTE: Currently only
blastnis supported in AWS.
[blast]
program = blastp
BLAST options¶
BLAST options to customize BLAST invocation.
Note: the default output format in ElasticBLAST is 11 (BLAST archive).
If you do not specify an output format (with -outfmt), you can use blast_formatter to format the results in any desired output format.
Below, we have specified “-outfmt 7” for the BLAST tabular format and requested blastp-fast mode.
Default: None
Values: String, see BLAST+ options
[blast]
options = -task blastp-fast -outfmt 7
BLAST database¶
BLAST database name to search. To search a database provided in the cloud by the NCBI use the database name. To search your own custom database, upload the database files to a cloud storage bucket and provide the bucket’s universal resource identifier (URI) plus the database name (see example below).
Default: None
Values: String. Run the command below to get a list of available options:
update_blastdb.pl --source gcp --showall pretty
[blast]
db = nr
[blast]
db = gs://my-database-bucket/mydatabase
Batch length¶
Number of bases/residues per query batch.
NOTE: this value should change with BLAST program.
Default: Auto-configured for supported programs.
Values: Positive integer
Also supported via the environment variable:
ELB_BATCH_LEN.
[blast]
batch-len = 10000
Memory request for BLAST search¶
Minimum amount of RAM to allocate to a BLAST search.
Format as <number> immediately followed by G for gigabytes, M for megabytes.
Must be less than available RAM for the chosen machine type.
Default: Auto-configured based on database choice. Minimal value is
0.5G.Values: String
See also:
[blast]
mem-request = 95G
Memory limit for BLAST search¶
Maximum amount of RAM that a BLAST search can use.
Format as <number> immediately followed by G for gigabytes, M for megabytes.
Must be less than available RAM for the chosen machine type.
Default: Auto-configured based on database choice. Maximal value is
0.95of the RAM available in the machine type.Values: String
See also:
[blast]
mem-limit = 115G
Input/output configuration options¶
Query sequence data¶
Query sequence data for BLAST.
Can be provided as a local path or GCS bucket URI to a single file/tarball.
Default: None
Values: String
[blast]
queries = /home/${USER}/blast-queries.tar.gz
Results bucket¶
GCS or AWS S3 bucket URI where to save the output from ElasticBLAST.
Note: This bucket must exist prior to invoking ElasticBLAST and it must include the
gs://ors3://prefix.
Default:
gs://${USER}-testfor GCP;s3://elasticblast-${USER}for AWS.Values: String
Also supported via the environment variable:
ELB_RESULTS_BUCKET.
[blast]
results-bucket = ${YOUR_RESULTS_BUCKET}
Timeout configuration options¶
BLAST timeout¶
Timeout in minutes after which kubernetes will terminate a single BLAST job (i.e.: that corresponds to one of the query batches).
Default:
10080(1 week)Values: Positive integer
[timeouts]
blast-k8s-job = 10080
BLASTDB initialization timeout¶
Timeout in minutes to wait for the persistent disk to be initialized with the selected BLAST database.
Default:
45Values: Positive integer
[timeouts]
init-pv = 45
Developer configuration options¶
ELB_DONT_DELETE_SETUP_JOBS¶
Set via an environment variable.
Default: Disabled
Values: Any string. Set to any value to enable.
Do not delete the kubernetes setup jobs after they complete.
ELB_PAUSE_AFTER_INIT_PV¶
Set via an environment variable.
Default: 120
Values: Positive integer.
Time in seconds to wait after persistent volume gets initialized to prevent mount errors on BLAST kubernetes jobs.