Variants Batch¶

The VariantsBatch component of the Genomcore API allows you to store and query genetic variant observations.

Get more information about the Variant API here

Create¶

Create variant observations.The following code starts the sdk, the logging module and loads an .env file. Assumes that the user has a list of observations(list of dictionaries) and sends

Reads a putative csv file with the necessary columns. Construct the list of dictionaries required and send the observations to the Variant API in chunks of 300 observations.

from dotenv import load_dotenv
from genomcore.client import GenomcoreApiClient


api = GenomcoreApiClient(
    token=os.getenv("TOKEN"),
    refresh_token=os.getenv("REFRESH_TOKEN")
)

single_observation={
  "collection":"kafka",
  "Position": {
    "VCF_Genome": "hg38",
    "VCF_Chr": "chr1",
    "VCF_Position": 1013466
  },
  "Genotype": {
    "VCF_Allele_REF": "T",
    "VCF_Allele_ALT": "TA",
    "INFO_CALC_Genotype": "T>TA/TA",
    "VCF_FORMAT_GT": "1|1",
    "VCF_FORMAT_GQ": 99,
    "VCF_FORMAT_PL": "2246,156,0",
    "VCF_FORMAT_PS": 1013466,
    "VCF_INFO_AC": "2",
    "VCF_INFO_AF": "1",
    "VCF_INFO_AN": 2,
    "INFO_CSQ_Allele": "A",
    "INFO_CSQ_ZYG": "HOM",
    "INFO_CSQ_ALLELE_NUM": "1"
  },
  "Calling_statistics": {
    "VCF_Quality": 2232.03,
    "VCF_Filter": "PASS",
    "VCF_FORMAT_DP": 52,
    "INFO_CALC_Read_depth_REF": 0,
    "INFO_CALC_Read_depth_ALT": 52,
    "INFO_CALC_Read_depth": 52,
    "INFO_CALC_Read_percentage_REF": 0,
    "INFO_CALC_Read_percentage_ALT": 100,
    "VCF_INFO_DP": 60,
    "VCF_INFO_MQ": 60
  },
  "origin": "GERMLINE",
  "type": "SNV/INDEL",
  "Feature": {
    "INFO_CSQ_ENSConsequence": "upstream_gene_variant",
    "INFO_CSQ_ENSFeature_type": "Transcript",
    "INFO_CSQ_ENSFeature": "ENST00000649529",
    "INFO_CSQ_ENSCANONICAL": "YES",
    "INFO_CSQ_PUBMED": "25741868",
    "INFO_CSQ_Consequence": "upstream_gene_variant",
    "INFO_CSQ_IMPACT": "MODIFIER",
    "INFO_CSQ_Feature_type": "Transcript",
    "INFO_CSQ_Feature": "NM_005101.4",
    "INFO_CSQ_BIOTYPE": "protein_coding",
    "INFO_CSQ_DISTANCE": "30",
    "INFO_CSQ_STRAND": "1"
  },
  "Gene": {
    "INFO_CSQ_SYMBOL": "ISG15",
    "INFO_CSQ_Gene": "ENSG00000187608",
    "INFO_CSQ_SYMBOL_SOURCE": "HGNC",
    "INFO_CSQ_HGNC_ID": "HGNC:4053"
  },
  "External_databases": {
    "INFO_CSQ_SWISSPROT": "P05161.238",
    "INFO_CSQ_UNIPARC": "UPI0000048D70"
  },
  "Clinical_significance": {
    "INFO_CSQ_GENE_PHENO": "1",
    "INFO_IV_ACMG": "Benign"
  },
  "Population_AF": {
    "INFO_CSQ_1KG_ALL_AF": "0.8544",
    "INFO_CSQ_1KG_AFR_AF": 0.5575,
    "INFO_CSQ_1KG_AMR_AF": 0.9366,
    "INFO_CSQ_1KG_EAS_AF": 0.9464,
    "INFO_CSQ_1KG_EUR_AF": 0.9791,
    "INFO_CSQ_1KG_SAS_AF": 0.9744,
    "INFO_CSQ_gnomADe_AF": 0.959,
    "INFO_CSQ_gnomADe_AFR_AF": 0.5708,
    "INFO_CSQ_gnomADe_AMR_AF": 0.9506,
    "INFO_CSQ_gnomADe_ASJ_AF": 0.9217,
    "INFO_CSQ_gnomADe_EAS_AF": 0.9667,
    "INFO_CSQ_gnomADe_FIN_AF": 0.985,
    "INFO_CSQ_gnomADe_NFE_AF": 0.9743,
    "INFO_CSQ_gnomADe_SAS_AF": 0.9567,
    "INFO_CSQ_gnomADg_AF": 0.8635,
    "INFO_CSQ_gnomADg_AFR_AF": 0.5936,
    "INFO_CSQ_gnomADg_AMI_AF": 0.9945,
    "INFO_CSQ_gnomADg_AMR_AF": 0.931,
    "INFO_CSQ_gnomADg_ASJ_AF": 0.928,
    "INFO_CSQ_gnomADg_EAS_AF": 0.9509,
    "INFO_CSQ_gnomADg_FIN_AF": 0.9859,
    "INFO_CSQ_gnomADg_MID_AF": 0.9048,
    "INFO_CSQ_gnomADg_NFE_AF": 0.9741,
    "INFO_CSQ_gnomADg_SAS_AF": 0.9621
  },
  "Existing": {
    "INFO_CSQ_PHENO": "1",
    "INFO_CSQ_dbSNP_RS": "rs3841266"
  },
  "uri": "chr1::1013466.0::T::TA::NM_005101.4"
}

observations=[single_observation.copy() for n in range(11)]

total=api.kafka.create_observations_batch(observations=observations,chunksize=15)

total=api.kafka.create_one_observations_batch(observations=observations)

Warning

The necessary fields for the variant observations are uri, origin, type, collection.

Variants of multiple types can be uploaded in the same request. Available types are: “SNV/INDEL”, “CNV”, “SV”

The values for collection and uri are not checked by the API. The user must provide these values at their own discretion.

Warning

Variant observation template is flexible and defined in a per project basis. The complete list of fields can be obtained using the method get_template().

Note

The function wait to all chunks was finished

Note

The method create_one_observations_batch dont split observations in chunks and dont wait to finish

Status¶

This example queries the status observations of the BatchId

from genomcore.client import GenomcoreApiClient

api = GenomcoreApiClient(token="A_VALID_TOKEN", refresh_token="A_VALID_REFRESH_TOKEN")

status=api.kafka.get_status(observationBatchId='67ffd46f709f4a3c230dfe02')

Note

The query return a dict with next format:

{: ‘id’: ‘67ffd46f709f4a3c230dfe02’, ‘projectId’: 550, ‘observationsTotal’: <Total number of observations in the batch>, ‘observationsCreatedTotal’: <Number of successfully created observations>, ‘observationsValidationFailedTotal’: <Number of observations that failed validation>, ‘observationsValidationFailed’: <List of observations that failed validation with detailed errors>, ‘observationsFailedTotal’: <Number of observations that failed to create>, ‘observationsFailed’: <List of observations that failed with their error messages>, ‘createdAt’: ‘2025-04-16T16:01:51.276Z’, ‘updatedAt’: ‘2025-04-16T16:01:51.344Z’, ‘status’: <string with the status: COMPLETED, FAILED, PENDING>

}

Wait¶

This method wait to all observation was finished

from genomcore.client import GenomcoreApiClient

api = GenomcoreApiClient(token="A_VALID_TOKEN", refresh_token="A_VALID_REFRESH_TOKEN")

status=api.kafka.wait()