Variants Batch¶
The VariantsBatch component of the Genomcore API allows you to store and query genetic variant observations.
Get more information about the Variant API here
Create¶
Create variant observations.The following code starts the sdk, the logging module and loads an .env file. Assumes that the user has a list of observations(list of dictionaries) and sends
Reads a putative csv file with the necessary columns. Construct the list of dictionaries required and send the observations to the Variant API in chunks of 300 observations.
from dotenv import load_dotenv
from genomcore.client import GenomcoreApiClient
api = GenomcoreApiClient(
token=os.getenv("TOKEN"),
refresh_token=os.getenv("REFRESH_TOKEN")
)
single_observation={
"collection":"kafka",
"Position": {
"VCF_Genome": "hg38",
"VCF_Chr": "chr1",
"VCF_Position": 1013466
},
"Genotype": {
"VCF_Allele_REF": "T",
"VCF_Allele_ALT": "TA",
"INFO_CALC_Genotype": "T>TA/TA",
"VCF_FORMAT_GT": "1|1",
"VCF_FORMAT_GQ": 99,
"VCF_FORMAT_PL": "2246,156,0",
"VCF_FORMAT_PS": 1013466,
"VCF_INFO_AC": "2",
"VCF_INFO_AF": "1",
"VCF_INFO_AN": 2,
"INFO_CSQ_Allele": "A",
"INFO_CSQ_ZYG": "HOM",
"INFO_CSQ_ALLELE_NUM": "1"
},
"Calling_statistics": {
"VCF_Quality": 2232.03,
"VCF_Filter": "PASS",
"VCF_FORMAT_DP": 52,
"INFO_CALC_Read_depth_REF": 0,
"INFO_CALC_Read_depth_ALT": 52,
"INFO_CALC_Read_depth": 52,
"INFO_CALC_Read_percentage_REF": 0,
"INFO_CALC_Read_percentage_ALT": 100,
"VCF_INFO_DP": 60,
"VCF_INFO_MQ": 60
},
"origin": "GERMLINE",
"type": "SNV/INDEL",
"Feature": {
"INFO_CSQ_ENSConsequence": "upstream_gene_variant",
"INFO_CSQ_ENSFeature_type": "Transcript",
"INFO_CSQ_ENSFeature": "ENST00000649529",
"INFO_CSQ_ENSCANONICAL": "YES",
"INFO_CSQ_PUBMED": "25741868",
"INFO_CSQ_Consequence": "upstream_gene_variant",
"INFO_CSQ_IMPACT": "MODIFIER",
"INFO_CSQ_Feature_type": "Transcript",
"INFO_CSQ_Feature": "NM_005101.4",
"INFO_CSQ_BIOTYPE": "protein_coding",
"INFO_CSQ_DISTANCE": "30",
"INFO_CSQ_STRAND": "1"
},
"Gene": {
"INFO_CSQ_SYMBOL": "ISG15",
"INFO_CSQ_Gene": "ENSG00000187608",
"INFO_CSQ_SYMBOL_SOURCE": "HGNC",
"INFO_CSQ_HGNC_ID": "HGNC:4053"
},
"External_databases": {
"INFO_CSQ_SWISSPROT": "P05161.238",
"INFO_CSQ_UNIPARC": "UPI0000048D70"
},
"Clinical_significance": {
"INFO_CSQ_GENE_PHENO": "1",
"INFO_IV_ACMG": "Benign"
},
"Population_AF": {
"INFO_CSQ_1KG_ALL_AF": "0.8544",
"INFO_CSQ_1KG_AFR_AF": 0.5575,
"INFO_CSQ_1KG_AMR_AF": 0.9366,
"INFO_CSQ_1KG_EAS_AF": 0.9464,
"INFO_CSQ_1KG_EUR_AF": 0.9791,
"INFO_CSQ_1KG_SAS_AF": 0.9744,
"INFO_CSQ_gnomADe_AF": 0.959,
"INFO_CSQ_gnomADe_AFR_AF": 0.5708,
"INFO_CSQ_gnomADe_AMR_AF": 0.9506,
"INFO_CSQ_gnomADe_ASJ_AF": 0.9217,
"INFO_CSQ_gnomADe_EAS_AF": 0.9667,
"INFO_CSQ_gnomADe_FIN_AF": 0.985,
"INFO_CSQ_gnomADe_NFE_AF": 0.9743,
"INFO_CSQ_gnomADe_SAS_AF": 0.9567,
"INFO_CSQ_gnomADg_AF": 0.8635,
"INFO_CSQ_gnomADg_AFR_AF": 0.5936,
"INFO_CSQ_gnomADg_AMI_AF": 0.9945,
"INFO_CSQ_gnomADg_AMR_AF": 0.931,
"INFO_CSQ_gnomADg_ASJ_AF": 0.928,
"INFO_CSQ_gnomADg_EAS_AF": 0.9509,
"INFO_CSQ_gnomADg_FIN_AF": 0.9859,
"INFO_CSQ_gnomADg_MID_AF": 0.9048,
"INFO_CSQ_gnomADg_NFE_AF": 0.9741,
"INFO_CSQ_gnomADg_SAS_AF": 0.9621
},
"Existing": {
"INFO_CSQ_PHENO": "1",
"INFO_CSQ_dbSNP_RS": "rs3841266"
},
"uri": "chr1::1013466.0::T::TA::NM_005101.4"
}
observations=[single_observation.copy() for n in range(11)]
total=api.kafka.create_observations_batch(observations=observations,chunksize=15)
total=api.kafka.create_one_observations_batch(observations=observations)
Warning
The necessary fields for the variant observations are uri, origin, type, collection.
Variants of multiple types can be uploaded in the same request. Available types are: “SNV/INDEL”, “CNV”, “SV”
The values for collection and uri are not checked by the API. The user must provide these values at their own discretion.
Warning
Variant observation template is flexible and defined in a per project basis. The complete list of fields can be obtained using the method get_template().
Note
The function wait to all chunks was finished
Note
The method create_one_observations_batch dont split observations in chunks and dont wait to finish
Status¶
This example queries the status observations of the BatchId
from genomcore.client import GenomcoreApiClient
api = GenomcoreApiClient(token="A_VALID_TOKEN", refresh_token="A_VALID_REFRESH_TOKEN")
status=api.kafka.get_status(observationBatchId='67ffd46f709f4a3c230dfe02')
Note
- The query return a dict with next format:
- {
‘id’: ‘67ffd46f709f4a3c230dfe02’, ‘projectId’: 550, ‘observationsTotal’: <Total number of observations in the batch>, ‘observationsCreatedTotal’: <Number of successfully created observations>, ‘observationsValidationFailedTotal’: <Number of observations that failed validation>, ‘observationsValidationFailed’: <List of observations that failed validation with detailed errors>, ‘observationsFailedTotal’: <Number of observations that failed to create>, ‘observationsFailed’: <List of observations that failed with their error messages>, ‘createdAt’: ‘2025-04-16T16:01:51.276Z’, ‘updatedAt’: ‘2025-04-16T16:01:51.344Z’, ‘status’: <string with the status: COMPLETED, FAILED, PENDING>
}
Wait¶
This method wait to all observation was finished
from genomcore.client import GenomcoreApiClient
api = GenomcoreApiClient(token="A_VALID_TOKEN", refresh_token="A_VALID_REFRESH_TOKEN")
status=api.kafka.wait()