HOWTO_Import_external_data.Rmd
COSMIC datasets can no longer be download without a registered account. Moreover their consistent size makes it more suitable as an external source to be downloaded seperately. Here we provide the instructure to download and format the file for usage with the TMBleR package.
Cosmic data is required for one of the mutation filters that removes known “cancer” mutation. This is achieved by setting remove.cancer = TRUE
when calling applyFilter()
. If you are planning to use the remove.cancer
filter, please follow the intructions to retrieve the COSMIC data and import it into TMBleR.
Login to COSMIC at https://cancer.sanger.ac.uk/cosmic/download
Select as Genome Version from the menu GRCh37 or GRCh38, according to the reference genome used to generate the input VCF file
Download the CosmicCodingMuts.vcf.gz
file, containing all coding COSMIC mutations.
Extract the .gz file:
If genome GRCh37
gzcat CosmicCodingMuts.vcf.gz > CosmicCodingMuts_hg19.vcf
If genome GRCh38
gzcat CosmicCodingMuts.vcf.gz > CosmicCodingMuts_hg38.vcf
Put extracted file in the data-raw folder that you find within this package (path_To_PackageInstallation/data-raw)
Run the function readCOSMIC()
to convert the COSMIC file from .vcf to .rda and load it into memory.
formatCOSMIC( input_file = "~/Downloads/CosmicCodingMuts.vcf"
, "hg19"
, output_file = "~/Downloads/COSMIC_hg19")
load the file into the memory using the load()
function
load("~/Downloads/COSMIC_hg19")
COSMIC_hg19 should now appear in the environment ls()
.