[rat-forum] Announcing change to RGD GENES ftp file
Kowalski, George
gkowalski at mcw.edu
Tue May 8 15:00:39 CDT 2007
All,
Because of internal database changes, and inconsistencies found while
trying to read the rat GENES file into Biomart, RGD has stopped updating
this file effective immediately. This GENES file is currently located
at:
ftp://rgd.mcw.edu/pub/data_release/GENES
<ftp://rgd.mcw.edu/pub/data_release/GENES>
This file will be available at the site above for at least two months,
at which time it will be removed.
We are now be generating three new files:
ftp://rgd.mcw.edu/pub/data_release/GENES_RAT
<ftp://rgd.mcw.edu/pub/data_release/GENES_RAT>
ftp://rgd.mcw.edu/pub/data_release/GENES_HUMAN
<ftp://rgd.mcw.edu/pub/data_release/GENES_HUMAN>
ftp://rgd.mcw.edu/pub/data_release/GENES_MOUSE
<ftp://rgd.mcw.edu/pub/data_release/GENES_MOUSE>
The format of these files is similar to that of GENES file except for a
number of small changes. These changes are:
1) Broken the starting and stopping positions into separate fields - the
old data were not machine readable if one of these fields was not
present.
Removed:
START_POS, STOP_POS fields
and the data has been moved into the fields:
START_POS_31 STOP_POS_31 START_POS_34 STOP_POS_34
2) We've added the fields: STRAND_31 and STRAND_34 containing the
strand information.
3) The SWISSPROT_ID field has now been renamed to the UNIPROT_ID field .
4) Changed the following field names for consistency; now they don't
contain spaces (for human readability) :
The "ENTREZ GENE" field is now ENTREZ_GENE
The "GDB ID" field is now GDB_ID
3) Removed the following fields as we are generating separate files for
Human and Mouse:
MOUSE_HOMOLOG_RGD_ID
MOUSE_HOMOLOG_SYMBOL
MOUSE_HOMOLOG_NAME
MOUSE_CHROMOSOME
MGD_ID
HUMAN_HOMOLOG_RGD_ID
HUMAN_HOMOLOG_SYMBOL
HUMAN_HOMOLOG_NAME
HUMAN_CHROMOSOME
Below is a complete list of fields in the new files and in the current
file.
Fields in the new files:
GENE_RGD_ID
SYMBOL
NAME
GENE_DESC
CHROMOSOME
FISH_BAND
START_POS_31
STOP_POS_31
STRAND_31
START_POS_34
STOP_POS_34
STRAND_34
CURATED_REF_RGD_ID
CURATED_REF_PUBMED_ID
UNCURATED_PUBMED_ID
RATMAP_ID
ENTREZ_GENE
UNIPROT_ID
RHDB_ID
UNCURATED_REF_MEDLINE_ID
GENBANK_NUCLEOTIDE
TIGR_ID
GENBANK_PROTEIN
UNIGENE_ID
GDB_ID
SSLP_RGD_ID
SSLP_SYMBOL
ALIAS_VALUE
ALIAS_TYPES
QTL_RGD_ID
QTL_SYMBOL
NOMENCLATURE_STATUS
SPLICE_RGD_ID
SPLICE_SYMBOL
GENE_TYPE
ENSEMBL_ID
Fields in the current file ( GENES ) :
GENE_RGD_ID
SYMBOL
NAME
GENE_DESC
CHROMOSOME
FISH_BAND
START_POS
STOP_POS
CURATED_REF_RGD_ID
CURATED_REF_PUBMED_ID
UNCURATED_PUBMED_ID
RATMAP_ID
ENTREZ GENE
SWISSPROT_ID
RHDB_ID
UNCURATED_REF_MEDLINE_ID
GENBANK_NUCLEOTIDE
TIGR_ID
GENBANK_PROTEIN
UNIGENE_ID
MOUSE_HOMOLOG_RGD_ID
MOUSE_HOMOLOG_SYMBOL
MOUSE_HOMOLOG_NAME
MOUSE_CHROMOSOME
MGD_ID
HUMAN_HOMOLOG_RGD_ID
HUMAN_HOMOLOG_SYMBOL
HUMAN_HOMOLOG_NAME
HUMAN_CHROMOSOME
GDB ID
SSLP_RGD_ID
SSLP_SYMBOL
ALIAS_VALUE
ALIAS_TYPES
QTL_RGD_ID
QTL_SYMBOL
NOMENCLATURE_STATUS
SPLICE_RGD_ID
SPLICE_SYMBOL
GENE_TYPE
ENSEMBL_ID
We will be updating these new extracts weekly. Please contact me if you
have any questions in regards to this file. I am also on the rats-forum
email list.
George Kowalski
Medical College of Wisconsin - Project Lead RGD Database
Human and Molecular Genetics Center
414.456.5746 gkowalski at mcw.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gray.hmgc.mcw.edu/pipermail/rat-forum/attachments/20070508/baf99043/attachment.html>
More information about the rat-forum
mailing list