Document Actions

Mouse Atlas SAGE Library Help

Classification

Library Name
Each Library has a unique five character identifier associated with it. This identifier is unqiue to the Mouse Atlas of Gene Expression project.
Species
The species for the Mouse Atlas project is always Mus musculus, the mouse.
Strain
The Mouse Atlas project primarily uses C57BL/6J as the mouse strain. In the case of expression studies done on mutant mice, the strain sometimes varies, such as CD1.
Mutation
Most mice have no known mutations and are listed as Wild Type. There are a select set of libraries in the project that have been done on mutant mice.
Location
The region from which the tissue was taken.
Sex
Either Male or Female. In many cases the embryos used to construct the tissue are too immature to determine sex, in which case this field will be set to Unknown.
Age
The Age of the mice used. In general, an effort has been made to collect mice whose age also conforms to the approriate rate of development as outlined by the Theiler Stages.
Developmental Stage
In the case of the embryonic mouse, the Theiler Stage categorizations are used, as detailed on the embryonic mouse anatomical nomenclature emap web site.
Organism Condition
Detailing any special conditions of the mice used. For example, there are three adult mammary gland samples where this field will read either: Lactation, Involution or Pregnancy.

Sample Collection

Lab
The name of lab that performed the dissection and collection of the mouse tissue.
Number of Animals
In the case of the embronic mouse, or the case of very small regions of tissue, more than one mouse was pooled in the dissection of the tissue in order to gather enough RNA for use with the SAGE protocol. This number is the count of the number of mice in the pool.

Library Construction

SAGE Protocol
There are three different variations on the SAGE protocol that have been used for the Mouse Atlas libraries. SAGE is the original protocol, and is sometimes called ShortSAGE. This protocol uses an enzyme that results in tag lengths of 10 base pairs. LongSAGE uses an enzyme that results in a tag length of 17 base pairs. SAGELite-LongSAGE is the same as LongSAGE except there is an extra step to amplify the RNA so that the starting amount of RNA can be very small, as low as 50 nanograms. The Mouse Atlas project has primarly used LongSAGE and SAGELite-LongSAGE, with only a few of the earliest libraries in the project being constructed as normal SAGE libraries. When comparing expressions levels between libraries, it is important to consider the SAGE protocol, as there is a level bias all of the protocols that makes a comparison between say a LongSAGE and SAGELite-LongSAGE library undesirable.
Starting RNA
This is the amount of RNA that is extracted from the tissue sample, this number is always listed in nanograms.

Tag Extraction

Total Raw Tags (no quality cut-off)
The total number of tags in the library. No filtering of tags has been performed on this number and no sequence quality cut-off has been used.
Total Raw Tags at Q95
Q95 is short for Quality 95. This states that across the entire tag there is a 95% chance that there are no errors in any of the base calls. Quality is calculated as the average of the product of all (1-p)-value (represented by Phred quality scores) over the entire tag sequence. This calculation includes the quality scores of the cloning CATG site:
Sequence:    A     G     G     G
Phred:      20    30    30    40
p:        1E-2  1E-3  1E-3  1E-4
1-p:       .99  .999  .999 .9999

Quality: .9879 (98.8%)
The SAGE Pipeline used to extract tags is written in python is calculated as follows:
>>> # sample scores for a LongSAGE tag
>>> scores = [ 10, 20, 20, 21, 30, 21, 32, 21, 30, 21, 32, 30, 25, 32, 30, 25, 32, 20, 26, 27 ]
>>> total = 1.0
>>> for score in scores:
...   total *= ( 1.0 - 10 ** (-score/10.0) )
... 
>>> print total
0.831282972145
  
Total Raw Tags at Q99
The total number of tags at Quality 99 cut-off.
Unique Tags
The total number of different tag types seen across the library. This count is derived from the raw tags, so it will be somewhat higher than a count done on a tag set at a Q95 or Q99 cut-off.

Data

Library Metadata
This is a simple XML file that includes all of the data about the SAGE library. There is no formal DTD for this file. Each library XML file can be accessed at the following URL:
  http://www.mouseatlas.org/data/mouse/libraries/<Library Name>/library_metadata.xml
  
Tag Counts at Quality 0.95 cut-off
This is a tab-separated file of the tag counts of all tags in the SAGE library at a quality cut-off of Q95. This file is available at the following URL:
  http://www.mouseatlas.org/data/mouse/libraries/<Library Name>/plain_text/tagcounts_<Library Name>_Q95.txt
  
Tag Counts at Quality 0.99 cut-off
This is a tab-separated file of the tag counts of all tags in the SAGE library at a quality cut-off of Q99. This file is available at the following URL:
  http://www.mouseatlas.org/data/mouse/libraries/<Library Name>/plain_text/tagcounts_<Library Name>_Q99.txt
  
Clustered Tag Counts and P-Value
This is a tab-separated file of the tag sequence, tag count, and tag p-value of all tags after tag clustering has been performed on the library. See the Tag Clustering page for an explanation of what tag clustering is and the methodology used:
  http://www.mouseatlas.org/data/mouse/libraries/<Library Name>/plain_text/tagclustered_<Library Name>.txt