Cancer registry information

Cancer registry data is amalgamated from multiple sources. Data structures may differ slightly between the sources and researchers are advised to treat the merged data with care. In particular, the following anomalies have been noted; work to address these is planned for the coming months.

Morphology (histology and behaviour codes)

The dataset presents histology and behaviour codes as two separate fields (e.g. histology in Field 40011 = "8120" and behaviour in Field 40012 = "3"). However, the ICD-O system gives meanings for a combined morphology code (e.g. "8120/3"). While meanings for histology codes are provided in Data-Coding 38, these generally refer to the malignant version (where applicable) and may not reflect the actual diagnosis. For example, histology code "8120" is interpreted in the data-coding as "Transitional cell carcinoma", while the full morphology code in ICD-O indicates four different meanings depending on the behaviour code attached:

Researchers should be careful to always refer to the ICD-O (3rd revision) meaning for the combined morphology code rather than relying on Data-Coding 38 alone.

Please note that ICD-O directs coders to use the appropriate behaviour code even where the resulting combined term does not appear in the ICD-O guide; for example, a code of "9000/2" could be used for "Brenner tumor in situ," if such an entity were to identified, even though this does not appear in the code lists.

Redacted morphology codes

In some cases, errors in either the histology or behaviour codes have been noted during processing. As these are best interpreted as a single morphology code (see above), behaviour codes are redacted where the accompanying histology code was missing or invalid, and vice versa.


As the data is amalgamated from different providers, caution must be exercised when interpreting multiple cancer records for the same participant. In some cases multiple records will reflect multiple diagnoses, but we cannot exclude the possibility of this sometimes including (pseudo)duplicates between the different providers. While exact duplicates are excluded, two records may actually be duplicates with a minor variation in date, morphology, etc. A diagnosis may also have been recorded with an ICD-9 code from one provider and then converted to ICD-10 in another, resulting in duplication.

In particular, the record counts in Field 40009 should not be treated as the number of distinct cancer diagnoses per patients, since they reflect simply the number of records (which may have pseudoduplicates as mentioned above).