Update of HES Data - September 2019
Changes to the structure of the data
The structure of the Hospital Inpatient Data, as accessed via the Data Portal, has been changed in the latest Showcase Update.
Researchers requiring access to the record-level HES data can now select data-fields on Showcase that give access to complete tables rather than specific columns within those tables.
Researchers who have already been approved to access the record-level HES data will retain access to all HES data they have been previously been approved to access, but may be able to view additional columns within the HES tables.
Details of the new structure can be found in the Hospital Inpatient Data Dictionary (Resource 141140) in Category 2000.
There is however almost no new data available (see below), and so researchers who have already downloaded Hospital Inpatient data are unlikely to find it worth their while to download it again this time around. The changes we have made are in preparation for a more substantial update in the future.
The Data Portal has also been modified to make downloading complete tables easier (via the Table Download tab). Please see the Accessing UK Biobank Data guide for details: Accessing data guide.
A summary of the main changes that have been made are:
A small amount of data that was not released in March due to concerns about the data linkage has been reinstated or added.
The record_id field has been removed. Instead the separate records for a particular participant are distinguished using the new ins_index (instance index). Hence, each record is now uniquely identified by the combination of the eid and ins_index.
The tables available through the Data Portal have been restructured. In summary:
Fields related only to maternity or administrative aspects of psychiatry have been split into separate tables. General information about the admission, including dates, are still contained in the main hesin table.
All diagnoses are now in the table hesin_diag (including both ICD-9 and ICD-10 codings) unlike previously where the main/primary diagnosis appeared in the main hesin table and secondary diagnoses were split into other child tables. The external cause ICD-10 code has also been placed in this table. The different types of diagnoses/causes are distinguished by a new field called level, which takes value 1 for a main diagnosis, 2 for a secondary diagnosis and 3 for an external cause.
The operation/procedure codes are similarly now all contained in a single table hesin_oper, with main operations given level=1 and secondary operations level=2.
The changes to the structure outlined above mean joins between tables will be necessary to link (for example) diagnoses to date information (the former appearing in the hesin_diag table and the latter in the main hesin table). Tables are joined by matching on the combination of eid and ins_index. Examples of using SQL to produce such joins are included in the updated Data Access Guide which can be found on the Accessing Your Data page.
The date of birth field for newborns (dobbaby) is no longer generally available. A new table hesin_delivery now contains data related to newborns, but which does not include this restricted field. The table hesin_birth, which is a restricted version of hesin_delivery and does contain the dobbaby field, will only be made available to researchers on request where justification is provided.
Note that joining a record in hesin_delivery with the episode/admission date information in the main hesin table (via the eid and ins_index fields) will, in the vast majority of cases, provide a good approximation of the date of birth of a child.
Some issues we have noted in the data related to birth records
A number of issues have come to our attention regarding birth records (on hesin_delivery and hesin_birth).NHS Digital have informed us that the neocare field should only have a value other than "None" in cases where the patient concerned is a newborn child. Hence, this field is not relevant for inpatient admissions relating to a UK Biobank participant. As a result, the neocare field has been removed.
We became aware of a number of cases where obviously incorrect information had been entered into the dobbaby and sexbaby fields by the data provider. We have blanked out this erroneous data in the affected records. In the majority of cases all other hesin_birth data is coded as "unknown" (usually a 9, X or 9999), and we suspect that many of them do not correspond to genuine births.
There are a number of cases where excessive birth records exist in hesin_delivery/hesin_birth which do not appear to correspond to real births. For example, the birth of a single child is reported in the numbaby field but six birth records exists all showing a child of the same sex and birth weight, suggesting these are all repetitions of the details for the single child that was actually born. In other cases, we sometimes see all information in the five birth records after the first being “unknown”.
We hope to correct the second two anomalies in the next release, but for the moment we recommend researchers proceed with caution when working with these records, and use other supporting information such as whether other relevant fields have known values, the value of the epitype field, the ICD-10 codes for the episode and information contained within nearby or coincident hospital episodes to determine whether a particular hesin_birth record genuinely indicates a real birth.
These records can also be cross-referenced against information on self-reported births contained in fields in category 100069.