Avan Data Portal

Use this portal to explore available datasets, search variables, and build a CSV-ready selection list.

Secrecy

All data are subject to the General Data Protection Regulation (GDPR) and are covered by secrecy in accordance with §7 of the Offentlighets- och sekretessförordningen (OSF) and/or Chapter 24, §8 of the Offentlighets- och sekretesslagen (OSL; Public Access to Information and Secrecy Act). Access to the data is strictly regulated, and any request for disclosure is subject to a documented secrecy assessment (sekretessprövning) before approval.

How to use the Avan Data Portal

The Avan Data Portal provides an overview of the variables available in the Avan data lake. You can browse variables, read their descriptions, select the ones you need, and export your final selection as a CSV file. Please ensure that all selected variables align with the project description in your approved ethics application.

The portal currently includes variables from several national and regional data sources. Additional datasets may be added in future updates as the PREDICT infrastructure continues to grow.

Instructions

  • Select variables by clicking; all chosen variables are added to your list.
  • When your selection is complete, export it using the "Download CSV" button. This downloads the CSV file to your computer.
  • Include the CSV file with your data application submitted to research.predict@umu.se.

Important

This version of the portal is a temporary, simplified release. Your selection is only saved for the current session. If you leave, close, or refresh the page, you will need to redo your selections.


If you prefer to work offline — and avoid the risk of losing your selections — you can download an Excel file containing all available variables. This allows you to complete your variable selection locally on your computer.

  • Download the Excel version of the variable list here: Download Excel file
  • In the Excel file, mark your selected variables by adding an “x” in the “Selection” column.
  • When your selection is complete, attach the edited Excel file to your data application.

National Cancer Register

Data extract from the National Board of Health and Welfare (Socialstyrelsen)

This dataset is an extract from the National Cancer Register, covering the period 1958‑01‑01 to 2022‑03‑31. It includes all individuals in the PREDICT cohort who have relevant register records within this time span.

National Cause of Death Register

Data extract from the National Board of Health and Welfare (Socialstyrelsen)

This dataset is an extract from the National Cause of Death Register, covering the period from 1958‑01‑01 to 2022‑03‑31. It includes all individuals in the PREDICT cohort who have relevant register records within this time span.

  • Official site:
    https://www.socialstyrelsen.se/statistik-och-data/register/dodsorsaksregistret/

  • Date of extraction: 2025‑02‑24

  • Notes:

    • The official variable names differ from those used internally within Avan. These original register names are shown under Source variable name in the table below.
    • In addition to the variables originating from the register, Avan has enriched the dataset by deriving several convenience variables, such as clear‑text translations of medical code values at different hierarchical levels.

National In Patient Register

Data extract from the National Board of Health and Welfare (Socialstyrelsen)

This dataset is an extract from the National In‑Patient Register (Slutenvård), covering the period from 1987‑01‑01 to 2022‑03‑31. It includes all individuals in the PREDICT cohort with relevant register records within this time span.

National Out Patient Register

Data extract from the National Board of Health and Welfare (Socialstyrelsen)

This dataset is an extract from the National Out‑Patient Register (Öppenvård), covering the period from 2001‑01‑01 to 2022‑03‑31. It includes all individuals in the PREDICT cohort with relevant register records within this time span.

National Prescribed Drug Register

Data extract from the National Board of Health and Welfare (Socialstyrelsen)

This dataset is an extract from the National Prescribed Drug Register (Läkemedelsregistret), covering the period 2005‑01‑01 to 2022‑03‑31. It includes all individuals in the PREDICT cohort who have recorded dispensings during this time span.

NSHDS Questionnaires

NSHSA Questionnaire Responses, Blood Measurements, and Anthropometric Data on all Participants in the Predict Cohort.

This dataset is an extract from the Northern Sweden Health and Disease Study (NSHDS). It includes all individuals in the PREDICT cohort who have corresponding NSHDS records.

  • Official site:
    https://www.umu.se/en/brs/provsamlingar-och-register/nshds/

  • Date of extraction: 2024‑02‑01

  • Notes:

    • The official NSHDS variable names differ from those used internally within Avan. These original names are shown under Source variable name in the table below.
    • This data extract includes only a subset of NSHDS variables (VIP and MONICA). If you wish to order NSHDS variables not included here, please refer to the NSHDS access information at the link above.
    • Further descriptions of NSHDS:
      - Cohort profile (2025)
      - Descriptive statistics

Important

NSHDS maintains its own variable order forms (in Swedish and English), which provide detailed variable descriptions and context and are widely used in NSHDS‑related projects. To order variables from this dataset, you may choose either the selection tool below or the official NSHDS order form, depending on your preference.

  • If you use the selection tool, all chosen variables across data sources are included in a single CSV file, which you submit with your application.
  • If you use the official NSHDS order form, please submit both the generated CSV file and the completed order form: