Use this portal to explore available datasets, search variables, and build a CSV-ready
selection list.
Secrecy
All data are subject to the General Data Protection Regulation (GDPR) and are covered
by secrecy in accordance with §7 of the Offentlighets- och sekretessförordningen (OSF)
and/or Chapter 24, §8 of the Offentlighets- och sekretesslagen (OSL; Public Access to
Information and Secrecy Act). Access to the data is strictly regulated, and any
request for disclosure is subject to a documented secrecy assessment
(sekretessprövning) before approval.
How to use the Avan Data Portal
The Avan Data Portal provides an overview of the variables available in the Avan data
lake. You can browse variables, read their descriptions, select the ones you need, and
export your final selection as a CSV file. Please ensure that all selected variables align
with the project description in your approved ethics application.
The portal currently includes variables from several national and regional data sources.
Additional datasets may be added in future updates as the PREDICT infrastructure
continues to grow.
Instructions
Select variables by clicking; all chosen variables are added to your list.
When your selection is complete, export it using the "Download CSV" button. This
downloads the CSV file to your computer.
This version of the portal is a temporary, simplified release. Your selection is only
saved for the current session. If you leave, close, or refresh the page, you will
need to redo your selections.
If you prefer to work offline — and avoid the risk of losing your selections —
you can download an Excel file containing all available variables. This allows you to
complete your variable selection locally on your computer.
In the Excel file, mark your selected variables by adding an “x” in the “Selection”
column.
When your selection is complete, attach the edited Excel file to your data application.
Dataset
Variable
Notes
Remove
National board of health and welfare
National Cancer Register
Data extract from the National Board of Health and Welfare (Socialstyrelsen)
This dataset is an extract from the National Cancer Register, covering the period 1958‑01‑01 to 2022‑03‑31. It includes all individuals in the PREDICT cohort who have relevant register records within this time span.
The official variable names differ from those used internally within Avan. These original register names are shown under Source variable name in the table below.
In addition to the variables originating from the register, Avan has enriched the dataset by deriving several convenience variables, such as clear‑text translations of medical code values at different hierarchical levels.
15
items
Select
Label
Variable name
Source variable name
Type
Categories
Date of birth
Extracted from pin
birthdate
birthdate
date
demography
Sex
Derived from pin
sex
sex
character
demography
Year of diagnosis
Calendar year of diagnosis
diagnosis_year
AR
integer
diagnosiscancer_diagnosis
Date of diagnosis
Where the complete date is unavailable (`diagnosis_date_original`), it has been extrapolated by letting it default to…
diagnosis_date
DIADAT_complete
date
diagnosiscancer_diagnosis
Age at diagnosis
Derived from `diagnosis_date` and `birthdate`.
age_diagnosis
age_diagnosis
double
demography
Tumour location according to ICD-O-3
Tumor location according to ICD-O-3. This is a four-character topographical code (except for the three-character code…
icdo3_code
ICDO3
character
diagnosiscancer_diagnosis
Tumour location according to ICD-O-2
Tumor location according to ICD-O-2. This is a four-character topographical code (except for the three-character code…
icdo2_code
ICDO10
character
diagnosiscancer_diagnosis
Tumour location according to ICD-9
Tumor location according to ICD-9-CM
icd9_code
ICD9
character
diagnosiscancer_diagnosis
Tumour location according to ICD-7
Tumor location according to ICD-7
icd7_code
ICD7
character
diagnosiscancer_diagnosis
Morphological diagnosis according to ICD-O-3
Tumor morphological diagnosis according to ICD-O-3
snomed3_code
SNOMED3
character
diagnosiscancer_diagnosis
Morphological diagnosis according to ICD-O-2
Tumor morphological diagnosis according to ICD-O-2
snomed2_code
SNOMEDO10
character
diagnosiscancer_diagnosis
Morphological diagnosis according to C24_1
Tumor morphological diagnosis according to C24.1
c24_code
PAD
character
diagnosiscancer_diagnosis
Malignant or benign tumour
Indicates if a tumor is malignant or benign. Further documentation is provided by the National Board of Health and…
malignancy_status
BEN
character
diagnosiscancer_diagnosis
Tumour location according to ICD-9 (3-chr)
Tumor location according to ICD-9-CM, at three-character level. Derived from `icd9_code` through truncation.
icd9_3chr_code
ICD9_3chr_code
character
diagnosiscancer_diagnosis
Tumour location according to ICD-9 (3-chr)
Tumor location according to ICD-9-CM, at three-character level. Derived from `icd9_3chr_code` through in-house…
icd9_3chr_text
ICD9_3chr_text
character
diagnosiscancer_diagnosis
No variables match the current filters.
National Cause of Death Register
Data extract from the National Board of Health and Welfare (Socialstyrelsen)
This dataset is an extract from the National Cause of Death Register, covering the period from 1958‑01‑01 to 2022‑03‑31. It includes all individuals in the PREDICT cohort who have relevant register records within this time span.
The official variable names differ from those used internally within Avan. These original register names are shown under Source variable name in the table below.
In addition to the variables originating from the register, Avan has enriched the dataset by deriving several convenience variables, such as clear‑text translations of medical code values at different hierarchical levels.
14
items
Select
Label
Variable name
Source variable name
Type
Categories
Date of birth
Extracted from pin
birthdate
birthdate
date
demography
Sex
Derived from pin
sex
sex
character
demography
Date of death (00 represents missing month and/or day)
'00' presumably represents missing month/and or day. Consequently, this column can not be managed as date type.
death_date_original
DODSDAT
character
death
Date of death
Where an exact date is unavailable (`death_date_original`), it has been extrapolated by letting missing months default…
death_date
DODSDAT_valid
date
deathdemography
Age at death
Derived from `death_date` and `birthdate`.
age_death
age_death
double
death
Year of death
Calendar year of death
death_year
AR
double
death
ICD-code version
ICD version used for coding causes of death in each record (e.g., ICD-9 or ICD-10)
icd_version
ICD
character
medical_code
Underlying cause of death
ICD-code
cod_underlying_code
ULORSAK
character
deathmedical_code
Underlying cause of death
Text description of `cod_underlying_code`, based on in-house translation using resources from the National Board of…
cod_underlying_text
ULORSAK_text
character
deathmedical_code
Underlying cause of death
Three-character ICD code, derived by truncating `cod_underlying_code`.
cod_underlying_3chr_code
ULORSAK_3chr_code
character
deathmedical_code
Underlying cause of death
Text description of `cod_underlying_3chr_code`, based on in-house translation using resources from the National Board…
cod_underlying_3chr_text
ULORSAK_3chr_text
character
deathmedical_code
Underlying cause of death
Top-level ICD interval code, derived from `cod_underlying_3chr_code` using in-house mapping, based on resources from…
cod_underlying_int_code
ULORSAK_int_code
character
deathmedical_code
Underlying cause of death
Text description of `cod_underlying_int_code`, based on in-house translation using resources from the National Board of…
cod_underlying_int_text
ULORSAK_int_text
character
deathmedical_code
Contributing cause(s) of death
ICD-coded contributing causes across up to 17 fields.
contributing_cod
MORSAK1-MORSAK17
deathmedical_code
No variables match the current filters.
National In Patient Register
Data extract from the National Board of Health and Welfare (Socialstyrelsen)
This dataset is an extract from the National In‑Patient Register (Slutenvård), covering the period from 1987‑01‑01 to 2022‑03‑31. It includes all individuals in the PREDICT cohort with relevant register records within this time span.
The official variable names differ from those used internally within Avan. These original register names are shown under Source variable name in the table below.
In addition to the variables originating from the register, Avan has enriched the dataset by deriving several convenience variables, such as clear‑text translations of medical code values at different hierarchical levels.
21
items
Select
Label
Variable name
Source variable name
Type
Categories
Date of birth
Extracted from pin
birthdate
birthdate
date
demography
Sex
Derived from pin
sex
sex
character
demography
Date of admission
Year of introduction: 1964
admission_date
INDATUM
date
care_episode
Date of discharge
Year of introduction: 1964
discharge_date
UTDATUM
date
care_episode
Year of discharge
Year of introduction: 1964
discharge_year
AR
double
care_episode
Age at admission
Derived from `admission_date` and `birthdate`.
admission_age
age_admission
double
demography
Age at discharge
Derived from `discharge_date` and `birthdate`.
discharge_age
age_discharge
double
demography
ICD-code version
Inferred from `diagnosis` using: 1. Year (<1997 = ICD-9), 2. Strict pattern matching, 3. Relaxed pattern matching, 4…
icd_version
HDIA_icd_v
character
medical_code
ICD-code validation
Best-effort validation of `diagnosis` ICD-code values using custom in-house methods.
icd_validation
check_icd_info
character
medical_code
Main diagnosis (ICD)
ICD code. Includes discontinued codes, format errors, and values not found in official ICD-9/10-SE lookup assets. Year…
diagnosis
HDIA
character
medical_codediagnosis
Main diagnosis (ICD)
Text description of `diagnosis`, based on in-house translation using resources from the National Board of Health and…
diagnosis_text
HDIA_text
character
medical_codediagnosis
Main diagnosis (ICD, 3-chr)
Three-character ICD code, derived by truncating `diagnosis`.
diagnosis_3chr_code
HDIA_3chr_code
character
medical_codediagnosis
Main diagnosis (ICD, 3-chr)
Text description of `diagnosis_3chr_code`, based on in-house translation using resources from the National Board of…
diagnosis_3chr_text
HDIA_3chr_text
character
medical_codediagnosis
Main diagnosis (ICD, interval)
Top-level ICD interval code, derived from `diagnosis` using in-house mapping, based on resources from the National…
diagnosis_int_code
HDIA_int_code
character
medical_codediagnosis
Main diagnosis (ICD, interval)
Text description of `diagnosis_int_code`, based on in-house translation using resources from the National Board of…
diagnosis_int_text
HDIA_int_text
character
medical_codediagnosis
Diagnosis
"ICD-code. The order of `diagnosis_nn` variables is not systematically meaningful across the dataset. Year of…
diagnosis_all
DIA1-DIA30
diagnosismedical_code
External cause
"ICD-code, Year of introduction: 1964."
external_cause_all
EKOD1-EKOD5
diagnosismedical_code
Procedure codes (KVA, space delimited list)
Multiple procedure codes, space delimited, using KOP-codes (1964-1996) or KVA-code (post 1996). Includes errors, such…
procedures
OP
character
medical_codeprocedure
Number of procedures recorded
Number of reported procedures; may exceed 30. Derived from `procedures`. Replaces the variable `OP_ANT` provided by the…
procedures_count
OP_count
double
medical_codeprocedure
Date of procedure
"Dates of the procedures listed in 'Procedure codes'"
procedure_date_all
OPD1-OPD29
medical_codeprocedure
Is reporting complete
Indicates whether the care episode has been finalized by the healthcare provider. The value `final` means reporting is…
report_complete
slutrapporterad
character
care_episode
No variables match the current filters.
National Out Patient Register
Data extract from the National Board of Health and Welfare (Socialstyrelsen)
This dataset is an extract from the National Out‑Patient Register (Öppenvård), covering the period from 2001‑01‑01 to 2022‑03‑31. It includes all individuals in the PREDICT cohort with relevant register records within this time span.
The official variable names differ from those used internally within Avan. These original register names are shown under Source variable name in the table below.
In addition to the variables originating from the register, Avan has enriched the dataset by deriving several convenience variables, such as clear‑text translations of medical code values at different hierarchical levels.
18
items
Select
Label
Variable name
Source variable name
Type
Categories
Date of birth
Extracted from pin
birthdate
birthdate
date
demography
Sex
Derived from pin
sex
sex
character
demography
Date of visit
Year of introduction: 2001
visit_date
INDATUM
date
care_episode
Year of visit
Year of introduction: 2001
visit_year
AR
double
care_episode
Age at visit
Derived from `visit_date` and `birthdate`.
age_visit
age_visit
double
demography
ICD-code version
Inferred from context; all records are dated 2001 and later.
icd_version
HDIA_icd_v
character
medical_code
ICD-code validation
Best-effort validation of `diagnosis` ICD-code values using custom in-house methods.
icd_validation
check_icd_info
character
medical_code
Main diagnosis (ICD)
ICD code. Includes discontinued codes, and othervalues not found in official ICD-10-SE lookup assets. Year of…
diagnosis
HDIA
character
medical_codediagnosis
Main diagnosis (ICD)
Text description of `diagnosis`, based on in-house translation using resources from the National Board of Health and…
diagnosis_text
HDIA_text
character
medical_codediagnosis
Main diagnosis (ICD, 3-chr)
Three-character ICD code, derived by truncating `diagnosis`.
diagnosis_3chr_code
HDIA_3chr_code
character
medical_codediagnosis
Main diagnosis (ICD, 3-chr)
Text description of `diagnosis_3chr_code`, based on in-house translation using resources from the National Board of…
diagnosis_3chr_text
HDIA_3chr_text
character
medical_codediagnosis
Main diagnosis (ICD, interval)
Top-level ICD interval code, derived from `diagnosis` using in-house mapping, based on resources from the National…
diagnosis_int_code
HDIA_int_code
character
medical_codediagnosis
Main diagnosis (ICD, interval)
Text description of `diagnosis_int_code`, based on in-house translation using resources from the National Board of…
diagnosis_int_text
HDIA_int_text
character
medical_codediagnosis
Diagnosis
"ICD-code. The order of `diagnosis_nn` variables is not systematically meaningful across the dataset. Year of…
diagnosis_all
DIA1-DIA30
diagnosismedical_code
External cause
"ICD-code, Year of introduction: 1964."
external_cause_all
EKOD1-EKOD5
diagnosismedical_code
Procedure codes (KVA, space delimited list)
Multiple procedure codes, space delimited, using KVA-code (post 1996). Year of introduction: 2001
procedures
OP
character
medical_codeprocedure
Number of procedures recorded
Number of reported procedures; may exceed 30. Derived from `procedures`. Replaces the variable `OP_ANT` provided by the…
procedures_count
OP_count
double
medical_codeprocedure
Is reporting complete
Indicates whether the care episode has been finalized by the healthcare provider. The value `final` means reporting is…
report_complete
slutrapporterad
character
care_episode
No variables match the current filters.
National Prescribed Drug Register
Data extract from the National Board of Health and Welfare (Socialstyrelsen)
This dataset is an extract from the National Prescribed Drug Register (Läkemedelsregistret), covering the period 2005‑01‑01 to 2022‑03‑31. It includes all individuals in the PREDICT cohort who have recorded dispensings during this time span.
The official variable names differ from those used internally within Avan. These original register names are shown under Source variable name in the table below.
In addition to the variables originating from the register, Avan has enriched the dataset by deriving several convenience variables, such as clear‑text translations of medical code values at different hierarchical levels.
Negative dispensing counts:
The register uses negative values of ANTAL (dispensed_package_count) to represent corrections to earlier erroneous records. To support users in identifying such corrections, Avan provides a set of convenience variables — Candidate Corresponding Erroneous Post (ccep) — derived using in‑house algorithms.
14
items
Select
Label
Variable name
Source variable name
Type
Categories
Date of birth
Extracted from pin
birthdate
birthdate
date
demography
Sex
Derived from pin
sex
sex
character
demography
Anatomical Therapeutic Chemical (ATC 7-chr)
ATC code according to the World Health Organization (WHO) (7 characters; 5th level - chemical substance). This is a…
atc_code
ATC
character
medical_codedrug_dispensingatc
Chemical substance (ATC 7-chr)
Text description of `atc_code`, based on in-house translation using the ATC index from the WHO Collaborating Centre for…
atc_text
ATC_text
character
medical_codedrug_dispensingatc
Anatomical Therapeutic Chemical (ATC 3-chr)
ATC code according to the World Health Organization (WHO) (3 characters; 2nd level - therapeutic subgroup). This is a…
atc_3chr_code
ATC_3chr_code
character
medical_codedrug_dispensingatc
Therapeutic subgroup (ATC 3-chr)
Text description of `atc_3chr_code`, based on in-house translation using the ATC index from the WHO Collaborating…
atc_3chr_text
ATC_3chr_text
character
medical_codedrug_dispensingatc
Date of drug dispensing
Date of drug dispensing
dispensing_date
EDATUM
date
drug_dispensing
Age at drug dispensing
Age at drug dispensing
age_dispensing
age_dispensing
double
drug_dispensing
Number of packages dispensed
Number of packages dispensed to the patient. For dose-dispensed medications, this value may be a decimal. Negative…
dispensed_package_count
ANTAL
double
drug_dispensing
Year of drug dispensing
Year of drug dispensing
dispensing_year
ar
double
drug_dispensing
Daily Defined Dose
Defined Daily Doses (DDD) per package. Indicates how many Defined Daily Doses the package contains (number of treatment…
package_ddd
forpddd
double
drug_dispensing
Candidate Corresponding Erroneous Post
Identifier of the Candidate Corresponding Erroneous Post (CCEP) for negative dispensing events. For each negative…
ccep
ccep
character
drug_dispensing
NA
Number of days between the negative dispensing event and its matched CCEP. This value allows users to assess the…
ccep_datediff
ccep_datediff
double
drug_dispensing
NA
Flag indicating whether a post has been identified as a CCEP. All negative `dispensed_package_count` entries and…
ccep_filter
ccep_filter
factor
drug_dispensing
No variables match the current filters.
Västerbotten Intervention Programme (VIP)
NSHDS Questionnaires
NSHSA Questionnaire Responses, Blood Measurements, and Anthropometric Data on all Participants in the Predict Cohort.
This dataset is an extract from the Northern Sweden Health and Disease Study (NSHDS). It includes all individuals in the PREDICT cohort who have corresponding NSHDS records.
The official NSHDS variable names differ from those used internally within Avan. These original names are shown under Source variable name in the table below.
This data extract includes only a subset of NSHDS variables (VIP and MONICA). If you wish to order NSHDS variables not included here, please refer to the NSHDS access information at the link above.
NSHDS maintains its own variable order forms (in Swedish and English), which provide detailed variable descriptions and context and are widely used in NSHDS‑related projects. To order variables from this dataset, you may choose either the selection tool below or the official NSHDS order form, depending on your preference.
If you use the selection tool, all chosen variables across data sources are included in a single CSV file, which you submit with your application.
If you use the official NSHDS order form, please submit both the generated CSV file and the completed order form: