Data documentation - 3rd IMDC

This section provides detailed documentation of the data available for the 3rd IMDC, including table descriptions and other essential information for data analysis and modelling.

The data were uploaded to an FTP server. There are several ways to access the data on an FTP server. We will propose some here:

1- Using FileZilla

  1. Download the FileZilla app from the official website: https://filezilla-project.org/.
  2. Open the application, enter info.dengue.mat.br in the Host field, and click the button to connect.

If the message below appears to you, just click ok:

  1. Open the data_imdc_2026 folder to visualize the available datasets.

  1. To download all the datasets in the folder (described in detail in this document) right-click on the folder and click on the Download option:

2 - Using FTPWeb

  1. Open the link: https://www.ftpweb.com.br/index2.php and fill server= info.dengue.mat.br, user = anonymous and password= anonymous@domain.com.

  1. Open the data_imdc_2026 folder to visualize the available datasets and download them.

Inside the data_imdc_2026 folder, there are the following files:

  • Population: datasus_population_2001_2025.csv.gz,
  • Environmental: environ_vars.csv.gz,
  • Ocean temperature indicators: enso.csv.gz, iod.csv.gz, pdo.csv.gz,
  • Shapefile of the cities: shape_muni.gpkg,
  • Shapefile of the regional health divisions: shape_regional_health.gpkg,
  • Shapefile of the macroregional health divisions: shape_macroregional_health.gpkg,
  • Link between each city and its regional and health region and macroregion: map_regional_health.csv,
  • Weekly time series of dengue cases: dengue.csv.gz,
  • Weekly time series of chikungunya cases: chikungunya.csv.gz,
  • Weekly time series of climatic variables: climate.csv.gz.
  • Monthly time series of climate variable forecasts: climate_forecast.csv.gz.

Each of these datasets is described in detail below.

Dengue data

Period: epiweek 201001 to epiweek 2026101.

Aggregation: Dengue cases aggregated by the epidemiological week of dengue symptom onset and by municipality.

File: dengue.csv.gz.

Sources: from SINAN and IBGE, organized by Infodengue.

Note: Data from the state of ES may contain inaccuracies due to issues in the reporting process.

Table 1. Description of the columns in dengue.csv.gz

Column nameTypeDescription
dateYYYY-MM-DDFirst day of the epiweek (Sunday).
epiweekint (YYYYWW)Epidemiological week is defined by the date of symptom onset.
geocodeintIBGE’s municipality code.
casosintNumber of cases per week, classified as probable dengue cases2. This column is equivalent to the column casprov in the infodengue table available in the mosqlimate API.

regional_geocode3intHealth district code.
macroregional_geocode3intHealth macroregion code.
ufstrFederative Unit (state).
uf_codeintTwo-digit code associated with the Federative Unit (state). This value is required to submit predictions to the Mosqlimate platform.
target_cityboolBoolean flag indicating whether the row corresponds to one of the cities selected for the additional challenge in the 3rd IMDC edition.
train_1boolData for the first training (pre-season 22/23).
train_2boolData for the second training (pre-season 23/24).
train_3boolData for the third training (pre-season 24/25).
train_4boolData for the fourth training (pre-season 25/26).
target_1boolData for the first validation (season 22/23).
target_2booldata for the second validation (season 23/24).
target_3booldata for the third validation (season 24/25).
target_4booldata for the fourth validation (season 25/26). This column does not go up to epiweek 202640 as this data has not yet been reported. However, please send the forecasts for the whole period ([EW 41 2025- EW40 2026]), since by the end of the challenge, the data will be reported, and the forecasts can be evaluated.
diseasestrfilled with the value dengue

Chikungunya data

Period: epiweek 201401 to epiweek 2026101.

Aggregation: Chikungunya cases aggregated by the epidemiological week of dengue symptom onset and by municipality.

File: chikungunya.csv.gz.

Sources: from SINAN and IBGE, organized by Infodengue.

Note: Data from the state of ES may contain inaccuracies due to issues in the reporting process.

Table 1. Description of the columns in chikungunya.csv.gz

Column nameTypeDescription
dateYYYY-MM-DDFirst day of the epiweek (Sunday).
epiweekint (YYYYWW)Epidemiological week is defined by the date of symptom onset.
geocodeintIBGE’s municipality code.
casosintNumber of cases per week, classified as probable dengue cases2. This column is equivalent to the column casprov in the infodengue table available in the mosqlimate API.

regional_geocode3intHealth district code.
macroregional_geocode3intHealth macroregion code.
ufstrFederative Unit (state).
uf_codeintTwo-digit code associated with the Federative Unit (state). This value is required to submit predictions to the Mosqlimate platform.
target_cityboolBoolean flag indicating whether the row corresponds to one of the cities selected for the additional challenge in the 3rd IMDC edition.
train_1boolData for the first training (pre-season 22/23).
train_2boolData for the second training (pre-season 23/24).
train_3boolData for the third training (pre-season 24/25).
train_4boolData for the fourth training (pre-season 25/26).
target_1boolData for the first validation (season 22/23).
target_2booldata for the second validation (season 23/24).
target_3booldata for the third validation (season 24/25).
target_4booldata for the fourth validation (season 25/26). This column does not go up to epiweek 202640 as this data has not yet been reported. However, please send the forecasts for the whole period ([EW 41 2025- EW40 2026]), since by the end of the challenge, the data will be reported, and the forecasts can be evaluated.
diseasestrfilled with the value chikungunya

Climate — reanalysis

Reanalysis of hourly data from ERA5, summarized by week by the Mosqlimate project.

Period: epiweek 199952 to epiweek 2026114.

Aggregation: temperature, humidity, and precipitation, originally by hour, were first aggregated by day (min, max, mean), and these daily measures were aggregated by epidemiological week (mean).

File: climate.csv.gz.

Sources: Copernicus ERA5, organized by Mosqlimate.

Table 2. Description of the columns of climate.csv.gz. The daily values of these variables are available in the mosqlimate API. *Atmospheric pressures are given as if the place were at sea level.

Column nameTypeDescription
dateYYYY-MM-DDFirst day of the epiweek (Sunday).
epiweekint (YYYYWW)Epidemiological week.
geocodeintIBGE’s municipality code.
temp_minfloat (°C)Minimum temperature.
temp_medfloat (°C)Mean temperature.
temp_maxfloat (°C)Maximum temperature.
precip_minfloat (mm/h)Minimum precipitation rate.
precip_medfloat (mm/h)Average precipitation rate.
precip_maxfloat (mm/h)Maximum precipitation rate.
precip_totfloat (mm)Total precipitation.
pressure_minfloat (atm)Minimum daily sea level atmospheric pressure*.
pressure_medfloat (atm)Average atmospheric pressure*.
pressure_maxfloat (atm)Maximum atmospheric pressure*.
rel_humid_minfloat (%)Minimum relative humidity.
rel_humid_medfloat (%)Average relative humidity.
rel_humid_maxfloat (%)Maximum relative humidity.
thermal_rangefloat (°C)Difference between the daily maximum and minimum temperature averaged by week
rainy_daysintNumber of days in the week for which precip\_tot > 0.03.

Climate Forecast

Seasonal forecasts (up to six months ahead) of climate variables from Copernicus, generated using System 51 by the ECMWF center.

Period: January 2010–March 2026.

File: climate_forecast.csv.gz.

Sources: Copernicus.

Table 3. Description of the columns of climate_forecast.csv.gz.

Column nameTypeDescription
geocodeintIBGE’s municipality code.
reference_monthYYYY-MM-DDReference month.
forecast_months_aheadintThe number of months into the future relative to the reference month for which the forecast is made.
temp_medfloat (°C)Mean temperature.
precip_totfloat (mm)Total precipitation.
rel_humid_medfloat (%)Average relative humidity.

Ocean temperature and level oscillations

Period: 1993-01-04 — 2026-03-10 (weekly).

File: ocean_climate_oscillations.csv.gz.

Sources: https://sealevel.jpl.nasa.gov/.

Table 4. Description of the columns of ocean_climate_oscillations.csv.gz.

Column nameTypeDescription
dateYYYY-MM-DDWeek (starting on Monday).
ensofloatEl Niño-Southern Oscillation is a climate pattern in the Pacific Ocean that has two phases: El Niño and La Niña. In a normal year, in the Pacific Ocean, the trade winds blow westward along the Equator and push warm surface waters near Australia and Indonesia. On the other side of the Pacific Ocean, nutrient-rich cold waters come up off the coast of Central and South America, creating favorable conditions for fishing. During an El Niño event, the trade winds weaken, and warm, nutrient-poor waters are not pushed anymore by the winds, and sea level rises in the eastern tropical Pacific and falls in the western tropical Pacific. La Niña is the opposite phase of El Niño, with warm water piling up in the western Pacific and colder water in the eastern Pacific. This causes a higher sea level in the western tropical Pacific and a lower sea level in the eastern tropical Pacific.
iodfloatThe Indian Ocean Dipole. Is a climate pattern affecting the Indian Ocean. During a positive phase, warm waters are pushed to the Western part of the Indian Ocean, while cold deep waters are brought up to the surface in the Eastern Indian Ocean. This pattern is reversed during the negative phase of the IOD.
pdofloat

The Pacific Decadal Oscillation PDO. It is a long-term (10-20 year) oscillation of the Pacific Ocean in response to the changes in the atmosphere. During a warm (positive) phase, the response of the ocean to low atmospheric pressure over the Aleutian Islands causes ocean currents to bring warm waters in the Eastern Pacific Ocean and along the coast of North America, and cool nutrient-rich waters in the western Pacific Ocean. This leads to higher sea levels along the coastlines of the Northeast Pacific. During a cool (negative) phase, the Eastern Pacific Ocean becomes cooler and the Western Pacific Ocean becomes warmer. This leads to lower sea levels along the coastlines of the Northeast Pacific.

Environmental data

Environmental characteristics of the municipalities. Other variables can be aggregated as necessary.

Period: 2010 (koppen) and 2024 (biome).

File: environ_vars.csv.gz.

Sources: IBGE, Embrapa.

Table 5. Description of the columns of environ_vars.csv.gz.

Column nameTypeDescription
geocodeintIBGE’s municipality code.
uf_codeintIBGE’s state code.
koppenstrmain climate type
biomestrmain biome type .

Demographic data

Table 6. Geometry of cities in shape_muni.gpkg (source = IBGE).

Column nameTypeDescription
geocodeintIBGE’s municipality code.
geocode_namestrMunicipality name.
ufstrTwo-letter state name.
uf_codeintIBGE’s state code.
geometrygeometrymunicipality geometry.

Table 7. Geometry of the regional health divisions in shape_regional_health.gpkg (source = DATASUS).

Column nameTypeDescription
regional_geocodeintRegional health code.
regional_namestrRegional health name.
uf_codeintIBGE’s state code.
geometrygeometryRegional health geometry.

Table 8. Geometry of the macroregional health divisions in shape_macroregional_health.gpkg (source = DATASUS).

Column nameTypeDescription
macroregional_geocodeintMacrorregional health code.
macroregional_namestrMacrorregional health name.
ufstrTwo-letter state name.
uf_codeintIBGE’s state code.
geometrygeometryMacroregional health geometry.

Table 9. Link between each city and its regional and macroregional health center in map_regional_health.csv (source = IBGE).

Column nameTypeDescription
macroregion_codeintMacroregion code (1- Norte, 2- Nordeste, 3- Sudeste, 4 -Sul, 5 - Centro-Oeste).
macroregion_namestrMacroregion name.
uf_codeintIBGE’s state code.
ufstrTwo-letter state name.
uf_namestrState name.
macroregional_geocodeintMacrorregional health code.
macroregional_namestrMacrorregional health name.
regional_geocodeintRegional health code.
regional_namestrRegional health name.
geocodeintIBGE’s municipality code.
geocode_namestrMunicipality name.

Table 10. Population data (source: SVS). Files with population by city and year (2001 - 2025) in datasus_population_2001_2025.csv.gz

Column nameTypeDescription
geocodeintIBGE’s municipality code.
yearintYear (YYYY)
populationintPopulation of the city.

Additional datasets


  1. Note that the last weeks are subject to update as cases are still being reported. This data will be updated before the submission of the 2026 forecasts. ↩︎ ↩︎

  2. Case definition: Probable cases = Suspected cases - discarded cases. ↩︎ ↩︎

  3. Regional and Macroregional are the subdivisions used by the Ministry of Health. ↩︎ ↩︎ ↩︎ ↩︎

  4. This data will be updated before the submission of the 2026 forecasts. ↩︎

Previous editions