Import FluSight forecast

The import_flusight_csv function uses several helper functions to convert FluSight-specific formatted CSV forecasts. The file name should contain the MMWR epidemiological week assigned to the forecast (e.g. “EW42”) followed by a dash (“-”), the team name (e.g. “Hist_Avg”) followed by a dash (“-”), and the submission date as an ISO data (YYYY-MM-DD, e.g. “2018-10-29”).

This import process creates a special, embedded “predx_df” data frame (a tbl_df, “tibble” object) with a row for every individual prediction (defined by target, location, and prediction type), a column predx_class that defines the class of each prediction, and a column predx that is a list of predx objects. In the process of importation, all predictions are validated (e.g. binned predictions must sum to 1.0) or (hopefully useful) errors messages are returned in the predx column.

For this example, we use a submission with only national-level forecasts.

Verify expected predictions are included

The verify_expected function can be used to check that expected targets are included in the predx_df. For example, the national-level FluSight forecasts include forecasts for seven targets and 11 locations. The percentage forecasts (e.g. “Season peak percentage”, “1 wk ahead”) are classified as BinLwr predictions and the week targets (“Season onset”, “Season peak week”) are classifed as BinCat predictions. This set can be specified in a list with one list for each combination of required characteristics. For example, the target “Season peak percentage”, should have a prediction for each location and each predx bin as specified in the first element of flusight_ilinet_expected(), below.

list(
  list(
    target = c("Season peak percentage", "1 wk ahead", "2 wk ahead", 
      "3 wk ahead", "4 wk ahead"),
    location = c("HHS Region 1", "HHS Region 10", "HHS Region 2", "HHS Region 3",
        "HHS Region 4", "HHS Region 5", "HHS Region 6", "HHS Region 7",
        "HHS Region 8", "HHS Region 9", "US National"),
    predx_class = "BinLwr",
    lwr = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 
      1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 
      2.6, 2.7, 2.8, 2.9, 3, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 
      3.9, 4, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5, 5.1, 
      5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6, 6.1, 6.2, 6.3, 6.4, 
      6.5, 6.6, 6.7, 6.8, 6.9, 7, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 
      7.8, 7.9, 8, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9, 
      9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10, 10.1, 10.2, 
      10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 11, 11.1, 11.2, 11.3, 
      11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 12, 12.1, 12.2, 12.3, 12.4, 
      12.5, 12.6, 12.7, 12.8, 12.9, 13)
  ),
  list(
    target = "Season peak week",
    location = c("HHS Region 1", "HHS Region 10", "HHS Region 2", "HHS Region 3",
        "HHS Region 4", "HHS Region 5", "HHS Region 6", "HHS Region 7",
        "HHS Region 8", "HHS Region 9", "US National"),
    predx_class = "BinCat",
    cat = c("40", "41", "42", "43", "44", "45", "46", "47", "48", "49", 
      "50", "51", "52", "1", "2", "3", "4", "5", "6", "7", "8", "9", 
      "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20")
    ),
  list(
    target = "Season onset",
    location = c("HHS Region 1", "HHS Region 10", "HHS Region 2", "HHS Region 3",
        "HHS Region 4", "HHS Region 5", "HHS Region 6", "HHS Region 7",
        "HHS Region 8", "HHS Region 9", "US National"),
    predx_class = "BinCat",
    cat = c("40", "41", "42", "43", "44", "45", "46", "47", "48", "49", 
      "50", "51", "52", "1", "2", "3", "4", "5", "6", "7", "8", "9", 
      "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
      "none")
    ),
  list(
    target = c("Season peak percentage", "1 wk ahead", "2 wk ahead", 
      "3 wk ahead", "4 wk ahead", "Season onset", "Season peak week"),
    location = c("HHS Region 1", "HHS Region 10", "HHS Region 2", "HHS Region 3",
        "HHS Region 4", "HHS Region 5", "HHS Region 6", "HHS Region 7",
        "HHS Region 8", "HHS Region 9", "US National"),
    predx_class = "Point"
    )

This specific expected list, available with flusight_ilinet_expected, is included in the predx package along with flusight_state_ilinet_expected (for state ILINet forecasts) and flusight_hospitalization_expected (for hospitalization forecasts). Other verification lists can be made in this format and all can be used with the function verify_expected to validate that all expected predictions are included. The function prints missing and additional predictions, but those can also be returned as a data frame by including the argument return_df = TRUE.

For this example, we limit the expected forecasts to the national level.

national_expected <- list(
  list(
    target = c("Season peak percentage", "1 wk ahead", "2 wk ahead", 
      "3 wk ahead", "4 wk ahead"),
    location = "US National",
    predx_class = "BinLwr",
    lwr = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 
      1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 
      2.6, 2.7, 2.8, 2.9, 3, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 
      3.9, 4, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5, 5.1, 
      5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6, 6.1, 6.2, 6.3, 6.4, 
      6.5, 6.6, 6.7, 6.8, 6.9, 7, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 
      7.8, 7.9, 8, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9, 
      9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10, 10.1, 10.2, 
      10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 11, 11.1, 11.2, 11.3, 
      11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 12, 12.1, 12.2, 12.3, 12.4, 
      12.5, 12.6, 12.7, 12.8, 12.9, 13)
  ),
  list(
    target = "Season peak week",
    location = "US National",
    predx_class = "BinCat",
    cat = c("40", "41", "42", "43", "44", "45", "46", "47", "48", "49", 
      "50", "51", "52", "1", "2", "3", "4", "5", "6", "7", "8", "9", 
      "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20")
    ),
  list(
    target = "Season onset",
    location = "US National",
    predx_class = "BinCat",
    cat = c("40", "41", "42", "43", "44", "45", "46", "47", "48", "49", 
      "50", "51", "52", "1", "2", "3", "4", "5", "6", "7", "8", "9", 
      "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
      "none")
    ),
  list(
    target = c("Season peak percentage", "1 wk ahead", "2 wk ahead", 
      "3 wk ahead", "4 wk ahead", "Season onset", "Season peak week"),
    location = "US National",
    predx_class = "Point"
    )
)

verify_expected(fcast, national_expected)
#> All expected predictions found.

Export as predx JSON (reduced size), predx CSV, or FluSight CSV

To save space and facilitate sharing, transfer, and storage of predx_df objects, they can be exported as JSON objects using the function export_json. A file name can be supplied as an argument to store this as a file instead of returning an object as shown for the peak week predictions in fcast below.

peak_week_json <- export_json(fcast[13:14, ])
peak_week_json
#> [{"location":"US National","target":"Season peak week","unit":"week","team":"Hist-Avg","mmwr_week":42,"submission_date":"2018-10-29","predx_class":"Point","predx":{"point":5}},{"location":"US National","target":"Season peak week","unit":"week","team":"Hist-Avg","mmwr_week":42,"submission_date":"2018-10-29","predx_class":"BinCat","predx":{"cat":["40","41","42","43","44","45","46","47","48","49","50","51","52","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20"],"prob":[2.71765295790689e-018,1.05098154210838e-017,5.62064673333472e-018,1.10633017835077e-018,6.14565616231136e-018,3.20292319880939e-015,2.73138424595409e-011,4.81833260313942e-008,1.78568393149462e-005,0.00141898430676559,0.0250993223739668,0.10365445494028,0.103891022982423,0.0292895523076136,0.0196608506730169,0.0349911259046295,0.0912038352325736,0.159595360227116,0.17640031092917,0.121576720796766,0.0435668625182061,0.0268396648526282,0.0348099949396376,0.0214596885860514,0.00441747999649527,0.000344280326810563,0.000251797579413183,0.000251797579413183,0.000251797579413183,0.000251797579413183,0.000251797579413183,0.000251797579413183,0.000251797579413183]}}]
jsonlite::prettify(peak_week_json)
#> [
#>     {
#>         "location": "US National",
#>         "target": "Season peak week",
#>         "unit": "week",
#>         "team": "Hist-Avg",
#>         "mmwr_week": 42,
#>         "submission_date": "2018-10-29",
#>         "predx_class": "Point",
#>         "predx": {
#>             "point": 5
#>         }
#>     },
#>     {
#>         "location": "US National",
#>         "target": "Season peak week",
#>         "unit": "week",
#>         "team": "Hist-Avg",
#>         "mmwr_week": 42,
#>         "submission_date": "2018-10-29",
#>         "predx_class": "BinCat",
#>         "predx": {
#>             "cat": [
#>                 "40",
#>                 "41",
#>                 "42",
#>                 "43",
#>                 "44",
#>                 "45",
#>                 "46",
#>                 "47",
#>                 "48",
#>                 "49",
#>                 "50",
#>                 "51",
#>                 "52",
#>                 "1",
#>                 "2",
#>                 "3",
#>                 "4",
#>                 "5",
#>                 "6",
#>                 "7",
#>                 "8",
#>                 "9",
#>                 "10",
#>                 "11",
#>                 "12",
#>                 "13",
#>                 "14",
#>                 "15",
#>                 "16",
#>                 "17",
#>                 "18",
#>                 "19",
#>                 "20"
#>             ],
#>             "prob": [
#>                 2.71765295790689e-018,
#>                 1.05098154210838e-017,
#>                 5.62064673333472e-018,
#>                 1.10633017835077e-018,
#>                 6.14565616231136e-018,
#>                 3.20292319880939e-015,
#>                 2.73138424595409e-011,
#>                 4.81833260313942e-008,
#>                 1.78568393149462e-005,
#>                 0.00141898430676559,
#>                 0.0250993223739668,
#>                 0.10365445494028,
#>                 0.103891022982423,
#>                 0.0292895523076136,
#>                 0.0196608506730169,
#>                 0.0349911259046295,
#>                 0.0912038352325736,
#>                 0.159595360227116,
#>                 0.17640031092917,
#>                 0.121576720796766,
#>                 0.0435668625182061,
#>                 0.0268396648526282,
#>                 0.0348099949396376,
#>                 0.0214596885860514,
#>                 0.00441747999649527,
#>                 0.000344280326810563,
#>                 0.000251797579413183,
#>                 0.000251797579413183,
#>                 0.000251797579413183,
#>                 0.000251797579413183,
#>                 0.000251797579413183,
#>                 0.000251797579413183,
#>                 0.000251797579413183
#>             ]
#>         }
#>     }
#> ]
#> 
json_tempfile = tempfile()
export_json(fcast, filename = json_tempfile, overwrite = T)

Alternatively, predx_df objects can be exported as predx CSV files or FluSight-formatted CSV files (those used for submissions on the FluSight website). These CSV formats differ in several ways:

  1. The predx CSV contains additional columns for MMWR epidemic week, team, and submission date. In the FluSight CSV these are included in the file name only.
  2. The predx CSV includes point predictions as individual rows whereas the FluSight CSV includes them in an additional column.
  3. The FluSight CSV will allow upload on the EPI webpage.

These differences are the results of making predx JSON and CSV formats more generic than the specific FluSight implementation.

Import predx JSON or predx CSV

In addition to importing CSV files in the FluSight format (as described at the beginning of this vignette), predx can be used to import predx JSON or predx CSV files.

Use with the FluSight package

The FluSight package https://github.com/jarad/FluSight includes numerous functions to work with forecasts, such as scoring and visualization. predx FluSight forecast can be converted to the format used by the FluSight package using to_flusight_pkg_format.

This can then be used for scoring. Note that in this example, fcast only contains national level forecasts.