Preliminary EDA for Landsat 8 Surface Temp Data

Chase Dawson

Surface Temperature Data

This section examines the median surface temperature. The Landsat 8 dataset is comprised of scenes. A scene is basically a picture of some part of the Earth which contains data in the form of a raster. From those scenes, we can compute zonal statistics given geometries, such as county, tract, or block group boundaries. There has been some difficulty computing zonal stats at the block level, so we exclude those for now.

The data for a given scene is captured at a specific time and day, thus there isn't necessarily one data source we can use for the surface temperature. In this exploration part, we iterate over all scenes and all spatial units and produce chloropleth plots of the median surface temperature in degrees Fahrenheight. This means there are a LOT of plots!

A given plot is titled with the format "filename basename spatial_unit date", where filename is the name of the file, basename is the name of the region (either "cville" for Charlottesville or "easternShore" for the Eastern Shore), spatial_unit is the spatial unit at which the zonal stats were computed (counties, tracts, or block groups), and date is the date when the data was acquired (or when the satellite took a photo of that region).

Missingness

If a geographic reigon (region is this sense means one geometry, so one county, tract, or block group) is fully missing data, it will be grayed out. I discuss/explore missingness in further detail later on.

Charlottesville Region Temperature Data

Charlottesville City Temperature Data

Eastern Shore Region Temperature Data

Missing Data

Since we're looking at several scenes, which represent photos of the Earth at a specific date and time, some regions aren't fully covered by one scene. This particularly happens with the Charlottesville region, where there are no scenes that fully cover the reion. Due to this issue, I thought it was important to explore the missingness of the data for each file.

In the plots above, some regions might have partial missing data, which is hard to show visually. I grayed out regions that had no data, but wasn't able to show what regions were missing significant data. Below, I explore missingness in the surface temperature data.

Charlottesville Region Missingness

The first plot below demonstrates how a single scene doesn't fully cover the Charlottesville region. The red boxes represent the spatial coverage of 2 different scenes and the blue polygons reprepsent the county lines for the Charlottesville region.

The figures below visualize the proportion of missing data in a geometry for that scene. As you can see, some scenes miss certain regions entirely.

Eastern Shore Missing Data

With the Eastern Shore, there are some scenes that cover the whole region, so missingness is not as much of a problem. It is confusing to see, however, that there is still some missing data. This might be from the actual data itself.