The Topography of Searching, part 2: Making the contours.
Digital Collections Catalyst 2021
Brett Tweedie shares his inspiration for
The Topography of Searching
As mentioned in the previous post, I played around with a bunch of ideas (some better than others) about how best to present the data (leaves, trees, hills, islands, rivers etc), but in the end I settled on a topographic map. For this to work as intended though, I needed a way to actually convert the search data - a period in time, and a quantity - to topographic lines. Just so i knew i could actually deliver something at least vaguely along the lines of what I’d promised, I wanted to make sure I had some sort of fallback in case I couldn’t end up getting together a decent topographic map, so I started out playing around with a few other ways to present the points (was ready to call it ‘The Topology of Searching’ if needed and hope that no one noticed…). One of those was using a mesh-like landscape using Three.js. After a bit of playing around, I had something that I reckon could work at a pinch, and so with a fallback in place and panic subsiding, I thought I'd now have a crack at generating a proper topographic map.
To get the data into a grid the values simply get wrapped, with zero values added to complete the rectangle if there is no data for those months. In the end I also added zero values around the whole dataset to give less abrupt peaks at the edges, and played around with whether to smooth the monthly values by taking a 2, 3, or 4 month average, but in the end I left the values as they were (but values are normalised, so that by default they show the ratio of the number times the search term appeared compared to the total searches that month).
I then took the multipolygon and split them into individual polygons. In the GeoJSON that d3 produces, there’s just one multipolygon for each level, whereas the way I had in mind to mark the peaks (with a dot), I needed each polygon to be separate, so I could find its centroid (I also wanted them separate from an interactive point of view). Again, there’s probably a better way of doing it, but I just marked the peaks by running through all the polygons from the bottom and checking to see if the centroid of the level above was within the bounding box of the the current polygon, and if not, that polygon was a peak and marked accordingly with the dot and its height. This was only done if the peak was greater than 25% of the maximum height, otherwise for some terms you ended up with way too many dots, as every peak was marked, not just those that were significantly higher than the surrounding terrain. There was also a bit of mucking around to place the dots and labels for the peaks at the right stacking level in the SVG, so that when viewed in 3D, they’d properly be obscured by any taller peaks in front as the map was rotated.
I also wanted to have contour height markers, like those you see on proper maps (those numbers next to some of the contours lines to show the height of that contour), and had a bit of a play at placing them along every n-th line, but between working out the direction of the slopes across multiple contours, and then matching the distance along the contour from the origin of the curve, to placing the height, as well as the right direction to rotate the text to match the slope of the curve at that point of the contour, my code was getting quite unwieldy, and wasn’t producing particularly good results, so I gave up (again, there’s probably a better way of doing it).
The other issue was how to scale it all. Originally I just wanted to have a single scale for the map, but the numbers of results varied so greatly depending on what was searched for, that that was proving problematic. For example, some searches might only have a few results, such as a mere 7 for the classic children’s series ‘Meg and Mog’ (I loved them as a kid, and am a bit surprised they’re so low to be honest) to almost 115,000 results for something like ‘history’ (and even more for words like ‘the’). Any scale that catered for a smaller number of results wouldn’t work well for a large number, and vice versa. I looked at using a log scale, and although that worked for showing the differences between results, it didn't work so well for showing the differences within a set of results, which I thought was the more interesting aspect, so in the end I went with a custom scale that uses the maximum height of the data as the maximum height of the map, in order to best show how a set of search results changed over the period.
After that I had a play around with a bunch of different colour palettes for the map. Some came from looking at the old topographic maps the library has, as well as trying to match the tone I’d cobbled together for my initial submission, and, um, some just ‘cos I thought they looked nice (I’m a sucker for pastel colours). In the early versions I was actually using unnatural colours such as pinks and purples (see below), but in the end, I went with something more natural, in order to make it (hopefully) easier to understand the height aspect (especially in the 2D view).
As for how best to show the maps online, I was initially unsure whether to generate the map via the HTML canvas tag or as an SVG, but basically I'm just more comfortable with SVGs so went with that. Plus I wanted to make the generated maps downloadable as an image, and I figured that working with vector images, such as SVGs, in other applications is much easier than working with pixel based images, such as the pngs or jpegs generated via canvas (and with an SVG, you can still convert it to a png, which I’ve added as an option, but with canvas, you can’t go the other way).
I also thought that by using SVGs I could take advantage of CSS’s 3D transforms to generate an isometric view of the map, and after a bit of playing around (and dredging up some high school maths again) I got the look I was after. There were a bunch of other things I looked at, such as whether the map should be zoomable (it probably should have been, but I ran out of time), and how best to show it on mobile/smaller screens.
In the end, the contours are generated based on monthly, rather than daily counts (although if you download the JSON, it has a daily breakdown). This was partly for the sake of speed, both to generate and download the data, as I was terrified that I might end up making the interface I wanted, only for it to take ages to load. But it was also so the grid would work in both directions, left to right (January to December of each year), as well as top to bottom (the same month through the years, such as April 2012, April 2013, April 2014 etc). Also, I just realised looking back on the initial data that the library has the searches bundled by month, so thinking about it now, perhaps that was also one of the unconscious reasons why I was thinking in monthly formats as well…).
After having a play around with code, the next step was to test it with actual data and I was always a bit scared at this stage in case a) it didn’t work, or worse (if I’m honest), b) looked terrible. Along the way I posted a few of the images to twitter - the first four looking at the topographies of searches containing the words ‘bushfire’, ‘cyclone’, ‘flood’, and ‘drought’ respectively - but with my 260 followers, there wasn’t much feedback beyond the occasional polite like from supportive friends (thanks Cass!).
Some results were interesting because of what they showed on the map, such as clear peaks at particular points in time, whereas others were interesting not so much for the topography they produced, but rather because of the results they returned. Admittedly some were neither, but that’s to be expected, as, um, not all data is inherently interesting.
The (probably rather obvious) point of all this is that - at least for me - any datavis project involves potential trade offs between the idea, technologies, and time/effort, and it’s been interesting going back over the old sketches and designs and seeing how things have changed over time.