Tuesday, June 28, 2016

Under the Big Top # 26: “A Watershed Moment”

(Personal reflections inspired by Who songs)

Song: “Guitar and Pen”
Album: Who Are You
Release Date: August, 1978

What’s the most complex thing you have ever done?  I’m not talking difficult per se, à la hiking the Appalachian Trail or running a marathon or dealing with an illness or a family crisis.  I’m talking about something that was complicated and took deep thought and innovation; something out of the box, cutting edge, multi-tiered, hard to mentally tackle.  This entry is centered on this concept because this week’s Big Top entry, “Guitar and Pen”, off of the transcendent Who Are You album, is for me one of the most complex songs Pete Townshend ever wrote and up there with the most complex the Who has ever performed.

Ok, well obviously I’m not going to get any responses to enter into this write up (although I would love to hear your stories) so I’ll proceed with one of my own complicated endeavors.  I kicked the tires on a few thoughts, including a model ship I built when I was a teenager; a handful of essays in college; and designing and erecting a unique tree house in our back yard here in Pepperell.  I finally settled on a program I wrote way back in 1991 while in my early years with the US Geological Survey (USGS).  This will take some effort to explain in layman terms, but here goes….

One of the scientists in the Massachusetts office I worked in at the time, Kernell Ries, a longstanding colleague and friend, who I continue to collaborate with to this day, came to me with a request for my services.  Kernell is a surface-water hydrologist who excels at developing regression equations that predict flows in rivers and streams using long-term records from fixed USGS gaging-station locations, along with landscape and atmospheric characteristics of the region of interest.  The resulting flow statistics are used for a plethora of reasons, from designing bridges and culverts, to regulating development, to water consumption, to constituent time-of-travel predictions (think hazardous spills) to habitat studies.  The list goes on.

I had been hired by the USGS several years earlier for my GIS skills.  Whenever I am asked what GIS is, my short definition is ‘computerized smart maps’ which I hope to explain more through further describing my role in the project I am writing about here.  In the late 80s and early 90s, GIS technology was still in its formative years, nowhere near the vision that later played out with GPS and Google Maps and a myriad of other geographic-centered analysis and use cases.  However, it was a time of rapid innovative discovery, with new cutting-edge ideas being explored and hatched on a daily basis by those who knew the software. Such is the environment for any technology when it begins to flourish.   

A vast majority of GIS map data comes in either vector or raster format.  Vector data is often brought into the computer world through digitization as points, lines, and polygons.  Examples include dams and sampling locations as points; rivers and roads as lines; and lakes and land parcels as polygons.  Linear data can additionally be built as geometric networks in order to trace routes, distances and other information from one point to another such as from home to work on a road network, or up/downstream on a river network. 

Raster data comes in the form of a grid lattice with every square pixel storing a data value.  Examples include 3-dimensional terrain maps of land-surface elevation values and landscape maps of forested, agricultural and urban land.  Digital raster maps can be overlain with one another to perform complex map-algebraic computations that help detect natural and manmade patterns and trends.  

Most GIS specialists excel in either the vector world or the raster world, but rarely both.  I was fortunate enough to be in a position to have to work with each of these environments for a handful of projects leading up to 1991.  This combination skillset would prove to be a complimentary and powerful one for Kernell’s needs and ultimately for the rest of my professional career to date.

The most important characteristic that needs to be computed for regression-based flow equations is drainage area.   This is the area of land that contributes riverine and overland flow to a user-specified point location on a river or stream.  It is also referred to as a watershed or basin.  In the days before GIS, this characteristic needed to be measured using an instrument called a planimeter by tracing along the pre-sketched ridge line (basin boundary) of interest on a topographic map.  In the mid-80s several bright minds at USGS’s Earth Resources Observation and Science (EROS) Data Center figured out a masterful way to automate this process using derivate raster data from terrain maps called Digital Elevation Models (DEMs).  In a nutshell, a grid of flow direction values can be derived from a DEM, which can then be used to compute the ridge line of a watershed. 

One big problem however was that the DEMs in the early 90s were pretty course, often producing sketchy results, particularly in flat areas.   One consequence was that ‘synthetic’ streams in these areas, which could also be derived from the DEM, would often plot hundreds of feet from their true location, and/or merge and meander together incorrectly.  The erratic results did not give Kernell a lot of confidence, he being a meticulous planimeter delineator back in the day.  At one point Kernell pulled out a handful of USGS topographic quadrangles from our map drawers, each of which included a Mylar overlay of delineated sub-basin boundaries, adding up to several thousand across the State of Massachusetts (which was the study area for his  project).  Kernell wanted to see if I could figure out a way for these boundaries to be used in place of the DEM derived boundaries in all locations where the two were hypothetically coincident along the ridge lines.  I was aware that these boundaries had been digitized into a GIS datalayer by a colleague at the State’s GIS clearinghouse (now called MassGIS, which was formed as a result of a three-year cooperative program between the State and our office:  I was initially hired by USGS to play a role in the last year of that co-op) and I had knowledge of and access to this data, having worked with it some for several projects already.  Yet, seeing those overlays, worthy of display in an art gallery, and in their original hand-delineated state, was both impressive and inspiring. 

I also had access to digital stream-line data and other vector hydrography, digitized by the USGS mapping centers at around this time as Digital Line Graphs (DLGs).  In the two years prior, I had been involved in fleshing out this data, creating ‘centerlines’ through lakes and wide polygonal rivers by devising Euclidean distance formulas in the raster world to complete the connectivity of dendritic river networks in the vector world, from the multitude of small headwater streams all the way down to the major tributaries and ultimately the ocean (think the silhouette of a tree from twig to branch to limb to trunk to ground). 

Around the same time I started messing around with an Australian-source GIS program called ANUDEM (later renamed TopoGrid in the ArcGIS software and now called TopoToRaster), which enforces a DEM to recognize the spatial accuracy of digitized vector streams, thereby improving the spatial accuracy of the elevation values near the streams, and in turn getting the synthetic streams more in line with reality (though not quite getting to the even-more precise horizontal line up which would come later with a deep-trenching raster procedure called ‘burning’, developed by students at the University of Texas, albeit in a compromised fashion that rendered useless the output DEM, but got the desired spatial alignment with the vector DLG streams for derivatives like flow direction). 

Finally I was getting fairly proficient at programming in a GIS macro language called AML, which opened up all sorts of mind-expanding concepts for me related to computation and automation; functions, directives and variable settings being the cornerstone to this fascinating new world.  It was this combination of knowledge that got my brain spinning with ideas on how to help Kernell.  My plan:  Automate a way where I could get basins delineated from any click point on a river or stream by using both the raster DEM and the more accurate vector sub-basin boundary datalayer for Massachusetts.

Since just before the time I started working at USGS, the agency had initiated the first in a series of blanket purchase agreements with ESRI, one of the largest developers of GIS software in the world.  Today, ESRI’s flagship product is ArcGIS.  In the 80’s and 90’s it was better known as ArcInfo.  The ‘Coverage’ was the vector product of that time, and there were some interesting features of that product which have unfortunately disappeared with the newer ‘geodatabase’ model (at least without considerable effort to replicate).  One of them was the ability to build a single Coverage as both a line and polygon layer.   This allowed for a relationship that could be utilized between the two feature types: Depending on which typically-random direction a line was facing, there was built-in coding that determined whether an adjacent polygon was to the left or right of that line using the related line attribute table.  This coding was rarely applied in analysis, but now I had a good reason to use it.  I added two additional integer attributes to the line attribute table: ‘LOpen’ and ‘ROpen’ (left and right open).  If there was no flow relationship between the adjacent polygons I left these to the default zero values.  However, if there was flow relationship up and downstream, I would give one or the other a code of 1 to ‘open up’ flow, depending on the line feature’s digitized direction, when it got ‘tapped’ in the iterative stair-step program procedure I had now begun devising in AML. 

Ok, I had a coding scheme, now I had to figure out a way to automate it.  I tracked down a great “DO UNTIL” looping piece of code from a USGS colleague and tinkered with it.  This code would loop through a process over and over until it ran out of options, at which time it would move on.  I modified this code to work with my line attribute coding scheme.  Polygons would be collected systematically through this line-polygon relationship until all upstream polygons had been found. 

At this point I turned my focus to the click point and its immediate contributing area upstream.  This was the chunk of territory that had to rely on the DEM raster process devised by EROS and enhanced with ANUDEM, and since it almost always fell inside an existing vector basin polygon, I had to figure a way to only use the raster-derived watershed (by this time converted to vector) up to the points where it intersected the accurate existing vector basin boundary.  I devised how to do this by narrowing the analysis window to the basin polygon that the click point fell within.  After a few steps of finding a way to cut the new boundary at the first intersections (both sides) I had my final piece of real estate.  Combining this sub-piece with the first upstream basin polygon (or polygons) was a fairly difficult concept to work through and took some trial and error, but once I got it with some proximity commands I had my raster/vector merger.  From there I used the iterative DO UNTIL loop, until all upstream polygons had been collected, which I then proceeded to dissolve of internal boundaries to get the final basin boundary polygon from a click-point (hence my dubbing the program “ONEBASIN”).

Obviously this process needed a visual component for the user which I set up by first prompting the user for a latitude/longitude coordinate.  The initial map view consisted of the vector stream and ANUDEM raster (synthetic) stream, as well as the lat/long point, with instruction to click a point on the synthetic steam that best represented the equivalent location of the vector stream and the point (for the entire procedure, see the attached jpeg files from my scanned slides).  From there the program took off, showing the user the progress along the way.  The attached selection of slides from the talk I gave that fall in Portland, Maine at the annual Northeast Arc Users Conference (NEARC) show how it all worked.  Subsequent programs would take this boundary and use it as a cookie-cutter for other basin characteristics (i.e. forested area, urban area, precipitation, channel slope, mean basin elevation, soils, total length of streams, area of sand and gravel deposits, etc.).  The most significant ones would be added to a final program to run Kernell’s regression equations to get flow statistics.

Over the ensuing years Kernell and I would collaborate to bring this automation to the web, first as a state-wide product and later, after USGS Headquarters got wind of it all, as a National product for other State offices to utilize (with a significant amount of training from us).  That product is called StreamStats, and for many years, it stood head and shoulders above all other GIS web sites in terms of its advanced use of GIS functionality on the web.  Thousands of users hit the site daily on a regular basis.  StreamStats has been a tremendous success story for us.  Some of what I did had to be modified for the newer software, but that early “ONEBASIN” code still stands; a proud moment as a trailblazing procedure where I took a complex and disparate set of data and ideas, and customized it all to work for a very specific and revolutionary purpose. 

“Guitar and Pen” is one of my all-time favorite Who songs ( (for more on “Guitar and Pen” and the album it was recorded on, see Under the Big Top # 10 “A Who Album Review: Who Are You”). Thank goodness Keith Moon was still around for this one as for one last time Pete Townshend had all that brilliant one-of-a-kind potency of Who tools at his disposal (Moon, Entwistle, Daltrey and himself), making the complex, multi-textured structure of this tune possible.  Like the program I wrote, I believe Townshend had to work at it one piece at a time.  Complex ideas typically come together that way.  I’m thankful for those clairvoyant handful of times where multi-tiered schemes came together for me in life, if for only to give me a tiny bit of insight into the world of a genius.



No comments:

Post a Comment

Please feel free to comment: