How Can Cities Identify and Map the Issues That Cause People to be the Most Upset?
Using data from Open311 systems and natural language processing we can see where people are most upset and what issues they're most upset about.
Recent years have seen the broad adoption of web-based Open311 systems that connect the public with local government. Open311 is used by individuals in a city to report problems like potholes, speeding, and graffiti. This information is received by a computer application where work can be assigned to the proper department to fix the problem. Along with the issue type, a requester generally includes a description of the issue they see. Using natural language processing (NLP), we can analyze this description and come up with a sentiment value. This sentiment value can be a majority positive, neutral, or negative.
Our goal here is to see how we can use the location component of a sample of Open311 requests and NLP to create a heat map of overall sentiment in an area. We can also see what types of issues people are most upset about or track negativity from month to month. With this information, cities can reduce the number of complaints they get, and citizens get to live in a better city.
Here's a flow diagram to show how this might work:
(1) - We upload a '.csv' table that contains the raw data from an Open311 system. Many cities have open data policies now, here's a good data set from the City of San Diego.
(2) Once the file is uploaded a Lambda function is triggered to process the data in our bucket. The Lambda function calls Comprehend which is a service that adds sentiment to the records in our table.
(3) Comprehend is an NLP web service that can detect sentiment, find key phrases, and pick out entities from unstructured text.
(4) Once Lamdbda is done processing the file it's uploaded to a 'Processed' bucket on S3.
(5) From there we upload to ArcGIS Online or configure a web map to point to the file on S3.
Once we have extracted data from the Open311 system, we're going to import into an S3 bucket. This is a cheap, highly available, and scalable storage option for data. If you're new to S3 go check it out. Before we upload to S3, make sure you have a location component (lat/long) and customer input for each record. In the case of the data set from San Diego, the original customer input has been scrubbed, most likely to keep communications private. I came up with some of my own inputs and used those instead. Our analysis will be a bit different from the real world, but we can still see how it can be useful to map sentiment.
Below are the phrases I came up with and had a little fun seeing what variants of a phrase produced the most negative output. The higher the confidence score, the more negative the statement is:
Comprehend and Lambda
Now onto Lamdbda and this is where the bulk of our processing will occur. From Lambda, you can set up custom code that follows the serverless architecture meaning you don't have to deploy, scale, or manage any servers. If you want to learn more, go check it out. TThey can run at a specified time interval or be triggered when new data is available (e.g., adding data to a bucket).
Using Python in the Lambda function call the Comprehend API and pass the customer input from each record to the 'detect_sentiment' function. This will return a dictionary containing the sentiment for that record. You can also use 'batch_detect_sentiment' that allows 25 documents at a time (documents are any string that's fewer than 5,000 bytes). After all your data has been processed, write the resulting table to S3. Below is a code snippet that shows how easy it is to leverage Comprehend. This service is continuously trained on multiple data sources, including Amazon product descriptions and consumer reviews-one of the most extensive natural language data sets in the world!
import boto3 import json comprehend = boto3.client(service_name='comprehend', region_name='region') text = "It is raining today in Seattle" print('Calling DetectSentiment') print(json.dumps(comprehend.detect_sentiment(Text=text, LanguageCode='en'), sort_keys=True, indent=4)) print('End of DetectSentiment\n')
You now have a table of 311 requests with sentiment value for each record. From here, you have a few options: upload to an online mapping application to visualize and display the data or set make the file public on S3 (read-only of course). With the second option you can configure your web map to point to that public file, and any time it's updated so is your map.
I chose to upload my processed 311 requests to ArcGIS Online. This platform offers a free tier that allows you to upload data, customize web maps, and even make web apps! Below you can see an interactive map of the final analysis and click on individual requests.
We've covered how cloud services and mapping applications can be used together to map public sentiment. We used some artificial inputs to generate sentiment, but there are interesting questions you can start asking about the data. What types of issues were people most upset about? Were there any hot spots of negativity? Why were those particular locations a hot spot? It would be useful to see how specific issues can be addressed differently. If people are generally upset about potholes, how can they be prevented from occurring in the first place?
The critical function of any city is maintaining substantial financial investment in its infrastructure. The public plays a crucial role in reporting issues and keeping a city running correctly. Local governments should start leveraging the cloud more aggressively to improve efficiency and meet the demands of smart cities.