Tiffany Ong

Data Visualisation Exploration and Learning

I’ve been wanting to work on a data visualisation project for a while. I’ve been seeing more and more projects from others that had their own individual style and really wanted to try my hand at it to see what I could come up with. I had already dabbled with a few small pieces but nothing very polished. I knew it would also take me time to find my own style and get to a certain standard that I would be proud of.

I’ve been immersing myself in data visualisation podcasts since they were a really accessible way for me to gain information with my tight schedule and could listen whilst completing household chores. I had taken in quite a lot of information, inspired, motivated, found the designers I liked the work of, ready to do something but never knew where to start. I stumbled on a data visualisation course from Frederica Fragapane, a designer whose work I found beautiful. There were a few other data visualisation courses on the site and watched all the introduction videos. Even though Frederica Fragapane's course was what I really wanted, I felt I needed a simpler one to ease me in. I chose the course from Sonja Kuijpers as I was curious of her process and also liked her style. I started the project using my 10% allowance at King’s Digital Lab , and continued the majority (approx. ~80%) of the project on my personal time to complete the project as I didn’t want to lose momentum.

Selecting a Data Source from Children's Literature

The course assignment started by choosing a book that you had a personal or strong connection to. I didn’t have any so went to the Gutenburg Project site as recommended, a free eBook resource that are free of copyright (country-dependent). Since I knew the data would be manually collected, I looked at children’s books as I knew they were usually a lot shorter in content.

I found a few that made the shortlist but Red Riding Hood by Lydia Very that was published in 1863 really captured my attention because of the unique shape of the book and illustrations. Everything in it fit my criteria. Not much text or many pages. I was familiar with the story (or versions of it). I liked the illustrations and the language of writing.

I realised Gutenburg Project didn’t have the scanned version of the book. It was just the background with html text on top. I searched for the original versions of the book to gain a better idea how the book really looked like. Worthpoint had a lot of examples but not full page versions. The best version I found was on Internet Archive.

original book pages cut in a shape of a girl with text and illustrations laid out in a row — Scanned pdf pages taken from the Internet Archive that I laid out side by side in Photoshop.

Data Collection and Visualisation Planning

I followed the course assignment, counting the characters on each chapter, but in my case was just pages. Then looked at other data that could be gathered, and jotted it down. The data collection was manual and slow. The process was a nice change to my usual work and quite therapeutic but was still very glad I chose a short book. I then typed in all up into Excel as I knew it would get messy and more calculations were needed. Used Word to get text word and character count.

With most of the data at hand I started sketching out how I could visualise it. I had it in mind that I wanted to make the visualisation reflect the story as much as possible. I also knew I wanted to use simple geometric shapes. I had used them in previous projects and like the simplicity and look. Also if I had to code it up (in CSS) it would be a lot easier.

Crafting a Geometric Storyscape

I started with the characters, knowing they were the main feature and made symbols for each. Thought it would be nice if I could create a scene with the data and making each page a tree seemed perfect to represent the woods and forest.

photographs of the same sketchbook opened up to three different pages with data table and black fountain pen line drawings using geometric shapes — Initial data collection, notes and sketches in sketchbook

The course used RAWgraphs to turn the data from a spreadsheet into svg shapes which could be downloaded and adapted in Illustrator. I had read about and tried unsuccessfully to plot anything before, so was glad to have figured out how to use it.

screenshot of a bubble chart in RAWgraph with word characters plotted on the vertical y axis and 16 pages on the horizontal x axis — Screenshot of plotting total text characters (letters) on 16 pages on a bubble chart in RAWgraph

I got to a point where I had everything I wanted but the layout. I had this idea that once I completed the layout I would next code it up to give it a bit of interactivity and make it responsive so didn’t want to complicate the layout but it just looked too stiff.

data visualised with geometric shapes making a scene of green circle top trees, triangle top houses, multi shaped characters laid in a linear format with legend in boxes below on white background — First version

I decided to try to put it in a circle, which spaced the trees out too much and also didn’t make sense since the story wasn’t cyclical, but a semi circle seems to work and moved a few elements around to make it fit.

Seeking Feedback and Critique

The next stage of the assignment was to share it for feedback. I never like asking for feedback, not because of bad feedback but because of no feedback. There’s nothing more disappointing than putting your work and yourself out there, asking for feedback and getting nothing back, which I've experienced a few times so had a bit of apprehension. But I had a course forum and my work colleagues, and I really needed to get on with it so out it went.

data visualised with geometric shapes making a scene of green circle top trees, triangle top houses, multi shaped characters laid in around a green semicircle with some text in circles and legend in one box on white background — Semi-circle layout

Only a couple of colleagues replied but they gave good feedback which was much appreciated. Things I missed were pointed out, and sentiment analysis was suggested. The course leader gave design suggestions which was really what I needed as I didn’t have a design community that I could easily turn to, and later another person from the forum gave more feedback. One was about how my name could be placed in a different location which I didn't have on the poster, which then made me realise an important thing that wasn't clear that - the original author could be mispresented and not credited accurately.

Exploring Text Sentiment Analysis and Sentiment Scoring Challenges

Text sentiment analysis was something I’ve never worked on before. It meant categorising a word or section of text, into negative, positive or neutral. The more advanced analysis methods would give a metric scale to the sentiment, like negative 40%, or -0.4. I searched around for free online tools that I could do that easily with and although there were lots available, not many fit my requirements, which were:

Free
No signup
Gave a sentiment metric
Could analyse individual words and section of text
No coding needed

After quite a bit of searching and testing, I finally found one that did it all. MonkeyLearn had a demo tool that I could paste a the text of a page plus the individual words within it.

Text box with paragraph of a page within on left with button to submit, results showing sentiment type and its percentage on the right — Screenshot of sentiment analyzer tool on monkeylearn

Refining Visual Elements

It would take too long and tedious to type and record down the result every word so I picked a few words from each page which I thought would have a high negative or positive value and listed it on an excel sheet. I then picked out the top 20 of the positive and negative words, including the repeated ones, but excluded the very positive word “mother” (according to the monkeylearn’s model) as it repeated 9 times. I found the results of quite a few words from the monkey learn model a bit questionable but it was quite a well known sentiment analysis tool, was the only one that fit all my requirements and I had spent so long getting the sentiment scores, I couldn’t bear to redo it all so continued.

I marked the page sentiment over the tree which initially I wanted to use a more cloud like representation but it looked too much so opted for a thin curved solid or dotted line. The top 20 negative and positive words - circles in the tree representing the words were marked. This was from the suggestion from my colleague Arianna Ciula, who mentioned the circles could represent the sentiment analysis. I realised it could be better represented and tried to change the circles to a sunburst but it changed the look too much and I shelved the idea for the next project.

I usually despise word clouds but thought it was quite appropriate to use them like actual clouds in the sky scene. It could give a rough sense of the most used words if someone wanted to look for them and also add some texture to the image and used a free online tool from Jason Davis that gave several options and allowed a svg download. The legend got even trickier with the additional information and I don’t know if many people would have the patience to read all that is there but hopefully it is useful for anyone who wants to read it. I worked on tweaking the rest the suggestions from the feedback. A darker background was one which was a great suggestion but took me quite a while to complete. I still don’t think I have it right but I had exhausted almost every combination I could think of.

Embracing Imperfections as a Learning Journey

I spent far longer than expected in this project and I’m not completely happy with many aspects of the final piece. There are so many things I could improve on it like legibility, colours, fonts, layout.... the list could go on. But this was something I needed to do to make a step forward and I learnt a lot from it.