Sentiment Analysis in the Real and Research Worlds

Sentiment analysis is an important component to anyone interested in studying large datasets of text. Like most topics in this course, James has set forth to highlight some of the underlying functionality that generates sentiment analysis output.

First, it is interesting to note that part of speech tagging, lemmatization, and rate words, elements essential to sentiment analysis are direct descendants of the work linguists have been doing for many years. The field of computational linguistics interfaces quite nicely with the work being done in digital humanities particularly for this reason. This again, is a moment of innate interdisciplinary work.

Sentiment analysis is also hyper-reliant on the human component. Interpretation of data is brought center stage in this type of research as we can not determine the accuracy of the output without interpreting the results. For the Mets vs. Royals example that James gives in his slides, there is contextual information regarding the World Series, the stakes of the game, the date, etc that come into play in understanding the data. Having accurate results is an important step, but the researchers interpretation is fundamentally important. For computer scientists/computational linguists to create more accurate sentiment analysis, digital humanities scholars and the like must utilize and improve upon the output.

From a professional standpoint, I have come across sentiment analysis in brand management platforms. One such example is Trackur, which allows a brand to see positive, negative, and neutral mentions of it’s products/services. Essentially, the platform will aggregate mentions of specific keywords allowing the brand to see the sentiment associated with it. The announcement for this feature includes the important caveat: “It’s my philosophy that only a human can accurately navigate the nuances of the human language and understand with 100% accuracy whether a tweet, update, or posts is positive or negative.” The kicker is that this is an expensive platform. This speaks to James’ frustrations with access to data. The value of sentiment analysis in the workplace, like all random KPIs is being able to tell a valuable story with it, strategize next steps, and create active plans to make improvements to said KPIs.

In my research in particular, sentiment analysis will have value as I begin to work with the textual elements of a large set of picturebooks. Looking at a corpus of say Dr. Suess books, it would be interesting to see the sentiment analysis of the entirety of his texts. The lemmatization has potential problems considering how much invented language is used, but that’s a problem that we can tackle with a Suess specific dictionary.

Slides and Datasets for sharing

Hello all,

Here’s a link to the slides and datasets that I’ll be sharing in class during my session on sentiment analysis. Feel free to look it over at your convenience.

Copyright, pros and cons?

Copyright is an interesting topic when we compare between China and U.S.

Several years ago, almost all musics and movies in China are free, we can download any songs from the internet, and watch movies online completely free. The most interesting part is that most of Chinese never think it may be a problem for others.

Nowadays, copyright is becoming a hotter and hotter topic in China. We found that it is becoming more and more difficult for us to find free music, watch free movie, which I think is a good thing. Since copyright is not protected in China before, it is really hard for musicians, writers to make money. Actually, there are so many low-quality music and books sold in the market. I guess, it’s because more and more good artists and writers quit since they could not survive. As copyright is taken more and more serious now in China, actually it is easier for us to find high-quality music, books, etc.  

Of course, you can argue that selling books can steer authors toward popular topics and forget about their expertise. But I think it’s a different question. We pay for food, drinks, hair cut and all kinds of services and products in the world, how can we suppose others’ knowledge products (like music, books) should be free (academic papers’ case is different here)? 

Mother Goose is a much less interesting Dr. Suess.

The reality of access barriers that emerge due to copyright restrictions are all too real for anyone studying source materials published after the 1930’s. The issue is interdisciplinary. The issue exists for scholars working in the social sciences, natural sciences, arts, and humanities.

Digital Humanities scholars in particular, face this issue of access and discuss it widely. It does seem however, based on how difficult it was to find readings on this topic and digital humanities specifically, that many of these discussions go on face to face, in forums and conferences, and not on published DH outlets.

It is both odd and intriguing that more DH scholars aren’t loudly talking about how they deal with copyright issues in the scholarly research. Why do we vocalize the need for Open Access, Open Source, Open everything, but only behind closed doors think about copyright.

For my own research, I have been pigeonholed away from my intended source materials because of copyright issues.

When I first conceived of an exploration of text and images in picturebooks, I intended on working with the already digitized copies of Dr. Suess’s many published stories. There is something academically interesting happening in Suess’s invented language and invented worlds, but the Suess-iverse is off limits. Random House refuses to play nicely with a lowly scholar like me interested in applying computer vision and natural language processing to one of it’s most commercially lucrative and recognizable brands.

In attempting to get access, it was made very clear that it was a better use of my time to find alternative source material. The librarians I spoke with pulled me in different directions. The Random House representatives were unresponsive or vague in their response. Noone wanted to pay much time assisting me in actually getting permission to do my research. So I didn’t. We turned to Mother Goose. It was public domain. It was accessible. It was less interesting.

It was not only frustrating, but it was discouraging.

My History as a Pirate

I could probably make this post around 10,000 words, but i’ll try not to. This is the topic I am most familiar about and something i’m actually very interested in. If you’d like, you can read 8000 words I’ve written on the topic here.

I illegally download music, games, books, and movies regularly. Sometimes I do this even when I am able to access them for free…

Continue reading

If There are More Roads is the Trip Easier?

The library servers are currently down, so I can’t access Heron’s article about accessibility in MUDs. What a bummer, that article seems like it’d be right up my alley.


The nice thing about new technology is it gives us options. The more ways we have to access, the greater the possibility that everyone will be able to. The more roads that lead somewhere, the easier it is to get there… right?

The blind cannot read newspapers, just as the deaf cannot listen to the radio–fortunately the news can be found on either. At the same time, one is not a substitute for the other. Listening to the news and reading the news are two different experiences for a multitude of reasons (tone of delivery, individual writer/anchor opinion, conversation, delivery, breadth covered, etc).

Digital media should be (but isn’t always) a step toward accessibility. If you need another reason to believe in TEI, look no further than the features it brings toward accessibility. With the right interface and/or software pairing, an encoded text can become fully searchable using voice commands. Yet at the same time, a complicated interface can make something less accessible, even to those with no disabilities. Williams’ article about universal design spoke to this: there is no catch-all solution that is going to work for every individual: “This scenario caused me to reevaluate my understanding of what it means to be disabled, as she clearly was using abilities that I did not—and still do not—have: I had not trained myself to be able to process auditory information as efficiently as she could”.

Why DH?

I always think that DH is an inclusive field. It combines information technology and humanities to provides entirely new perspectives to the way we think. It is creative and wide-ranging. Until I read  Michael James’s paper and William’s article.

Williams pointed out that digital humanities have been focused on creating, organizing, presenting, and preserving digital information for future use. But the scholars fail to consider other possible needs, that is, the needs of people with disabilities. For example, How could we make the information presented in Audio and Video is accessible to the deaf? I have to admit that I have never think about these possible research areas in digital humanity before. Heron demonstrates  a case study of  accessibility improvements of multiplayer text games, which is also a topic that has been beyond my imagination. 

All of these lead us back to the question that we have been asking, talking and trying to answer: why digital humanities?

What can be included in digital humanities? What can DHers do except digitizing information and exploring the knowledge behind the information? 

Canonical Fact-Checked

Digital Humanities prides itself on inclusiveness. That is a canonical fact of DH. The problem is, it’s not as inclusive as we want to believe.

In “Disability, Universal Design, and the Digital Humanities,” George H. Williams reminds us that, “many of the otherwise most valuable digital resources are useless for people who are—for example—deaf or hard of hearing, as well as for people who are blind, have low vision, or have difficulty distinguishing particular colors.” The development of tools, archives, and methodologies does not go far enough to be inclusive to those with disabilities.

The problem here is that, as Williams uses Rosemarie Garland-Thomson to point out, disabilities are often overlooked mainly because of “cultural rules about what bodies should be or do.” The adoption of a “universal design approach” is argued for in Williams article, and rightly so. The complexity and lifespan of many of these tools is mainstage in their development. Time and money are minimal to nonexistent on most of these projects and even those that are funded and have hefty timelines are focused on user/information design for a specifically decided intended audience. Visually impaired, or otherwise physically disabled people, are in general, not included in a user experience avatar profile. When coming up with the expected audience to determine what kinds of paths, questions, and interactions they will have with a tool, theory, or methodology, we, with bias, overlook this part of the population.

This issue is particularly pertinent in both our immediate class, but also for the DH program in general. Visually impaired persons’ are among the many people at the Graduate Center creating and interacting with DH theories and methods.

Thinking back to an article published in the Guardian called “Video games which open the door for the blind to play” about a year ago, visually impaired gamers have emerged onto the scene as a result of “great sound design” and the introduction of audio-only adventures.

A potential DH project could be the creation of a conversation based archive. A uber-Siri like platform that would allow you to inquire into the archive by asking conversation style questions. It would be an audio only learning journey into a database of documents.

Access for the Disabled – compared to my CV research

Going through William’s paper, I am struck by the closeness to the issues we face with Assistive Tech for visually impaired, and now generally disabled people. While we are primarily concerned with getting about and mot hurting ourselves, the issues are nonetheless similar.

How do we interpret what is sensed – in particular visually and audibly.


It occurs to me that- if Williams does not state it – that the answer is a 3-fold solution –

  1. texts should be stored in a very easily read format – TEI seems to approach that.
  2. depending on the presentation required, at this point specific formatting of the underlying text is done.
  3. The output is thus customized to the user.


Now, the issue is what do we do about a video newscast? Well, if closed captioning cam be applied – or possibly transcripts be generated – then, if auto-TEI processing were available, the rest would be – as they say – a piece of cake.


I can adhere to problems with web-page presentation – having to use magnification often means I miss out on parts of a page which the ‘regular’ user sees such as processing buttons!

Oh well!

Computer generated music? Musicians have the worst luck.

It’s hard to follow a video of REM playing in the major scale, but here it goes…

I know most of this weeks readings were about MEI, but “AI Methods for Algorithmic Composition: A Survey, a Critical View and Future Prospects” is just too amazing not to spend a post reflecting on.

Music is and has been a major piece of my life. I’ve been performing, writing, and recording music for as long as I’ve been able to stand tall enough to reach the keys on my piano.

The ability to produce music algorithmically is one of the most incredible, but expected, things I have ever heard.

The impact of digital technology on the way music is composed is explicit to those who have turned on any major radio station since Y2K. We’ve gone full digital. In recent recording sessions, I’ve worked with software that generates natural sounding 3-5-7 part vocal harmonies off of a single recording. It uses something people call math. I’ve seen my own soundwaves go from awkward vocal booth capture to manipulatable graphical waveforms. In a few clicks I’ve been given intensely perfect pitch. Automation can be added to detect and quantize drumbeats so everything is perfectly aligned within a system of rules set in an digital audio workstation.

It is only right and natural for music, like art and text, to have representation in the digital humanities community. Like James, I simply hadn’t considered it.

This brings me to my favorite problem. Papadopoulos notes, “Probably the most difficult task is to incorporate in our systems the concept of creativity” (5)

Can we develop computers powerful enough to develop computational creativity?

Can computers develop something that is aesthetically pleasing enough to trick us into thinking it’s so beautiful a human must have made it?

Personally, I think so. I am not threatened by a computers ability to write music. It will be different than the things I conceive of. It will be valuable in it’s own right.

I want to bring David Cope to the table here. From his website, we learn “David Cope is Dickerson Emeriti Professor at the University of California at Santa Cruz where he teaches theory and composition, and Honorary Professor of Computer Science at Xiamen University (China). He also teaches regularly in the annual Workshop in Algorithmic Computer Music (WACM) held in June-July at UC Santa Cruz.

Here we have a computer scientist and musical composition professor operating with the same hands and mind.

Needless to say, I am fascinated by this.