The Rating Rabbit Hole

Note: This was written in August of 2015, before cafebedouin.org existed. I rediscovered it recently and thought the point is still a valid one and worth sharing.

tl;dr: Algorithms have a bias toward the status quo and present a threat to our cultural production. (2,600 words)

Shortly after the start of the Afghanistan war, Osama Bin Laden fled the city of Kandahar. An Afghani family picking through the abandoned Bin Laden property found a collection of 1,500 mix tapes comprised of “songs, sermons and intimate conversations” he used for his brand of extremist propaganda. In places like Afghanistan, mix tapes are good vehicles for propaganda because they are not subject to censorship and can be easily duplicated. Cassette players are an important medium in places where there is a dearth of other entertainment. The local cassette shop owner that bought the Bin Laden tapes from the family was convinced by a CNN cameraman to keep the collection together, and the tapes eventually found their way to Flagg Miller, who is an expert in Arabic literature and culture from the University of California, Davis. He subsequently spent 10 years listening to the tapes and writing a book about them, entitled “The Audacious Ascetic“.

The idea that the cassette player could be a powerful modern medium for propaganda is intriguing. So, I went to GoodReads to mark “The Audacious Ascetic” as a book I’d like to read when it comes out. While doing that, I noticed that GoodReads already had one rating for the book, for one star. [Note: It currently has a rating of 2.57 out of 5, with 7 ratings in. The only written review gives it a 5.]

While it is possible that someone got a hold of a review copy, evaluated the book and found it to be worthy of a single star, it seems unlikely. Given the book is not going to be released for two more months, the rating was more likely given independently of the book’s merits and rated instead based on the initial impression of the book and how well it conformed to the worldview and the personal/cultural identity of the person rating it. This possibility made me wonder. I have come to rely on rating systems like GoodReads. What exactly is being rated? What larger implications do using these rating systems have to our society, our culture and to ourselves?

In our house, we often use a minimum of an All Critics >80% Tomatometer score from Rotten Tomatoes to determine whether we will watch a movie that we are unfamiliar with. I rationalize this approach with a variation of Linus’s Law, i.e., “Given a large enough audience and critic base, almost everything ratable will be characterized quickly and its quality will become obvious.”

In Rotten Tomatoes, there are different ratings for top critics, all critics and for the general audience. Consider the ratings for the top movies for last weekend (August 14-16, 2015):

Title	Top Critics	All Critics	Audience
1. Straight Outta Compton	78%	89%	96%
2. Mission Impossible: Rogue Nation	95%	93%	91%
3. The Man From Uncle	47%	67%	80%
4. Fantastic Four	5%	8%	22%
5. The Gift	96%	93%	81%
6. Ant-Man	72%	79%	90%
7. Vacation	25%	28%	57%
8. Minions	29%	54%	53%
9. Ricki and the Flash	51%	62%	54%
10. Trainwreck	83%	86%	75%

Looking over this table, we might hypothesize that if “Top Critics” represent some approximation of an objective baseline, then maybe movies based on established franchises (e.g., Fantastic Four, Vacation, and The Man From Uncle) get an audience bump of at least +15%, when they are of average quality. On the other end, unsettling movies like The Gift seem to take at least a -15% hit, even if they are excellent because the content is either unfamiliar or challenging.

Further, the Audience rating is hiding an important detail. There aren’t just “Top Critics”; there is also a “Top Audience”. Frequent movie-goers, defined as people that go to a movie theaters at least once a month, make up only 11% of the U.S. population, but they account for over 50% of movie ticket sales. The average American, in contrast, watches less than four movies a year in the theater.

If there are ~520 films released in the United States every year and if we suppose frequent movie-goers average no more than one film per week, they are watching no more than 10% of the movies released in a given year. How do they decide what to watch, showing-to-showing? If someone needs to evaluate 40 new movies every month, then they are going to rely largely on reviews. Using a meta-review site like Rotten Tomatoes that aggregates reviews combined with reading or listening to a particular trusted critic or two is a sensible approach to choosing. However, this approach is overkill for someone watching only four films a year. The selection process for the general audience is largely driven by popularity, and known franchises are more popular.

What impact does this have on film production? Primarily, the blockbuster becomes most important. These films see ever-increasing budgets and their content focuses on spectacle, established franchises and storylines that can translate across multiple geographic markets and with the broadest audience appeal possible, even if the quality is mediocre. Second, films not targeted to a large general audience need a defined audience demographic and has to balance expenses against an anticipated return. In order to turn a profit, this often means looking at the tastes of frequent movie-goers and making films that appeal to them. It also means creating niche movies to pull in niche audiences.

A recent Vox article stated that Universal made more money this year than any movie studio ever. How did they do it? Consider the top 12 films they released thus far, in order of gross revenue: Jurassic World (re-boot), Furious 7 (seventh installment), Minions (spin-off from Despicable Me), Pitch Perfect 2 (sequel), Fifty Shades of Grey (best-selling book), Trainwreck (niche celebrity), Ted 2 (sequel), Straight Outta Compton (biopic), The Boy Next Store (niche celebrity), Unfriended (genre), Seventh Son (genre), and Blackhat (genre).

The top five films were existing franchises. Trainwreck probably got greenlighted according to whatever logic is behind the Saturday Night Live comic pipeline, and while Trainwreck is more original than most SNL derived films, the studio was banking on the popularity of Amy Schumer and the built in audience that watches her show. Straight Outta Compton could be Exhibit A for a film targeting a specific audience, but at the same time, exhibiting a lot of general crossover appeal. Want to take bets that Straight Outta Compton is going to provide a template for future biopic films? Jennifer Lopez isn’t really a bankable star, but maybe has enough of a fan base to push her over the line to more likely than not to be profitable. The remainder are conventional genre pictures with unsurprising poor performance at the box office.

Netflix provides another interesting example. They invest in a series like House of Cards in order to appeal to a wide audience and differentiate Netflix from other streaming services. Comedy specials are for a niche audience. Netflix has produced dozens. They are fairly inexpensive to produce, and there’s probably significant overlap between people that would go to comedy show and those regularly going to movie theaters.

The Universal and Netflix examples show that there is a content continuum that moves from expensive blockbusters with high returns to niches with a modest expense/profit profile to original concepts, genre films and art house gambles that often lose money. Unique films with a new or an alternative vision do not have predictable audience appeal, which means studios have to pay for multiple failures out of a few successes. If a studio has $100 million to invest, it’s clear from looking at Universal’s list what content pays. Comfortable and familiar content for predictable audiences means money. Capitalism works best when you have a consistent, reproducible product, and if you grow up on a diet of Twinkies, you’re going to prefer Twinkies.

But, what about the “long tail”, or the idea that given enough of these niches over time, there will be a diversity of content and voices? The problem with the “long tail” is there is rarely enough profit in it for any but a small minority to make a living making content for it. Imagine trying to find funding for the modern equivalent of a cult movie like “The Rocky Horror Picture Show”. Will the argument be that over decades, streaming, merchandising and other sales will provide a sizable return? Or, will it be viewed as disposable content that will never find an audience, but maybe interest investors in the producer’s next project or lay the groundwork for a successful Kickstarter campaign?

Movie studios are corporations. Why would anyone believe that a corporation would want to invest in films with profitability measured in decades rather than films that make a profit in a few years? Try to name an example of that ever happening. Consequently, film studios are not good at developing alternative viewpoints or funding the creation of challenging works of art. The market value of “the alternative” is only realized when it becomes the mainstream or solidifies into a predictable, profitable niche. Easier to bet on “Jay and Silent Bob Strike Back” than the original “Clerks”.

While a few people like Kevin Smith (director of “Clerks”, “Chasing Amy” and “Dogma”) or Shane Carruth (director of “Primer” and “Upstream Color”, both movies I would highly recommend) show it is possible to achieve a level of success in the “long tail”, their films are often made outside the traditional studio systems, television networks or even emerging production and distribution channels. The “long tail” may largely be a process of creating a portfolio and carving out a niche audience that can be pitched to investors. Creating content for these audiences is perhaps easier in an era where films can be shot on a consumer grade “smart phone” and uploaded to YouTube. But, then, what happens when the “long tail” gets longer, and film production moves from a hundred films or even several hundred films a year to a market of millions? What does that look like? The book market might provide some insight.

In the United States, ~300,000 books are published every year. According to Pew Research, Americans read, on average, 5 books a year. Best seller lists indicate that the books with the most sales top out at around a million copies. What does a popular book look like compared to most of the others? According to BookScan numbers, Hillary Clinton’s “Hard Choices” was No. 20 on the Non-Fiction best seller list and it sold 260,814 units. How many books does the No. 10,000 spot sell?

One indication is that it is typical for self-publishers to print somewhere between 7,500-10,000 volumes for around $16,000. If you sell your book for $5, then you need to sell 3,200 books to cover production costs. If you sell it for $25, then it’s only 640. Most books, if they have an audience at all, will fit a small niche. In GoodReads, they have few ratings, often a 100 or less, and sales of a few thousand, at most. You don’t need to be an accountant to figure out that there’s not a lot of room to make a living writing books, and the people doing it seem to be doing it on the Kevin Smith and Shane Carruth model. They are funding the “long tail” with jobs, second mortgages and credit card debt.

Most writers are living in the tail, and the deeper into the tail you go, the question starts to be less about the quality of the work and more about whether the reader (and rater) is the intended audience. For example, there is no other source to learn about the Bin Laden tapes. The quality of professor Flagg Miller’s prose is largely irrelevant. As long as it meets a certain minimum standard, it’s going to be good enough. And once you get to that point, raters are making judgments on the work based on how well it conforms to their worldview or whether they find it interesting, rather than an objective evaluation of the work. How do you untangle the objective judgments about the work from subjective judgments about how well the rater conforms to the niche audience of the work, particularly in an environment where 300,000 new works are made every year, each with relatively small audiences?

One of the primitive ways we check our compatibility with a niche is through word-of-mouth and the development of genres, and sub-cultures. If I say something like: “György Pálfi’s ‘Free Fall’ is my Discordian film pick for 2014,” people that self-identify as Discordian will know that it’s probably going to get weird in a way they might like. To everyone else, it won’t mean anything. In the technology space, there is a similar thing going on with linked recommendations: “People who bought this book also bought X”. On the more sophisticated end, algorithms are creating taste profiles and making predictions based on pattern matching preferences, and this is where the problem with ratings really start down a troublesome path.

If Netflix is using algorithms to both make recommendations to you and also looking through datasets of your watching behavior to determine what is profitable to produce, then at some point, you have to start wondering when a feedback loop will come into play. For example, Netflix thinks I like movies featuring “Strong Female Leads” but is this my preference? Or, is this because I almost always watch movies with my wife? What happens when Netflix sees a broader pattern of interest in creating content with “Strong Female Leads” and produces “Grace And Frankie”? Now, whose interest is Netflix recommendations serving? Is it win/win or is there subtle interplay, where if something is produced based on a collective viewership dataset, then how can that not feed right back into the recommendation predictions I am receiving?

Then, there’s the dynamic I get into by looking at the predicted results and comparing them to how I felt about a particular film. I find myself trying to game my own ratings of Netflix titles, so it shows me more options of what I think I want to watch rather than more of what I actually do watch. I may have a thing for Gladiator or Clint Eastwood movies, but it doesn’t mean I want endless exercises on the genre. More to the point, how can I coax the algorithm to help me to find content that challenges me and develops my interests rather than recommending things that are great for the person I am today? I am not sure it can.

Rating systems are imperfect. Authors are paying for positive reviews on Amazon, which is understandable when you get your mind around the razor thin margins writers live on. There are examples of people clearly gaming the systems for fun, profit and in the pursuit of various agendas – such as our one star rater for “The Audacious Ascetic”. But, I think the thing that concerns me most as I think about the various rating sites I use is how much influence they have over what is created. In some sense, rating systems are a way of rating ourselves, and it changes both our cultural landscape and our very selves. It is tempting to see meta-ratings and reviews as a ticket to some strange Wonderland of “The Best” content, products or whatever. But, these ratings may serve as a chrysalis of stasis, trapping us in a cocoon of the generic, the popular, the profitable, and the established average with its +15% bump with a sprinkling of easily defined niches, celebrity vehicles and genre exercises. What, besides the “long tail”, will emerge from this environmental envelopment and this narrowing of our vision? What will it mean to the kind of people we will become? I don’t know, but on balance, I don’t think the path leads to more interesting, flourishing lives. Caveat evaluator.

Share this:

One thought on “The Rating Rabbit Hole”