Researchers at the University of Pennsylvania and other institutions had subjects read sets of 20 Tweets and predict the writer's gender, age, political orientation, and education level based on the words they used. Subjects' guesses were fairly accurate - they guessed right 76% of the time on gender, 69% on whether the person was older or younger than 24, and 82% on liberal versus conservative. They were only right in 46% of cases, however, when predicting whether the Tweet-writers had no bachelor's degree, a bachelor's degree, or an advanced degree.
Based on the results, the researchers were able to create word clouds of the words commonly associated with each demographic, as well as the words that led to false predictions.
"Overall, stereotypes were generally correct but were exaggerated and thus led to inaccurate conclusions in practice," study author Daniel Preotiuc-Pietro wrote in an email.
Below, (A) shows words associated with female; (B) shows words associated with male; (C) shows words incorrectly classified as female; (D) shows words incorrectly classified as male. Word size indicates how likely words were to fall into each group.
Tweets about love, friendship, and family were stereotyped as female. Those about sports and politics were stereotyped as male.
There's some truth to that - again, the study found 76% accuracy in gender predictions - but subjects tended to exaggerate these (and other) stereotypes and end up with false conclusions. For instance, people exaggerated the likelihood that Tweets about tech would come from a man.
"Almost every woman who posted about technology was inaccurately believed to be a man," lead author Jordan Carpenter said.
Likewise, they exaggerated the association between men and words like "news," "research," and "Ebola," while exaggerating the link between women and words like "love," "beautiful," and "today."
Below are the word clouds for age. Note that egocentric words are associated with youth, though a lot of those stereotypes are wrong. Also, just because someone Tweets about Snapchat doesn't mean they are young.
Below are the word clouds for political orientation, the category where stereotypes were most accurate. Sure enough, people who Tweet about #wakeupamerica are conservative.
Note that Tweets about sports and family were often falsely associated with political orientation, suggesting that subjects defaulted to gender biases when lacking other information and exaggerated the links between gender and political orientation.
This research is useful in elucidating stereotypes that people might not be willing to admit if asked outright.
"The important next step is making people aware of the inaccuracy of these stereotypes and why they lead to bad conclusions," Preotiuc-Pietro said. "If we can educate people about the ways these beliefs can steer them wrong, it will make people more socially accurate both online and off."