Because a name is not enough to infer the gender of a user, we analyzed the profile description. Based on the findings of (
Argamon et al., 2006), we wrote a Java program that uses a Bayesian Network with weighted word frequencies and part of speech to estimate the gender of an author. While most of the profile descriptions were in English or Spanish, four users’ descriptions were not. We removed them from the data set. Inspired by Gaucher, Friesen, and Kay (2011)’s work, we built our own list of masculine and female words for English and Spanish. It was built based on gender stereotypes and social roles. The female words list contains words such as “wife”, “mother”, “daughter”, “girl”, “lovely”, “delicate”, etc. The male words list contains words such as “husband”, “brother”, “son”, “father”, “boy”, “player”, etc. Our English lists have a total of 67 masculine words, and 75 feminine words. The Spanish version of our list has less number of words because for this language the ending of an adjective determines its gender. We then counted the frequency of masculine and feminine words in the profile description depending on its language and the corresponding gender word list.