Scoping Review of Methods and Annotated Datasets Used to Predict Gender and Age of Twitter Users

Karen O’Connor; Su Golder; Davy Weissenbacher; Ari Klein; Arjun Magge; Graciela Gonzalez-Hernandez

doi:10.1101/2022.12.06.22283170

Abstract

Real World Data (RWD) has been identified as a key information source in health and social science research. An important, and readily available source of RWD is social media. Identifying the gender and age of the authors of social media posts is necessary for assessing the representativeness of the sample by these key demographics and enables researchers to study subgroups and disparities. However, deciphering the age and gender of social media users can be challenging. We present a scoping review of the literature and summarize the automated methods used to predict age and gender of Twitter users. We used a systematic search method to identify relevant literature, of which 74 met our inclusion criteria. We found that although methods to extract age and gender evolved over time to utilize deep neural networks, many still relied on more traditional machine learning methods. Gender prediction has achieved higher reported performance, while prediction of age performance lags, particularly for more granular age groups. However, the heterogeneous nature of the studies and the lack of consistent performance measures made it impossible to quantitively synthesize results. We found evidence that data bias is a prevalent problem and discuss suggestions to minimize it for future studies.

1. Introduction

Real World Data (RWD)¹ such as social media data has been increasingly recognized as a valuable resource for gaining knowledge and insight for a variety of health-related research topics including disease surveillance ^2,3, pharmacovigilance ^4,5, mental health ^6,7. It can also be used for the identification of cohorts for potential recruitment into traditional studies ^8,9. In short, social media can readily provide abundant personal health information in real-time.

The use of data from social media platforms such as Twitter, however, presents some inherent limitations for health-related research in that certain demographic information is not explicitly available through the Application Program Interface (API) ¹⁰. Knowing the demographics of users included in a study, including their age or gender, is important in health research. This information can be incorporated in analyses to identify disparities across demographic groups, to ensure inclusion of underrepresented groups and to elicit insights into age and gender differences in disease presentation or treatment response ^11,12. Furthermore, given that Twitter users tend to be younger than the general population ¹³, knowing the specific demographic makeup of a cohort allows the researcher to report or make necessary adjustments to account for such bias.

Predicting demographic data is complex and challenging. A user’s profile does not necessarily include such information, and researchers have used other features available in the data, such as names, the content of the tweets, or the individual’s network. In this study, we present a scoping review of methods published since 2017 for determining the age and/or gender of Twitter users. We choose to focus our review on studies that use Twitter as the terms of use for this platform are well understood by both users and researchers, it includes an API, and the data is abundant for health related research ¹⁴.

While studies to predict Twitter user’s gender began as early as 2011 ^13,15–18 and detecting the age of Twitter user’s has been addressed since 2013 ^19–21, it is only since 2017 that the language processing community shifted its methods away from hand-crafted rules and represented the documents with dense vectors to train deep neural networks ^22,23 resulting in a noticeable increase of performance for many applications. We sought to examine if these increases in performance were evident in the methods used for the prediction of the age and gender of Twitter users.

2. Methods

We conducted the study following the Preferred Reporting Items for Systematic and Meta Analysis extension for Scoping Reviews (PRISMA-ScR) ²⁴ methodology. The PRISMA-ScR checklist is available in the Supplemental Information (SI) (Table S1). We searched several databases to identify research on the prediction of Twitter users’ age, gender, or both. We developed a detailed search strategy protocol. Our database search strategy combines three facets; facet one includes terms related to Twitter, facet two consists of terms for age or gender and facet three consists of terms for methods of prediction such as Machine Learning. The search strategy was translated as appropriate for each database. The detailed search strategy is available in the SI (Table S2). The machine learning term facet was expanded using terms from related reviews by Hinds and Joinson ²⁵ and Abubakar ²⁶. The search criteria were limited to peer-reviewed journals, conference proceedings, books, and theses.

The following databases were searched with a publication date range of 2017 or later (Figure 1).

Figure 1:

List of databases searched for reviews

Citations were exported to a shared Endnote library for deduplication. Using the PICOS ²⁷ framework, we developed a list of inclusion and exclusion criteria (details below), and two screeners from the research team screened the results independently, with disputes on criteria discussed and a consensus decision reached. The first 50 records from both ACL and Google Scholar were screened separately using these same methods.

2.1 Inclusion and Exclusion Criteria

Population-P: Any Twitter data on Twitter users such as posts, profile details, photos or avatars. We excluded studies evaluating extraction from other types of social media.

Intervention – I: Methods to infer or predict gender or age demographic data of Twitter users. Articles that use machine learning, natural language processing, human in the loop or other computationally assisted methods to predict the gender or the age of users were included. Studies were excluded that contained no computation methods.

Comparator – C: Any or none.

Outcome-O: Gender or age prediction.

Study Design - S: Any type of peer-reviewed study reporting on the methods used to extract gender or age. Such information must be the primary focus of the study or reported in enough detail to be reproducible. Discussion papers, commentaries, and letters were excluded.

For reasons outlined in the introduction, we restricted the date of our search to only include publication from 2017 and beyond. No language restrictions were applied to the inclusion criteria; however, financial and logistical restraints did not enable translation from all languages.

2.2 Data Extraction

For each included article we extracted the following data: the year of publication, publication type (journal, conference paper, thesis), demographic extracted or predicted (gender, age, or both), the language of tweets, the size of the dataset, the collection method for the dataset, details of prediction models, features used in models (posts, profile, pictures), the performance of said models, the name of any software used for extraction, the measures used to evaluate methods and results of any evaluation, and the availability of data and/or code. The included articles were distributed amongst the authors to extract the data. The extracted data was validated by another author (KO).

2.3 Quality Assessment

This is a scoping review; thus no quality assessment is required.

2.4 Data Analysis

We summarized the performance stated in the papers. However, it is impossible to directly compare the approaches as the reported training data, validation methods, and performance metrics varied.

3. Results

Our database searches resulted in 981 records which were retrieved and entered into an Endnote library, where duplicates were removed, leaving 684 records for sifting.

From the abstract review, 172 references were deemed potentially relevant by either one of the independent sifters (SG and KO). The full-text of these articles was screened independently and disagreements discussed, resulting in 74 ^28–101 that met our inclusion criteria and 98 excluded (Figure 2).

Figure 2:

PRISMA flow diagram of included studies.

3.1 Characteristics of Included Studies

In the 74 studies (SI Table S3 and S4), the majority (n=42, 57.3%) focused on predicting the gender of the individual, 24 (32.0%) explored predicting both gender and age, and 8 (10.7%) focused solely on predicting age. Most of the studies were published in conference proceedings (n = 44, 58.7%), followed by journal articles (n = 28, 37.3%), thesis (n = 2, 2.7%) and in a book chapter (n = 1, 1.3%).

In 42 studies, developing methods to infer or predict the age and/or gender of Twitter users was the primary purpose of the study. In the remaining papers (n = 32), the identification of demographic characteristics of Twitter users was secondary. Within this last group, 9 studies developed ad hoc methods to determine age and/or gender, while the others used open-source models (n = 13) or off-the-shelf software (n=10).

3.2 Studies developing methods for gender and age prediction

3.2.1 Gender

44 studies developed ad-hoc methods to predict the Twitter user’s gender. Of these, 32 predicted only gender ^{28,29,31,33,36,37,47,48,50,51,54,55,58,60,61,64,65,68,71,72,75,81,83–86,90,92,94,96,100,101} and gender was predicted along with the user’s age in 12 ^{34,44,49,59,62,66,80,80,87,89,91,95}.

Most studies approached the problem of gender prediction as a binary classification task, predicting for an account the labels male or female, while three ^72,91,98 added the classification of organization/brand.

We found that approaches to predict gender covered multiple languages, including English ^{31,61,62,71,72,94,96}, German ⁵⁵, Slovenian ⁸⁵, Italian ²⁸, Japanese ⁶⁸, Arabic/Egyptian ^36,37,58, French, Dutch, Portuguese, Spanish and a multilingual study including 28 languages/dialects ⁹¹.

3.2.1.1 Datasets

For the training and validation of the approaches for gender detection, some studies used previously created annotated corpora, while others collected data directly from Twitter. Among the 19 studies that used previously annotated data sets, nine ^{34,36,37,47,49,65,66,75,100} used corpora from the PAN-CLEF author profiling tasks^102–108, while ten studies ^{51,54,62,64,72,83,84,94,96,101} relied on data sets from other studies ^92,109–115.

For studies that collected their own data, different components of the Twitter accounts were used. These components were used either for the purpose of manually or semi-automatically validating the gender of a user or for the purpose of computing features describing the user to train a classifier (SI Table S5). Despite data limitations from the Twitter API, it was the main source of data collection, with 18 studies ^{28,30,31,33,48,50,51,55,58,62,68,71,85–87,95,96,100} collecting data either as a random sample from the Twitter Streaming API or based on keywords or geographic location from the Twitter Search API. One study ⁶¹ collected data using a scraping tool, three ^59,91,92 used a random sample from a collection of 10% of tweets from 2014-2017 or the Twitter archive, and one did not specify its data source ⁴⁴.

The studies that created a labeled dataset (SI Table S6) to train and test, or to validate the performance of the system determined the gender of the users using multiple components of their Twitter accounts (SI Table S5). Eleven studies labeled the data through manual annotation where the annotators determined the gender using profile pictures ^31,33, user names ⁵⁰, profiles ⁶⁸, or a combination of these ^{55,61,71,85,87,89,95}. There were 11 studies that automatically, or (semi)-automatically, labeled their data sets through the detection of self-reports or gender identifying terms (e.g., mother, son, uncle, etc.) ^{48,59,87,89,91,96}, the user’s name ^28,86,92 or declarations on other linked social media ^95,96. While three studies created their labeled datasets by using the accounts of famous social media influencers⁵⁷ or other unspecified collection of users whose gender is known^28,43. Of the 24 studies, only 8 reported data availability with most ‘by request’, only 2 have working links to the whole corpus (SI Table S6).

3.2.1.2 Non-personal accounts

A Twitter account may not be authored by, or represent, a single person. There are organization or company accounts, as well as bot accounts. A bot is an automatic, or semi-automatic user account. Some bot accounts identify themselves as such and may be used to automatically amplify news or tweets related to a certain topics. Others may emulate human accounts and may be used with more malicious intent to sow discord, manipulate public opinion or spread misinformation. There were nine of the included studies ^{28,55,71,72,75,82,83,85,91} that removed non-personal organization accounts when they manually annotated their collections. Other studies implemented heuristics to explicitly detect and remove non-personal accounts ^{28,29,38,50,60,86,92,101}, bots ⁷⁷ or both ^58,116. Others used previously annotated dataset either consisting of only personal accounts, or labeled non-personal accounts which were removed, or collected their datasets based on self-reports. The remaining studies provided no details of how, or if, these accounts were removed (SI Table S5).

3.2.1.3 Features and Models

The data labeled with user’s gender was used in the reviewed studies to build and evaluate classification models based on features describing the text in tweets (e.g., n-grams, word embeddings, hashtags, URLs) ^{36,37,44,47–50,54,58,61,65,66,71,75,83,88,92,100}, or in the users’ profile metadata (e.g., user names, bio, followers, users followed) ^{28,30,31,51,59,64,91,94,101}, a combination of their profile metadata and tweets ^{31,33,55,62,72,86,87,89,96}, images ^{31,59,87,91,95}. There was one study from Japan that included the user’s geographic information under the assumption that, culturally, a person of a certain demographic is more likely to frequent specific places ⁶⁸.

Among the systems using hand crafted features (n = 25, 56.8%), most achieved their best results using Support vector machine (SVM) ^{28,33,44,51,61,64,65,83–85,92,95,117}, while others utilized logistic regression ^66,86,89, Naïve Bayes ^30,71 random forests ⁵⁹, bag of trees ⁴⁹, XGBoosting ⁶⁸, an ensemble of three supervised approaches ⁵⁸ (Table 1). Others used deep learning methods (n = 15, 34.1%) such as deep neural networks (DNN), convolutional neural networks (CNNs), feed forward neural networks (FFNN) or recurrent neural networks (RNN) ^{34,47,50,54,72,94,100}, BiLSTM ³⁷, gated recurrent units (GRU) ³⁶, graph recursive neural networks (GRNN) ⁶² and multi-modal deep learning networks ^87,91. One study created a meta-classifier ensemble classifying users based on the predictions of multiple individual classifiers ⁹⁶, including SVM, BERT and two existing models ^91,118. Another created a deep neural network for learning with label proportion (LLP), a semi-supervised approach ³¹. Results for the best-performing deep learning model as reported in each study are included in Table 2. Studies that employed lexical matching (n=4, 9.1%) of the user’s name to a curated names dictionary ^29,60,80,81 to determine gender reported no validation or performance metrics.

View this table:

Table 1: Top reported system performance for studies inferring gender of Twitter users using traditional ML methods.

Result metrics are reflected here as reported in the original publications and are not comparable to each other.

View this table:

Table 2: Top reported system performance for studies inferring gender of Twitter users using deep learning ML methods.

Result metrics are reflected here as reported in the original publications and are not comparable to each other.

3.2.1.4 Performance

Performance results from traditional machine learning against deep learning methods cannot be meaningfully compared as they are evaluated against different corpora, including the different languages, their size, as well as non-standardized reporting metrics. However, looking at the overall results in terms of F1-score, the results of studies using deep learning had a relatively narrower range of reported performance (0.84 – 0.96), with a higher minimum of 0.84 and higher maxiumum of 0.96, compared to the reported performance range for traditional ML methods, which spans from 0.64 to 0.93.

3.2.2 Age

We found in those that developed ad hoc methods, 19 studies that sought to predict the Twitter user’s age, where 7 predicted only age ^{32,43,45,52,69,73,74}. All but one of the studies ⁵⁹ approached the detection of Twitter users’ age as automatic classification of predefined age groups. The number of age groups varies across the studies (Table 3), with the ages categorized into two ^{32,52,62,89,95}, three ^{30,45,69,73,74,80,87}, four ^49,91 or more ^34,43,44,66 groups. The range of ages within the groups also varies across the studies; for example, across the five studies that take a binary classification approach, Guimaraes et al., ⁵² use 13-19 and 20+ as the two age groups, Volkova et al ⁸⁹ and Kim et al, ⁶² used 18-23 or 25+, Xiang et al., ⁹⁵ used 30 or below or above 30, while Ardehaly and Culotta ³² use less than 25 and 25+. Except for two studies that did not report the language of the tweets used ^30,52, all studies used English language tweets. Eight of the studies extended their systems to include additional languages including Spanish ^34,43,66,89, Dutch ^66,73,74, Filipino ⁴⁴, and multiple languages ⁹¹.

View this table:

Table 3: Top reported system performance for studies inferring age of Twitter users using traditional ML methods.

Result metrics are reflected here as reported in the original publications and are not comparable to each other. Reviews are ordered by number of classification groups.

3.2.2.1 Datasets

While most of the studies that developed new algorithms prepared new data sets to evaluate the algorithms with data retrieved directly using Twitter’s API ^{30,32,45,52,69,87} or using other sources or methods ^43,59,91 (SI Table S4), several used data sets made available from others’ studies to train and/or evaluate their algorithms. Two ^73,74 combined data sets from ^19,21,69 Kim et al.,⁶² used the dataset from Volkova et al., ¹¹⁹, while three ^34,49,66 used data sets that were created for the PAN-CLEF author profiling shared tasks ^103–105. The studies that prepared new data sets (SI Table S6) labeled users’ age groups by (semi-)automatically searching for (1) tweets that self-report birthday announcements or age ^{32,59,69,87,89,91}, (2) tweets in which a user is wished a happy birthday ⁶⁹, (3) profiles that self-report age ^43,45,87,91, (4) profiles that mention age-related keywords (e.g., grandparent) ^45,91, (5) manual annotation based on images or profile metadata ^91,95,119, or (6) by subjectively perceiving age groups based on the content of individual tweets ⁵². In one study ³⁰, a mixture of self-reported data and individuals with known demographic information was used to label the data. Similar to the studies on gender, the reported availability of the corpora is scare. Only 5 studies reported that their datasets are available, 2 by request, 1 provided a link to the whole dataset and 2 proved link to a sample of the corpus (SI Table 6).

3.2.2.2 Features and Models

The studies used the labeled age groups to evaluate classification models based on features of the users’ profile metadata (e.g., user names, bio, followers, users followed) ^{30,32,43,59,91}, a combination of their profile metadata and tweets (e.g., n-grams, word embeddings, hashtags, URLs) ^{52,62,69,73,74,87,89}, tweet texts only ^44,45,49,66 or images ^59,87,91,95.

For automatic classification, the majority of studies (12/19, 63.2%) used traditional supervised machine learning methods including logistic regression ^{30,45,66,69,89}, Bayesian probabilistic inference ⁴³, random forests ⁵⁹, bag of trees ⁴⁹, Support vector machine (SVM) ^44,95 or a semi-supervised approach, learning from label proportion (LLP) ³². Others used deep learning methods (7/19, 36.8%) such as convolutional neural networks (CNNs) ^34,52,73,74, graph recursive neural networks (GRNN) ⁶² and multi-modal deep learning networks ^87,91. Results of the best performing systems for each study are reported in Tables 3 and 4. One study ⁸⁰ classified age based on a previously developed age lexicon and did not report any performance metrics.

View this table:

Table 4: Top reported system performance for studies inferring age of Twitter users using deep learning ML methods.

Result metrics are reflected here as reported in the original publications and are not comparable to each other. Reviews are ordered by number of classification groups.

3.2.2.3 Performance

Given the variation in classification (e.g., different age groupings, different number of classification categories) and reported performance metrics it is difficult to assess performance difference between studies using tradition machine learning versus those using deep learning or neural networks. However, for binary and ternary classification, there is a slight improvement in classification performance over quaternary or more classification.

3.3 Studies using previously developed methods

Within our included studies, there were 23 for which the detection of gender or age was secondary to their research and previously developed methods were used to detect the demographic information for their cohort. Of the 23, 13 used open-source models, and ten used off-the-shelf software.

3.3.1 Open-Source Models

Three of the studies ^53,78,79 drew upon an extant model ¹²⁰ that employs a predictive lexicon for multi-class classification of age groups or gender for their applications. None of these studies created a validation corpus to assess the performance of the system which was originally reported as 89.9% accuracy for gender and 0.84 Pearsons correlation coefficient for age. One study ⁹⁷ utilized the same text-based model ¹²⁰ and an image model ¹²¹ to find age and gender of their cohort. When tested against their gold standard corpus of self-reports from profile descriptions, they found the imaging model performed best for gender (accuracy = 90 – 92%), while textual features gave the best results for age (accuracy = 60%). Three studies ^57,70,93 used Demographer ^118,122,123 for gender predictions, with one ⁷⁰ evaluating the performance against a set of users who had self-reported their gender in a survey finding an F1-score of 0.869 for women and 0.770 for men. Two studies ^40,41 used their ensemble classifier of previously developed models, with a reported accuracy of 0.83 and F1-score of 0.83¹⁰¹.Two studies ^46,99 used M3 ⁹¹ to detect gender and age, with one validating the performance using a manually labeled dataset finding for gender the performance achieved 95.9% accuracy and an F1 score of 0.957 and for age 77.6% accuracy and F1 score of 0.731. One³⁵ used DEX ¹²⁴ for age and gender detection which reported a validation error of 3.96 years for age and an 88% accuracy for gender. One study ⁷⁷ used the rOPenSci “gender” package, no assessment of the performance was reported

3.3.2 Off-the-shelf software

For the 10 studies that used off the shelf software, Face ++ was the most common, being used by 6 studies^{42,56,67,76,88,98}. The remaining used DemographicsPro ^38,39,Microsoft Face API ⁶³ and RapidMiner ⁸².

In four of the studies ^67,76,82,88 no validation of performance was carried out and a further two simply reported that DemographicsPro requires 95% confidence ^38,39. Others compared to manual annotation and identified accuracy for age using Face ++ at 82.8% ⁵⁶ or 68% for strict age groups or 83% if the age groupings were relaxed ⁴². The performance for age using Microsoft Face API was measured at 0.895 Gwet’s AC ⁶³, when compared to manually labeled datasets.

For gender, those studies that measured accuracy using their own gold standard set of users accuracy was recorded as 94.4% ⁵⁶ or 88% ⁴² using Face ++. Other studies ^67,76,88 reported the confidence level reported by Face ++ for gender prediction of 95% +-0.015.

Only one study ⁹⁸ went beyond manual annotation to create a gold standard and used multiple search techniques to manually verify age and sex, including LinkedIn profiles, electoral roll listings, personal websites, Twitter descriptions, and Twitter profile images. In it, Face++ accuracy for age was reported as 40.4% and sex 44.8% (with valid images age 32.5% and gender 87.7%) and crowdsourcing annotation accuracy for age was 60.8% and gender 86.4% (with valid images for age 56.1% and gender 93.9%).

4. Discussion

In this review, we aimed to provide an overview of recent machine learning methods being used to predict the gender and age of Twitter users. Our review indicates that the identification of gender has received more attention. However, despite the popularity of both tasks, no accepted standards for research (data collection and evaluation) has emerged, resulting in a large number of heterogeneous studies which are difficult to compare. In consequence, it is difficult to conclude where the state-of-the-art stands for these tasks. Data collection practices can severely bias the datasets, thus resulting in correspondingly biased automatic methods. We note below some recommendations to avoid the noted biases, summarized in Figure 3.

Figure 3.

Summary of recommendations for best practice.

Demographic information is an important task to address in order to fully realize the potential advantages of using social media data such as that from Twitter in health-related research. In the US, the National Institute of Health (NIH) has committed to include women participants in clinical studies and to include sex as a biological variable, finding that the disaggregation of data by sex will allow for sex-based comparisons. A recent review ¹²⁵ found this disaggregation in the development of machine learning models led to the discovery of sex-based differences that improved model performance for sex-specific cohorts. Age is also important as age can affect the course and progression of disease ¹²⁵, or the effects of medication ¹²⁶. Given the significance of this information, it is important that accurate and reproducible models be developed. One way to ensure the reproducibility of the models is for researchers to make available all data and code, including annotation guidelines. In addition to model performance, studies that create annotated corpora should report annotator agreement measures in order to assess the quality of the corpus. Few of the included studies made available their data or code (SI tables S2, S3 and S6).

A particular difficulty when comparing different systems comes from a lack of a ‘gold standard’ labeled corpus to compare the systems against. Some studies created their own corpora, collecting data randomly or based on keywords relevant to their studies. Others reused datasets from prior studies or from shared tasks. Although outside the scope of this review, there have been shared tasks which aim to advance research through competition, focusing on gender and age prediction. A longstanding shared task focused on author profiling has been hosted at the PAN workshop at the Conference and Labs of the Evaluation Forum (CLEF) ^102–108. More recently, Social Media Mining for Health (SMM4H), 2022, included 2 tasks for age detection releasing new annotated corpora for the tasks ¹²⁷, Several researchers reported utilizing the corpora from these shared tasks. Testing and reporting performance metrics against these publicly available data sets, without alteration, would provide a comparable metric of different approaches. However, while reusing annotated corpora provides quick access to labeled data, it does have some limitations, including data loss over time as users delete their Tweets, which not only reduces the size of the data but also can result in a data imbalance of the corpus.

4.1 Gender Prediction

Almost all included studies approached the gender prediction task as a binary classification task, identifying a user as either male or female. We note that, even when focusing on binary gender classification, which is the prevalent approach, the task of gender prediction on Twitter could be better characterized as a multinomial classification task: given a user account, the classifier should return male, female, or “non-personal”. The last label (non-personal) could account for Twitter users representing organizations or bots. While some studies attempted to identify and exclude non-personal accounts as a preprocessing step, others developed their systems using previously annotated datasets which were exclusively labeled as male or female users or removed non-personal accounts during annotation before training and testing. It is unknown how well these systems would perform when extended to unseen data which may contain non-personal accounts.

Excluding non-personal accounts, the ratio of males to females in the training dataset is also important, as it should mimic the natural distribution of Twitter users estimated to be 31.5% females and 68.5% males as of January 2021. However, some authors biased their collections by using unconventional methods of collection or using datasets that were artificially balanced. The most conventional method to collect a set of Twitter accounts is to query from the Twitter API any tweet mentioning functional words without semantic meaning such as “of”, “the”, or “and”. Whereas collecting Twitter users using functional or neutral keywords, a given language, or geographic areas, resulted in a male/female ratio close to the ratio naturally observed on Twitter, other choices resulted in collections with different ratios. Such change of ratios could have improved (or deteriorated) the training of the authors’ classifiers and biased their evaluations which were not reflecting the performance of their approach on a random sample of users taken from Twitter.

All studies treated gender as a binary determination of male or female. While some referenced the limitation of this approach, they opted to use these designations given the need to align their data with outside resources such as US census or social security administration data. We note that gender, unlike biological sex, it is not necessarily binary as a social construct and has been shown to influence a person’s use of healthcare, interactions, therapeutics response, disease perceptions and decision making ¹²⁸, This underlies the importance of expanding the efforts of classification beyond binary to improve accuracy and avoid misinterpreting results.

4.2 Age Prediction

The prediction of age task generally had lower performance that gender prediction. This was true for studies that developed their own models as well as those using open source or off-the-shelf software. This may be because most approached it as a multiclass classification task. The proxies used, such as language, names, networks or images, may have limited predictive value for age. Additionally, the demographics of Twitter users means that any data set will be inherently imbalanced, providing few training examples for age groups in the tail ends of the distribution. This data imbalance may lead to too few instances of the minority classes to effectively train the classifier. For models that classified based on images, this poor performance for age may be unsurprising given that it can be difficult for humans to discern age from a single image. In addition, photos may be subject to photo editing or enhancing, or not be a recent photograph of the user. Due to a lack of error analysis reports in the included studies, it is difficult to determine the source of the classification difficulty for age.

Performance aside, the fact that the number and range of the age groups vary across the studies suggests that a classification approach is not generalizable to all research applications. Identifying exact age, rather than age groups, can generalize to applications that do not align with predefined groupings of binary or multi-class models; however, using high-precision rules to extract self-reports of exact age from user’s profile metadata had shown not to scale. As we worked on this study, we noted that none of the systems reviewed opted for extracting exact age. To test the feasibility and utility of a generalizable system that extracts exact age from tweet in a user’s timeline using deep learning methods, we developed a classification and extraction pipeline using the RoBERTa-Large model and a rule-based extraction model¹²⁹. The system was trained and tested on 11000 annotated tweets. The classification of tweets mentioning an age achieved an F1-score of 0.93 The extraction of age from these tweets achieved a F1-score of 0.86. From a collection of 245, 947 users, an age was extracted for 54% using REPORTage. A shared task for the classification task was run at the SMM4H 2022 workshop, in which we released the annotated dataset. We did not include our approach in the scoping review as there were no comparable systems published before the release of the approach as part of the SMM4H 2022 shared task.

4.3 Potential bias of differing methods

Limitations of using names to distinguish between genders may promote bias, particularly if the names used for training do not represent the ethnic diversity of the population and some cultures may have more unisex names than others which cannot be used to distinguish gender. There can be a high degree of uncertainty for many users for which gender cannot be classified by name, estimates by Sloan 2013¹³⁰ are that 52% will be unclassified using this method. However, studies have suggested that those classified may be relatively accurate given that data from UK Twitter demonstrates a high level of agreement with UK census data ¹³¹. Furthermore, used alone, this heuristic may label some organization accounts, such as PAUL_BAKERY, as a person but it removes most of the organizations’ accounts.

Relying on self-declarations may be prone to bias as well. For example, regarding age, younger people are more likely to profess their age than older adults as age may be more important to them and with respect to gender pronouns these may be more likely to be declared by those in particular occupations or of a particular social class. Indeed, there may also be other bias to self-declarations of data by culture, background, social class, or country of origin or residence.

Using users’ profile images is challenging for gender and age identification. Not all Twitter users provide a picture of themselves with many opting for pictures of their pets, objects, children, scenery or even celebrities. Even those with pictures of themselves can be problematic if the quality is poor, the picture contains more than one face, or the picture is not recent particularly for inferring age. A comparison of systems using images to infer demographics ¹³² not only measured the accuracy of identifying age and gender but also the share of images for when a face is detected finding only around 30% of Twitter users had a single detectable face.

Methods to filter out organizations in the studies included removing accounts with large number of followers ⁵⁰, or searching explicitly for organization by looking up user name terms linked to economic activities such as restaurant, hotel etc. ²⁸. These methods remove accounts that are not representing a single user, however, they do not remove bots. While one study created a classifier to detect bots, the filtering of bots was limited to those identified in manual annotation, by simple heuristics, or non-existent in many of the studies (SI Table S5).

4.4 Validation of age and gender proxies

For those cases where age or gender are estimated it is necessary to conduct validation exercises whereby the data are compared to a ‘gold standard data set’ to establish accuracy levels. For example, one study ⁹⁸ that used off-the-shelf software also created a manually annotated gold standard dataset for measuring accuracy. This study found that although the accuracy of crowdsourcing was higher than software, it was only around 60% for age. This puts into question the use of manual annotations alone as a gold standard.

The most reliable way of generating a ‘gold standard’ is to obtain this information directly from the user. This may be in form of direct correspondence with the user such as messaging via social media, or the other way around: requesting Twitter handles in surveys that collect demographic data. Other methods for validation, such as manual extraction, may be less rigorous. However, these can be improved by multiple independent annotators, using experienced teams.

External validation of the model is also a vital step to assess how the model will perform on unseen data. In a validation on a second dataset, Yang et al, ⁹⁶ found performance dropped on all but two of their models, stressing the importance of benchmarking existing systems on a targeted corpus. This step is equally important when using existing systems so a range of expected performance can be reported and used in any analysis of the output.

4.5 Limitations

It is unlikely that we have identified all studies using off-the-shelf software as we did not search for specific named software, but part of our remit was to identify the array of software used. We did not limit our inclusion to only studies that developed their own software, therefore, we have included studies that used proprietary software. These software products do not publish their methodologies; therefore, we are unable to directly compare these approaches to others.

We also included studies for which the extraction of age and gender was needed for the primary focus of their study. These studies either used proprietary software, previously developed methods or developed limited methods to infer the demographic information. In general, these studies did not report the performance of their inference methods on their datasets. While some reported the original performance metrics of the methods used, the assumption cannot be made that these methods will perform similarly across all data.

5. Conclusions

The extraction of demographic data such as age and gender is an important step in increasing the value and application of social media data. Many methods are reported in the literature with differing degrees of success. While we sought to explore whether deep learning approaches would advance the performance for these tasks as it has been shown to do for other NLP tasks, many of the included studies utilized traditional machine learning methods. Though only explored by a handful of studies, deep learning methods appear to perform well for the prediction of a user’s gender or age. However, direct comparison of methods is impossible at this point. This highlights the need for recently developed, publicly available gold-standard corpora, such as those released for shared tasks like SMM4H or PAN@CLEF, in order to have unbiased data and baseline metrics to compare different approaches against.

Data Availability

The search strategy and extracted data on included studies is available in the Supplementary Data.

Author Contributions

SG, KO and GG devised the study and identified data for extraction. SG created and executed the search strategy and created the initial draft of the manuscript. SG and KO were responsible for study selection. All authors were responsible for data extraction, summarization and discussion. KO synthesized all data and created all tables. All authors commented on and edited the manuscript. KO provided the final version of this manuscript.

All authors contributed to the final draft of the manuscript.

Competing interests

The author(s) declare no competing interests.

Data availability

The search strategy and extracted data on included studies is available in the Supplementary Data.

Funding

This work was supported by the National Institutes of Health (NIH) National Library of Medicine under Grant number NIH NLM R01LM011176. The NIH National Library of Medicine funded this research but were not involved in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

References

1.↵
FDA. Real-World Evidence. FDA https://www.fda.gov/science-research/science-and-research-special-topics/real-world-evidence (2020).
2.↵
Alessa, A. & Faezipour, M. A review of influenza detection and prediction through social networking sites. Theor. Biol. Med. Model. 15, (2018).
3.↵
Bisanzio, D. et al. Use of Twitter social media activity as a proxy for human mobility to predict the spatiotemporal spread of COVID-19 at global scale. Geospatial Health 15, 15 (2020).
OpenUrl
4.↵
Magge, A. et al. DeepADEMiner: A Deep Learning Pharmacovigilance Pipeline for Extraction and Normalization of Adverse Drug Effect Mentions on Twitter. medRxiv 2020.12.15.20248229 (2020) doi:10.1101/2020.12.15.20248229.
OpenUrl Abstract/FREE Full Text
5.↵
Nikfarjam, A., Sarker, A., O’Connor, K., Ginn, R. & Gonzalez, G. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J. Am. Med. Inform. Assoc. JAMIA 22, 671–681 (2015).
OpenUrl
6.↵
Guntuku, S. C. et al. Tracking Mental Health and Symptom Mentions on Twitter During COVID-19. J. Gen. Intern. Med. 35, 2798–2800 (2020).
OpenUrl
7.↵
Ma, L. & Wang, Y. Constructing a Semantic Graph with Depression Symptoms Extraction from Twitter. in 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) 1–5 (2019). doi:10.1109/CIBCB.2019.8791452.
OpenUrl CrossRef
8.↵
Bauermeister, J. A. et al. Innovative recruitment using online networks: Lessons learned from an online study of alcohol and other drug use utilizing a web-based, Respondent-Driven Sampling (webRDS) strategy. J. Stud. Alcohol Drugs 73, 834–838 (2012).
OpenUrl CrossRef PubMed
9.↵
Weissenbacher, D. et al. Automatic Cohort Determination from Twitter for HIV Prevention amongst Ethnic Minorities. SocArXiv (2021) doi:10.31235/osf.io/qx7s2.
OpenUrl CrossRef
10.↵
Twitter. Twitter API. Twitter API https://developer.twitter.com/en/docs/twitter-api (2021).
11.↵
Brady, E., Nielsen, M. W., Andersen, J. P. & Oertelt-Prigione, S. Lack of consideration of sex and gender in COVID-19 clinical studies. Nat. Commun. 12, 4015 (2021).
OpenUrl CrossRef PubMed
12.↵
Tannenbaum, C., Ellis, R. P., Eyssel, F., Zou, J. & Schiebinger, L. Sex and gender analysis improves science and engineering. Nature 575, 137–146 (2019).
OpenUrl CrossRef PubMed
13.↵
Mislove, A., Lehmann, S., Ahn, Y.-Y., Onnela, J.-P. & Rosenquist, J. N. Understanding the Demographics of Twitter Users. in yongyeol.com vol. 5(1) 554–557 (AAAI, 2011).
OpenUrl
14.↵
Bour, C. et al. The Use of Social Media for Health Research Purposes: Scoping Review. J. Med. Internet Res. 23, e25736 (2021).
OpenUrl PubMed
15.↵
Fink, C., Kopecky, J. & Morawski, M. Inferring Gender from the Content of Tweets: A Region Specific Example. in Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media 4 (2012).
16.
Alowibdi, J. S., Buy, U. A. & Yu, P. Language Independent Gender Classification on Twitter. (Ieee, 2013).
17.
1. Bourbakis, N.,
2. Esposito, A.,
3. Mali, A. &
4. Alamaniotis, M
Chen, L., Qian, T. Y., Zhu, P. S. & You, Z. N. Learning User Embedding Representation for Gender Prediction. in 2016 Ieee 28th International Conference on Tools with Artificial Intelligence (eds. Bourbakis, N., Esposito, A., Mali, A. & Alamaniotis, M.) 263–269 (Ieee, 2016). doi:10.1109/ictai.2016.45.
OpenUrl CrossRef
18.↵
Culotta, A., Ravi, N. K., Cutler, J., & Aaai. Predicting the Demographics of Twitter Users from Website Traffic Data. (Assoc Advancement Artificial Intelligence, 2015).
19.↵
Sloan, L., Morgan, J., Burnap, P. & Williams, M. Who tweets? Deriving the demographic characteristics of age, occupation and social class from twitter user meta-data. PLoS ONE Electron. Resour. 10, e0115545 (2015).
OpenUrl
20.
Oktay, H., Fırat, A. & Ertem, Z. Demographic Breakdown of Twitter Users: An analysis based on names. in pdfs.semanticscholar.org (2014).
21.↵
Nguyen, D., Gravel, R., Trieschnigg, D. & Meder, T. ‘How Old Do You Think I Am?’: A Study of Language and Age in Twitter. Seventh International AAAI Conference on Weblogs and Social Media http://www.aaai.org (2013).
22.↵
Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. ArXiv Prepr. ArXiv13013781 (2013).
23.↵
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Distributed Representations of Words and Phrases and their Compositionality. ArXiv13104546 Cs Stat (2013).
24.↵
Tricco, A. C. et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 169, 467–473 (2018).
OpenUrl CrossRef PubMed
25.↵
Hinds, J. & Joinson, A. N. What demographic attributes do our digital footprints reveal? A systematic review. PLoS ONE 13, e0207112–e0207112 (2018).
OpenUrl
26.↵
Umar, A., Bashir, S. A., Abdullahi, M. B. & Adebayo, O. S. Comparative Study of Various Machine Learning Algorithms for Tweet Classification. (2019).
27.↵
Amir-Behghadami, M. & Janati, A. Population, Intervention, Comparison, Outcomes and Study (PICOS) design as a framework to formulate eligibility criteria in systematic reviews. Emerg. Med. J. 37, 387–387 (2020).
OpenUrl FREE Full Text
28.↵
1. Petrucci, A.,
2. Racioppi, F. &
3. Verde, R
Alessandra, R., Gentile, M. M. & Bianco, D. M. Who Tweets in Italian? Demographic Characteristics of Twitter Users. in New Statistical Developments in Data Science (eds. Petrucci, A., Racioppi, F. & Verde, R.) vol. 288 329–344 (Springer International Publishing, 2019).
OpenUrl
29.↵
1. Meiselwitz, G.)
Alfayez, A. et al. Understanding Gendered Spaces Using Social Media Data. in Social Computing and Social Media. Applications and Analytics (ed. Meiselwitz, G.) vol. 10283 338–356 (Springer International Publishing, 2017).
OpenUrl
30.↵
Arafat, T. A., Budi, I., Mahendra, R. & Salehah, D. A. Demographic Analysis of Candidates Supporter in Twitter During Indonesian Presidential Election 2019. in 2020 International Conference on ICT for Smart Society (ICISS) 1–6 (2020). doi:10.1109/ICISS50791.2020.9307598.
OpenUrl CrossRef
31.↵
1. Gottumukkala, R., et al.
Ardehaly, E. M. & Culotta, A. Co-training for Demographic Classification Using Deep Learning from Label Proportions. in 2017 17th Ieee International Conference on Data Mining Workshops (eds. Gottumukkala, R., et al.) 1017–1024 (Ieee, 2017). doi:10.1109/icdmw.2017.144.
OpenUrl CrossRef
32.↵
Ardehaly, E. M. & Culotta, A. Learning from noisy label proportions for classifying online social data. Soc. Netw. Anal. Min. 8, (2018).
33.↵
Baxevanakis, S., Gavras, S., Mouratidis, D. & Kermanidis, K. A machine learning approach for gender identification of Greek tweet authors | Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments. in PETRA Proceedings (2020).
34.↵
1. Nicosia, G.,
2. Pardalos, P.,
3. Giuffrida, G. &
4. Umeton, R
Bayot, R. K. & Goncalves, T. Age and Gender Classification of Tweets Using Convolutional Neural Networks. in Machine Learning, Optimization, and Big Data, Mod 2017 (eds. Nicosia, G., Pardalos, P., Giuffrida, G. & Umeton, R.) vol. 10710 337– 348 (Springer International Publishing Ag, 2018).
OpenUrl
35.↵
Brandt, J. et al. Identifying social media user demographics and topic diversity with computational social science: a case study of a major international policy forum. J. Comput. Soc. Sci. 3, 167–188 (2020).
OpenUrl
36.↵
Bsir, B. & Zrigui, M. Enhancing deep learning gender identification with gated recurrent units architecture in social text. Comput. Sist. 22, 757–766 (2018).
OpenUrl
37.↵
1. Rojas, I.,
2. Joya, G. &
3. Catala, A
Bsir, B. & Zrigui, M. Document Model with Attention Bidirectional Recurrent Network for Gender Identification. in Advances in Computational Intelligence, Iwann 2019, Pt I (eds. Rojas, I., Joya, G. & Catala, A.) vol. 11506 621–631 (Springer International Publishing Ag, 2019).
OpenUrl
38.↵
Cavazos-Rehg, P. A., Zewdie, K., Krauss, M. J. & Sowles, S. J. ‘No High Like a Brownie High’: A Content Analysis of Edible Marijuana Tweets. Am. J. Health Promot. AJHP 32, 880–886 (2018).
OpenUrl
39.↵
Cavazos-Rehg, P. A. et al. ‘I just want to be skinny.’: A content analysis of tweets expressing eating disorder symptoms. PloS One 14, e0207506 (2019).
OpenUrl
40.↵
Cesare, N., Dwivedi, P., Nguyen, Q. C. & Nsoesie, E. O. Use of social media, search queries, and demographic data to assess obesity prevalence in the United States. Palgrave Commun. 5, 1–9 (2019).
OpenUrl
41.↵
Cesare, N., Nguyen, Q. C., Grant, C. & Nsoesie, E. O. Social media captures demographic and regional physical activity. BMJ Open Sport Exerc. Med. 5, e000567 (2019).
OpenUrl Abstract/FREE Full Text
42.↵
Chakraborty, A. et al. Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations. 10.
43.↵
1. Altun, Y., et al.
Chamberlain, B. P., Humby, C. & Deisenroth, M. P. Probabilistic Inference of Twitter Users’ Age Based on What They Follow. in Machine Learning and Knowledge Discovery in Databases, Ecml Pkdd 2017, Pt Iii (eds. Altun, Y., et al.) vol. 10536 191– 203 (Springer International Publishing Ag, 2017).
OpenUrl
44.↵
Cheng, J., Fernandez, A., Quindoza, R., Tan, S. & Cheng, C. A Model for Age and Gender Profiling of Social Media Accounts Based on Post Contents. springerprofessional.de (2018).
45.↵
Cornelisse, J. & Pillai, R. G. Age Inference on Twitter using SAGE and TF-IGM. in Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval 24–30 (Association for Computing Machinery, 2020). doi:10.1145/3443279.3443300.
OpenUrl CrossRef
46.↵
Duong, V., Luo, J., Pham, P., Yang, T. & Wang, Y. The Ivory Tower Lost: How College Students Respond Differently than the General Public to the COVID-19 Pandemic. in 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 126–130 (2020). doi:10.1109/ASONAM49781.2020.9381379.
OpenUrl CrossRef
47.↵
ElSayed, S. & Farouk, M. Gender identification for Egyptian Arabic dialect in twitter using deep learning models. Egypt. Inform. J. 21, 159–167 (2020).
OpenUrl
48.↵
Emmery, C., Chrupała, G. & Daelemans, W. Simple Queries as Distant Labels for Predicting Gender on Twitter. 50–55 https://github.com/facebookresearch/ (2017).
49.↵
Garcia-Guzman, R. et al. Trend-Based Categories Recommendations and Age-Gender Prediction for Pinterest and Twitter Users. Appl. Sci. 10, 5957 (2020).
OpenUrl
50.↵
Geng, L., Zhang, K., Wei, X. Z., Feng, X., & Ieee. Soft Biometrics in Online Social Networks: A Case Study on Twitter User Gender Recognition. in 2017 Ieee Winter Conference on Applications of Computer Vision Workshops 1–8 (Ieee, 2017). doi:10.1109/wacvw.2017.8.
OpenUrl CrossRef
51.↵
Giannakopoulos, O., Kalatzis, N., Roussaki, I., Papavassiliou, S., & Ieee. Gender recognition based on social networks for multimedia production. (Ieee, 2018).
52.↵
Guimaraes, R. G., Rosa, R. L., De Gaetano, D., Rodriguez, D. Z. & Bressan, G. Age Groups Classification in Social Network Using Deep Learning. IEEE Access 5, 10805– 10816 (2017).
OpenUrl
53.↵
Hasanuzzaman, M., Dias, G. & Way, A. Demographic Word Embeddings for Racism Detection on Twitter. in Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 926–936 (Asian Federation of Natural Language Processing, 2017).
54.↵
Hashempour, R. A Deep Learning Approach to Language-independent Gender Prediction on Twitter. in Proceedings of the 2019 Workshop on Widening NLP 92–94 (2019).
55.↵
Hirt, R., Kuhl, N. & Satzger, G. Cognitive computing for customer profiling: meta classification for gender prediction. Electron. Mark. 29, 93–106 (2019).
OpenUrl
56.↵
Huang, X., Xing, L., Dernoncourt, F. & Paul, M. J. Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition. 11–16 (2020).
57.↵
Huang, X. et al. Can online self-reports assist in real-time identification of influenza vaccination uptake? A cross-sectional study of influenza vaccine-related tweets in the USA, 2013-2017. BMJ Open 9, e024018 (01 15).
58.↵
Hussein, S., Farouk, M. & Hemayed, E. Gender identification of egyptian dialect in twitter. Egypt. Inform. J. 20, 109–116 (2019).
OpenUrl
59.↵
1. Ciampaglia, G. L.,
2. Mashhadi, A. &
3. Yasseri, T
Jurgens, D., Tsvetkov, Y. & Jurafsky, D. Writer Profiling Without the Writer’s Text. in Social Informatics (eds. Ciampaglia, G. L., Mashhadi, A. & Yasseri, T.) 537–558 (Springer International Publishing, 2017). doi:10.1007/978-3-319-67256-4_43.
OpenUrl CrossRef
60.↵
Kang, Y., Wang, Y., Zhang, D. & Zhou, L. The public’s opinions on a new school meals policy for childhood obesity prevention in the U.S.: A social media analytics approach. Int. J. Med. Inf. 103, 83–88 (7).
61.↵
Khandelwal, A., Swami, S., Akhtar, S. S. & Shrivastava, M. Gender Prediction in English-Hindi Code-Mixed Social Media ContentL: Corpus and Baseline System. Comput. Sistimas 22, (2018).
62.↵
Kim, S. M., Xu, Q., Qu, L., Wan, S. & Paris, C. Demographic Inference on Twitter using Recursive Neural Networks. in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 471–477 (Association for Computational Linguistics, 2017). doi:10.18653/v1/P17-2075.
OpenUrl CrossRef
63.↵
1. Brynielsson, J.
Kostakos, P., Pandya, A., Kyriakouli, O. & Oussalah, M. Inferring Demographic data of Marginalized Users in Twitter with Computer Vision APIs. in 2018 European Intelligence and Security Informatics Conference (ed. Brynielsson, J.) 81–84 (Ieee, 2018). doi:10.1109/eisic.2018.00022.
OpenUrl CrossRef
64.↵
Ljubešić, N., Fišer, D. & Erjavec, T. Language-independent Gender Prediction on Twitter. in Proceedings of the Second Workshop on NLP and Computational Social Science 1–6 (Association for Computational Linguistics, 2017). doi:10.18653/v1/W17-2901.
OpenUrl CrossRef
65.↵
López-Monroy, A. P., González, F. A. & Solorio, T. Early author profiling on Twitter using profile features with multi-resolution. Expert Syst. Appl. 140, 112909 (2020).
OpenUrl
66.↵
1. Pichardo-Lagunas, O. &
2. Miranda-Jiménez, S.
Markov, I., Gómez-Adorno, H., Posadas-Durán, J.-P., Sidorov, G. & Gelbukh, A. Author Profiling with Doc2vec Neural Network-Based Document Embeddings. in Advances in Soft Computing (eds. Pichardo-Lagunas, O. & Miranda-Jiménez, S.) vol. 10062 117–131 (Springer International Publishing, 2017).
OpenUrl
67.↵
Messias, J., Vikatos, P., Benevenuto, F., & Acm. White, Man, and Highly Followed: Gender and Race Inequalities in Twitter. (Assoc Computing Machinery, 2017). doi:10.1145/3106426.3106472.
OpenUrl CrossRef
68.↵
Miura, R. et al. Predicting User Gender on Social Media Sites Using Geographical Information. (Assoc Computing Machinery, 2018). doi:10.1145/3281375.3281383.
OpenUrl CrossRef
69.↵
Morgan-Lopez, A. A., Kim, A. E., Chew, R. F. & Ruddle, P. Predicting age groups of Twitter users based on language and metadata features. PLoS ONE Electron. Resour. 12, e0183537 (2017).
OpenUrl
70.↵
Mueller, A., Wood-Doughty, Z., Amir, S., Dredze, M. & Nobles, A. L. Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement. Proc. ACM Hum.-Comput. Interact. 5, 107:1-107:28 (2021).
OpenUrl
71.↵
Mukherjee, S. & Bala, P. K. Gender classification of microblog text based on authorial style. Inf. Syst. E-Bus. Manag. 15, 117–138 (2017).
OpenUrl
72.↵
Oyasor, J., Raborife, M. & Ranchod, P. Sentiment Analysis as an Indicator to Evaluate Gender disparity on Sexual Violence Tweets in South Africa. in 2020 International SAUPEC/RobMech/PRASA Conference 1–6 (2020). doi:10.1109/SAUPEC/RobMech/PRASA48453.2020.9040955.
OpenUrl CrossRef
73.↵
Pandya, A. et al. On the use of URLs and hashtags in age prediction of Twitter users. (Ieee, 2018). doi:10.1109/iri.2018.00017.
OpenUrl CrossRef
74.↵
Pandya, A., Oussalah, M., Monachesi, P. & Kostakos, P. On the use of distributed semantics of tweet metadata for user age prediction. Future Gener. Comput. Syst.-Int. J. Escience 102, 437–452 (2020).
OpenUrl
75.↵
Pizarro, J. Profiling Bots and Fake News Spreaders at PAN’19 and PAN’20L: Bots and Gender Profiling 2019, Profiling Fake News Spreaders on Twitter 2020. in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) 626–630 (2020). doi:10.1109/DSAA49011.2020.00088.
OpenUrl CrossRef
76.↵
Reis, J. C. S., Kwak, H., An, J., Messias, J. & Benevenuto, F. Demographics of News Sharing in the U.S. Twittersphere. in Proceedings of the 28th ACM Conference on Hypertext and Social Media 195–204 (Association for Computing Machinery, 2017). doi:10.1145/3078714.3078734.
OpenUrl CrossRef
77.↵
Serfass, D. G. Assessing situations on social media: Temporal, demographic, and personality influences on situation experience. Diss. Abstr. Int. Sect. B Sci. Eng. 78, No Pagination Specified (2017).
78.↵
Stevens, R. et al. Association Between HIV-Related Tweets and HIV Incidence in the United States: Infodemiology Study. J. Med. Internet Res. 22, e17196 (2020).
OpenUrl
79.↵
Stevens, R. C. et al. Exploring Substance Use Tweets of Youth in the United States: Mixed Methods Study. JMIR Public Health Surveill. 6, e16191 (2020).
OpenUrl
80.↵
1. Tiwari, S.,
2. Trivedi, M. C.,
3. Mishra, K. K.,
4. Misra, A. K. &
5. Kumar, K. K
Swain, S. & Seeja, K. R. TWEESENT: A Web Application on Sentiment Analysis. in Smart Innovations in Communication and Computational Sciences (eds. Tiwari, S., Trivedi, M. C., Mishra, K. K., Misra, A. K. & Kumar, K. K.) 393–400 (Springer, 2019). doi:10.1007/978-981-13-2414-7_36.
OpenUrl CrossRef
81.↵
Thelwall, M. & Thelwall, S. Covid-19 tweeting in English: Gender differences. El Prof. Inf. Mayjun2020 Vol 29 Issue 3 P1–7 7p (2020).
OpenUrl
82.↵
1. Ozawa, S.,
2. Tan, A. H.,
3. Angelov, P. P.,
4. Roy, A. &
5. Pratama, M
Udayakumar, S., Senadeera, D. C., Yamunarani, S. & Cheon, N. J. Demographics Analysis of Twitter Users who Tweeted on Psychological Articles and Tweets Analysis. in Inns Conference on Big Data and Deep Learning (eds. Ozawa, S., Tan, A. H., Angelov, P. P., Roy, A. & Pratama, M.) vol. 144 96–104 (Elsevier Science Bv, 2018).
OpenUrl
83.↵
Van Der Goot, R., Ljubeić, N., Matroos, I., Nissim, M. & Plank, B. Bleaching text: Abstract features for cross-lingual gender prediction. in ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) vol. 2 383–389 (2018).
OpenUrl
84.↵
Vashisth, P. & Meehan, K. Gender Classification using Twitter Text Data. in 2020 31st Irish Signals and Systems Conference (ISSC) 1–6 (2020). doi:10.1109/ISSC49989.2020.9180161.
OpenUrl CrossRef
85.↵
Verhoeven, B., Škrjanec, I. & Pollak, S. Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style. in Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing 119–125 (Association for Computational Linguistics, 2017). doi:10.18653/v1/W17-1418.
OpenUrl CrossRef
86.↵
Vicente, M., Batista, F. & Carvalho, J. P. Gender detection of twitter users based on multiple information sources. in Studies in Computational Intelligence vol. 794 39–54 (Springer Verlag, 2019).
OpenUrl
87.↵
Vijayaraghavan, P., Vosoughi, S. & Roy, D. Twitter demographic classification using deep multi-modal multi-task learning. in ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) vol. 2 478–483 (2017).
OpenUrl
88.↵
Vikatos, P., Messias, J., Miranda, M. & Benevenuto, F. Linguistic diversities of demographic groups in Twitter. in HT 2017 - Proceedings of the 28th ACM Conference on Hypertext and Social Media 275–284 (Association for Computing Machinery, Inc, 2017). doi:10.1145/3078714.3078742.
OpenUrl CrossRef
89.↵
Volkova, S. Predicting demographics and affect in social networks. Diss. Abstr. Int. Sect. B Sci. Eng. 78, No Pagination Specified (2017).
90.↵
1. Lee, D.,
2. Lin, Y. R.,
3. Osgood, N. &
4. Thomson, R
Wang, Y., Feng, Y. & Luo, J. B. Gender Politics in the 2016 US Presidential Election: A Computer Vision Approach. in Social, Cultural, and Behavioral Modeling (eds. Lee, D., Lin, Y. R., Osgood, N. & Thomson, R.) vol. 10354 35–45 (Springer International Publishing Ag, 2017).
OpenUrl
91.↵
Wang, Z. et al. Demographic inference and representative population estimates from multilingual social media data. in The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019 vol. 12 2056–2067 (Association for Computing Machinery, Inc, 2019).
OpenUrl
92.↵
Wong, S. C., Teh, P. L. & Cheng, C.-B. How Different Genders Use Profanity on Twitter? in 1–9 (Association for Computing Machinery, 2020). doi:10.1145/3388142.3388145.
OpenUrl CrossRef
93.↵
Wood-Doughty, Z., Smith, M., Broniatowski, D. & Dredze, M. How Does Twitter User Behavior Vary Across Demographic Groups? in 83–89 (2017). doi:10.18653/v1/w17-2912.
OpenUrl CrossRef
94.↵
Wood-Doughty, Z., Andrews, N., Marvin, R. & Dredze, M. Predicting Twitter User Demographics from Names Alone. in Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media 105–111 (ACL, 2018). doi:10.18653/v1/w18-1114.
OpenUrl CrossRef
95.↵
1. Amsaleg, L.,
2. Guðmundsson, G. Þ.,
3. Gurrin, C.,
4. Jónsson, B. Þ. &
5. Satoh, S
Xiang, L., Sang, J. & Xu, C. Demographic Attribute Inference from Social Multimedia Behaviors: A Cross-OSN Approach. in MultiMedia Modeling (eds. Amsaleg, L., Guðmundsson, G. Þ., Gurrin, C., Jónsson, B. Þ. & Satoh, S.) 515–526 (Springer International Publishing, 2017). doi:10.1007/978-3-319-51811-4_42.
OpenUrl CrossRef
96.↵
Yang, Y.-C., Al-Garadi, M. A., Love, J. S., Perrone, J. & Sarker, A. Automatic gender detection in Twitter profiles for health-related cohort studies. JAMIA Open 4, ooab042 (2021).
OpenUrl
97.↵
Yazdavar, A. H. et al. Multimodal mental health analysis in social media. PLOS ONE 15, e0226248 (2020).
OpenUrl
98.↵
Yildiz, D., Munson, J., Vitali, A., Tinati, R. & Holland, J. A. Using Twitter data for demographic research. Demogr. Res. 37, (1514).
99.↵
Zhang, C., Xu, S., Li, Z. & Hu, S. Understanding Concerns, Sentiments, and Disparities Among Population Groups During the COVID-19 Pandemic Via Twitter Data Mining: Large-scale Cross-sectional Study. J. Med. Internet Res. 23, e26482 (2021).
OpenUrl
100.↵
Zhao, Y. et al. Mining Twitter to Assess the Determinants of Health Behavior towards Palliative Care in the United States. http://medrxiv.org/lookup/doi/10.1101/2020.03.26.20038372 (2020) xdoi:10.1101/2020.03.26.20038372.
OpenUrl Abstract/FREE Full Text
101.↵
Cesare, N., Grant, C., Hawkins, J. B., Brownstein, J. S. & Nsoesie, E. O. Demographics in Social Media Data for Public Health Research: Does it matter? in (Bloomberg, 2017). doi:10.48550/arXiv.1710.11048.
OpenUrl CrossRef
102.↵
Rangel, F., Rosso, P., Moshe Koppel, M., Stamatatos, E. & Inches, G. Overview of the Author Profiling Task at PAN 2013. in CLEF 2013 Labs and Workshops (2013).
103.↵
Rangel, F. et al. Overview of the 2nd author profiling task at pan 2014. in CLEF 2014 Evaluation Labs and Workshop Working Notes Papers, Sheffield, UK, 2014 1–30 (2014).
104.
Rangel, F. et al. Overview of the 3rd Author Profiling Task at PAN 2015. 40 (2015).
105.↵
Rangel, F. et al. Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations. in Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings/Balog, Krisztian [edit.]; et al. 750–784 (2016).
106.
Rangel, F., Rosso, P., Potthast, M. & Stein, B. Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter. Work. Notes Pap. CLEF 1613–0073 (2017).
107.
Rangel, F., Rosso, P., Montes-y-Gómez, M., Potthast, M. & Stein, B. Overview of the 6th author profiling task at pan 2018: multimodal gender identification in twitter. Work. Notes Pap. CLEF 1–38 (2018).
108.↵
Rangel, F. & Rosso, P. Overview of the 7th author profiling task at PAN 2019: bots and gender profiling in twitter. in Working Notes Papers of the CLEF 2019 Evaluation Labs Volume 2380 of CEUR Workshop (2019).
109.↵
Burger, J. D., Henderson, J., Kim, G. & Zarrella, G. Discriminating gender on Twitter. in 1301–1309 (Association for Computational Linguistics, 2011).
110.
Volkova, S. & Yarowsky, D. Improving Gender Prediction of Social Media Users via Weighted Annotator Rationales. http://www.cs.jhu.edu/.
111.
Volkova, S., Wilson, T. & Yarowsky, D. Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media. 13.
112.
Liu, W. & Ruths, D. What’s in a Name? Using First Names as Features for Gender Inference in Twitter. undefined (2013).
113.
Plank, B. & Hovy, D. Personality Traits on Twitter—or—How to Get 1,500 Personality Tests in a Week. in Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis 92–98 (Association for Computational Linguistics, 2015). doi:10.18653/v1/W15-2913.
OpenUrl CrossRef
114.
Verhoeven, B., Daelemans, W. & Plank, B. TWISTY: a Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling. http://www.clips.uantwerpen.be/.
115.↵
Gender Classification Dataset. https://www.kaggle.com/datasets/cashutosh/gender-classification-dataset.
116.↵
1. Nie, J. Y., et al.
Radford, J. Piloting A Theory-based Approach to Inferring Gender in Big Data. in 2017 Ieee International Conference on Big Data (eds. Nie, J. Y., et al.) 4824–4826 (Ieee, 2017).
117.↵
Pizzaro, Juan. Using N-grams to detect Bots on Twitter Notebook for PAN at CLEF 2019. Noteb. PAN CLEF 2019 18, ix (2014).
OpenUrl
118.↵
Knowles, R., Carroll, J. & Dredze, M. Demographer: Extremely Simple Name Demographics. in Proceedings of the First Workshop on NLP and Computational Social Science 108–113 (Association for Computational Linguistics, 2016). doi:10.18653/v1/W16-5614.
OpenUrl CrossRef
119.↵
Volkova, S., Coppersmith, G. & Van Durme, B. Inferring User Political Preferences from Streaming Communications. in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 186–196 (Association for Computational Linguistics, 2014). doi:10.3115/v1/P14-1018.
OpenUrl CrossRef
120.↵
Sap, M. et al. Developing Age and Gender Predictive Lexica over Social Media. 1146– 1151 http://www.wwbp.org/data.html (2014).
121.↵
Zhou, E., Fan, H., Cao, Z., Jiang, Y. & Yin, Q. Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade. in 2013 IEEE International Conference on Computer Vision Workshops 386–391 (2013). doi:10.1109/ICCVW.2013.58.
OpenUrl CrossRef
122.↵
Wood-Doughty, Z., Andrews, N., Marvin, R. & Dredze, M. Predicting Twitter User Demographics from Names Alone. in Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media 105–111 (Association for Computational Linguistics, 2018). doi:10.18653/v1/W18-1114.
OpenUrl CrossRef
123.↵
Wood-Doughty, Z., Xu, P., Liu, X. & Dredze, M. Using Noisy Self-Reports to Predict Twitter User Demographics. (2020).
124.↵
Rothe, R., Timofte, R. & Van Gool, L. Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks. Int. J. Comput. Vis. 126, 144–157 (2018).
OpenUrl
125.↵
Geifman, N., Cohen, R. & Rubin, E. Redefining meaningful age groups in the context of disease. Age 35, 2357–2366 (2013).
OpenUrl
126.↵
Sera, L. C. & McPherson, M. L. Pharmacokinetics and pharmacodynamic changes associated with aging and implications for drug therapy. Clin. Geriatr. Med. 28, 273– 286 (2012).
OpenUrl PubMed
127.↵
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task. (Association for Computational Linguistics, 2022).
128.↵
Mauvais-Jarvis, F. et al. Sex and gender: modifiers of health, disease, and medicine. The Lancet 396, 565–582 (2020).
OpenUrl
129.↵
Klein, A. Z., Magge, A. & Gonzalez-Hernandez, G. ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets. PLOS ONE 17, e0262087 (2022).
OpenUrl
130.↵
Sloan, L. et al. Knowing the Tweeters: Deriving Sociologically Relevant Demographics from Twitter. Sociol. Res. Online 18, 74–84 (2013).
OpenUrl
131.↵
Sloan, L. Who tweets in the United Kingdom? Profiling the Twitter population using the British social attitudes survey 2015. journals.sagepub.com 3, (2017).
132.↵
Jung, S., An, J., Kwak, H., Salminen, J. & Jansen, B. J. Assessing the Accuracy of Four Popular Face Recognition Tools for Inferring Gender, Age, and Race. in Twelfth International AAAI Conference on Web and Social Media (2018).

View the discussion thread.

Posted December 06, 2022.

Download PDF

Supplementary Material

Data/Code

Citation Tools

Subject Area

Health Informatics

Subject Areas

All Articles

Addiction Medicine (349)
Allergy and Immunology (668)
Allergy and Immunology (668)
Anesthesia (181)
Cardiovascular Medicine (2648)
Dentistry and Oral Medicine (316)
Dermatology (223)
Emergency Medicine (399)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
Epidemiology (12228)
Forensic Medicine (10)
Gastroenterology (759)
Genetic and Genomic Medicine (4103)
Geriatric Medicine (387)
Health Economics (680)
Health Informatics (2657)
Health Policy (1005)
Health Systems and Quality Improvement (985)
Hematology (363)
HIV/AIDS (851)
Infectious Diseases (except HIV/AIDS) (13695)
Intensive Care and Critical Care Medicine (797)
Medical Education (399)
Medical Ethics (109)
Nephrology (436)
Neurology (3882)
Nursing (209)
Nutrition (577)
Obstetrics and Gynecology (739)
Occupational and Environmental Health (695)
Oncology (2030)
Ophthalmology (585)
Orthopedics (240)
Otolaryngology (306)
Pain Medicine (250)
Palliative Medicine (75)
Pathology (473)
Pediatrics (1115)
Pharmacology and Therapeutics (466)
Primary Care Research (452)
Psychiatry and Clinical Psychology (3432)
Public and Global Health (6527)
Radiology and Imaging (1403)
Rehabilitation Medicine and Physical Therapy (814)
Respiratory Medicine (871)
Rheumatology (409)
Sexual and Reproductive Health (410)
Sports Medicine (342)
Surgery (448)
Toxicology (53)
Transplantation (185)
Urology (165)

[1] 1.↵
FDA. Real-World Evidence. FDA https://www.fda.gov/science-research/science-and-research-special-topics/real-world-evidence (2020).

[2] 2.↵
Alessa, A. & Faezipour, M. A review of influenza detection and prediction through social networking sites. Theor. Biol. Med. Model. 15, (2018).

[3] 3.↵
Bisanzio, D. et al. Use of Twitter social media activity as a proxy for human mobility to predict the spatiotemporal spread of COVID-19 at global scale. Geospatial Health 15, 15 (2020).
OpenUrl

[4] 4.↵
Magge, A. et al. DeepADEMiner: A Deep Learning Pharmacovigilance Pipeline for Extraction and Normalization of Adverse Drug Effect Mentions on Twitter. medRxiv 2020.12.15.20248229 (2020) doi:10.1101/2020.12.15.20248229.
OpenUrl Abstract/FREE Full Text

[5] 5.↵
Nikfarjam, A., Sarker, A., O’Connor, K., Ginn, R. & Gonzalez, G. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J. Am. Med. Inform. Assoc. JAMIA 22, 671–681 (2015).
OpenUrl

[6] 6.↵
Guntuku, S. C. et al. Tracking Mental Health and Symptom Mentions on Twitter During COVID-19. J. Gen. Intern. Med. 35, 2798–2800 (2020).
OpenUrl

[7] 7.↵
Ma, L. & Wang, Y. Constructing a Semantic Graph with Depression Symptoms Extraction from Twitter. in 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) 1–5 (2019). doi:10.1109/CIBCB.2019.8791452.
OpenUrl CrossRef

[8] 8.↵
Bauermeister, J. A. et al. Innovative recruitment using online networks: Lessons learned from an online study of alcohol and other drug use utilizing a web-based, Respondent-Driven Sampling (webRDS) strategy. J. Stud. Alcohol Drugs 73, 834–838 (2012).
OpenUrl CrossRef PubMed

[9] 9.↵
Weissenbacher, D. et al. Automatic Cohort Determination from Twitter for HIV Prevention amongst Ethnic Minorities. SocArXiv (2021) doi:10.31235/osf.io/qx7s2.
OpenUrl CrossRef

[10] 10.↵
Twitter. Twitter API. Twitter API https://developer.twitter.com/en/docs/twitter-api (2021).

[11] 11.↵
Brady, E., Nielsen, M. W., Andersen, J. P. & Oertelt-Prigione, S. Lack of consideration of sex and gender in COVID-19 clinical studies. Nat. Commun. 12, 4015 (2021).
OpenUrl CrossRef PubMed

[12] 12.↵
Tannenbaum, C., Ellis, R. P., Eyssel, F., Zou, J. & Schiebinger, L. Sex and gender analysis improves science and engineering. Nature 575, 137–146 (2019).
OpenUrl CrossRef PubMed

[13] 13.↵
Mislove, A., Lehmann, S., Ahn, Y.-Y., Onnela, J.-P. & Rosenquist, J. N. Understanding the Demographics of Twitter Users. in yongyeol.com vol. 5(1) 554–557 (AAAI, 2011).
OpenUrl

[14] 14.↵
Bour, C. et al. The Use of Social Media for Health Research Purposes: Scoping Review. J. Med. Internet Res. 23, e25736 (2021).
OpenUrl PubMed

[15] 15.↵
Fink, C., Kopecky, J. & Morawski, M. Inferring Gender from the Content of Tweets: A Region Specific Example. in Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media 4 (2012).

[16] 16.
Alowibdi, J. S., Buy, U. A. & Yu, P. Language Independent Gender Classification on Twitter. (Ieee, 2013).

[17] 17.
Bourbakis, N.,
Esposito, A.,
Mali, A. &
Alamaniotis, M
Chen, L., Qian, T. Y., Zhu, P. S. & You, Z. N. Learning User Embedding Representation for Gender Prediction. in 2016 Ieee 28th International Conference on Tools with Artificial Intelligence (eds. Bourbakis, N., Esposito, A., Mali, A. & Alamaniotis, M.) 263–269 (Ieee, 2016). doi:10.1109/ictai.2016.45.
OpenUrl CrossRef

[18] Bourbakis, N.,

[19] Esposito, A.,

[20] Mali, A. &

[21] Alamaniotis, M

[22] 18.↵
Culotta, A., Ravi, N. K., Cutler, J., & Aaai. Predicting the Demographics of Twitter Users from Website Traffic Data. (Assoc Advancement Artificial Intelligence, 2015).

[23] 19.↵
Sloan, L., Morgan, J., Burnap, P. & Williams, M. Who tweets? Deriving the demographic characteristics of age, occupation and social class from twitter user meta-data. PLoS ONE Electron. Resour. 10, e0115545 (2015).
OpenUrl

[24] 20.
Oktay, H., Fırat, A. & Ertem, Z. Demographic Breakdown of Twitter Users: An analysis based on names. in pdfs.semanticscholar.org (2014).

[25] 21.↵
Nguyen, D., Gravel, R., Trieschnigg, D. & Meder, T. ‘How Old Do You Think I Am?’: A Study of Language and Age in Twitter. Seventh International AAAI Conference on Weblogs and Social Media http://www.aaai.org (2013).

[26] 22.↵
Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. ArXiv Prepr. ArXiv13013781 (2013).

[27] 23.↵
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Distributed Representations of Words and Phrases and their Compositionality. ArXiv13104546 Cs Stat (2013).

[28] 24.↵
Tricco, A. C. et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 169, 467–473 (2018).
OpenUrl CrossRef PubMed

[29] 25.↵
Hinds, J. & Joinson, A. N. What demographic attributes do our digital footprints reveal? A systematic review. PLoS ONE 13, e0207112–e0207112 (2018).
OpenUrl

[30] 26.↵
Umar, A., Bashir, S. A., Abdullahi, M. B. & Adebayo, O. S. Comparative Study of Various Machine Learning Algorithms for Tweet Classification. (2019).

[31] 27.↵
Amir-Behghadami, M. & Janati, A. Population, Intervention, Comparison, Outcomes and Study (PICOS) design as a framework to formulate eligibility criteria in systematic reviews. Emerg. Med. J. 37, 387–387 (2020).
OpenUrl FREE Full Text

[32] 28.↵
Petrucci, A.,
Racioppi, F. &
Verde, R
Alessandra, R., Gentile, M. M. & Bianco, D. M. Who Tweets in Italian? Demographic Characteristics of Twitter Users. in New Statistical Developments in Data Science (eds. Petrucci, A., Racioppi, F. & Verde, R.) vol. 288 329–344 (Springer International Publishing, 2019).
OpenUrl

[33] Petrucci, A.,

[34] Racioppi, F. &

[35] Verde, R

[36] 29.↵
Meiselwitz, G.)
Alfayez, A. et al. Understanding Gendered Spaces Using Social Media Data. in Social Computing and Social Media. Applications and Analytics (ed. Meiselwitz, G.) vol. 10283 338–356 (Springer International Publishing, 2017).
OpenUrl

[37] Meiselwitz, G.)

[38] 30.↵
Arafat, T. A., Budi, I., Mahendra, R. & Salehah, D. A. Demographic Analysis of Candidates Supporter in Twitter During Indonesian Presidential Election 2019. in 2020 International Conference on ICT for Smart Society (ICISS) 1–6 (2020). doi:10.1109/ICISS50791.2020.9307598.
OpenUrl CrossRef

[39] 31.↵
Gottumukkala, R., et al.
Ardehaly, E. M. & Culotta, A. Co-training for Demographic Classification Using Deep Learning from Label Proportions. in 2017 17th Ieee International Conference on Data Mining Workshops (eds. Gottumukkala, R., et al.) 1017–1024 (Ieee, 2017). doi:10.1109/icdmw.2017.144.
OpenUrl CrossRef

[40] Gottumukkala, R., et al.

[41] 32.↵
Ardehaly, E. M. & Culotta, A. Learning from noisy label proportions for classifying online social data. Soc. Netw. Anal. Min. 8, (2018).

[42] 33.↵
Baxevanakis, S., Gavras, S., Mouratidis, D. & Kermanidis, K. A machine learning approach for gender identification of Greek tweet authors | Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments. in PETRA Proceedings (2020).

[43] 34.↵
Nicosia, G.,
Pardalos, P.,
Giuffrida, G. &
Umeton, R
Bayot, R. K. & Goncalves, T. Age and Gender Classification of Tweets Using Convolutional Neural Networks. in Machine Learning, Optimization, and Big Data, Mod 2017 (eds. Nicosia, G., Pardalos, P., Giuffrida, G. & Umeton, R.) vol. 10710 337– 348 (Springer International Publishing Ag, 2018).
OpenUrl

[44] Nicosia, G.,

[45] Pardalos, P.,

[46] Giuffrida, G. &

[47] Umeton, R

[48] 35.↵
Brandt, J. et al. Identifying social media user demographics and topic diversity with computational social science: a case study of a major international policy forum. J. Comput. Soc. Sci. 3, 167–188 (2020).
OpenUrl

[49] 36.↵
Bsir, B. & Zrigui, M. Enhancing deep learning gender identification with gated recurrent units architecture in social text. Comput. Sist. 22, 757–766 (2018).
OpenUrl

[50] 37.↵
Rojas, I.,
Joya, G. &
Catala, A
Bsir, B. & Zrigui, M. Document Model with Attention Bidirectional Recurrent Network for Gender Identification. in Advances in Computational Intelligence, Iwann 2019, Pt I (eds. Rojas, I., Joya, G. & Catala, A.) vol. 11506 621–631 (Springer International Publishing Ag, 2019).
OpenUrl

[51] Rojas, I.,

[52] Joya, G. &

[53] Catala, A

[54] 38.↵
Cavazos-Rehg, P. A., Zewdie, K., Krauss, M. J. & Sowles, S. J. ‘No High Like a Brownie High’: A Content Analysis of Edible Marijuana Tweets. Am. J. Health Promot. AJHP 32, 880–886 (2018).
OpenUrl

[55] 39.↵
Cavazos-Rehg, P. A. et al. ‘I just want to be skinny.’: A content analysis of tweets expressing eating disorder symptoms. PloS One 14, e0207506 (2019).
OpenUrl

[56] 40.↵
Cesare, N., Dwivedi, P., Nguyen, Q. C. & Nsoesie, E. O. Use of social media, search queries, and demographic data to assess obesity prevalence in the United States. Palgrave Commun. 5, 1–9 (2019).
OpenUrl

[57] 41.↵
Cesare, N., Nguyen, Q. C., Grant, C. & Nsoesie, E. O. Social media captures demographic and regional physical activity. BMJ Open Sport Exerc. Med. 5, e000567 (2019).
OpenUrl Abstract/FREE Full Text

[58] 42.↵
Chakraborty, A. et al. Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations. 10.

[59] 43.↵
Altun, Y., et al.
Chamberlain, B. P., Humby, C. & Deisenroth, M. P. Probabilistic Inference of Twitter Users’ Age Based on What They Follow. in Machine Learning and Knowledge Discovery in Databases, Ecml Pkdd 2017, Pt Iii (eds. Altun, Y., et al.) vol. 10536 191– 203 (Springer International Publishing Ag, 2017).
OpenUrl

[60] Altun, Y., et al.

[61] 44.↵
Cheng, J., Fernandez, A., Quindoza, R., Tan, S. & Cheng, C. A Model for Age and Gender Profiling of Social Media Accounts Based on Post Contents. springerprofessional.de (2018).

[62] 45.↵
Cornelisse, J. & Pillai, R. G. Age Inference on Twitter using SAGE and TF-IGM. in Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval 24–30 (Association for Computing Machinery, 2020). doi:10.1145/3443279.3443300.
OpenUrl CrossRef

[63] 46.↵
Duong, V., Luo, J., Pham, P., Yang, T. & Wang, Y. The Ivory Tower Lost: How College Students Respond Differently than the General Public to the COVID-19 Pandemic. in 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 126–130 (2020). doi:10.1109/ASONAM49781.2020.9381379.
OpenUrl CrossRef

[64] 47.↵
ElSayed, S. & Farouk, M. Gender identification for Egyptian Arabic dialect in twitter using deep learning models. Egypt. Inform. J. 21, 159–167 (2020).
OpenUrl

[65] 48.↵
Emmery, C., Chrupała, G. & Daelemans, W. Simple Queries as Distant Labels for Predicting Gender on Twitter. 50–55 https://github.com/facebookresearch/ (2017).

[66] 49.↵
Garcia-Guzman, R. et al. Trend-Based Categories Recommendations and Age-Gender Prediction for Pinterest and Twitter Users. Appl. Sci. 10, 5957 (2020).
OpenUrl

[67] 50.↵
Geng, L., Zhang, K., Wei, X. Z., Feng, X., & Ieee. Soft Biometrics in Online Social Networks: A Case Study on Twitter User Gender Recognition. in 2017 Ieee Winter Conference on Applications of Computer Vision Workshops 1–8 (Ieee, 2017). doi:10.1109/wacvw.2017.8.
OpenUrl CrossRef

[68] 51.↵
Giannakopoulos, O., Kalatzis, N., Roussaki, I., Papavassiliou, S., & Ieee. Gender recognition based on social networks for multimedia production. (Ieee, 2018).

[69] 52.↵
Guimaraes, R. G., Rosa, R. L., De Gaetano, D., Rodriguez, D. Z. & Bressan, G. Age Groups Classification in Social Network Using Deep Learning. IEEE Access 5, 10805– 10816 (2017).
OpenUrl

[70] 53.↵
Hasanuzzaman, M., Dias, G. & Way, A. Demographic Word Embeddings for Racism Detection on Twitter. in Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 926–936 (Asian Federation of Natural Language Processing, 2017).

[71] 54.↵
Hashempour, R. A Deep Learning Approach to Language-independent Gender Prediction on Twitter. in Proceedings of the 2019 Workshop on Widening NLP 92–94 (2019).

[72] 55.↵
Hirt, R., Kuhl, N. & Satzger, G. Cognitive computing for customer profiling: meta classification for gender prediction. Electron. Mark. 29, 93–106 (2019).
OpenUrl

[73] 56.↵
Huang, X., Xing, L., Dernoncourt, F. & Paul, M. J. Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition. 11–16 (2020).

[74] 57.↵
Huang, X. et al. Can online self-reports assist in real-time identification of influenza vaccination uptake? A cross-sectional study of influenza vaccine-related tweets in the USA, 2013-2017. BMJ Open 9, e024018 (01 15).

[75] 58.↵
Hussein, S., Farouk, M. & Hemayed, E. Gender identification of egyptian dialect in twitter. Egypt. Inform. J. 20, 109–116 (2019).
OpenUrl

[76] 59.↵
Ciampaglia, G. L.,
Mashhadi, A. &
Yasseri, T
Jurgens, D., Tsvetkov, Y. & Jurafsky, D. Writer Profiling Without the Writer’s Text. in Social Informatics (eds. Ciampaglia, G. L., Mashhadi, A. & Yasseri, T.) 537–558 (Springer International Publishing, 2017). doi:10.1007/978-3-319-67256-4_43.
OpenUrl CrossRef

[77] Ciampaglia, G. L.,

[78] Mashhadi, A. &

[79] Yasseri, T

[80] 60.↵
Kang, Y., Wang, Y., Zhang, D. & Zhou, L. The public’s opinions on a new school meals policy for childhood obesity prevention in the U.S.: A social media analytics approach. Int. J. Med. Inf. 103, 83–88 (7).

[81] 61.↵
Khandelwal, A., Swami, S., Akhtar, S. S. & Shrivastava, M. Gender Prediction in English-Hindi Code-Mixed Social Media ContentL: Corpus and Baseline System. Comput. Sistimas 22, (2018).

[82] 62.↵
Kim, S. M., Xu, Q., Qu, L., Wan, S. & Paris, C. Demographic Inference on Twitter using Recursive Neural Networks. in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 471–477 (Association for Computational Linguistics, 2017). doi:10.18653/v1/P17-2075.
OpenUrl CrossRef

[83] 63.↵
Brynielsson, J.
Kostakos, P., Pandya, A., Kyriakouli, O. & Oussalah, M. Inferring Demographic data of Marginalized Users in Twitter with Computer Vision APIs. in 2018 European Intelligence and Security Informatics Conference (ed. Brynielsson, J.) 81–84 (Ieee, 2018). doi:10.1109/eisic.2018.00022.
OpenUrl CrossRef

[84] Brynielsson, J.

[85] 64.↵
Ljubešić, N., Fišer, D. & Erjavec, T. Language-independent Gender Prediction on Twitter. in Proceedings of the Second Workshop on NLP and Computational Social Science 1–6 (Association for Computational Linguistics, 2017). doi:10.18653/v1/W17-2901.
OpenUrl CrossRef

[86] 65.↵
López-Monroy, A. P., González, F. A. & Solorio, T. Early author profiling on Twitter using profile features with multi-resolution. Expert Syst. Appl. 140, 112909 (2020).
OpenUrl

[87] 66.↵
Pichardo-Lagunas, O. &
Miranda-Jiménez, S.
Markov, I., Gómez-Adorno, H., Posadas-Durán, J.-P., Sidorov, G. & Gelbukh, A. Author Profiling with Doc2vec Neural Network-Based Document Embeddings. in Advances in Soft Computing (eds. Pichardo-Lagunas, O. & Miranda-Jiménez, S.) vol. 10062 117–131 (Springer International Publishing, 2017).
OpenUrl

[88] Pichardo-Lagunas, O. &

[89] Miranda-Jiménez, S.

[90] 67.↵
Messias, J., Vikatos, P., Benevenuto, F., & Acm. White, Man, and Highly Followed: Gender and Race Inequalities in Twitter. (Assoc Computing Machinery, 2017). doi:10.1145/3106426.3106472.
OpenUrl CrossRef

[91] 68.↵
Miura, R. et al. Predicting User Gender on Social Media Sites Using Geographical Information. (Assoc Computing Machinery, 2018). doi:10.1145/3281375.3281383.
OpenUrl CrossRef

[92] 69.↵
Morgan-Lopez, A. A., Kim, A. E., Chew, R. F. & Ruddle, P. Predicting age groups of Twitter users based on language and metadata features. PLoS ONE Electron. Resour. 12, e0183537 (2017).
OpenUrl

[93] 70.↵
Mueller, A., Wood-Doughty, Z., Amir, S., Dredze, M. & Nobles, A. L. Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement. Proc. ACM Hum.-Comput. Interact. 5, 107:1-107:28 (2021).
OpenUrl

[94] 71.↵
Mukherjee, S. & Bala, P. K. Gender classification of microblog text based on authorial style. Inf. Syst. E-Bus. Manag. 15, 117–138 (2017).
OpenUrl

[95] 72.↵
Oyasor, J., Raborife, M. & Ranchod, P. Sentiment Analysis as an Indicator to Evaluate Gender disparity on Sexual Violence Tweets in South Africa. in 2020 International SAUPEC/RobMech/PRASA Conference 1–6 (2020). doi:10.1109/SAUPEC/RobMech/PRASA48453.2020.9040955.
OpenUrl CrossRef

[96] 73.↵
Pandya, A. et al. On the use of URLs and hashtags in age prediction of Twitter users. (Ieee, 2018). doi:10.1109/iri.2018.00017.
OpenUrl CrossRef

[97] 74.↵
Pandya, A., Oussalah, M., Monachesi, P. & Kostakos, P. On the use of distributed semantics of tweet metadata for user age prediction. Future Gener. Comput. Syst.-Int. J. Escience 102, 437–452 (2020).
OpenUrl

[98] 75.↵
Pizarro, J. Profiling Bots and Fake News Spreaders at PAN’19 and PAN’20L: Bots and Gender Profiling 2019, Profiling Fake News Spreaders on Twitter 2020. in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) 626–630 (2020). doi:10.1109/DSAA49011.2020.00088.
OpenUrl CrossRef

[99] 76.↵
Reis, J. C. S., Kwak, H., An, J., Messias, J. & Benevenuto, F. Demographics of News Sharing in the U.S. Twittersphere. in Proceedings of the 28th ACM Conference on Hypertext and Social Media 195–204 (Association for Computing Machinery, 2017). doi:10.1145/3078714.3078734.
OpenUrl CrossRef

[100] 77.↵
Serfass, D. G. Assessing situations on social media: Temporal, demographic, and personality influences on situation experience. Diss. Abstr. Int. Sect. B Sci. Eng. 78, No Pagination Specified (2017).

[101] 78.↵
Stevens, R. et al. Association Between HIV-Related Tweets and HIV Incidence in the United States: Infodemiology Study. J. Med. Internet Res. 22, e17196 (2020).
OpenUrl

[102] 79.↵
Stevens, R. C. et al. Exploring Substance Use Tweets of Youth in the United States: Mixed Methods Study. JMIR Public Health Surveill. 6, e16191 (2020).
OpenUrl

[103] 80.↵
Tiwari, S.,
Trivedi, M. C.,
Mishra, K. K.,
Misra, A. K. &
Kumar, K. K
Swain, S. & Seeja, K. R. TWEESENT: A Web Application on Sentiment Analysis. in Smart Innovations in Communication and Computational Sciences (eds. Tiwari, S., Trivedi, M. C., Mishra, K. K., Misra, A. K. & Kumar, K. K.) 393–400 (Springer, 2019). doi:10.1007/978-981-13-2414-7_36.
OpenUrl CrossRef

[104] Tiwari, S.,

[105] Trivedi, M. C.,

[106] Mishra, K. K.,

[107] Misra, A. K. &

[108] Kumar, K. K

[109] 81.↵
Thelwall, M. & Thelwall, S. Covid-19 tweeting in English: Gender differences. El Prof. Inf. Mayjun2020 Vol 29 Issue 3 P1–7 7p (2020).
OpenUrl

[110] 82.↵
Ozawa, S.,
Tan, A. H.,
Angelov, P. P.,
Roy, A. &
Pratama, M
Udayakumar, S., Senadeera, D. C., Yamunarani, S. & Cheon, N. J. Demographics Analysis of Twitter Users who Tweeted on Psychological Articles and Tweets Analysis. in Inns Conference on Big Data and Deep Learning (eds. Ozawa, S., Tan, A. H., Angelov, P. P., Roy, A. & Pratama, M.) vol. 144 96–104 (Elsevier Science Bv, 2018).
OpenUrl

[111] Ozawa, S.,

[112] Tan, A. H.,

[113] Angelov, P. P.,

[114] Roy, A. &

[115] Pratama, M

[116] 83.↵
Van Der Goot, R., Ljubeić, N., Matroos, I., Nissim, M. & Plank, B. Bleaching text: Abstract features for cross-lingual gender prediction. in ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) vol. 2 383–389 (2018).
OpenUrl

[117] 84.↵
Vashisth, P. & Meehan, K. Gender Classification using Twitter Text Data. in 2020 31st Irish Signals and Systems Conference (ISSC) 1–6 (2020). doi:10.1109/ISSC49989.2020.9180161.
OpenUrl CrossRef

[118] 85.↵
Verhoeven, B., Škrjanec, I. & Pollak, S. Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style. in Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing 119–125 (Association for Computational Linguistics, 2017). doi:10.18653/v1/W17-1418.
OpenUrl CrossRef

[119] 86.↵
Vicente, M., Batista, F. & Carvalho, J. P. Gender detection of twitter users based on multiple information sources. in Studies in Computational Intelligence vol. 794 39–54 (Springer Verlag, 2019).
OpenUrl

[120] 87.↵
Vijayaraghavan, P., Vosoughi, S. & Roy, D. Twitter demographic classification using deep multi-modal multi-task learning. in ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) vol. 2 478–483 (2017).
OpenUrl

[121] 88.↵
Vikatos, P., Messias, J., Miranda, M. & Benevenuto, F. Linguistic diversities of demographic groups in Twitter. in HT 2017 - Proceedings of the 28th ACM Conference on Hypertext and Social Media 275–284 (Association for Computing Machinery, Inc, 2017). doi:10.1145/3078714.3078742.
OpenUrl CrossRef

[122] 89.↵
Volkova, S. Predicting demographics and affect in social networks. Diss. Abstr. Int. Sect. B Sci. Eng. 78, No Pagination Specified (2017).

[123] 90.↵
Lee, D.,
Lin, Y. R.,
Osgood, N. &
Thomson, R
Wang, Y., Feng, Y. & Luo, J. B. Gender Politics in the 2016 US Presidential Election: A Computer Vision Approach. in Social, Cultural, and Behavioral Modeling (eds. Lee, D., Lin, Y. R., Osgood, N. & Thomson, R.) vol. 10354 35–45 (Springer International Publishing Ag, 2017).
OpenUrl

[124] Lee, D.,

[125] Lin, Y. R.,

[126] Osgood, N. &

[127] Thomson, R

[128] 91.↵
Wang, Z. et al. Demographic inference and representative population estimates from multilingual social media data. in The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019 vol. 12 2056–2067 (Association for Computing Machinery, Inc, 2019).
OpenUrl

[129] 92.↵
Wong, S. C., Teh, P. L. & Cheng, C.-B. How Different Genders Use Profanity on Twitter? in 1–9 (Association for Computing Machinery, 2020). doi:10.1145/3388142.3388145.
OpenUrl CrossRef

[130] 93.↵
Wood-Doughty, Z., Smith, M., Broniatowski, D. & Dredze, M. How Does Twitter User Behavior Vary Across Demographic Groups? in 83–89 (2017). doi:10.18653/v1/w17-2912.
OpenUrl CrossRef

[131] 94.↵
Wood-Doughty, Z., Andrews, N., Marvin, R. & Dredze, M. Predicting Twitter User Demographics from Names Alone. in Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media 105–111 (ACL, 2018). doi:10.18653/v1/w18-1114.
OpenUrl CrossRef

[132] 95.↵
Amsaleg, L.,
Guðmundsson, G. Þ.,
Gurrin, C.,
Jónsson, B. Þ. &
Satoh, S
Xiang, L., Sang, J. & Xu, C. Demographic Attribute Inference from Social Multimedia Behaviors: A Cross-OSN Approach. in MultiMedia Modeling (eds. Amsaleg, L., Guðmundsson, G. Þ., Gurrin, C., Jónsson, B. Þ. & Satoh, S.) 515–526 (Springer International Publishing, 2017). doi:10.1007/978-3-319-51811-4_42.
OpenUrl CrossRef

[133] Amsaleg, L.,

[134] Guðmundsson, G. Þ.,

[135] Gurrin, C.,

[136] Jónsson, B. Þ. &

[137] Satoh, S

[138] 96.↵
Yang, Y.-C., Al-Garadi, M. A., Love, J. S., Perrone, J. & Sarker, A. Automatic gender detection in Twitter profiles for health-related cohort studies. JAMIA Open 4, ooab042 (2021).
OpenUrl

[139] 97.↵
Yazdavar, A. H. et al. Multimodal mental health analysis in social media. PLOS ONE 15, e0226248 (2020).
OpenUrl

[140] 98.↵
Yildiz, D., Munson, J., Vitali, A., Tinati, R. & Holland, J. A. Using Twitter data for demographic research. Demogr. Res. 37, (1514).

[141] 99.↵
Zhang, C., Xu, S., Li, Z. & Hu, S. Understanding Concerns, Sentiments, and Disparities Among Population Groups During the COVID-19 Pandemic Via Twitter Data Mining: Large-scale Cross-sectional Study. J. Med. Internet Res. 23, e26482 (2021).
OpenUrl

[142] 100.↵
Zhao, Y. et al. Mining Twitter to Assess the Determinants of Health Behavior towards Palliative Care in the United States. http://medrxiv.org/lookup/doi/10.1101/2020.03.26.20038372 (2020) xdoi:10.1101/2020.03.26.20038372.
OpenUrl Abstract/FREE Full Text

[143] 101.↵
Cesare, N., Grant, C., Hawkins, J. B., Brownstein, J. S. & Nsoesie, E. O. Demographics in Social Media Data for Public Health Research: Does it matter? in (Bloomberg, 2017). doi:10.48550/arXiv.1710.11048.
OpenUrl CrossRef

[144] 102.↵
Rangel, F., Rosso, P., Moshe Koppel, M., Stamatatos, E. & Inches, G. Overview of the Author Profiling Task at PAN 2013. in CLEF 2013 Labs and Workshops (2013).

[145] 103.↵
Rangel, F. et al. Overview of the 2nd author profiling task at pan 2014. in CLEF 2014 Evaluation Labs and Workshop Working Notes Papers, Sheffield, UK, 2014 1–30 (2014).

[146] 104.
Rangel, F. et al. Overview of the 3rd Author Profiling Task at PAN 2015. 40 (2015).

[147] 105.↵
Rangel, F. et al. Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations. in Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings/Balog, Krisztian [edit.]; et al. 750–784 (2016).

[148] 106.
Rangel, F., Rosso, P., Potthast, M. & Stein, B. Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter. Work. Notes Pap. CLEF 1613–0073 (2017).

[149] 107.
Rangel, F., Rosso, P., Montes-y-Gómez, M., Potthast, M. & Stein, B. Overview of the 6th author profiling task at pan 2018: multimodal gender identification in twitter. Work. Notes Pap. CLEF 1–38 (2018).

[150] 108.↵
Rangel, F. & Rosso, P. Overview of the 7th author profiling task at PAN 2019: bots and gender profiling in twitter. in Working Notes Papers of the CLEF 2019 Evaluation Labs Volume 2380 of CEUR Workshop (2019).

[151] 109.↵
Burger, J. D., Henderson, J., Kim, G. & Zarrella, G. Discriminating gender on Twitter. in 1301–1309 (Association for Computational Linguistics, 2011).

[152] 110.
Volkova, S. & Yarowsky, D. Improving Gender Prediction of Social Media Users via Weighted Annotator Rationales. http://www.cs.jhu.edu/.

[153] 111.
Volkova, S., Wilson, T. & Yarowsky, D. Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media. 13.

[154] 112.
Liu, W. & Ruths, D. What’s in a Name? Using First Names as Features for Gender Inference in Twitter. undefined (2013).

[155] 113.
Plank, B. & Hovy, D. Personality Traits on Twitter—or—How to Get 1,500 Personality Tests in a Week. in Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis 92–98 (Association for Computational Linguistics, 2015). doi:10.18653/v1/W15-2913.
OpenUrl CrossRef

[156] 114.
Verhoeven, B., Daelemans, W. & Plank, B. TWISTY: a Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling. http://www.clips.uantwerpen.be/.

[157] 115.↵
Gender Classification Dataset. https://www.kaggle.com/datasets/cashutosh/gender-classification-dataset.

[158] 116.↵
Nie, J. Y., et al.
Radford, J. Piloting A Theory-based Approach to Inferring Gender in Big Data. in 2017 Ieee International Conference on Big Data (eds. Nie, J. Y., et al.) 4824–4826 (Ieee, 2017).

[159] Nie, J. Y., et al.

[160] 117.↵
Pizzaro, Juan. Using N-grams to detect Bots on Twitter Notebook for PAN at CLEF 2019. Noteb. PAN CLEF 2019 18, ix (2014).
OpenUrl

[161] 118.↵
Knowles, R., Carroll, J. & Dredze, M. Demographer: Extremely Simple Name Demographics. in Proceedings of the First Workshop on NLP and Computational Social Science 108–113 (Association for Computational Linguistics, 2016). doi:10.18653/v1/W16-5614.
OpenUrl CrossRef

[162] 119.↵
Volkova, S., Coppersmith, G. & Van Durme, B. Inferring User Political Preferences from Streaming Communications. in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 186–196 (Association for Computational Linguistics, 2014). doi:10.3115/v1/P14-1018.
OpenUrl CrossRef

[163] 120.↵
Sap, M. et al. Developing Age and Gender Predictive Lexica over Social Media. 1146– 1151 http://www.wwbp.org/data.html (2014).

[164] 121.↵
Zhou, E., Fan, H., Cao, Z., Jiang, Y. & Yin, Q. Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade. in 2013 IEEE International Conference on Computer Vision Workshops 386–391 (2013). doi:10.1109/ICCVW.2013.58.
OpenUrl CrossRef

[165] 122.↵
Wood-Doughty, Z., Andrews, N., Marvin, R. & Dredze, M. Predicting Twitter User Demographics from Names Alone. in Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media 105–111 (Association for Computational Linguistics, 2018). doi:10.18653/v1/W18-1114.
OpenUrl CrossRef

[166] 123.↵
Wood-Doughty, Z., Xu, P., Liu, X. & Dredze, M. Using Noisy Self-Reports to Predict Twitter User Demographics. (2020).

[167] 124.↵
Rothe, R., Timofte, R. & Van Gool, L. Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks. Int. J. Comput. Vis. 126, 144–157 (2018).
OpenUrl

[168] 125.↵
Geifman, N., Cohen, R. & Rubin, E. Redefining meaningful age groups in the context of disease. Age 35, 2357–2366 (2013).
OpenUrl

[169] 126.↵
Sera, L. C. & McPherson, M. L. Pharmacokinetics and pharmacodynamic changes associated with aging and implications for drug therapy. Clin. Geriatr. Med. 28, 273– 286 (2012).
OpenUrl PubMed

[170] 127.↵
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task. (Association for Computational Linguistics, 2022).

[171] 128.↵
Mauvais-Jarvis, F. et al. Sex and gender: modifiers of health, disease, and medicine. The Lancet 396, 565–582 (2020).
OpenUrl

[172] 129.↵
Klein, A. Z., Magge, A. & Gonzalez-Hernandez, G. ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets. PLOS ONE 17, e0262087 (2022).
OpenUrl

[173] 130.↵
Sloan, L. et al. Knowing the Tweeters: Deriving Sociologically Relevant Demographics from Twitter. Sociol. Res. Online 18, 74–84 (2013).
OpenUrl

[174] 131.↵
Sloan, L. Who tweets in the United Kingdom? Profiling the Twitter population using the British social attitudes survey 2015. journals.sagepub.com 3, (2017).

[175] 132.↵
Jung, S., An, J., Kwak, H., Salminen, J. & Jansen, B. J. Assessing the Accuracy of Four Popular Face Recognition Tools for Inferring Gender, Age, and Race. in Twelfth International AAAI Conference on Web and Social Media (2018).

Scoping Review of Methods and Annotated Datasets Used to Predict Gender and Age of Twitter Users

Abstract

1. Introduction

2. Methods

2.1 Inclusion and Exclusion Criteria

2.2 Data Extraction

2.3 Quality Assessment

2.4 Data Analysis

3. Results

3.1 Characteristics of Included Studies

3.2 Studies developing methods for gender and age prediction

3.2.1 Gender

3.2.1.1 Datasets

3.2.1.2 Non-personal accounts

3.2.1.3 Features and Models

3.2.1.4 Performance

3.2.2 Age

3.2.2.1 Datasets

3.2.2.2 Features and Models

3.2.2.3 Performance

3.3 Studies using previously developed methods

3.3.1 Open-Source Models

3.3.2 Off-the-shelf software

4. Discussion

4.1 Gender Prediction

4.2 Age Prediction

4.3 Potential bias of differing methods

4.4 Validation of age and gender proxies

4.5 Limitations

5. Conclusions

Data Availability

Author Contributions

Competing interests

Data availability

Funding

References

Citation Manager Formats

Subject Area