Smart, Responsible, and Upper Caste Only: Measuring Caste Attitudes through Large-Scale Analysis of Matrimonial Profiles
Ashwin Rajadesingan, Ramaswami Mahalingam, David Jurgens

TL;DR
This study uses large-scale analysis of matrimonial profiles to measure caste attitudes in India, revealing generational differences, social status influences, and diaspora variations in openness to intercaste marriage.
Contribution
Introduces a novel large-scale indicator based on matrimonial profiles to empirically measure caste attitudes across generations and geographies.
Findings
Younger generations are more open to intercaste marriage.
Attitudes are influenced by social status beyond caste.
Diaspora shows significantly less openness to intercaste marriage.
Abstract
Discriminatory caste attitudes currently stigmatize millions of Indians, subjecting individuals to prejudice in all aspects of life. Governmental incentives and societal movements have attempted to counter these attitudes, yet accurate measurements of public opinions on caste are not yet available for understanding whether progress is being made. Here, we introduce a novel approach to measure public attitudes of caste through an indicator variable: openness to intercaste marriage. Using a massive dataset of over 313K profiles from a major Indian matrimonial site, we precisely quantify public attitudes, along with differences between generations and between Indian residents and diaspora. We show that younger generations are more open to intercaste marriage, yet attitudes are based on a complex function of social status beyond their own caste. In examining the desired qualities in a…
| Self posted | Family posted | Total | |
| Male profiles | 175,611 | 50,245 | 225,856 |
| Female profiles | 39,039 | 48,413 | 87,452 |
| Total | 214,650 | 98,658 | 313,308 |
| Label | Caste Category | Pop. % | Data % |
| Brahmin | Brahmin | 5.2 | 16.62 |
| OFC | Other Forward Castes | 22.9 | 27.43 |
| OBC | Other Backward Castes | 40.5 | 39.38 |
| SC | Scheduled Castes | 21.2 | 10.84 |
| ST | Scheduled Tribes | 8.6 | 1.74 |
| Intercaste | Intercaste | - | 0.17 |
| Other | Other | 1.3 | 3.82 |
| Variable | Description |
| Gender | Male, Female |
| Age | z-score of years-old |
| Account age | log(number of days on site) |
| First Marriage | Yes, No |
| In a Large City | Yes, No |
| Education | High school, College, Postgraduate |
| Income | Above Median, Below Median, Not Answered |
| Parental Employment | Both, Father, Mother, Neither, Not Answered |
| Affluence | Upper, Upper Middle Class, Lower-middle or Middle Class, Not Answered |
| Caste | Brahmin, OFC, OBC, SC, ST, Intercaste, Other |
| Categories | Examples |
| family, profession, location | Looking for someone from a good family with a settled job in Delhi |
| caste, education | Seeking a suitable match of the same caste, ideally with a graduate degree |
| personality | Hoping for a like-minded partner who is warm, funny and supportive |
| education, physical attributes | Looking for a well-educated bride who is tall, fair and good-looking |
| caste, location | Looking for a match within the same community in the US or is willing to relocate |
| Category | Example words |
| Age | years, young, younger |
| Attractiveness | beautiful, handsome, pretty |
| Caste | community, brahmin, iyer, |
| Education | well-educated, degree, IIT |
| Family | parents, mom, families |
| Finances | earning, financially, career |
| Health | smoking, non-smoker, teetotaler |
| Location | USA, mumbai, relocate |
| Personality | loyalty, honesty, funny |
| Physical attributes | slim, tall, fair |
| Profession | profession, work, job |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Smart, Responsible, and Upper Caste Only:
Measuring Caste Attitudes through Large-Scale Analysis of Matrimonial Profiles
Ashwin Rajadesingan,1 Ramaswami Mahalingam,2 David Jurgens1
1School of Information, 2Department of Psychology
{arajades,ramawasi,jurgens}@umich.edu
University of Michigan, Ann Arbor
Abstract
Discriminatory caste attitudes currently stigmatize millions of Indians, subjecting individuals to prejudice in all aspects of life. Governmental incentives and societal movements have attempted to counter these attitudes, yet accurate measurements of public opinions on caste are not yet available for understanding whether progress is being made. Here, we introduce a novel approach to measure public attitudes of caste through an indicator variable: openness to intercaste marriage. Using a massive dataset of over 313K profiles from a major Indian matrimonial site, we precisely quantify public attitudes, along with differences between generations and between Indian residents and diaspora. We show that younger generations are more open to intercaste marriage, yet attitudes are based on a complex function of social status beyond their own caste. In examining the desired qualities in a spouse, we find that individuals open to intercaste marriage are more individualistic in the qualities they desire, rather than favoring family-related qualities, which mirrors larger societal trends away from collectivism. Finally, we show that attitudes in diaspora are significantly less open, suggesting a bi-cultural model of integration. Our research provides the first empirical evidence identifying how various intersections of identity shape attitudes toward intercaste marriage in India and among the Indian diaspora in the US.
1 Introduction
Traditionally, marriages in India are considered as a union of two families. Typically, parents initiate and mediate the search for a spouse within kinship networks and ensure compatibility based on factors such as caste, education, affluence, horoscope and physical characteristics. These “arranged” marriages are usually endogamous with the bride and groom belonging to the same caste, reinforcing caste lines (?). However, recent studies show a decline in parental control in the matchmaking process, with more individuals reporting that they chose their spouse independently or at least jointly with their parents (?; ?). Coupled with a greater openness to marry outside their caste, this increase in agency has the potential to alter the social fabric of the society, especially pertaining to caste relations. In this work, using data from over 313K profiles from a major Indian matrimonial site, we study social inclusion by examining partner preferences and factors that affect openness to intercaste marriage in India and the Indian diaspora in the US.
Previous research on intercaste marriage primarily relies on rich ethnographies (?; ?), qualitative interviews (?), surveys (?; ?) and audit studies (?; ?). Unlike surveys which depend on self reported openness which may have a social desirability bias (?) or measuring incidence of intercaste marriages which may be different from actual openness, in our work, we use data from online matrimonial profiles where users specify their openness to intercaste marriage when they register. As profiles contain rich demographic information, we can identify how various intersections of identity shape attitudes toward inter-caste marriage.
In Indian matrimonial sites, either individuals or their family can create a matrimonial profile, allowing us to study generational differences in spousal preferences, and by proxy, caste attitudes. Further, given these websites’ popularity among Indian diaspora, through openness to intercaste marriage, we analyze how intercaste relations differ among diaspora in the US who live as a minority in a fundamentally different cultural context. These online profiles provide provide a unique real-time estimate of changing societal attitudes on caste and, because of their intensely personal nature, accurately reflect social preferences.
Our work provides the following three main contributions. First, we find younger generations are substantially more open to intercaste marriage when controlling for demographics and background (§4). However, the demographic factors associated with openness are highly similar across generations, with the exception that higher education in children shows increased openness when compared with their family. Moreover, we find clear geographic trends in acceptance to intercaste marriage, with South Indian states being less open. Second, we show that individuals who are open to intercaste marriage are more likely to desire a spouse based on individual traits, whereas those not open to intercaste marriage emphasize family; these trends mirror the shift in Indian culture from collectivist to more individualist (§5). Third, Indian diaspora show a lower acceptance to intercaste marriage compared to Indians in India (§6). This result aligns with current theory (?) which contends that some diasporic individuals, because of their minority status, essentialize the notion of caste as a way maintaining cultural identity which results in resistance to intercaste marriage.
2 Caste System
Caste is a form of social hierarchy which assigns status to individuals based on birth. Originally based on the Hindu varna system which grouped people based on their occupation, caste has evolved into a form of social identity based on birth which decides hierarchy and social order in society (?). The caste system divides people into groups and is organized based on perceived notions of “purity” and “pollution” (?). Brahmins hold the highest rank in this caste hierarchy, Other Forward Castes (OFC) are non-Brahmin upper castes, Other Backward Castes (OBC) are castes considered to be lower economically and socially, and Scheduled Castes (SC or Dalits) and Scheduled Tribes (ST) are historically disadvantaged groups. The prejudice propagated by caste through centuries denied individuals in lower caste groups dignity and self respect (?) as well as access to education (?), job opportunities (?), religious institutions (?) and public resources such as water (?). Despite increased awareness (?), government policies (?) and work of social reformers (?), caste based discrimination is still prevalent today. This form of discrimination greatly reduces social and economic mobility among these groups and increases inequality in society (?; ?). Caste discrimination is not limited to India, the Dalit diaspora in the US also face similar verbal, physical and workplace discrimination, with one in two Dalit respondents reporting that they fear being “outed” (?).
One of the primary institutions that reinforces caste lines is arranged marriages (?), which are usually endogamous. Only 5% of marriages in India are intercaste marriages where the bride and groom belong to different caste groups (?). Couples engaging in intercaste marriage face social seclusion (?), loss of family support (?) and violence (?) even amounting to murder (?). Nevertheless, there is increasing support for intercaste marriages through NGO networks (?), activists (?) and government schemes that provide monetary support to encourage intercaste marriage. Therefore, precise measurement of changing attitudes on caste is readily needed to highlight where increased pressure and incentives can be used to ultimately bring about an equitable society.
3 Matrimonial Profile Data
The search for a spouse has given rise to multiple online matrimonial sites that host millions of profiles. Crucially, unlike dating websites, matrimonial sites are specifically tailored for marriage and include language prohibiting their use for casual relationships. Like most matchmaking websites, individuals provide detailed information including photos, a free-form self description section, family details, age, location, caste, education level, income and current job. However, unlike Western matchmaking platforms, these sites allow others (parents, siblings, other relatives and friends) to post on behalf of a person, thereby facilitating the traditional arranged marriage process in an online setting (?). Critically, the platform directly asks individuals to indicate their openness to intercaste marriage, which we use as a proxy for their attitude toward caste.
Strong evidence indicates that the preferences expressed in these online matrimonial profiles accurately reflect offline personal attitudes in the Indian social context. Matrimonial sites in Indian communities facilitate—rather than replace—the traditional arranged marriage process, aiding the family or individual in selecting a spouse (?; ?). These matrimonial sites help make connections that would have been traditionally made by extended family members but are potentially more difficult to make in modernized India due to geographic mobility (?). Indeed, user interviews indicate that these sites provide individuals with greater agency over the desired traits in a spouse, compared to the traditional parent-driven selection process, and therefore the expressed preferences are more closely mirror individual preferences (?).
Data Collection
Data was collected from Shaadi.com, a leading Indian matrimonial site, by creating a profile to access public profile information available on the website. This experimenter-created profile included minimal information using, whenever possible, default profile attributes and site-provided text content. Profiles included in the study were gathered by querying using the website’s profile search functionality with different combinations of age, gender, caste, and height.111No private data, e.g., communication, IP addresses, contact details or other user hidden data, was collected. Each such query returned a maximum of 600 profiles. While the data collected is not considered as human subjects research by our Institutional Review Board (IRB), efforts to remove personally-identifiable information were taken. All occurrences of individuals’ names in the free-form self description were removed. Unique identifiers associated with each profile such as names and usernames (after being used for de-duplication) were removed. We discuss in detail the ethics behind collecting this dataset and performing research on it in Section 7.
In total, through this method, we collected 411,292 profiles. Most profiles collected were in English (99.2%), and the rest was filtered out using langdetect. Further, to guard against survival bias, where older profiles are potentially systematically different in attributes compared to newer profiles, we restrict our analysis to the profiles created from 2017 onward. Also, as our topic of interest is openness to intercaste marriage in India and among the Indian diaspora in the US, we restrict the profiles to within the two countries. We filter out friend posted profiles (about 1% of profiles) and collectively refer to profiles posted by a parent (53.5%), sibling (43.3%), or other relatives (3.2%) as family-posted profiles. We note that although a sibling may have posted the profile, parents are typically heavily involved throughout the arrangement process on these sites (?). In total, this filtering resulted in 313,308 profiles as shown in Table 1.
Caste is predominantly a Hindu phenomenon, with some scholars arguing that it is even intrinsic to Hinduism (for debate, see ? (?) and ? (?)). Therefore, we focus our research on individuals self-identifying as Hindu, which is the social context in which the effects of caste identity are strongest and most salient. The profiles matched 352 castes, which are mapped to one of the groups shown in Table 2 using the caste classification in the nationally representative Indian Human Development Survey (IHDS) (?). Estimating the proportion of individuals belonging to different caste categories in India is highly contentious (?), with the results on caste populations from the Socio Economic and Caste Census of 2011 (SECC) still yet to be released. Therefore, the population estimates for different caste categories in Table 2 are not definitive and must be treated accordingly. Broadly, relative to the available estimates of their national populations, SC and ST populations are underrepresented in our dataset, and Brahmin and OFC populations are over-represented, perhaps due to systemic economic disadvantages and differential levels of access to internet (?; ?). The results from our analysis are also more likely to reflect urban Indians’ attitudes on caste because of poor infrastructure and low Internet penetration in rural areas (?).
4 Openness to Intercaste Marriage
Caste remains an critical identity variable in present-day India (?; ?; ?). The caste of a potential spouse matters for many individuals—to the point of ruling out potential mates on the basis of this variable alone (?). What personal features influence whether an individual is open to intercaste marriage? Prior studies have suggested that increased openness is associated with older age and increased education (?), being in an urban area (?), and being a member of a lower-status caste (?). Yet, accurate measurements of these factors is significantly hindered by (i) cultural norms that vary substantially across states (?; ?; ?) and (ii) the social desirability response bias where individuals report the more socially-acceptable option (increased openness) while privately maintaining a different attitude (?; ?). Here, we mitigate both issues to analyze intercaste acceptability through large-scale analysis of the features of matrimonial profiles to ask two questions: (i) to what degree do caste attitudes differ between generations and (ii) what demographic factors are associated with increased openness to intercaste marriage.
Data
The website allows direct measurement of openness to intercaste marriage through its registration process where new users are instructed to indicate whether caste is an important factor for their spousal choice: “Not particular about my partner’s Caste/Sect (Caste No Bar).” Notably, this option defaults to being not open to intercaste marriage, underscoring the societal bias against intercaste marriage. While survey responses can suffer from the social desirability response bias, the direct impact of this choice on the person (i.e., marrying that potential spouse) suggests that public and private sentiments are fully aligned. Indeed, an individual’s choice directly affects how they are categorized and to whom they are visible in the websites. Further, this choice is a blanket choice towards all members of other castes and not a response to one particular person, making it correspond more closely to a general attitude.
These matrimonial profiles include multiple demographic attributes that are selected from a fixed set of options, shown in Table 3. Affluence is a self-reported measure of socioeconomic status; income is reported in ranges or as “decline-to-report,” which we separate into three categories (i-ii) lower or higher than median and (iii) unknown. Although similar to affluence, income also serves as a measure of financial independence which is closely linked to intercaste marriage. To control for the possible effects of how long an account has been shown on the site, we include the log of the number of days, with accounts older than two years being excluded from the study altogether. To focus solely on the attitudes on Indian residents, in this analysis, we only consider the 298,400 India-based profiles of which 68.74% are self-posted.
Methods
Inter-generational differences in acceptance of intercaste marriage are tested through two methods. First, we construct control and treatment cohorts of individuals to test for differences in openness using one-to-one almost exact matching (?), which is analogous to techniques like propensity score matching except that it uses direct category alignment. Individuals who self post are randomly paired with an individual with identical demographics (based on Table 3) whose family had posted their profile. To increase the percentage of individuals with counterparts, we match individuals against others with a maximum age difference of 5 years and account age difference of 90 days. Owing to our large dataset size, 76.9% of family posted profiles are matched with a corresponding self-posted profile (versus 46.2% with same-age, though we note that results described later are nearly identical). Due to almost exact matching, the difference in openness is expected to be due to the identity of profile’s author, i.e., the person or their family.
Second, we analyze the demographic factors contributing to increased openness by constructing identical logistic regression models for both self- and family-posted profiles using the variables in Table 3. Past work on caste has shown large regional differences (?), and therefore, we include a random effect for the state that the user resides in. To identify changes in the influence of the interest variables in self-arranged and family-arranged model of marriages, we compare log odds coefficients of the two models using the following Z test (?): where and are coefficients of interest variables from both models.
Results and Discussion
Individuals are far more open to intercaste marriage than family members who write profiles for their counterparts with near identical demographics (23.94% vs 20.82%), as shown in Figure 1. Given that parents are heavily involved when accounts are posted by a family member, this result strongly suggests a substantial inter-generational difference in caste attitudes. However, the current data cannot fully rule out selection effects to establish age as the only cause of this difference; individuals who opt to write their own profiles could also be more open than their counterparts having a profile written for them, thus introducing the potential for selection bias to affect the magnitude of the difference. Nevertheless, our result points to a substantial attitude difference in the two populations.
The demographic factors associated with openness to intercaste marriage, shown in Figure 2, suggest that a lower social status, as a function of income, education, and affluence, is associated with less openness to intercaste marriage. Also, the coefficients for affluence and parents’ employment are statistically indistinguishable across the two models, suggesting caste attitudes associated with these demographics are robust across generations. However, in the case of caste, the trend of higher status being more open deviates. Notably, the Brahmin caste is least open to intercaste marriage dwarfing other demographic attributes. Together, this result identifies how various intersections of identity influence attitudes toward caste, showing that demographic variables in addition to the caste hierarchy play an important role, and pointing to the need for holistic strategies for changing attitudes, not just those based on caste.
Comparing both models, we identify one important difference between self- and family-posted profiles related to education and openness. Higher education levels of the individual are positively correlated to openness in self-created profiles, whereas a person’s education level does not have a significant effect when the family creates their profiles. This result suggests that family members of more-educated individuals are less supportive of their relative being in an intercaste marriage. This infantilization of adults results in moral policing and consequently, less openness to intercaste marriage (?). However, as marriages move closer to individual choice, higher levels of education may help break caste barriers.
Our result showing that the education of the self-poster plays a significant role in their openness directly contrasts with the result of ? (?), who found that only the education of the husband’s mother mattered but not the education of either spouse. We attribute this difference to the fact that our analysis is able to separate out who is initiating the search, though the family is likely to be heavily involved in the process for both (?, p. 352). However, our result on the impact of the mother’s employment closely mirrors their result on the mother’s education (both associated with increases in openness), given that employment and education are associated (?; ?). Though our analyses are on slightly different data (preferences on matrimonial profiles vs. married couples), our finding presents an important insight for its implications on policy, suggesting that initiatives increasing education can potentially reduce the importance of caste in spouse selection.
Examining trends across caste identities, the relative differences in openness between castes was consistent across the models, with individuals identifying as intercaste being most open to intercaste marriage—in spite of likely being witness to and victim of their parents’ social ostracism by society (?). Further, we find that in both models, openness to intercaste marriage follows the same order: Intercaste OFC SC OBC ST Brahmin. Brahmins, who occupy the highest rung in the social hierarchy, are least open to intercaste marriage. Scheduled tribes are also resistant to intercaste marriage, which runs contrary to previous anthropological evidence (?). Considering the small sample size, the recent Hinduization of tribal identity (?), this phenomenon should be explored further with more qualitative research.
Regional variations
Given known cultural differences across states (?), when controlling for all other demographic factors, to what degree do states vary in their openness? To answer this question, the states’ random effect coefficients from the self-posted model are mapped and shown in Figure 3 (left),222The map derived from family-posted accounts is highly similar and not shown here. which reveals strong regional trends: Northern states are substantially more open to intercaste marriage than Southern states. Further, considering that the Southern states consistently rank higher in gender equity (?) and socioeconomic levels (?)—variables that are associated with higher openness in our model—we would expect the states to rank higher in openness to intercaste marriage as well. This trend is likely due to strong cultural norms around cross-cousin marriages (?) and has also been observed in previous nationally representative studies (?). Thus, our results suggest that while the prima facie expectations of openness in Southern states might be higher, when controlling for their demographic and social differences, our results show the opposite cultural norms. Compared to nationally representative percentages of actual intercaste marriages shown in Figure 3 (right), we also find substantial differences between the estimated national trends of openness to actual rates of intercaste marriage. Our result provides critical information on intercaste attitudes as relatively little information on openness to intercaste marriage is available; instead, most research relies on actual incidences of intercaste marriages, which we show differs substantially and could skew conclusions related to social inclusion.
5 Attitude Shifts in Desirability
Questions about modernization and its relation to cultural norms have long been contentiously debated among scholars (?): Do traditional norms persist or evolve in the face of modernization? Increased economic independence and changing family dynamics in a collectivist society have been argued to lead to a more individualistic society (?). In India, the effect of globalization and urbanization has been associated with an evolution of marriage and family roles (?). Given evidence of inter-generational differences in caste attitudes, does increased openness to intercaste marriage among self-posting individuals point to a shift in the attributes that are desired in a spouse, reflecting a broader shift away from caste as an important identity variable? Here, we examine profiles’ text content to test for systematic shifts in spouse-seeking behavior.
Data
Many profiles include at least one statement about their desired qualities in a spouse due to a prompt from the website to include a description of who they are looking for. We use a regular expression to extract statements about the desired attributes of the spouse as follows. Profiles are parsed into sentences and then filtered such that a sentence must contain a partner word and a search verb; here, search-related verbs are seek, need, search, look, looking, want, prefer, desire, expect and wish; and partner-related words include person, hubby, spouse, individual, soul mate, soulmate, partner, someone, some one, alliance, match, prospective and prospect. To streamline profile creating, the website provides auto-generated descriptions that users can copy and use, e.g., “I am seeking someone who will be a great partner in my journey of life.” To remove possible effects from auto-generated text, we filter out any sentence repeated verbatim over 10 times from the analysis. Through this approach we identified sentences containing partner preferences from 55,520 profiles with a median length of 73 characters. Table 4 lists paraphrased examples of extracted sentences.
Methods
To test for differences in spousal preference, we mirror prior work in testing for mentions of specific categories such as appearance (?). Past work in economics on dowry determinants suggested that education, height, and older men were preferred (?). Other evolutionary psychology work suggests that parents look for good character, family background, similar social status, wealth, health and chastity (?). Work on partner preferences in India highlighted the importance of kindness and understanding, health, mutual attraction, education and intelligence for males and females (?). Men preferred young, physically attractive and house work oriented partners while women preferred attributes such as high social status, education, income and ambition (?; ?). Using these theory-based themes as reference, two annotators independently classified profile words into theory driven categories. Manual coding was done to account for emerging themes such as “location” which was not predicted by past research. After the first round of coding, discussions on word classifications and new themes were conducted before recoding, reaching high agreement at 0.91 Krippendorff’s . Categories and example words are shown in Table 5. These words are matched to the extracted preference statements in self-posted profiles.
Results and Discussion
Both individuals who are open to intercaste marriage and those who are not have highly similar preferences in the attributes they look for in a spouse, as shown in Figure 4. This similarity implies that individuals open to intercaste marriage are no less choosy in the spousal choice. However, the relative mentions of Profession, Personality, Caste and Family significantly differed between the groups (p0.01), with the largest differences in the two most-frequently mentioned categories, Family and Personality. Specifically, we find that individuals open to intercaste marriage use a higher frequency of words related to the potential spouse’s personal qualities, whereas individuals who are not open use words more related to the social context of the spouse (caste and family). Individuals preferring to specify specific personal qualities as opposed to caste or family background in stating their preferences is a step forward in caste attitudes. A lower emphasis on family background of the prospective spouse suggests a turn towards a more individualistic concept of marriage, moving away from the traditional notion of marriage being a union of two families. Our large-scale data-driven results indicating a shift from family to personality provide further grounding to the qualitative findings from in-person interviews of individual’s preferences when seeking spouses on these sites (?).
6 Openness to Caste in Diaspora
Economic and social mobility have led to substantial populations of Indians living abroad, both as immigrants and as citizens. Assimilation and modernization theories suggest that when immigrants move to more individualistic societies, they adopt individualism and the family plays a smaller role in individuals’ lives (?; ?). However, the bi-cultural integration model of acculturation (?) contends that migrants retain cultural values in private settings while inculcating the host culture in professional settings. Using over 14,000 matrimonial profiles of the Indian diaspora living in the US, we test to assess whether attitudes and demographic factors associated with openness persist across cultural settings.
This study is rooted in a series of anthropological investigations of whether immigrants carry caste to their adopted lands. Studies on the Indian diaspora in the Caribbean islands report a decline and elimination of caste consciousness (?). However, similar research in the US demonstrates that caste and casteism continues to be prevalent among the Indian diaspora (?; ?). A crucial difference between the groups in the Caribbean and US is that the former were low-caste indentured workers with little cultural capital (?), while immigrants in the US are mostly high skilled professionals such as engineers and doctors. Thus, understanding factors that influence intercaste marriage in the US provides insight into how traditional caste boundaries are erased or reinforced in a relatively-privileged minority community within a larger western context.
Data and Methods
We analyze data from 14,908 matrimonial profiles created by US-based users. First, we compare US based individuals to their Indian counterparts on openness to intercaste marriage. Then, we examine how demographic factors influence their openness to intercaste marriage. US and Indian profiles are compared using the one-to-one almost exact matching technique outlined in Section 4, matching users based on all demographic attributes in Table 3. 89.6% of US based individuals are matched with a random counterpart in India with identical demographics.
We repeat a similar setup as in Section 4, fitting a random effects logistic regression model on openness as the dependent variable. Here, we include the US state as a random effect and include an additional fixed effect for whether the person was raised in the US. Because of fewer profiles available for this analysis, we train a single model using both self- and family-posted profiles and include a fixed effect for whether the profile is self-posted. Based on results from Section 4 which showed factors affecting openness in self- and family-posted profiles are similar, we do not expect that using a single model will affect validity.
Results and Discussion
Comparing the matched profiles in Figure 5, we find that individuals in the US are much less open to intercaste marriage than individuals with identical demographics in India (13.69% vs 22.91%). This result supports theory from social psychology regarding South Asian immigrants that suggests immigrants construct an essentialist notion of ethnic identity for self protection, which valorizes their culture and, in this circumstance, preserves caste salience (?).
The demographic aspects associated with increased openness to intercaste marriage in US diaspora, shown in Figure 6, largely mirror those of India-based profiles (cf. Figure 2). This result matches the expectations of the bi-cultural integration model of acculturation (?) in which caste preferences are maintained in the more private setting of Indian-specific matrimonial sites, relative to the larger cultural (American) attitudes about caste. However, compared with Indian immigrants, US-raised Indians are more open to intercaste marriage, which supports the modernization theory that individuals will adopt aspects of the surrounding environment. Modernization theory notwithstanding, we find that the positive openness effect from being US raised is dwarfed by the resistance based on status as a function of income and the caste hierarchy—underscoring the importance of recognizing intersectional identity in understanding caste attitudes.
Similar to trends in India, the demographics of self-posting, age, residing in a large city, more education, and higher income—all signals of higher social status—are positively linked to openness. Contrary to trends in Section 4, individuals are more open to intercaste marriage during their first marriage, which should be more deeply explored. We speculate that individuals looking to get remarried may have married outside caste or ethnicity in the previous marriage, and are looking to seek in-group affinity within their own community. However, more data on their previous marriage is required to make definitive conclusions.
Openness and Opportunity
Indian diaspora have settled throughout the US within varying sizes of communities. Prior studies on mate selection has found that with increased access to potential spouses, individuals become more selective (?). However, prior sociological studies have also shown that urbanness is associated with increased tolerance for others of different races and background (?). Given increased access to a larger pool of potential spouses who may also belong to the same caste, are individuals less open to intercaste marriage (i.e., more selective)?
To test the hypothesized relationship between openness and opportunity for marriage, we map the locations for all profiles and compute the number of profiles within a 50 mile radius as a proxy for potential matrimonial opportunity. The number of profiles correlates highly with the nearby urban population (=0.90) and therefore also serves as a proxy for urbanness. A logistic regression is then fit for predicting a profile’s openness from the number of profiles. The results, shown in Figure 7, reveal an increase in openness as the number of potential spouses increase. This result suggests that increasingly urban settings are associated with less discriminatory attitudes towards caste. However, we note that this current evaluation in insufficient to establish causality; indeed, the effect could be due to a selection bias where more open individuals move towards urban areas. Nevertheless, the results do indicate that despite increased opportunities to marry within their own caste, individuals in larger cities increasingly choose to be open to intercaste marriages.
7 Ethical Considerations
The use of matrimonial profile data in research warrants a discussion of the ethical implications and decisions made. The approach taken in this study is especially influenced by recent work around ethical research using social media data (?; ?; ?). In particular, precautions were taken to ensure the design, decision-making, data collection, and handling was done in an ethical manner according to recommended best practices and guidelines (?; ?; ?). Here, we discuss three key aspects: (1) whether this data is public, (2) data and privacy protection practices, and (3) the overall risk-benefit trade-off in analyzing the data.
Public or Private
Research using public information on the internet is typically not considered human subjects research (?). The IRB at the authors’ home institution of the University of Michigan also verbally confirmed that they do not consider work on this data to be human subjects research because all the information analyzed is public. However, the IRB is primarily concerned with legal and institutional regulations, and therefore, we consider the ethical implications of considering this data to be public. In their discussion of the process for deciding whether using particular data is ethical, ? (?) highlight a case study on data from a dating app that is a close analog to our study. Here and in the case study, individuals posting the data have the expectation that strangers will be viewing their profiles, and therefore we consider the data to be public in nature. However, as users do not expect their profiles to be publicly indexed and archived outside the site, releasing the data may violate their contextual expectations of privacy (?). Also, previous releases of anonymized data have been de-anonymized, putting users at possible risk (?). Therefore, we report only paraphrased examples and, though it limits reproducibility, opt not to release the data.
Data Privacy and Protection
Although we consider the data to be public, substantial precautions were taken during the data collection process to ensure that no personally identifiable information was retained. We collect only the minimal data required to study the relationship of personal attributes associated with caste discrimination (e.g., education, caste identity). This process intentionally excludes data such as photos, contact details, their interactions with other users, or users’ listed preference in other partner attributes, apart from openness to intercaste marriage. Once the data collection process was finished, all unique identifiers associated with each profile such as names and usernames (after being used for de-duplication) were removed. The fully-anonymized profiles are then stored on an encrypted hard disk accessible only to the researchers involved in the study. Although our data collection process required the creation of a new account, no communication or contact was made with any users on the site before, during, or after collection.
Risk-Benefit Analysis
In critically examining this project, we aim to minimize potential harm while maximizing the societal benefits of our research towards understanding casteism. Casteism is a serious and widespread form of social discrimination that results in major emotional, financial, and, at times, physical harm to those experiencing it, as detailed in Section 2. Our research provides valuable insights into where efforts to mitigate these discriminatory attitudes might be best placed; for example, geographic findings (Figure 3) show a stark difference in geographic trends in our data when compared to the limited data on rates of intercaste marriage, which point to where governmental and NGO efforts might be better addressed. As another example, while ? (?) found no association between discriminatory attitudes and education levels of the married couple, our regression analysis (Figure 2) shows that increased education is in fact associated with decreased discriminatory attitudes in self-posted profiles. Our research now identifies education as a potential long-term intervention strategy to change cultural attitudes in certain situations.
The largest risk from this research is loss of individuals’ privacy. As a result, we have attempted to mitigate privacy concerns by anonymizing the collected data and by our decision to not share the data. We report only general trends—indeed, these general trends are what are most critical for directing efforts towards removing the scourge of casteism.
Finally, while the website’s Terms of Service (TOS) prohibit crawling, our choice to disregard these terms was again motivated by a harm-benefit analysis. Our crawler issued limited queries that were appropriately spaced to have minimal impact on the site and our data privacy practices minimize the potential privacy risks to users. Our research from the resulting data presents an unprecedented view into caste attitudes that is otherwise difficult to measure and thus provides a substantial benefit towards understanding caste attitudes and their prevalence. This harm-benefit reasoning is analogous to those used by audit studies studying discrimination on platforms whose TOS prohibit such activities (?; ?).
8 Conclusion
Caste based discriminatory attitudes currently stigmatize millions of individuals, denying the equal access to employment, education, and even basic human rights. Governmental incentives and social movements have attempted to counter these attitudes, yet accurate measurements of public opinions on caste are unavailable despite being essential for understanding whether progress is being made. Here, we introduce a novel approach to measuring public attitudes of caste through an indicator variable: openness to intercaste marriage. Using a dataset of over 313K profiles from a major Indian matrimonial site, we precisely quantify attitudes on intercaste marriage, along with differences between generations and between Indian residents and diaspora.
Our work provides the following three main contributions towards understanding attitudes towards caste. First, we show attitudes are changing between generations, with younger individuals being more open to intercaste marriage and, in a holistic analysis of identity, show that lower social status as a function of multiple factors (e.g. income, education) is predictive of decreased openness to intercaste marriage. Further, we provide the first large-scale measurement of openness to intercaste marriage, showing that attitudes are not well aligned with incidences of intercaste marriage. Second, we uncover signs of cultural shift towards individualism by examining the desired qualities in a spouse: as attitudes become more liberal towards intercaste marriage, less emphasis is on the familial aspects of a spouse and, instead, more emphasis is on their individual personality. Finally, we find that Indian diaspora are substantially less open to intercaste marriage. While some selection bias exists for diasporic individuals seeking a spouse on an Indian matrimonial site, even when controlling for where they grew up, our results support the bi-cultural theory of integration (?) where individuals adopt the norms of the host country in public settings while maintaining their own culture’s norms in private settings. Our research provides the first empirical evidence identifying how various intersections of identity shape attitudes toward intercaste marriage in India and among the Indian diaspora in the US, where caste hierarchy, transnational social location, social status, and other demographic variables all play an important role.
Acknowledgements
The authors thank Ceren Budak, Paul Resnick, and the Computational Social Science seminar at UMSI for their helpful feedback on earlier drafts of this work.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Adur and Narayan 2017] Adur, S. M., and Narayan, A. 2017. Stories of dalit diaspora: Migration, life narratives, and caste in the us. Biography 40(1):244–264.
- 2[Agrawal, Agrawal, and Aggarwal 1991] Agrawal, S. P.; Agrawal, S.; and Aggarwal, J. 1991. Educational and Social Uplift of Backward Classes: At what Cost and How?: Mandal Commission and After , volume 26. Concept Publishing Company.
- 3[Agrawal 2015] Agrawal, A. 2015. Cyber-matchmaking among indians: Re-arranging marriage and doing ‘kin work’. South Asian Popular Culture 13(1):15–30.
- 4[Ahuja and Ostermann 2016] Ahuja, A., and Ostermann, S. L. 2016. Crossing caste boundaries in the modern indian marriage market. Studies in Comparative International Development 51(3):365–387.
- 5[Allendorf and Pandian 2016] Allendorf, K., and Pandian, R. K. 2016. The decline of arranged marriage? marital change and continuity in india. Population and development review 42(3):435–464.
- 6[Allendorf 2013] Allendorf, K. 2013. Schemas of marital change: From arranged marriages to eloping for love. Journal of Marriage and Family 75(2):453–469.
- 7[Ambedkar 2004] Ambedkar, B. R. 2004. Castes in india: Their mechanism, genesis and development. Readings in Indian Government And Politics Class, Caste, Gender 131–53.
- 8[Ambedkar 2014] Ambedkar, B. R. 2014. Annihilation of caste: The annotated critical edition . Verso Books.
