• Find Articles in Other Languages
  • Find Books & Videos in Other Languages

Finding Articles in Other Languages

Finding articles in specific languages is very similar to finding articles in English, but with some key changes:

  • Use search terms in your target language. For example, to find articles about climate change in Spanish, try calentamiento global, ensuciamiento de aire, or gases de invernadero. 
  • Limit your search results to your target language using the database's filters, if any. These filters might appear at the beginning of your search, on the advanced search page, or on the results page. See the examples below. The example on the left is from JSTOR, showing the language filter on its advanced search page. The example on the right is from OneSearch, showing the language filter, which is available on the  All Filters menu from the results page.

Language limits in JSTOR

NOTE : Our databases don't translate articles. These search tips can only find articles that were originally written in your target language. 

In addition to OneSearch (see below), we recommend the following databases for finding articles in foreign languages.

Partially peer reviewed. Some full-text content.

Free Web Resource

Partially peer reviewed. Some full-text content. Open access.

Peer reviewed. Some full-text content.

OneSearch logo

To use OneSearch, go to the UVU Fulton Library Homepage link below. OneSearch is the main search box on the page. Enter search terms into the box, then hit enter or click the magnifying glass. Once your search runs, you can filter your results clicking the All Filters  button that appears below the search box. 

  • UVU Fulton Library Homepage The library's main page. Access OneSearch, help options, library hours, and more.

Library Help

  • Call : 801.863.8840
  • Text : 801.290.8123
  • In-Person Help
  • Email a Librarian
  • Make an Appointment
  • << Previous: Home
  • Next: Find Books & Videos in Other Languages >>
  • Last Updated: Apr 30, 2024 1:44 PM
  • URL: https://uvu.libguides.com/languages-guide

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 08 March 2023

Changing perceptions of language in sociolinguistics

  • Jiayu Wang 1 ,
  • Guangyu Jin 1 , 2 &
  • Wenhua Li 1  

Humanities and Social Sciences Communications volume  10 , Article number:  91 ( 2023 ) Cite this article

7062 Accesses

1 Citations

1 Altmetric

Metrics details

  • Language and linguistics

This paper traces the changing perceptions of language in sociolinguistics. These perceptions of language are reviewed in terms of language in its verbal forms, and language in vis-à-vis as a multimodal construct. In reviewing these changing perceptions, this paper examines different concepts or approaches in sociolinguistics. By reviewing these trends of thoughts and applications, this article intends to shed light on ontological issues such as what constitutes language, and where its place is in multimodal practices in sociolinguistics. Expanding the ontology of language from verbal resources toward various multimodal constructs has enabled sociolinguists to pursue meaning-making, indexicalities and social variations in its most authentic state. Language in a multimodal construct entails the boundaries and distinctions between various modes, while language as a multimodal construct sees language itself as multimodal; it focuses on the social constructs, social meaning and language as a force in social change rather than the combination or orchestration of various modes in communication. Language as a multimodal construct has become the dominant trend in contemporary sociolinguistic studies.

Similar content being viewed by others

research paper in different languages

Determinants of behaviour and their efficacy as targets of behavioural change interventions

research paper in different languages

Entropy, irreversibility and inference at the foundations of statistical physics

research paper in different languages

The development of human causal learning and reasoning

Introduction.

This article will review a range of sociolinguistic concepts and their applications in multimodal studies, in relation to how language has been conceptualized in sociolinguistics. While there are reviews of specific areas of research in sociolinguistics, including prosody and sociolinguistic variation (Holliday, 2021 ), language and masculinities (Lawson, 2020 ), and Language change across the lifespan (Sankoff, 2018 ), there have been few reviews works set out to delineate the most fundamental ontological questions in sociolinguistic studies; that is, what is and what constitutes language? How do sociolinguists perceive language in relation to other semiotic resources that are part and parcel of social meaning-making and social interaction? Relevant discussions are scattered in passing mainly in the introductory sections of various sociolinguistic works, such as Blommaert ( 1999 ), García and Li ( 2014 ) and Makoni and Pennycook ( 2005 ). However, there have not been review articles systematically dealing with the changing perceptions of language in sociolinguistic studies.

These issues are worthwhile to pursue in the sense that though sociolinguistics studies language, yet no reviews were done regarding what on earth constitutes language, especially in relation to a wider range of semiotic resources. What even makes the review more imperative is that in an increasingly globalized and high-tech world, linguistic practices are complicated by the super-diversity of ethnic fluidity, communications technologies, and globalized cross-cultural art.

Centring on the ontological perception of language in sociolinguistics, this article consists of five sections. After the “Introduction” section, the next section will review traditional (socio)linguistic perceptions of language as written or spoken signs or symbols that people use to communicate or interact with each other. The next section will review representative sociolinguistic approaches that place language in multimodal settings which involve the relationship between language and other semiotic resources. They are categorized as the conceptualizations of “language in multimodal construct” and “language as multimodal construct”. These conceptualizations share the common feature that language is not researched merely in terms of written and spoken signs and symbols, but it is probed (1) in relation to its multimodal contexts and (re)contextualization (regarding language in multimodal construct), (2) in terms of its own materiality and spatiality, and linguistic representations of multimodality, for instance, social (inter)action and “smellscapes” (Pennycook and Otsuji, 2015a ) which are in turn conflated with linguistic features (regarding language as multimodal construct). The penultimate section and the last section will present a critical reflection and a conclusion of the review, respectively.

Language as written and spoken signs and symbols

What constitutes language(s)? Saussure ( 1916 ) distinguishes between langue and parole. The former refers to the abstract, systematic rules and conventions of the signifying system, while the latter represents language in daily use. Chomsky ( 1965 ) refers to them as competence (corresponding to langue) and performance (corresponding to parole). Chomsky ( 1965 ) assumes that performance is bound up with “grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of this language in actual performance” (Chomsky, 1965 , pp. 3–4). He advocates that the agenda of linguistics should be the study of competence of “an ideal speaker-listener, in a completely homogeneous speech-community, who knows its (the speech community’s) language perfectly” (in brackets original). His conception of the ideal language rules out the “imperfections” arising from the influences of social or pragmatic dimensions in real language use. This can be seen as the conception of language as innate human competence. By contrast, constructionists have argued that language cannot be separated from the societal and social domain; social reality is constructed through languages (Berger and Luckmann, 1966 ), and linguistics should take social dimensions into account, as shown by Systemic Functional Linguistics developed by Halliday. These approaches to language studies, nevertheless, do not pay much attention to the ontological issues of language or linguistics concerning what constitutes language, whether languages can be separated from each other, and whether there are different conceptions of language(s).

Sociolinguistics, taking as its departure an interdisciplinary attempt to be the sociology regarding linguistic issues or linguistics regarding sociological issues, faces the ambivalent positioning of whether it should be sociologically oriented (that is, more explanatory) or linguistically oriented (that is, more descriptive) (Cameron, 1990 ). Also, there are contentions regarding whether more attention should be paid to epistemically linguistic minutiae (as in conversation analysis or CA), or to the macro-social interpretation of ideology not necessarily dependent on the evident orientation of the participants (as in critical discourse analysis, or CDA), as debated in Blommaert ( 2005 ) and Schegloff ( 1992 , 1998a / 1998b , 1999 ). As such, more sociolinguists than linguists in other disciplines are concerned with the ontology of language regarding its nature and its relation with broader social structures. In other words, such concerns can, firstly, justify the identity of sociolinguistics being either a branch of sociology, or linguistics, or even more broadly, anthropology. They can also delineate the contour of the macro vis-à-vis micro research subjects: are languages seen as separate systems, or inseparable but relatively fixed systems or an integrated construction in relation to their social dimensions of power, ideology and hegemony?

Such ontological concerns are important, because different approaches to research may be engendered accordingly. For instance, variational sociolinguistics is concerned with the linguistic differences within a language (standard language vis-à-vis its variations in dialects) and examines how these differences are linked to social aspects of linguistic practices, such as gender and social status. These differences within a certain category of language may be placed in the changing situations of various language communities or areas (e.g., Labov, 1963 , 1966 ), or in contextualized pragmatic situations (Agha, 2003 ; Eckert, 2008 ). Assumptions of separable or separate languages may be well-encapsulated in the works regarding language ideology and linguistic differentiation, such as the studies by Kroskrity ( 1998 ), Irvine and Gal ( 2000 ), as well as considerable other works on bilingualism or multilingualism. These works treat language as belonging to different standard systems (e.g., English, French, German, and so on) and can be pursued by “enumerating” these categories. In other words, these standard language systems are seen as having clear boundaries between them, and language can be researched by attributing different linguistic resources to (one of) these systems. The stance of the inseparability of language problematizes the enumeration of languages, by discrediting their explanatory potential in linguistic practices. In pedagogical contexts, transnational students are found using language features beyond the boundaries of language systems (Creese and Blackledge, 2010 ; Lewis et al., 2012 ). In the context of youth or urban culture, there are loosely fixed assumptions between language and ethnicity (Maher, 2005 ; Woolard, 1999 ). In some globalized contexts, new communications technologies as well as globalization itself are changing the traditional power structure in linguistic practices (Jacquemet, 2005 ; Jørgensen, 2008 ; Jørgensen et al., 2011 ). Furthermore, Makoni and Pennycook ( 2005 ), by advocating the disinvention of languages, problematize the process of “historical amnesia” (Makoni and Pennycook, 2005 , p. 149) of bi- and multilingualism, and their tradition of enumerating languages which reduces sociolinguistics to at best a “pluralization of monolingualism” (Makoni and Pennycook, 2005 , p. 148). However, this does mean that languages cannot be probed as standard categories. It holds a more intricate stance: on the one hand, it problematizes the separation of languages, as language is characterized by fluidity in multi-ethnic settings; on the other hand, it assumes the fixity of the relationship between a given (standard) language and its corresponding identity, ethnicity, and other societal factors (Otsuji and Pennycook, 2010 ); fluidity and fixity, however, are not binary attributes that exclude each other; they coexist, mutually influence each other in real-life linguistic practices. By the same token, Blackledge and Creese ( 2010 ) and Martin-Jones et al. ( 2012 ) also hold a dynamic view on language and identity: while language functions as “heritage” (see Blackledge and Creese, 2010 , pp. 164–180) and the positioning or maintenance of national identity, the bondage, however, frequently loosens as it is always contested, resisted and “disinvented” (Makoni and Pennycook, 2005 ). Table 1 illustrates three kinds of sociolinguistic conceptualizations of language.

The above discussion briefly delineates how contemporary sociolinguistic studies attempt to capture the complex ways in which the notion of language is construed, resisted or reinvented in and through practices. Most of these approaches are based on the traditional assumption of language as written signs and symbols in its verbal forms. Other forms of resources are generally seen as contexts where these verbal signs and symbols take place. They are contextual facets that contribute to the ideological and sociological corollary of language use, but they are not seen as ontological components in linguistics. Later developments, which integrate multimodal studies into sociolinguistics, show differing stances regarding the ontology of language, as shown in the next section.

Language in vis-à-vis as multimodal construct

Jewitt ( 2013 , p. 141) defines multimodality as “an inter-disciplinary approach that understands communication and representation to be more than about language”. This should be seen as a definition oriented toward social semiotics, in which different semiotic resources are seen as various modes of representation or communication through semiosis. For a sociolinguistic version of the definition, we prefer to interpret it as language in vis-à-vis as a multimodal construct. By using the word “construct”, we would like to point out that multimodality or multimodal conventions enter into sociolinguistic studies because they are socially constructed; that is, sociolinguists research these multimodal dimensions because they are semiotic resources and practices which are constructed by social subjects with power, manipulation and ideology. They are not neutral resources by which people communicate information or by which the process of meaning-making, or semiosis, is realized. Instead, they are a social construct that constitutes the type of Foucauldian knowledge in which sociological power and ideology lie at the core. In this sense, the notions, frameworks, and approaches that we discuss as follows are socially critical in nature and are predominantly related to socially constructed ideologies such as hegemony, power, and identity. As Makoni and Pennycook ( 2005 ) note, languages are “invented” by the dominant (colonial) groups through classification and naming in history; they are not neutral practices and they are constructed and invested with ideologies, power and inequality. Sociolinguistics thus needs a historically critical perspective. In fact, since its birth, sociolinguistics has been a discipline focusing on language use in relation to socially critical issues, such as gender, race, class and politics. This focus can date back as early as Labov’s ( 1963 , 1966 ) ethnographical research on variations of English on the island of Martha’s Vineyard, Massachusetts and in New York City. The sound change or phonetic features are studied in relation to ethnicity, social stratification and class. Agha ( 2003 ) and Eckert ( 2008 ) also probe the phonetic features or regional change of variations in relation to ethnicity and social and economic status.

In fact, the above-mentioned concerns of sociolinguistics are also consistent with CDA (see Wang and Jin, 2022 ; Wang and Yang, 2022 ), especially multimodal critical discourse analysis (MCDA), which also contributes to the research trend in terms of language in multimodality. Kress and van Leeuwen ( 1996 ) postulates a set of visual grammar based on systemic functional grammar. Machin ( 2016 ) and Machin and Mayr ( 2012 ) and other scholars have also adopted MCDA in various types of discourse. Semiotic resources other than language are analysed to reveal the social construct of power, ideology, and inequality in relation to verbal resources (Wang, 2014 , 2016a , 2016b ). Language in the multimodal construct in sociolinguistics is quite similar to the social semiotic and critical discourse approach to multimodality: language is seen as one type of resource, amongst other non-language resources (visual, aural, embodied, and spatial) in the meaning-making process. The difference lies in that sociolinguistic approaches toward language in multimodality have much more focus on social interaction, power and ideology and their research frequently includes ethnographical data and observations. Language as a multimodal construct, by contrast, sees language as a more integral part of multimodal resources, and vice versa; less distinct boundaries are seen as existing between languages and non-languages. These two trends of conceptions are discussed below.

Language in multimodal construct

To place language studies in the multimodal construct is not a new practice in sociolinguistics. Agha ( 2003 , p. 29) analyses the Bainbridge cartoon, treating accent not as “object of metasemiotic scrutiny”, but as an integral element in “the social perils of improper demeanour in many sign modalities” such as dress, posture, gait and gesture. His discussion demonstrates how language studies can be embedded in a larger multimodal scope. Language is contextualized by its peripheral multimodal paralinguistic sign systems. In Eckert ( 2008 , p. 25), the process of “bricolage” (Hebdige, 1984 ), in which “individual resources can be interpreted and combined with other resources to construct a more complex meaningful entity”, is linked to the style and language variations which reflect social meaning. She gives examples of how the clothing of students at Palo Alto High School affords them certain types of styles to convey social meaning. Eckert ( 2001 ), Coupland ( 2003 , 2007 ) and other scholars’ research represent the “third-wave” sociolinguistic studies, which see the use of variation in terms of personal and social styles (Eckert, 2012 ). Language and other semiotic resources constitute a stylistic complex that makes social meaning and constructs social styles and identities together. Goodwin ( 2007 ) extensively encompasses multimodal interaction in the examination of participation, stance and affect in a “homework” interaction between a father and his daughter, where gaze, gesture, and the spatial environment are taken into account. Goodwin’s research is partly premised on Bourdieu’s ( 1991 , pp. 81–89) associating bodily hexis with habitus , which is also a notion that is multimodal in itself. The deployment of different bodily modes in different contexts of participation (such as homework, archaeology, and surgery) depends on conventions of various social practices or their respective habitus .

Research regarding language in multimodal construct shares some common ground with the social semiotic approach towards multimodality. First, in communication, there are different modes of resources or semiotic types that convey social meaning and embed ideology. Second, these resources consist of language and “non-language”: the former being written or spoken signs and symbols that social actors use to communicate, and the latter being visual, aural, or embodied ones in that language are situated. Third, meaning-making is done through the orchestration of these resources.

In contrast to social semiotic approaches, with an anthropology-oriented concern, language in the multimodal construct as a sociological and sociolinguistic approach usually bases itself on ethnographical observations of social interaction. Language is seen as a component in social interactional discourse; other semiotic modes or resources are also important resources through which language use is contextualized. To be more specific, language in multimodal construct shows concerns with language as one type of semiotic resource that is placed in multimodal contexts in the following aspects:

First, meaning-making through other resources is seen as “add-ons” to that of language. In other words, language indexes social meaning and ideology in collaboration with other types of resources. An example is Agha’s ( 2003 ) analysis of the Bainbridge cartoon in which clothes, demeanour, and even body shape work in collaboration with accent in conveying register and social status. Second, language as one type of social meaning-making resource can be conceptualized in relation to the meaning-making process of other resources. For example, the process of “bricolage” is probed in relation to variations with their indexed styles and social categorization in terms of “gender and adolescence” (Eckert, 2008 , p. 458). This concept is used to offer clues regarding how “the differential use of variables constituted distinct styles associated with different communities of practice” (Eckert, 2008 , p. 458). Third, language is one of the communicative modes in social interactional discourse. It does not necessarily take the central role, because other types of resources, such as gestures, gaze, and the environment where these actions take place, jointly constitute the social meaning-making process. This can be best encapsulated in Goodwin’s ( 2007 ) analysis of the “homework” interaction between a father and his daughter. In this quite mundane interactional discourse, the father uses different embodied actions to negotiate different moral and affective stances through the “homework interaction” with his daughter. Conversation as a linguistic resource plays a role in the interaction, while embodied actions are key factors in affecting these stances.

Language as a multimodal construct

A slightly different approach to studies of language in multimodal contexts is to view it as a multimodal construct: either in the way that language is considered as autonomously constituting the semiotic texture (e.g., in the art form of the “text art” where text is also seen as picture) or in the way that some traditionally assumed extra-linguistic modes are considered as special forms or dimensions of language. This trend of research includes recent studies on language in space, social interactional multimodal discourse analysis, and new concepts or conceptualizations of language in society, as discussed below.

Language in space: semiotic landscape, place semiotics, and discourse geography

Jaworski and Thurlow ( 2010 ) review the notion of spatialization , that is, the semiotics and discursivity of space (Jaworski and Thurlow, 2010 ), and the extension of the notion of the linguistic landscape. By so doing, they frame the concept of semiotic landscape as encapsulating how written discourse interacts with other multimodal discursive resources with blurring boundaries in between.

In their opinion, space is “not only physically but also socially constructed, which necessarily shifts absolutist notions of space towards more communicative or discursive conceptualizations” (Jaworski and Thurlow, 2010 , p. 7). Sociological research on space thus is more oriented toward spatialization, “the different processes by which space comes to be represented, organized and experienced” (Jaworski and Thurlow, 2010 , p. 6). This spatialization—as represented discursively—is intrinsically multimodal:

Echoing the sentiments of Kress and van Leeuwen quoted at the start of this chapter, Markus and Cameron argue that ‘[b]uildings themselves are not representations’ (p. 15), but ways of organizing space for their users; in other words, the way buildings are used and the way people using them relate to one another, is largely dependent on the spoken, written and pictorial texts about these buildings… Architecture and language (spoken and written) may then form an even more complex, multi-layered landscape (or cityscape) combining built environment, writing, images, as well as other semiotic modes, such as speech, music, photography, and movement…(Jaworski and Thurlow, 2010 , pp. 19–20)

The “spatial turn” (Jaworski and Thurlow, 2010 , p. 6) in sociolinguistics thus adds the analytical dimensions of multimodal resources to the traditional concept of the linguistic landscape. Written language itself does convey social meaning and ideologies, while it is situated in materiality (the materials it is written on) and spatiality (the places where it appears). The concept of the semiotic landscape blurs the traditional boundary between language and non-language.

Different from social semiotic approaches towards multimodality, researchers of semiotic landscape pay predominant attention to the “metalinguistic or metadiscursive nature of ideologies” (Jaworski and Thurlow, 2010 , p. 11). In Kallen’s words, the concept of semiotic landscape starts from the assumption that “sinage is indexical of more than the ostensive message of the sign”. (Kallen, 2010 , p. 41); signage indexes ideologies that are embedded in, or indicated by, different types of space or spatiality: city centre, tourist places, districts and so on. Less interest is invested in the process of semiosis regarding how different modes of signs are orchestrated to communicate information, which is one of the primary endeavours of social semiotics (Li and Wang, 2022 ; Wang, 2014 , 2019 ; Wang and Li, 2022 ). As such, in ethnographical studies or data analysis, language, materiality, and spatiality are usually seen as interwoven with each other, with no distinct boundaries in between; or at least, boundary-marking is not the primary concern of semiotic landscape.

In the same vein, Scollon and Scollon ( 2003 , p. 2) coin the term “geosemiotics” (or “place semiotics”) which is “the study of the social meaning of signs and discourses and of our actions in the material world”. Their research objects are signs in public places. The conceptual framework of “geosemiotics” sees language as a multimodal construct in terms of the following aspects. First, verbal language is analysed by using social semiotic approaches to visuals. Code preference (regarding which language is seen as “primary” language) shown on signs or buildings is analysed by using Kress and van Leeuwen’s ( 1996 , p. 208) conception of compositional meaning indexed by different positions in pictures. Second, language is seen as multimodal itself. Language on signs or buildings is analysed in terms of the multimodal inscription (see Scollon and Scollon, 2003 , pp. 129–142) that includes fonts, letter form, material quality, layering and state changes. Third, the emplacement (referring to meaning-making through positioning signs in different places) in geosemiotics, similar to Jaworski and Thurlow’s ( 2010 ) approach towards the semiotic landscape, is predominantly concerned with spatiality and metalinguistic or metadiscursive ideology, rather than the interaction and orchestration of different modes (language vis-à-vis non-language) in semiosis.

Similar to the concepts of semiotic landscape and place semiotics, Gu ( 2009 , 2012 ) postulates the framework of four-borne discourse and discourse geography. Based on Blommaert’s ( 2005 , p. 2) view of discourse as “language-in-action”, Gu analyses the language and activities in social actors’ trajectories of time and space in the land-borne situated discourse (LBSD): a type of discourse categorized by Gu ( 2009 ) according to different types of spatiality as carriers and places where the discourses take place. In Gu’s ( 2012 ) conceptualizations, language and discourse are metaphorically spatialized: language is seen in terms of the place where it takes place. Multimodality is evaluated based on space (Gu, 2009 ). Though it is arguable to what extent language is seen as a conflation of modes or semiotic attributes in Gu ( 2009 ), his work demarcates an ambivalent boundary between language and the “non-language”. Also, in “spatializing” language as discourse geography, it represents language and discourse as a PLACE or SPACE metaphor that is multimodal itself. In addition, it analyses the translation between different modes, for instance, the “modalization” of written language into visuals and sounds; visuals are also seen as forms of “modalized” language and vice versa. As such, Gu ( 2009 ) also represents the “spatial turn” of sociolinguistics which can be seen as the research trend that regards language as multimodal construct.

In general, the trend to spatialize language and discourse (or the “spatial turn”), with the concepts or frameworks such as semiotic landscape, place semiotics, and discourse geography, treats language as multimodal construct in the following two aspects. First, it focuses on metalinguistic or metadiscursive ideologies that are embedded in different modes of signs or symbols; also, Gu’s research metaphorically theorizes social interaction through multimodality. In other words, it posits that language itself is multimodal or modalizable in meaning-making. Written language has its multimodal dimensions such as facets of its inscription including fonts, letterform, material quality, layering and state changes (Scollon and Scollon, 2003 ). Different forms of language are multimodal in terms of spatiality: they can be naturally multimodal and aural-visual for instance in televised discourse; written language can also be “modalized” (Gu, 2009 , p. 11) into visuals (Gu, 2009 ). Overall, language is either considered as signs in the spatialized system or actions in trajectories of activities. It is an integral part of multimodal construct, where other modes (visual, gesture, action, and so on) are not peripheral or auxiliary, but frequently they also belong to linguistic resources, for instance, the visual resources in text arts.

Multimodal studies from the social interactional perspective

There are sociolinguistic approaches towards multimodality that combine social interactional sociolinguistics (Goffman, 1959 , 1963 , 1974 ), social semiotic approach towards multimodality (Kress and van Leeuwen, 1996 ), and intercultural communication (Wertsch, 1998 ). We summarize these approaches as multimodal studies from the social interactional perspective, which include mediated discourse analysis (Scollon and Scollon, 2003 ) and multimodal interaction analysis (Norris, 2004 ); the latter grew out of the former.

Multimodal studies from the social interactional perspective focus on people’s daily actions and interactions, and the environment and technologies with(in) which they take place. This trend of research sees discourse as (embedded in) social interaction and sets out to investigate social action through multimodal resources used in daily interaction, such as gestures, postures, and language (see Jones and Norris, 2005 ). In Norris’s ( 2004 ) framework for multimodal interaction analysis, units of analysis are a system of layered and hierarchical actions including the lower-level actions such as an utterance of spoken language, a gesture, or a posture, and the higher-level actions consisting of chains of higher-level actions. Norris ( 2004 ) also coins the term “modal density” to refer to the complexity of modes a social actor uses to produce higher-level actions.

The focus on hierarchical levels of actions and the concept of “modal density” entail reflections on the question with regard to what constitute(s) mode and language. Language in multimodal interaction analysis is seen as a type of lower-level action amongst other different embodied resources that are at interactants’ disposal. These embodied resources are seen as different modes such as gesture, gaze, and proxemics. But arguably gestures and gazes in Norris ( 2004 ) are also seen as forms of language in interaction as well. Furthermore, regarding the mode of spoken language, Norris ( 2004 ) and her other works methodologically treat it as a multimodal construct where the pitches and intonation are visualized through various fonts in the wave-shaped annotation, along with the policeman’s gestures, as shown in Fig. 1 .

figure 1

The policeman’s spoken language is treated as a multimodal construct where the pitches and intonation are visualized through various fonts in the wave-shaped annotation, along with his gestures.

Multimodal studies from the social interactional perspective, similar to other sociolinguistic approaches to multimodality, target the meta-modal or metadiscursive facets of ideology. This is done through a bottom-up approach, that is, examining the general social categories of such as power, dominance and ideology from people’s daily (inter)action. This trend of research focuses on basic units of actions in people’s daily interaction; the conception of mode and language is oriented toward seeing language as multimodal; the methodological treatment of languages also shows this orientation. Multimodal studies from the social interactional perspective are intended to reveal the ideology and power embedded in language as action. Overall, they perceive language as a multimodal construct in social (inter)action.

Metrolingualism, heteroglossia, polylanguaging and multimodality

In the second section of the paper, we mentioned the works on some similar notions such as metrolingualism and polylanguaging. In this section, we will review the latest application of the notion of metrolingualism in multimodal analysis and discuss why other related notions or approaches also encapsulate the conceptualization regarding language as a multimodal construct.

Metrolingualism is a concept postulated by Otsuji and Pennycook ( 2010 ) originally referring to “creative linguistic conditions across space and borders of culture, history and politics, as a way to move beyond current terms such as multilingualism and multiculturalism” (Otsuji and Pennycook, 2010 , p. 244). Their later works (Pennycook and Otsuji, 2014 , 2015a , 2015b ) develop the concept and reformulate it as a broader notion encompassing the everyday language use in the city and linguistic landscapes in urban settings.

In Pennycook and Otsuji ( 2014 , 2015b ), metrolingualism involves the practice of “metrolingual multitasking” (Pennycook and Otsuji, 2015b , p. 15), in which “linguistic resources, everyday tasks and social space are intertwined” (Pennycook and Otsuji, 2015b , p. 15). Metrolingualism thus is not only concerned with the mixed use of linguistic resources (from different languages), but it involves how language use is involved in broader multimodal practices such as (embodied) actions accompanying or included in the metrolingual process, (changing) space or places where these actions and language use take place, and the objects in the environment. Pennycook and Otsuji ( 2015b ) include an olfactory mode in their analysis of the metrolingual practices in cities. Smell is represented through linguistic or pictorial signs in the city and suburb to constitute “smellscapes” in relation to social activities, ethnicities, gender and races. Metrolingual smellscapes are represented through the conflation of written and visual signs and symbols (e.g., street signs), social activities (e.g., buying and selling, and riding a bus), objects (e.g., spices), and places or spaces (e.g., suburb markets, coffee shops, buses and trains). The conventional distinction between language and the non-language is less important, or not at issue here, as smells have to be represented through language or visuals, and more resources are conceptualized as metrolingual other than languages.

Language in Pennycook and Otsuji’s ( 2014 , 2015a , 2015b ) conception of metrolingualism, in this regard, is seen as being integrated into different types of activities and actions; it is also spatialized in the sense that metrolingual practice is seen as involving the organization of space, the relationship between “locution and location” (Pennycook and Otsuji, 2015b , p. 84), (historical) layers of cities (Pennycook and Otsuji, 2015b , p. 140). The spatialization is intrinsically multimodal, which we have discussed in earlier sections.

In relation to metrolingualism, Jaworski ( 2014 ) briefly reviews the history of arts and writing, from which he chose the art form of “text art” as his research subject. Referring to the notion of metrolingualism, he sees these art forms as “metrolingual art”, where language interacts with other modes or is seen as part of the visual mode. He suggests that it be useful to “extend the range of semiotic features amenable to metrolingual usage to include whole multimodal resources” (Jaworski, 2014 , p. 151). The multimodal representations in text art are realized by mixing, meshing and queering of the linguistic features, as well as by its relation to a “melange of styles, genres, content, and materiality” (Jaworski, 2014 , p. 151). In this regard, the multimodal affordances (Kress, 2010 ; Jewitt, 2009 ) realized by materiality (e.g., papers, cloths, walls where the language is written), media (e.g., soundtrack, video, moving images, etc.), and styles (e.g., fonts, letterform, layering like add-ons or decorations) are an integral part of the metrolingualism. Subsequently, he postulates that it would be useful to align the concept of heteroglossia with metrolingualism, so as “to extend the idea of metrolingualism beyond ‘hybrid and multilingual’ speaker practices (Otsuji and Pennycook, 2010 , p. 244) and move towards a more ‘generic’ view of metrolingualism as a form of heteroglossia” (Jaworski, 2014 , p. 152). In this way, it relates the subject position taken by the producers of the text arts to their social orientation or alignment as regards power, domination, hegemony, and ideology in a broader social realm. This is also in line with Bailey’s discussion about heterogliossia: “(a) heteroglossia can encompass socially meaningful forms in both bilingual and monolingual talk; (b) it can account for the multiple meanings and readings of forms that are possible, depending on one’s subject position, and (c) it can connect historical power hierarchies to the meanings and valences of particular forms in the here-and-now” (Bailey, 2007 , pp. 266–267; also quoted in Jaworski, 2014 , p. 153). Overall, Jaworski ( 2014 ) shows how metrolingualism and heteroglossia can be used to analyse features of language and their place in multimodal construct. He also discusses how other notions which are similar to metrolingualism may bear a relationship with multimodality in that they stress “the importance of linguistic features (rather than discrete languages) as resources for speakers to achieve their communicative aims” (Jaworski, 2014 , p. 138).

Apart from the concepts of metrolingualism and heteroglossia, Jaworski ( 2014 ) touches upon the relationship between polylanguaging and multimodality, but he does not elaborate on it. Jørgensen ( 2008 ) demonstrates how polylanguaging is concerned with the use of language features in language practice among adolescents in superdiverse societies. Some of these language features “would be difficult to categorize in any given language” (Jørgensen et al., 2011 , p. 25); that is, they do not belong to any standard language system (e.g., English, Chinese, German). In addition, emoticons are frequently used in communication via social networking software. If some of these language features do not belong to any given language, it is difficult to say whether they can be seen as languages. The attention on features of language hence blurs the boundary between language and other semiotic resources. Of course, these features can be seen as a type of linguistic (lexical, morphemic or phonemic) units which still belong to language, but they are frequently used in multimodal meaning-making. Below I use Jørgensen et al.’s ( 2011 , p. 26) example (Fig. 2 ) to illustrate this.

figure 2

The “majority boy” makes use of resources from the minority’s language (the word “shark”).

Jørgensen et al.’s analysis of this example focuses on the “majority boy” using the word “shark”, which is a loan word from Arabic. As a majority member, he is using the minority’s language to which he is not entitled. Judging by the interaction, it can be seen that “both interlocutors are aware of the norm and react accordingly” (Jørgensen et al., 2011 , p. 25). As such he noted that one feature of polylanguaging is “the use of resources associated with different ‘languages’ even when the speaker knows very little of these” (Jørgensen et al., 2011 , p. 25).

What also needs attention but is not discussed by Jørgensen et al. ( 2011 ), is the interlocutors’ creative way to use these features in polylanguaging: the word “shark” is written as a prolonged “shaarkkk” in terms of its phonetic and visual effects. The creative configuration of the language feature “shark” functions to draw other interlocutors’ attention toward the polylanguaging practice. The emoticon “:D” following it is to demonstrate that the speaker knows that he is using language features by violating the “normal” rules; that is, he is using the minority language features to which he is not entitled. The repeated words “cough, cough”, followed by the emoticon “:D”, also demonstrate this.

Polylanguaging, as formulated by Jørgensen et al. ( 2011 ), deviates from the tradition of multilingualism to enumerate languages, but focuses on language features that may not belong to any given language. In this sense, the emoticons or creative configuration of words can also be seen as language features—the language features that are creatively used by a virtual community of (young) netizens in communication. These features are multimodal in the following aspects. First, they visualize the polylanguaging practice by creating new forms of words, for instance, the prolonged word “shaarkkk”. This creation itself is in fact also a process of polylanguaging, in the sense that it uses the features of common language, or language in people’s daily life (that is, non-cyber language) to create new cyber-language that is used by members of a virtual community. Second, these language features utilize the multimodal resources of embodiment in polylanguaging. For example, emoticons use different letters or punctuations (as language features from people’s daily written language) to represent different facial expressions and emotions. The repetition of the words “cough, cough”, as “a reference to a cliché way of expressing doubt or scepticism” (Jørgensen et al., 2011 , p. 27) also takes on an embodied stance. It shows that the interlocutors are aware that the majority boy is using the minority’s language to which he is not entitled. Hence, this embodied stance indexes the polylanguaging practice. To summarize what is discussed above, polylanguaging entails seeing language as a multimodal construct, as interlocutors creatively adapt language features in daily communication (face-to-face or written communication not involving the internet) or utilize embodied language features when polylanguaging in online communication.

Discussion and a critical reflection

In the sections “Language as written and spoken signs and symbols” and “Language in vis-à-vis as multimodal construct” above, we delineated the ontological perceptions of language in sociolinguistics, including language as spoken and written signs and symbols, language in vis-à-vis as a multimodal construct. In teasing out various trends of approaches, language in sociolinguistics is found to have undergone several stages of development. Language as spoken and written signs and symbols have been pursued in variational sociolinguistics, bi- and multilingualism, and the latest theoretical and conceptual trends of research that do not see language as separate and separable systems or codes. Language in sociolinguistics, however, has been predominantly placed in nuanced and complicated relationships with other semiotic resources. Research regarding language in multimodal constructs sees language and non-language resources as different modes, or types of resources. These different modes have boundaries, and efforts are made to see how each mode combines with each other in meaning-making; language itself is a distinctive type of mode, interdependent with but different from other modes. Research regarding language as a multimodal construct sees language itself as multimodal, language is spatialized (that is, probed in relation to various spatiality and materiality where they appear); in the social interactional approach to multimodality, it is embodied and seen as embedded in a layered and hierarchical system of modes (including gesture, posture, and intonation) in social interaction; in the latest concepts built on languaging, language is regarded as “inventions” (Makoni and Pennycook, 2005 ), as cross- and trans-cultural practice, instead of separable and enumerable codes, or system. Language is entangled and integrated with objects (for instance, signage, and the materiality where it appears) and multitasking with embodied resources (gestures, talking, and simultaneously doing other things).

Expanding the ontology of language from verbal resources toward various multimodal constructs has enabled sociolinguists to pursue meaning-making, indexicalities and social variations in its most authentic state. Language itself is multimodal, though it cannot be denied that language and other modes do have boundaries and distinctions (yet not always being so). Whenever a language is spoken, the stresses, intonations, and paralinguistic resources are all integrated into it. Focusing on language per se has generated fruitful outcomes in sociolinguistic studies, but placing language in the multi-semiotic resources has innovated the field and it has become the dominant trend in contemporary sociolinguistics. Both languages in or as multimodal constructs have captured the complex ways in which language interacts with multi-subjects, materiality, objects and spatiality. But it may be found that the latest research in sociolinguistics comes to increasingly see language itself as an intricate multimodal construct, as encapsulated by various new concepts and theories including translanguaging, metrolingualism, and polylanguaing, in the contexts of globalization, migration, multi-ethnicity, and new communication technologies. Language is not only seen as separable codes and systems spoken or written by a different group of people, but it entails a wider range of communicative repertoires including embodied meaning-making, objects and the environment where the written or spoken signs are placed. It hence may be speculated that sociolinguistics will be increasingly less concerned with the boundaries of language and non-language resources, but will focus more on the social constructs, social meaning, and language as a force in social change. The enumerating and separating way of studying language and multimodality—that is, delineating inter-semiotic boundaries and focusing on how modes of communication are combined in meaning-making—has generated various outcomes, especially in the field of grammar-oriented social semiotic research and MCDA. However, contemporary sociolinguistic studies have immensely expanded their scope toward a wider range of areas other than discursive, grammatical, and communicative. The three research paradigms regarding language as a multimodal construct reviewed in “Language as multimodal construct” have proved themselves as a feasible approach toward language in social interaction, geo-semiotics, and language use in ethnographical and multi-ethnic settings. The ontology of language in sociolinguistics, in this regard, may be perceived in terms of the sociology and societal facets of multimodal construct, rather than language placed in a multitude of semiotic types or the verbal resources per se. A critical reflection on the ontology of language is one of the prerequisites of innovations in contemporary linguistics, which is also the objective of this comprehensive review.

As can be seen through the above discussion, there are several versions of the perception of language in sociolinguistics. First, perceptions of language as a written or verbal system are moving from, or have moved from, the enumerating traditions bi- or multi-lingualism towards seeing language as an inseparable entity with fixity and fluidity. In other words, new approaches in sociolinguistics come to see languages as comprising different features, repertories, or resources, rather than different or discrete standard languages such as English, French, German and so on. The negotiation, construction, or attribution of ethnicity, identity, power and ideologies through language also has taken on a more dynamic and diverse look. Second, there is sociolinguistic research that places language with in the multimodal construct. Language is seen as being contextualized by other multimodal semiotics that is seen as “non-language”. However, more research comes to see language as multimodal construct; that is, language, be it written or spoken, is multimodal in itself as it comprises multimodal elements such as type, font, materiality, intonation, embodied representations and so on. It is also activated (seen as actions or activities) or spatialized in different approaches such as mediated discourse analysis, multimodal interaction analysis, geosemiotics, semiotic landscape, and metrolingualism discussed earlier. Third, these changing perceptions of languages in sociolinguistics result from researchers’ innovative efforts to view language from different perspectives. More importantly, they arise from the fact that language itself is also changing as society changes. As mentioned in the beginning, the world has been increasingly globalized and communications technologies have fundamentally changed the ways people interact with each other. Linguistic practices are complicated by the super-diversity of ethnic fluidity (e.g., the diversity of ethnic groups and the ever-present changes in ethnic structure), communications technologies, and globalized cross-cultural art.

In sum, it can be argued that contemporary sociolinguistics has become increasingly concerned with languaging (trans-, poly-, metro-, and pluri- and so on), rather than languages as a type of (static and fixed) verbal resource with demarcated boundaries separating them from other multimodal resources. Language is multimodal; it is embedded in or represents social activities, places or spaces, objects, and smells. Language in society belongs to and constitutes the “semiotic assemblage” (Pennycook, 2017 ) that can be better analysed holistically so as to reach an understanding of “how different trajectories of people, semiotic resources and objects meet at particular moments and places” (Pennycook, 2017 , p. 269). At a fundamental level of sociolinguistic ontology, this trend of research reflects the changing ways in which sociolinguists come to understand what language is and how it should be understood as part of a more general range of semiotic practices.

Agha A (2003) The social life of cultural value. Language Commun 23(3–4):231–273

Article   Google Scholar  

Berger P, Luckmann T (1966) The social construction of reality: a treatise in the sociology of knowledge. Doubleday, New York

Google Scholar  

Blackledge A, Creese A (2010) Multilingualism: a critical perspective. Continuum, London

Blommaert J (Ed.) (1999) Language ideological debates, vol. 2. Walter de Gruyter, Berlin

Blommaert J (2005) Discourse: a critical introduction. Cambridge University Press, Cambridge

Book   Google Scholar  

Bourdieu P (1991) Language and symbolic power [Thompson JB (ed and introd)] (trans: Raymond G, Adamson M). Polity Press/Blackwell, Cambridge

Bailey B (2007) Heteroglossia and boundaries. In: Heller M (Ed.) Bilingualism: a social approach. Palgrave Macmillan, New York, pp. 257–274

Chapter   Google Scholar  

Cameron D (1990) Demythologizing sociolinguistics: why language does not reflect society. In: Joseph J, Taylor T (eds) Ideologies of language. Routledge, London, pp. 79–93

Chomsky N (1965) Aspects of the theory of syntax. MIT Press, Cambridge, Massachusetts

Coupland N (2003) Sociolinguistic authenticities. J Sociolinguist 7(3):417–431

Coupland N (2007) Style: language variation and identity. Cambridge University Press, Cambridge

Creese A, Blackledge A (2010) Translanguaging in the bilingual classroom: a pedagogy for learning and teaching? Mod Language J 94:103–115

Eckert P, Rickford JR (Eds.) (2001) Style and sociolinguistic variation. Cambridge University Press, Cambridge

Eckert P (2008) Variation and the indexical field. J Sociolinguist 12(4):453–476

Eckert P (2012) Three waves of variation study: the emergence of meaning in the study of sociolinguistic variation. Annu Rev Anthropol 41(1):87–100

García O, Li W (2014) Translanguaing: language, bilingualism and education. Palgrave Macmillan, London

Goffman E (1959) The presentation of self in everyday life. Doubleday, New York, NY

Goffman E (1963) Behavior in public places. Free Press, New York, NY

Goffman E (1974) Frame analysis. Harper & Row, New York, NY

Goodwin C (2007) Participation, stance, and affect in the organization of activities. Discourse Soc 18(1):53–73

Gu Y (2009) Four-borne discourses: towards language as a multi-dimensional city of history. In: Li W, Cook V (eds.) Linguistics in the real world. Continuum, London, pp. 98–121

Gu Y (2012) Discourse geography. In: Gee JP, Hanford M (eds.) The Routledge handbook of discourse analysis. Routledge, London, pp. 541–557

Hebdige D (1984) Framing the youth ‘problem’: the construction of troublesome adolescence. In: Garms-Homolová V, Hoerning EM, Schaeffer D (eds.) Intergenerational Relationships. Lewiston, NY: C. J. Hogrefe, pp.184–195

Holliday N (2021) Prosody and sociolinguistic variation in American Englishes. Annu Rev Linguist 7:55–68

Irvine JT, Gal S (2000) Language ideology and linguistic differentiation. In: Kroskrity PV (ed.) Regimes of language: ideologies, polities, and identities. School of American Research Press, Santa Fe, pp. 35–84

Jaworski A (2014) Metrolingual art: multilingualism and heteroglossia. Int J Biling 18(2):134–158

Jaworski A, Thurlow C (eds.) (2010) Semiotic landscapes: language, image, space. Continuum, New York

Jewitt C (2009) Different approaches to multimodality. In: Jewitt C (ed) The Routledge handbook of multimodal analysis. Routledge, Abingdon, pp. 28–39

Jewitt C (2013) Multimodality and digital technologies in the classroom. In: de Saint-Georges I, Weber J (eds) Mulitlingualism and multimodality: current challenges for educational studies. Sense Publishing, Boston, pp. 141–152

Jørgensen JN (2008) Poly-lingual languaging around and among children and adolescents. Int J Multiling 5(3):161–176

Jørgensen JN, Karrebæk MS, Madsen LM, Møller JS (2011) Polylanguaging in superdiversity. Diversities 13(2):23–37

Jacquemet M (2005) Transidiomaticpractices: language and power in the age of globalization. Language Commun 25:257–277

Jones R, Norris S (2005) Discourse as action/discourse in action. In: Norris S, Jones R (eds) Discourse in action: introducing mediated discourse analysis. Routledge, London, pp. 1–3

Kallen J (2010) Changing landscapes: language, space and policy in the Dublin linguistic landscape. In: Jaworski A, Thurlow C (eds) Semiotic landscapes: language, image, space. New York: Continuum, pp. 41–58

Kress GR (2010) Multimodality: a social semiotic approach to contemporary communication. Routledge, London

Kress GR, van Leeuwen T (1996) Reading Images: the grammar of graphic design. Routledge, London

Kroskrity PV (1998) Arizona Tewa Kiva speech as a manifestation of linguistic ideology. In: Schieffelin BB, Woolard KA, Kroskrity P (eds) Language ideologies: practice and theory. Oxford University Press, New York, pp. 103–122

Labov W (1963) The social motivation of a sound change. Word 19(3):273–309

Labov W (1966) Hypercorrection by the lower middle class as a factor in linguistic change. Sociolinguistics 1966:84–113

Lawson R (2020) Language and masculinities: history, development, and future. Annu Rev Linguist 6(1):409–434

Lewis WG, Jones B, Baker C (2012) Translanguaging: origins and development from school to street and beyond. Educ Res Eval 18(7):641–654

Li W, Wang J (2022) Chronotopic identities in contemporary Chinese poetry calligraphy. Poznan Stud Contemp Linguist 58(4):861–884

Machin D (2016) The need for a social and affordance-driven multimodal critical discourse studies. Discourse Soc 27(3):322–334

Machin D, Mayr A (2012) How to do critical discourse analysis: a multimodal introduction. Sage, London

Maher J (2005) Metroethnicity, language, and the principle of Cool. Int J Sociol Language 11:83–102

Makoni S, Pennycook A (2005) Disinventing and (re)constituting languages. Crit Inq Language Stud 2(3):137–156

Martin-Jones M, Blackledge A, Creese A (eds) (2012) The Routledge handbook of multilingualism. Routledge, London

Norris S (2004) Analyzing multimodal interaction: a methodological framework. Routledge, London

Otsuji E, Pennycook A (2010) Metrolingualism: fixity, fluidity and language in flux. Int J Multiling 7:240–254

Pennycook A (2017) Translanguaging and semiotic assemblages. Int J Multiling 14(3):1–14

Pennycook A, Otsuji E (2014) Metrolingual multitasking and spatial repertoires: ‘Pizza mo two minutes coming’. J Socioling 18(2):161–184

Pennycook A, Otsuji E (2015a) Making scents of the landscape. Linguist Landsc 1(3):191–212

Pennycook A, Otsuji E (2015b) Metrolingualism. Language in the city. Routledge, New York

Sankoff G (2018) Language change across the lifespan. Annu Rev Linguist 4:297–316

Schegloff EA (1992) In another context. In: Duranti A, Goodwin C (eds) Rethinking context: language as an interactive phenomenon. Cambridge University Press, Cambridge, pp. 191–227

Schegloff EA (1998a) Positioning and interpretative repertoires: conversation analysis and poststructuralism in dialogue: reply to Wetherell. Discourse Soc 9(3):413–416

Schegloff EA (1998b) Reply to Wetherell. Discourse Soc 9(3):457–60

Schegloff EA (1999) ‘Schegloff’s texts’ as ‘Billig’s data’: a critical reply. Discourse Soc 10(4):558–572

Scollon R, Scollon S (2003) Discourses in place: language in the material world. Routledge, New York

Saussure F (1916) Course in general linguistics. Duckworth, London

Wang J (2014) Criticising images: critical discourse analysis of visual semiosis in picture news. Crit Arts 28(2):264–286

Wang J (2016a) Multimodal narratives in SIA’s “Singapore Girl” TV advertisements—from branding with femininity to branding with provenance and authenticity? Soc Semiot 26(2):208–225

Article   MathSciNet   Google Scholar  

Wang J (2016b) A new political and communication agenda for political discourse analysis: critical reflections on critical discourse analysis and political discourse analysis. Int J Commun 10:19

ADS   Google Scholar  

Wang J (2019) Stereotyping in representing the “Chinese Dream” in news reports by CNN and BBC. Semiotica 2019(226):29–48

Wang J, Jin G (2022) Critical discourse analysis in China: history and new developments. In: Aronoff M, Chen Y, Cutler C (eds) Oxford Research Encyclopedia of Linguistics. Oxford University Press. https://doi.org/10.1093/acrefore/9780199384655.013.909

Wang J, Li W (2022) Situating affect in Chinese mediated soundscapes of suona. Soc Semiot. https://doi.org/10.1080/10350330.2022.2139171

Wang J, Yang M (2022) Interpersonal-function topoi in Chinese central government’s work report (2020) as epidemic (counter-) crisis discourse. J Language Politics. https://doi.org/10.1075/jlp.22022.wan

Wertsch JV (1998) Voices of the mind: a sociocultural approach to mediated action. Harvard University Press, Cambridge, MA

Woolard K (1999) Simultaneity and bivalency as strategies in bilingualism. J Linguist Anthropol 8(1):3–29

Download references

Acknowledgements

Our thanks are extended to Dr. William Dezheng Feng for his constructive advice on the earlier drafts of the paper. This work is supported by the National Social Science Foundation of China (Project No. 18CYY050); the Foreign Language Education Foundation of China (Project No. ZGWYJYJJ11A030); and the Self-Determined Research Funds of CCNU from MOE for basic research and operation (Project No. CCNU20TD008).

Author information

Authors and affiliations.

Central China Normal University, Wuhan, China

Jiayu Wang, Guangyu Jin & Wenhua Li

Inner Mongolia Agricultural University, Hohhot, China

Guangyu Jin

You can also search for this author in PubMed   Google Scholar

Contributions

All three authors contributed to the conception and design of the study. JW mainly participated in drafting the work. GJ revised it critically for important intellectual content. WL participated in major intellectual contributions to the Chinese versions of the paper (unpublished); her ideas and points are integrated into the final version of this paper. All three authors are corresponding authors responsible for the final approval of the version to be published.

Corresponding authors

Correspondence to Jiayu Wang , Guangyu Jin or Wenhua Li .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Additional information.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Wang, J., Jin, G. & Li, W. Changing perceptions of language in sociolinguistics. Humanit Soc Sci Commun 10 , 91 (2023). https://doi.org/10.1057/s41599-023-01574-5

Download citation

Received : 12 September 2022

Accepted : 20 February 2023

Published : 08 March 2023

DOI : https://doi.org/10.1057/s41599-023-01574-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research paper in different languages

The Hidden Bias of Science’s Universal Language

The vast majority of scientific papers today are published in English. What gets lost when other languages get left out?

research paper in different languages

Newton’s Principia Mathematica was written in Latin; Einstein’s first influential papers were written in German; Marie Curie’s work was published in French. Yet today, most scientific research around the world is published in a single language, English.

Since the middle of the last century, things have shifted in the global scientific community. English is now so prevalent that in some non-English speaking countries, like Germany, France, and Spain, English-language academic papers outnumber publications in the country’s own language several times over. In the Netherlands, one of the more extreme examples, this ratio is an astonishing 40 to 1.

A 2012 study from the scientific-research publication Research Trends examined articles collected by SCOPUS, the world’s largest database for peer-reviewed journals. To qualify for inclusion in SCOPUS, a journal published in a language other than English must at the very least include English abstracts; of the more than 21,000 articles from 239 countries currently in the database, the study found that 80 percent were written entirely in English. Zeroing in on eight countries that produce a high number of scientific journals, the study also found that the ratio of English to non-English articles in the past few years had increased or remained stable in all but one.

This gulf between English and the other languages means that non-English articles, when they get written at all, may reach a more limited audience. On SCImago Journal Rank —a system that ranks scientific journals by prestige, based on the citations their articles receive elsewhere—all of the top 50 journals are published in English and originate from either the U.S. or the U.K.

In short, scientists who want to produce influential, globally recognized work most likely need to publish in English—which means they’ll also likely have to attend English-language conferences, read English-language papers, and have English-language discussions. In a 2005 case study of Korean scientists living in the U.K., the researcher Kumju Hwang, then at the University of Leeds, wrote: “The reason that [non-native English-speaking scientists] have to use English, at a cost of extra time and effort, is closely related to their continued efforts to be recognized as having internationally compatible quality and to gain the highest possible reputation.”

It wasn’t always this way. As the science historian Michael Gorin explained in Aeon earlier this year, from the 15th through the 17th century, scientists typically conducted their work in two languages: their native tongue when discussing their work in conversation, and Latin in their written work or when corresponding with scientists outside their home country.

“Since Latin was no specific nation’s native tongue, and scholars all across European and Arabic societies could make equal use of it, no one ‘owned’ the language. For these reasons, Latin became a fitting vehicle for claims about universal nature,” Gordin wrote. “But everyone in this conversation was polyglot, choosing the language to suit the audience. When writing to international chemists, Swedes used Latin; when conversing with mining engineers, they opted for Swedish.”

As the scientific revolution progressed through 17th and 18th centuries, Gordin continued, Latin began to fall out of favor as the scientific language of choice:

Galileo Galilei published his discovery of the moons of Jupiter in the Latin Sidereus Nuncius of 1610, but his later major works were in Italian. As he aimed for a more local audience for patronage and support, he switched languages. Newton’s Principia (1687) appeared in Latin, but his Opticks of 1704 was English (Latin translation 1706).

But as this shift made it more difficult for scientists to understand work done outside of their home countries, the scientific community began to slowly consolidate its languages again. By the early 19th century, just three—French, English, and German—accounted for the bulk of scientists’ communication and published research; by the second half of the 20th century, only English remained dominant as the U.S. strengthened its place in the world, and its influence in the global scientific community has continued to increase ever since.

As a consequence, the scientific vocabularies of many languages have failed to keep pace with new developments and discoveries. In many languages, the  words “quark” and “chromosome,” for example, are simply transliterated from English. In a 2007 paper, the University of Melbourne linguist Joe Lo Bianco described the phenomenon of “domain collapse,” or “the progressive deterioration of competence in [a language] in high-level discourses.” In other words, as a language stops adapting to changes in a given field, it can eventually cease to be an effective means of communication in certain contexts altogether.

In many countries, college-level science education is now conducted in English—partially because studying science in English is good preparation for a future scientific career, and partially because the necessary words often don’t exist in any other language. A 2014 report from the University of Oxford found that the use of English as the primary language of education in non-English speaking countries is on the rise, a phenomenon more prevalent in higher education but also increasingly present in primary and secondary schools.

But even with English-language science education around the world, non-native speakers are still often at a disadvantage.

“Processing the content of the lectures in a different language required a big energetic investment, and a whole lot more concentration than I am used to in my own language,” said Monseratt Lopez, a McGill University biophysicist originally from Mexico.

“I was also shy to communicate with researchers, from fear of not understanding quite well what they were saying,” she added. “Reading a research paper would take me a whole day or two as opposed to a couple of hours.”

Sean Perera, a researcher in science communication from the Australian National University, described the current situation this way: “The English language plays a dominant role, one could even call it a hegemony … As a consequence, minimal room or no room at all is allowed to communicators of other languages to participate in science in their own voice—they are compelled to translate their ideas into English.”

In practice, this attitude selects for only a very specific way of looking at the world, one that can make it easy to discount other types of information as nothing more than folklore. But knowledge that isn’t produced via traditional academic research methods can still have scientific value—indigenous tribes in Indonesia , for example, knew from their oral histories how to recognize the signs of an impeding earthquake, enabling them to flee to higher ground before the 2004 tsunami hit. Similarly, the Luritja people of central Australia have passed down an ancient legend of a deadly “fire devil” crashing from the sun to the Earth—which, geologists now believe, describes a meteorite that landed around 4,700 years ago.

“It is all part of a growing recognition that Indigenous knowledge has a lot to offer the scientific community,” the BBC wrote in an article describing the Luritja story. “But there is a problem—indigenous languages are dying off at an alarming rate, making it increasingly difficult for scientists and other experts to benefit from such knowledge.”

Science’s language bias, in other words, extends beyond what’s printed on the page of a research paper. As Perera explained it, so long as English remains the gatekeeper to scientific discourse, shoehorning scientists of other cultural backgrounds into a single language comes with “the great cost of losing their unique ways of communicating ideas.”

“They gradually lose their own voice,” he said—and over time, other ways of understanding the world can simply fade away.

  • Tools and Resources
  • Customer Services
  • Applied Linguistics
  • Biology of Language
  • Cognitive Science
  • Computational Linguistics
  • Historical Linguistics
  • History of Linguistics
  • Language Families/Areas/Contact
  • Linguistic Theories
  • Neurolinguistics
  • Phonetics/Phonology
  • Psycholinguistics

Sign Languages

  • Sociolinguistics
  • Share This Facebook LinkedIn Twitter

Article contents

Languages of the world.

  • William R. Leben William R. Leben Department of Linguistics, Stanford University
  • https://doi.org/10.1093/acrefore/9780199384655.013.349
  • Published online: 26 February 2018

About 7,000 languages are spoken around the world today. The actual number depends on where the line is drawn between language and dialect—an arbitrary decision, because languages are always in flux. But specialists applying a reasonably uniform criterion across the globe count well over 2,000 languages in Asia and Africa, while Europe has just shy of 300. In between are the Pacific region, with over 1,300 languages, and the Americas, with just over 1,000. Languages spoken natively by over a million speakers number around 250, but the vast majority have very few speakers. Something like half are thought likely to disappear over the next few decades, as speakers of endangered languages turn to more widely spoken ones.

The languages of the world are grouped into perhaps 430 language families, based on their origin, as determined by comparing similarities among languages and deducing how they evolved from earlier ones. As with languages, there’s quite a lot of disagreement about the number of language families, reflecting our meager knowledge of many present-day languages and even sparser knowledge of their history. The figure 430 comes from Glottolog.org, which actually lists them all. While the world’s language families may well go back to a smaller number of original languages, even to a single mother tongue, scholars disagree on how far back current methods permit us to trace the history of languages.

While it is normal for languages to borrow from other languages, occasionally a totally new language is created by mixing elements of two distinct languages to such a degree that we would not want to identify one of the source languages as the mother tongue. This is what led to the development of Media Lengua, a language of Ecuador formed through contact among speakers of Spanish and speakers of Quechua. In this language, practically all the word stems are from Spanish, while all of the endings are from Quechua. Just a handful of languages have come into being in this way, but less extreme forms of language mixture have resulted in over a hundred pidgins and creoles currently spoken in many parts of the world. Most arose during Europe’s colonial era, when European colonists used their language to communicate with local inhabitants, who in turn blended vocabulary from the European language with grammar largely from their native language.

Also among the languages of the world are about 300 sign languages used mainly in communicating among and with the deaf. The structure of sign languages typically has little historical connection to the structure of nearby spoken languages.

Some languages have been constructed expressly, often by a single individual, to meet communication demands among speakers with no common language. Esperanto, designed to serve as a universal language and used as a second language by some two million, according to some estimates, is the prime example, but it is only one among several hundred would-be international auxiliary languages.

This essay surveys the languages of the world continent by continent, ending with descriptions of sign languages and of pidgins and creoles. A set of references grouped by section appears at the very end. The main source for data on language classification, numbers of languages, and speakers is the 19th edition of Ethnologue (see Resources), except where a different source is cited.

  • language family
  • language history
  • language classification
  • sign language

1.1 Indo-European

Most of Europe’s languages belong to the Indo-European family, which has the following branches: Celtic, Germanic, Italic, Greek, Albanian, Balto-Slavic, Armenian, Indo-Iranian, Anatolian, and Tocharian.

1.1.1 Celtic

Celtic, which extended across much of Europe as far east as present-day Turkey 2,000 years ago, has undergone gradual contraction since the ascendance of the Romans in Europe, and with the spread of English and French the Celtic languages have long been confined to parts of Britain, Ireland, and western France. The two main branches of modern Celtic are Brythonic and Goidelic. In the Brythonic branch are Welsh, Cornish, and Breton; the Goidelic branch includes Irish, Scottish Gaelic, and Manx.

Gaulish, a third branch, went extinct but has recently undergone restoration attempts, as have Manx and Cornish, which also were extinct. In fact, all present-day Celtic languages have seen revitalization efforts. This is happening even with Welsh—hardly an endangered language with 562,000 speakers in the 2011 census. Currently, Wales has school programs aimed at getting a greater proportion of ethnic Welsh, who number nearly 2,400,000, to learn to speak the language. The same is happening with Breton, spoken by over 200,000 in Brittany in northwestern France, but “no longer exclusively, predominately, or even commonly used by the population in any city, town, or village in Brittany,” according to Adkins ( 2013 ). As in Wales, school programs in Brittany since at least the 1970s have aimed to get young people speaking a variety of their ethnic tongue.

1.1.2 Germanic

Germanic’s two branches, North and West, were once grouped into a superbranch called Northwest Germanic, once paired with the Gothic branch that went extinct, largely in the Middle Ages, though isolated traces of Crimean Gothic remained until the late 18th century . The North Germanic languages are Swedish, Danish, Norwegian, Icelandic, and Faroese. West Germanic includes English, German, Dutch, and Frisian. Two of these are paired with a sister language that is also spoken by significant numbers: Dutch with Afrikaans, and German with Yiddish.

1.1.3 Italic

This is the ancestral branch of the modern Romance languages, all descended from a colloquial form of Latin. About 2,500 years ago, the Italic branch included not just Latin but also Oscan, Umbrian, and Faliscan, but these languages have no modern descendants. The modern descendants of Latin include French, Catalan, Spanish, Portuguese, Italian, Romanian, Sardinian, Romansch, Ladin, Friulian, Occitan, and Judeo-Spanish.

1.1.4 Greek and Albanian

Modern Greek is the only descendant of this branch, also called Hellenic. Albanian, similarly, is the only descendant of the Albanian branch.

1.1.5 Balto-Slavic

This group has Baltic and Slavic subbranches. The official languages of Baltic countries Lithuania and Latvia make up the Baltic subbranch. Slavic has three divisions: Eastern (Russian, Ukrainian, and Belarusian), Southern (Serbo-Croatian, Macedonian, Slovenian, and Bulgarian), and Western (Polish, Czech, Slovak, and Sorbian).

1.1.6 Indo-Iranian

The languages of this branch are spoken in Asia. See section 3.1 .

1.1.7 Armenian

Armenia is here considered a language of Europe, though a good case could be made for including it in Asia. Like Greek and Albanian, the Armenian branch has just one language, with a major division between Eastern and Western dialects. The standard language of Armenia is in the Eastern Armenian group, which also includes the dialects of Armenian communities in Iran, Russia, Georgia, and their environs. Texts from Armenian Cilicia from the 11th to the 14th centuries ce are the first to show a differentiated Western dialect. Many dialects of Western Armenian were obliterated by the Armenian genocide, but the Western Armenian standard and its dialects are found in Turkey (especially Istanbul), the Levant, and émigré communities in the West. Armenian is of special interest to linguists because of retentions from Indo-European, notably all seven of its noun cases and the irregular retention of initial laryngeals.

1.1.8 Anatolian and Tocharian

The languages of this branch were spoken in Asia. See section 3.1.1 .

Three important languages in this family are Finnish, Estonian, and Hungarian. These three are traditionally grouped into a branch called Finno-Ugric. But while Finnish and Estonian are closely related members of the Finnic branch of Uralic, Hungarian’s membership in a sister branch to Finnic is under challenge; Ethnologue has dropped Finno-Ugric from its listing and now casts Hungarian as a separate Uralic entity. See Salminen ( 2002 ) for arguments. The remaining languages of Uralic are smaller ones found in northern parts of Europe and Asia.

1.3 Caucasus Area

The area of the Caucasus Mountains and its environs between the Caspian and Black Seas includes Georgia, Armenia, and Azerbaijan and parts of neighboring countries. This relatively small region may have up to around 40 highly diverse languages, falling into three families, Nakho-Dagestanian, Abkhazo-Adyghean, and Kartvelian. The most important Nakho-Dagestanian language is Chechen. Abkhaz-Adyghean is made up of Abkhaz and Adyghe and is best known among linguists for systems with 60 or more contrasting consonants but very few vowels. The major Kartvelian language is Georgian, with four million speakers. Ethnologue combines Nakho-Dagestanian and Abkhaz-Adyghean into a single family, North Caucasian. Nakho-Dagestanian and Abkhaz-Adyghean are also known by the respective names Northwest (or simply West) Caucasian and Northeast (or East) Caucasian.

Basque is an isolate spoken in the Western Pyrenees by about half a million, some in France but most in Spain. Its history in this location is widely thought to go back several millennia, antedating the more recent Indo-European migrations to the region. There have attempts to identify Basque with a wide variety of groups, including Kartvelian, Afro-Asiatic, and Iberian, but without attracting much support. Recent DNA evidence reinforces the notion of Basque descent from an ancient population of farmers and hunters (Günther et al., 2015 ).

1.5 Turkish

Turkish, a language of Europe and Asia, belongs to the Turkic group, described in the section on Asia.

Africa’s extraordinary linguistic diversity is threatened by the possible extinction of half or more of its languages, which some predict by the end of the century due to competition from other languages. The current count exceeds 2,000 languages, grouped into just a few families.

The most revolutionary aspects of Greenberg’s ( 1955 , 1963 ) classification of African language families largely stand, though with many adjustments by later experts in the different languages. Many other questions still remain open. For example, Greenberg recognized Khoisan as a family, but later scholars have tended to set a higher bar for establishing genetic relationships, leading most to reject it as a family and to defer judgment on particular groupings into branches. The unity of Nilo-Saharan is also called into question, and despite detailed comparative work by Bender ( 1996–1997 ) and Ehret ( 2001 ), some reject Nilo-Saharan as a valid genetic unit. For Niger-Congo, the status of some member branches—Kordofanian, Mande, Dogon, and Ijoid—has been challenged, though Niger-Congo itself is widely recognized as a valid family.

The Afro-Asiatic family is well established, though there are debates about subgrouping. For example, do Semitic, Berber, and Cushitic together form a separate branch, as Bender ( 1996–1997 ) contends? Within Cushitic, Greenberg’s classification included Omotic, which many now regard as a distinct branch, while Glottolog fails to recognize Omotic as an established group at all. Within Niger-Congo, there are a number of unanswered questions, many revolving around the constituency of its most complex branch, Benue-Congo, which uncontroversially includes all the Bantu languages and many more. Among the changes, the Kwa languages are now reduced to what Greenberg called Western Kwa, and the remaining languages have been moved from Greenberg’s Kwa into distinct branches, though experts still differ on their precise classification. For details and references, see Bendor-Samuel and Hartell ( 1989 ) and the references in Nordhoff et al. ( 2013 ).

2.1 Afro-Asiatic

This is the northernmost family, with a few hundred languages spanning all of North Africa and the Middle East, as well as two smaller areas of sub-Saharan Africa. The six branches of Afro-Asiatic are Semitic, Berber, Chadic, Cushitic, Omotic, and Egyptian. The Semitic branch has 78 languages, including Arabic, the first language of up to 300 million throughout North Africa and widely spoken in the Middle East. Among the world’s languages, Arabic ranks fourth in the number of speakers. Other important Semitic languages are Hebrew, which shares official status in Israel with Arabic, and several Ethiopic languages. Amharic, the official language of Ethiopia and the first language of 21 million, is a South Ethiopic language. In the North Ethiopic branch is Tigrigna, an official language of Eritrea spoken by 7 million.

The term Afro-Asiatic was used by Joseph Greenberg to replace the designation Hamito-Semitic, which posited a division between the Semitic branch (named for Biblical figure Shem) and a putative branch named for Biblical figure Ham. The notion that Hamitic languages formed a unified branch seemingly reflected factors like speakers’ typical occupations and a lighter skin color than black Africans to the south. Greenberg argued that extraneous factors like these had no place in language classification, which should be based solely on linguistic data. Comparing languages from the different groups classed as Hamitic, Greenberg concluded that the evidence did not support their grouping into a single branch.

The Berber branch of Afro-Asiatic is spoken in the foothills of the Atlas Mountain in Morocco and Algeria and, spottily, in neighboring countries. Cushitic gets its name from Cush, the son of Ham. The several dozen languages of this group are spoken mainly in Ethiopia and Somalia, with a few in Kenya and Tanzania. Chadic languages are mainly spoken in the countries surrounding Lake Chad and are dominant in northern Nigeria, numbering close to 200 in all. By far the most widely spoken is Hausa, with 25 million native speakers. The languages of the Omotic branch, numbering over two dozen, are all spoken in southwestern Ethiopia. The Egyptian branch, thanks to hieroglyphs, can be traced back before 3,000 bce . Ancient Egyptian was the ancestor of Coptic, spoken in Egypt, but over time was replaced by Arabic until Coptic died out, roughly 400 years ago. Since then Coptic has survived as a liturgical language.

2.2 Nilo-Saharan

The approximately 200 languages occupy a band extending from the Nile region to the Sahara desert. For a relatively small family, they are quite diverse typologically, leaving some doubt as to whether the Nilotic and Saharan branches really deserve to be grouped into a family. Reflecting this, Glottolog divides them into two separate families, Nilotic and Saharan.

2.3 Niger-Congo

The great majority of languages in sub-Saharan Africa are members of the Niger-Congo family. Its 1,538 languages make it the world’s largest language family, and only the Indo-European and Sino-Tibetan language families have more speakers than Niger-Congo. Ideas about the respective genetic affiliations of well-known groups within Niger-Congo have changed substantially over the last half-century. This has been the case with Kwa, Mande, Gur, Atlantic, and Benue-Congo, among others. To date, the truly remarkable event in the classification of this family remains Greenberg’s ( 1955 , 1963 ) demonstration that Bantu—a group of 538 languages covering most of Central and Southern Africa—was, along with other languages called Bantoid, a subgroup within a group now called East Benue-Congo, most of whose other languages are spoken in Nigeria and Cameroon. This discovery—which took ten years before gaining the wide acceptance it has today—not only challenged earlier assumptions about linguistic classification but also opened the door to hypotheses about Bantu origins. The currently accepted view is that Bantu originated in southeastern Nigeria and expanded east and south from there.

2.4 Khoisan

Among the languages of the world, some are poorly studied and go back so far in time that it is hard to trace their genetic origins. This is the case with Khoisan, which is generally not recognized as an established family but as a set of 27 languages—some with just a handful of speakers—that are likely not to belong to the other three established families of African languages. Ermisch ( 2008 ) presents what is known, along with the residual problems.

2.5 Austronesian

Off the southeastern coast of Africa is the island of Madagascar, home to Malagasy, a Malayo-Polynesian language brought over by the island’s earliest settlers maybe 1,500 years ago. For more on Malayo-Polynesian, see the subsection on Austronesian in the section on Oceania.

Asia is home to 60% of the world’s population and nearly 30% of the world’s languages. These are grouped into just a handful of major families, leaving out several important isolates, and due to long periods of contact, there’s less diversity than one might expect. The downside is that the contact situation has made it difficult to classify genetic relationships with certainty in some important cases. And it’s worth mentioning some areal features for various subregions.

3.1 Indo-European

The Indo-European languages of Europe were discussed in section 2 . This section describes the Indo-European languages of Asia.

3.1.1 Anatolian and Tocharian

Both of these branches are long extinct. Anatolian’s replacement by Greek is linked to the conquests of Alexander the Great. The Tocharian branch became extinct with the expansion of Turkic Uyghur tribes in the 9th century ce . Tocharian manuscripts from a few centuries prior to extinction, uncovered in the early 20th century , provided information that led scholars to reassess key assumptions about Proto-Indo-European and its descendants. Anatolian inscriptions from a much earlier era, about two millennia prior, similarly reshaped what had been known. Gamkrelidze and Ivanov ( 1990 ) offer a highly readable synthesis and summary of research presented in Gamkrelidze and Ivanov ( 1990 ).

3.1.2 Indo-Iranian

Indo-Iranian has two large branches, Indo-Aryan and Iranian. Among the over two hundred Indo-Aryan languages, Hindi and Urdu are official languages of India and Pakistan, respectively, and many consider them dialects of a single language. Kachru’s ( 2008 ) linguistic sketch describes Hindi and Urdu as closely related, mentioning the special case of Hindustani, an essentially colloquial language that has been called a co-dialect of Hindi and Urdu. Hindustani is the language once promoted by Gandhi and the Indian National Congress as a tool of national unity. For the Hindustani controversy, see Kachru ( 2008 ).

The largest language of the Iranian component of Indo-Iranian is Persian, with estimates exceeding 50 million native speakers in Iran. Written records of Old Persian go back to the 6th century bce . Other important languages in the Iranian branch are Pashto, mainly spoken in Afghanistan and Pakistan, and Kurdish, mainly spoken in Turkey, Iraq, and Iran.

The approximately 40 languages of this family extend from Macedonia to Siberia, Central Asia, and western China. Despite the vastness of this area, the languages themselves are typologically quite similar: agglutinative, with vowel harmony involving both backness and rounding.

3.3 Mongolic

The Mongolic languages are a group of about a dozen spoken in Mongolia and in adjacent areas of the Russian Federation and China. Mongolian, with over six million speakers, is by far the largest language in the family and the official language both of Mongolia and of the Inner Mongolian Autonomous Region of China.

3.4 Tungusic

The 11 languages of this family are scattered through Siberia, the Far East of Russia, and northwestern China, but most are endangered and some are nearly extinct. That includes Manchu, the language of the founders of the Qing Dynasty, which ruled China for nearly three centuries up to 1912 . The 2016 edition of Ethnologue lists only 20 speakers for Manchu, though over ten million are ethnically Manchu.

3.5 Altaic Area

The Altaic area extends from Turkey across the Altai Mountain area of Central and East Asia to Siberia. Altaic has been regarded by some as a family comprising Turkic, Mongolic, and Tungusic, and for a few even including as distant members Japonic and Korean. Versions of the Altaic hypothesis still have adherents, even though this notion has been cast into doubt as criteria have been challenged and evidence has been rejected as based largely on shared typological similarities, a position summarized in Unger ( 1990 ). Despite this, adherents continue to make a case, among them Miller ( 1991 ), Georg et al. ( 1999 ), and Robbeets ( 2005 ). The more conservative consensus is that many resemblances among languages in this linguistic area could have come from language contact rather than a shared ancestor. This view is reflected in Ethnologue and Glottolog, among others.

3.6 Dravidian

Dravidian languages are spoken primarily in southern India, though some are also found further north in the Indian subcontinent. The major literary languages are Tamil, Malayalam, Kannada, and Telugu, each one the first language of tens of millions. More is known about the history of Dravidian than about many other language families, thanks to the long literary periods of the four major languages.

Questions have been raised about Dravidian similarities to Uralic and Altaic, among several others. Austerlitz ( 1971 ) dismissed these, and Krishnamurti ( 2003 ), briefly surveying archeological and DNA literature along with linguistic evidence in his foundational work on Dravidian, seconds the conclusion that the linguistic arguments behind the proposed genetic relationships are tenuous and speculative.

Dravidian morphology is mainly agglutinative but lacks the long strings of affixes found in other agglutinative languages. The typical word order is SOV. Dravidian’s three-way contrast in coronal stops (dental, alveolar, and retroflex) can be traced back to proto-Dravidian. Sanskrit, an Indo-Aryan language, owes its retroflex consonants to Dravidian, from which they are thought to have spread by diffusion.

3.7 Sino-Tibetan

The languages of this family are spoken in China, the Himalayas, and Burma. The division into Chinese and Tibeto-Burman branches is customary, as espoused by Matisoff ( 2003 ), though a few experts, including van Driem ( 2007 ), still question the grouping of Sinitic as a separate sister branch to Tibeto-Burman, along with many particulars. Tibeto-Burman, with well over 400 languages, is especially problematic because of the inaccessibility of many languages in the Himalayas, not to mention that van Driem ( 2015 , p. 141) finds them “endangered with imminent extinction.” Overall, the lower-level groupings within Tibeto-Burman are more certain than the higher-level ones, leading van Driem ( 2001 ) to posit a “Fallen Leaves” model that recognizes clumps of closely related languages without identifying where on the family tree they fell from. Still, Ethnologue offers a full family tree. Sino-Tibetan was at one time thought to include languages farther south, such as the Tai-Kadai languages and the Hmong-Mien (Miao-Yao) languages, but the similarities among these languages are probably better attributed to areal diffusion, including massive lexical borrowing from Chinese.

3.7.1 Chinese

Member languages of the Chinese (or Sinitic) branch are sometimes called dialects, especially in China, but this stretches the normal meaning of the term “dialect” too far, since the 14 languages that make up Chinese are far from mutually intelligible, even though they share the same writing system and many grammatical properties. Each of the Chinese languages of course has dialects. Ethnologue lists five major dialects for Mandarin (which also goes by the name Guanhua): Huabei Guanhua (Northern Mandarin), Xibei Guanhua (Northwestern Mandarin), Xinan Guanhua (Southwestern Mandarin), Jinghuai Guanhua (Eastern Mandarin), and Jiangxia Guanhua (Lower Yangtze Mandarin). Other sources divide the dialects differently, due not only to differences of linguistic and geographical criteria but also to centuries of diffusion of linguistic features. For discussion, see Kurpaska ( 2010 ) and Yan ( 2006 ). With over a billion speakers total, Mandarin’s dialects have many subdialects as well.

Linguistic diffusion is the general pattern in the historical development of Chinese, due to over a dozen massive population movements going back to the 7th century bce and continuing to the present, each migration involving hundreds of thousands and often millions of people. Complicating these scenarios is the fact that in most cases, the migrations were to areas already settled by speakers of Chinese or other languages, often resulting in language mixture. The history of these migrations and their linguistic effects is traced by LaPolla ( 2001 ).

3.7.2 Tibeto-Burman

As already noted, most of the languages of this branch are endangered. As a group, they have many linguistic traits in common, including SOV order and agglutinative verb structure. Two word-order exceptions are the Karenic languages (Myanmar) and Bai (China), which have the SVO order characteristic of Sinitic, though unlike Sinitic, Karen and Bai are also relatively agglutinative. Karen and Bai both stand out enough from the rest of Tibeto-Burman to inspire attempts to classify them outside of Tibeto-Burman proper. Benedict’s ( 1976 ) proposed sister to Sinitic, labeled Tibeto-Karenic, with Tibeto-Burman as a daughter, has been ruled out, while more recently several scholars have taken up the case for linking Bai with Sinitic. See Wang ( 2005 ) for a brief survey with references.

3.8 Austro-Asiatic

The Austro-Asiatic family extends across south Asia from India to Vietnam. The Munda branch is found in northeastern India, surrounded by Indo-European and Dravidian languages that have influenced its languages greatly over the ages. Typologically they are agglutinative, with SOV word order, making them typologically very different from the rest of the family. Austro-Asiatic includes two important national languages, Vietnamese and Khmer (Cambodian). These two languages were grouped, along with many others, into a branch called Mon-Khmer, a grouping still accepted by Ethnologue but challenged by Sidwell ( 2009 ).

Vietnamese has borrowed massively from Chinese and was originally written with Chinese characters. Vietnamese and a few others in this family have developed phonological tones, and still others are thought to be in the process of developing them.

3.9 Hmong-Mien (Miao-Yao) and Tai-Kadai

These two families were once regarded as branches of Sino-Tibetan, and the languages of both families show many influences from Chinese. The Hmong - Mien (Miao-Yao) languages are spoken in scattered areas across southern China and nearby countries of Southeast Asia. The Tai-Kadai languages extend from China south to Thailand, Laos, Myanmar, and Vietnam and include the national languages Thai and Lao. Both families share a number of typological traits: most of their languages are SVO with isolating morphology and contrastive tone that is associated with creaky or breathy voice quality.

3.10 Paleosiberian Area

The name Paleosiberian applies to a set of four languages or language groups of Siberia with no established genetic relationship but sharing some typological features—agglutinative word structure and, with exceptions, SOV word order.

One of these is Ket, unrelated to any extant language and reduced to about 200 speakers, but once a member of the Yeniseian family and unlike the rest of Paleosiberian in several respects. It is tonal and has a highly agglutinative verbal system with complex agreement patterns—features that make it look like Na-Dene in North America. The case for a genetic relationship between the two has been made by Vajda ( 2010 , 2011 ). For arguments pro and con, see Kari and Potter ( 2010 ), Campbell ( 2011 ), and Kiparsky ( 2014 , pp. 65–67). Implications of this finding for Beringian migrations are pursued by Sicoli and Holton ( 2014 ).

Also in the Paleosiberian area are the Chukotko-Kamchatkan and Yukaghir families and Nivkh, a language with perhaps 200 speakers.

3.11 Korean and Japanese

Two of the major languages of East Asia, Korean and Japanese, are widely considered isolates, or nearly so in the case of Japanese, by far the dominant language in Japonic, a family of twelve languages. The remaining 11 languages of Japonic are the Ryukyuan group of the Ryukyu Islands. Some versions of the Altaic hypothesis include Korean and Japanese in a family with Turkic, Mongolic, and Tungusic. Another isolate of Asia is Burushaski (northeastern Pakistan).

Oceania, which includes Australia and most of the island territories of the central and southern Pacific and Indian oceans, is home to the Austronesian family and to two very large language groups, the Australian and the Papuan groups.

4.1 Austronesian

The 1,250+ languages of this family are distributed across Oceania from Madagascar to Easter Island and total well over 350 million speakers. All but 25 of these languages are Malayo-Polynesian; the rest are aboriginal languages of Taiwan.

The dominant category, Central-Eastern Malayo-Polynesian, has well over half of the languages classified as Malayo-Polynesian but only a few million speakers total, and it is not generally accepted as a valid linguistic grouping. The remaining Malayo-Polynesian languages are found in 17 smaller groups, some of whose languages are widely spoken and highly important politically. Among these are:

- Javanese, the language of nearly 90 million, centered in Java, Indonesia. - Filipino, an official language of the Philippines used by close to 50 million, including L2 speakers, as the national language of Philippines. The variety associated with native speakers, who number over 20 million, is called Tagalog. - Sundanese, the language of about 34 million in Java. - Malay, an official language of Malaysia along with Mandarin and English, is the language of more than 50 million. - Malagasy, spoken by 18 million.

Blust ( 2013 ) offers a recent and comprehensive account of the linguistic and anthropological aspects of this family, including internal linguistic groupings, the linguistic structure of its languages, sociolinguistic considerations, and archeological evidence backing up the linguistic groupings. Adelaar and Himmelmann ( 2005 ) cover a similar range of topics.

4.2 Papuan Languages

Estimates run to as many as a thousand languages in an area about a quarter of the size of India, making New Guinea the most linguistically diverse region in the world (Foley, 2000 , p. 357). Major groupings have been proposed by Greenberg ( 1971 ), Wurm ( 1982 ), and Ross ( 2005 ). Greenberg put all the languages into a single family and included some others from outside New Guinea, but the evidence for this has not generally been deemed credible. Wurm ( 1982 ) posited 10 Papuan phyla plus isolates, based entirely on lexicostatistic and typological evidence that others found unconvincing (Foley, 1986 ). A more recent grouping by Ross ( 2005 ), based essentially on evidence from pronouns, has also failed to find wide acceptance. One is left for now with Foley’s ( 1986 ) classification, with several dozen families and a similar number of isolates. Correlated with this is extreme typological variation across the families, with morphological types ranging from isolating to polysynthetic. Foley’s Papuan families average about 25 members in size, with the exception of Trans-New Guinea, with 482 member languages in Ethnologue, a figure that experts agree is subject to much revision because the family’s boundaries with others remain unclear. The uncertainty is reflected in Glottalog, which lists only Nuclear Trans New Guinea, with 315 languages.

4.3 Australia

This continent has been inhabited for 50,000 years, but the time frame for language classification is limited to the last 5,000 or so. As a result, we know very little about the historical connections among Australia’s languages. Worse, the number of vigorous Aboriginal languages today is a fraction of what it was before Europeans settled there in the 18th century . Of the 250-odd languages of Australia in 1788 , more than half are extinct, and of the remainder, fewer than two dozen are used and learned by the youngest generation.

Beginning with Hale ( 1966 ), many sources divide the continent’s original languages into two groups, Pama-Nynngan and Non-Pama-Nynngan, but even this rudimentary grouping is complicated by large-scale phonological and grammatical diffusion. Dixon, author of many standard reference works on Australian languages, among them Dixon ( 2002 ), diverges markedly from the others by simply dividing the languages into 50 groups representing different areas, though among them some genetic clusters may be found. For Dixon, Pama-Nyungan “cannot be supported as a genetic group. Nor is it a useful typological grouping.” (Dixon, 2002 , p. 53). The problem with applying standard methods toward reconstructing a language tree for Australia, as Dixon sees it, is that Australia is unique, in part to due widespread diffusion, whereby a language “will tend to become more like its neighbors” (Dixon, 2002 , p. 448). For alternative studies from a vantage point that differs markedly, see Bowern and Koch ( 2004 ).

Phonologically, Australian languages tend to be simple in some ways—usually with three-vowel systems—and complex in others, with as many as four contrasting articulations among the coronal consonants. Morphologically, Pama-Nyungan languages have noun class systems and verbal concord prefixes, and some have extensive noun incorporation constructions. Outside Pama-Nyungan, morphology, especially in nouns, is of a more simple agglutinative type, with suffixes but no prefixes. Most Australian languages have split ergativity, a common pattern being ergative-absolutive alignment for nouns but nominative-accusative alignment for pronouns. Word order tends to be very free, but there is evidence that clauses are best analyzed as verb-final; see Mushin and Baker ( 2008 ).

5. The Americas

The past and present states of indigenous languages in the Americas are entirely different as a result of colonization by Europeans. North America is estimated to have been host at one time to nearly 300 distinct languages (Mithun, 1999 , p. 1). Since then, over a hundred have gone extinct, and practically all of the rest are endangered. The 2010 U.S. Census Bureau report found 169 Native North American languages to be spoken in the home, with a total speaking population of less than half a million. By far the largest is Navajo, with nearly 170,000.

Central and South America are home to a few much larger languages, spoken by several million. Still, language endangerment is also the rule there. Of perhaps 1,700 pre-Columbian languages, fewer than 700 remain (Campbell, 1997 ) and of these, most are spoken by populations of several thousand or fewer.

The languages of the Americas are often divided into three geographical areas: North America, Mesomerica, and South America. Greenberg’s ( 1987 ) classification grouped the languages into three “super-families” that he called Eskimo-Aleut, Na-Dene, and Amerind. Of these, the most controversial is Amerind, a grouping widely contested for reasons summarized by Campbell ( 2012 , p. 19), drawing on Paul Rivet’s classification of South American languages in the first half of the 20th century : “Greenberg’s subgroups have been met with skepticism for a number of reasons, including the underanalyzed nature of the presented data, the perpetuation of old misunderstandings [. . .], and the fact that recent findings may suggest entirely different groupings.”

5.1 North America

The approximately 300 surviving languages of native North America are grouped by Golla et al. ( 2007 ) into 14 major families and 19 minor families, with an additional 25 isolates. The major families are Eskimo-Aleut, Na-Dene, Algic, Wakashan, Salishan, Utian, Plateau, Cochimi-Yuman, Uto-Aztecan, Kiowa-Tanoan, Siouan-Catawba, Caddoan, Muskogean, and Iroquoian. These and the remaining groupings in Golla et al. ( 2007 ) represent a compromise rather than a consensus, and it is unclear whether any individual, including the authors themselves, accepts them in toto.

5.1.1 Eskimo-Aleut

The Aleut branch has just one language, variously called Aleut or Unangax̂ and spoken by 155 in the Aleutian and Pribilof islands (Alaska) and the Commander Islands (Siberia). Eskimo has two branches, Inuit and Yupik. Because the term Eskimo is deemed offensive by many, especially in Canada and Greenland, Yupik-Inuit is sometimes used instead.

5.1.2 Na-Dene

The name Na-Dene is perhaps on its way to being phased out, having been replaced in Ethnologue by Eyak-Athabaskan and in Glottolog by Athabaskan-Eyak-Tlingit. Along with two small languages of Alaska, the family includes Athabaskan, a group of 42 languages widely distributed across the western United States and western Canada. At one time Na-Dene was thought to include Haida (Sapir, 1915 ), but this view has been abandoned by most (Schoonmaker et al., 1997 ).

The largest Athabaskan language is Navajo, a member of the Apachean group. Its morphology is widely studied for its complex prefix system, which might lead it to be classified as agglutinative, were it not for complex, overlapping dependencies that are more characteristic of fusional languages. Like many Athabaskan languages, Navajo is tonal, yet proto-Athabaskan lacked tone, and tone seems to have developed independently in many Athabaskan languages from constricted vowels (Campbell, 1997 , p. 113).

5.1.3 Algic

This family has some three dozen forty languages, all but two in the Algonquian branch, distributed across a wide expanse of eastern Canada and the northeastern United States. The two outliers are in California, Yurok and the now-extinct Wiyot. Algonquian languages extend from eastern Canada and the eastern United States to the Rocky Mountains. The largest languages of this group are Cree, spoken by well over 100,000 and spanning a vast area of Canada from Labrador to Alberta and the Northwest Territories, and Ojibwa, with more than 50,000 speakers, extending across southern Canada and from Ontario to the Rocky Mountains and south into the United States, especially Minnesota.

5.1.4 Wakashan

Wakashan, a family of seven languages in British Columbia, was assigned by Edward Sapir (in a 1929 Encyclopedia Britannica entry) to a putative stock called Mosan that also included the Salishan family (section 5.1.5 ). Sapir’s conjecture was based on a long list of shared grammatical similarities. But Beck ( 2000 ), echoing Campbell ( 1997 ), finds little lexical similarity and concludes that that one is dealing with a Sprachbund (Thomason & Kaufman, 1992 ), a set of languages whose common features have arisen from contact rather than from shared genetic origins.

5.1.5 Salishan

The 26 languages of this family are spoken in the coastal regions and in the region immediately to the east in British Columbia and in nearby areas in the United States. One of typological distinctions of Salishan languages is an extremely rich set of consonant contrasts—up to six pharyngeal consonants, contrasting velars and uvulars, and a full set of ejectives.

5.1.6 Utian

Approximately a dozen languages in the Utian family of central and northern California are divided into two branches, Miwok and Costanoan.

5.1.7 Plateau

Also known as Plateau Penutian, this group of four languages in the Pacific Northwest includes Klamath and Nez Percé.

5.1.8 Cochimi-Yuman

Also called Yuman, this group of eight small languages, which also includes the extinct Cochimi, is spoken in Arizona and neighboring parts of California and Mexico.

5.1.9 Uto-Aztecan

About 60 languages make up this family. The 13 languages of the Northern branch are spoken in the western United States. Among them is Hopi, spoken by 6,700 in and around northeastern Arizona. The Southern branch has 48 languages, almost all of them in Mexico.

5.1.10 Kiowa-Tanoan

Speakers of the five languages making up this family live in the southwestern United States.

5.1.11 Siouan-Catawba

This family, also called Siouan, includes Catawba, a language of South Carolina, which lost its last native speaker in the 20th century but is being revived as a second language by ethnic Catawbas. Total speakers for the Siouan family number under 35,000, but among its 14 languages is Dakota, the third largest indigenous language of North America and nearly tied for second place with Yupik, with close to 19,000 speakers. Dakota is spoken in North and South Dakota and neighboring areas.

5.1.12 Caddoan

This group of five languages, each with just a handful of speakers, may possibly form a super-family with Iroquoian and Siouan, based on comparative work (Chafe, 1976 ), but the relationship is not considered established (Mithun, 1999 , p. 305).

5.1.13 Muskogean

Traces of this family of six languages, roughly estimated at around 150,000 speakers, are still found in the southeastern United States, but forced relocations by the U.S. government in the 1830s drove many Muskogean tribes from their homeland. Included were the Choctaw and Chickasaw Nations, now situated in Oklahoma.

5.1.14 Iroquoian

Seven members of this family are severely endangered. Of the remaining two, Mohawk is estimated to have 540 speakers in the Canadian provinces Ontario and Quebec, and Cherokee has over 11,500 speakers in the 2010 U.S. Census Bureau Report, mainly in Oklahoma but also near their pre-relocation lands in North Carolina.

5.2 Mexico and Central America

5.2.1 uto-aztecan.

The Southern branch of this family includes 28 varieties of Nahuatl in Mexico and one in El Salvador that altogether number 1.5 million, according to the 2010 census. Nahuatl traces its origins to the Aztecs who dominated the area for many centuries.

5.2.2 Mayan

The approximately 30 languages comprising Mayan are spoken mainly in Guatemala and Mexico, as well as in Belize and Honduras. Estimates of the number of speakers of Mayan languages run to six million, with well over half that number in Guatemala. The most important Mayan languages of Guatemala are K’iche’, with 2,330,000 speakers; Q’eqchi’ with 800,000; Mam with 530,000; and Kaqchikel with 450,000. In Mexico, Yucatec Maya is spoken by more than 700,000, and a few others are spoken by well over a hundred thousand. The languages are still centered around the original Maya homeland in Guatemala and on the Yucatan Peninsula.

Among the noteworthy achievements of early Maya civilization were temples, pyramids, and the only writing system developed in the Americas before the coming of the European explorers. Decipherment of the writing system has offered a direct glimpse into the Mayan protolanguage and makes a fascinating story, recounted by Coe ( 1999 ).

5.2.3 Otomanguean

This is a large family of 177 languages spoken in central and southern Mexico. In the Eastern Otomanguean branch are the Mixtecan languages, including Trique and 52 varieties of Mixtec listed in Ethnologue, and 63 Zapotecan languages, including Chatino and 57 varieties of Zapotec listed in Ethnologue. Recent census estimates for both Mixtec and Zapotec are in the area of 500,000 speakers. The Western Otomanguean branch numbers 37 languages, among them 14 distinct varieties of Chinantec and nine varieties of Otomi. The 2010 census gives 130,000 native speakers for Chinantec and 290,000 for Otomi.

5.2.4 Totonacan

This is a family of 12 small languages spoken in and around Puebla State in Mexico. The largest is Sierra Totonacan.

5.2.5 Mixe-Zoquean

This family groups the ten Mixean languages with the seven Zoquean languages. All are spoken on the narrow strip of southern Mexico between the Gulf of Mexico and the Pacific Ocean.

5.3 South America

With 55 language isolates and 53 families of two or more languages, South America has about a quarter of the language families of the world (Campbell, 2012 , p. 59). While most are endangered and a large number nearly extinct, there are some very healthy exceptions, including Quechua, Tupi Guaraní, and Aymara, all discussed in this section. Especially since 1960 , efforts have been under way to reverse some of the declines in language populations of earlier eras. Particularly active in this area is the Andean region, where several bilingual school programs have incorporated Quechua and Aymara into the curriculum. The past 25 years have also seen a surge in interest by linguists in documenting and analyzing the tremendously diverse languages of this continent.

Among the 108 language families Campbell ( 2012 ) finds in South America, larger groupings still remain to be firmly established. Of the hypotheses advanced to date, including Greenberg’s ( 1987 ) classification that puts them all in Amerind, none have been proved to general satisfaction.

5.3.1 Intermediate Area: Between Central America and South America

The area between the site of the Mayan civilization to the north and the Inca civilization to the south covers the northwestern part of South America, extending into Central America. Among the language families here are Chocoan, spoken in Columbia and Panama; Barbcacoan, spoken in Colombia and Ecuador; and Chibchan, spoken from Honduras to Venezuela. Chibchan may be related to Misumalpan, spoken in Honduras and Nicaragua.

5.3.2 Arawakan

The family with the greatest geographical reach, spreading from Honduras down to Bolivia and as far east as Suriname, is Arawakan, with 40 languages, not including about two dozen extinct ones. Some reserve the name Arawakan for a slightly larger group with 11 additional languages, but their genetic connection to the core family is unproven (Campbell, 2012 , p. 71). For this reason, Campbell uses Arawakan (which includes the language Arawak) for the core group that also goes by the names Maipurean and Maipuran, as listed in Ethnologue.

Three Arawakan languages—Wayuu (Colombia), Garifuna (Honduras), and Asháninka (Peru)—account for more than 85% of the 645,000-odd speakers of languages in the family.

5.3.3 Arawan

The Arawan family of western Brazil, with six languages, and Guajiboan, with five languages in Eastern Colombia and southwestern Venezuela, comprise the group of 11 sometimes classed with Arawakan.

5.3.4 Cariban

Cariban is a family of 31 languages (as well as around two dozen extinct ones) in Brazil and Venezuela as well as in Guyana, Suriname, and Colombia. Most have just a few hundred speakers; some have a few thousand. The largest is Macushi, with 18,000 speakers in Brazil.

5.3.5 Tucanoan

Tucanoan includes 25 languages in Colombia, Ecuador, Peru, and Brazil. A few are extinct or very severely endangered. The two largest, with just over 6,000 speakers each, are Cubeo (Colombia) and Tucano (Brazil).

5.3.6 Aymaran

Aymaran has just two languages. One of them is Aymara, spoken by a million in Bolivia and several hundred thousand in Peru.

5.3.7 Quechuan

Quechuan languages are spoken natively by a greater number than any other language family indigenous to the Americas, a result of the spread of the Inca Empire in pre-Columbian times. The total speaking population is 8.5 million, mainly in Peru, Ecuador, and Bolivia. The designations of all but two of the 44 Quechuan languages include the name Quechua along with a geographical identifier, reflecting a close relationship, though in most cases not mutual intelligibility. Most are small, with a few thousand speakers. About a dozen others range from the tens of thousands to around 100,000, and a few more are spoken by several hundred thousand. Larger than these are South Bolivian Quechua (1,600,000 speakers in Bolivia), Ayacucho Quechua (900,000 speakers in Peru, including Lima), and Chimborazo Highland Quichua (800,000 in Ecuador). All three belong to what is known as Peripheral Quechua, a sister branch to Central Quechua. These two branches constitute the major break in the Quechuan family. Quechua is, along with Spanish, the official language in Peru.

Phonological, structural, and lexical similarities between Quechua and Aymara have raised the possibility that the two are related, as discussed by Orr and Longacre ( 1968 ) and Kaufman and Berlin, 2007 , but Adelaar ( 1992 , 2012 ) argues instead that the many similarities must have resulted from intense contact predating the protolanguages along with subsequent diffusion. Part of the reasoning is that the lexical similarities are in fact too similar where they occur and extend to only about a quarter of the vocabulary, while the rest is highly different.

5.3.8 Tupian

Jensen and Grimes ( 2003 ), Kaufman and Berlin ( 2007 ), and Rodrigues and Cabral ( 2014 ) regard the Tupian languages of Central Amazonia as a language stock—a grouping of languages families not fully established but thought to be distantly related. Here it is listed as an established family, following Kaufman ( 1990 ), Campbell ( 2012 ), and Ethnologue.

This set of 76 languages is grouped into 11 small branches and isolates and one major branch, Tupi-Guarani, which some recognize as a family in and of itself (Michael et al., 2015 ). Its 51 languages are found in parts of Paraguay, Brazil, and Bolivia but once covered a much larger expanse of South America, from the eastern coast to the west and from northern Argentina up to French Guiana. Ten languages of this group are varieties of Guaraní that together are spoken by five million, principally in Paraguay, where it is an official language (along with Spanish) and is widely used as a second language as well.

5.3.9 Northern Foothills

In this Andean region, we find Jivaroan, Cahuapanan, Zaparoan, and Witotoan, among a few others. Yagua is known to have belonged to the Peba-Yaguan family, whose other two members are extinct.

Beyond what is presented here, Campbell ( 2012 ) discusses many plausible and possible genetic relationships within South America. Campbell and Grondona ( 2012 , p. 29) cite a dozen other works on this topic.

6. Sign Languages

As with spoken languages, it is impossible to trace back to the time when the first sign languages were used. Still, McBurney ( 2012 ) documents early reports on signing by the deaf, including an Ancient Egyptian text from around 1200 bce : “Thou art one who is deaf and does not hear, to whom men make (signs) with the hand.” From Plato’s Cratylus she quotes: “should we not, like the deaf and dumb, make signs with the hands and head and the rest of the body?” And from a collection on Jewish oral law from the late 2nd century ce : “A deaf-mute may communicate by signs and be communicated with by signs.”

Signing systems developed into languages as communities of users grew and the communicative needs of the deaf were recognized by governments, educators, and the general public. In parts of Europe, emerging deaf communities were developing sign languages well before the 18th century , and in 1817 Thomas Gallaudet established the first permanent deaf school in the United States, basing his methods on practices already in place in France and Britain.

Ethnologue lists 138 sign languages for the deaf, each one named for the location where it is used. Many are adaptions of signing systems already used in other regions, as illustrated by American Sign Language (ASL), which Thomas Gallaudet directly based on French Sign Language. ASL has become the most widely used sign language of the deaf, with 250,000 users in North American, the Caribbean, the Philippines, and Africa. ASL and other sign languages are not closely connected to the spoken languages of the regions where they are used. For example, British Sign Language and American Sign Language are not mutually intelligible.

Sign languages also develop in response to other needs. A famous case is Plains Indian Sign Language, once used as a lingua franca by Native Americans over a vast expanse of North America and still in use in some regions (Davis, 2010 ). Sign languages that have arisen in Aboriginal Australia in response to speech taboos and ritual observance have been described by Kendon ( 1988 ).

7. Pidgins and Creoles

7.1 pidgins.

Pidgins are simplified languages that arise out of a need to communicate among speakers lacking a common language, typically in colonial situations where one group is dominant. Members of the dominated group fuse grammatical features, often simplified, of their native language (called the substrate) with vocabulary from the dominant, or superstrate, language. The resulting language serves restricted purposes, such as trade.

There are not many pidgins. Ethnologue lists only 16, six of them in Africa and five in Oceania, if Indonesia is included. Hiri Motu, an official language of Papua New Guinea, is noteworthy because it goes against some typical views of pidgins. This language developed between the Motu and their trading partners nearby before any European contact. After colonization, its use spread, though the colonizers themselves had little if any knowledge of it. More usual are the cases of the original Chinese Pidgin English, once known as Pigeon English, which arose in 17th-century China for trade with the British, and Nigerian Pidgin, which developed in the same era, again due to trade contact with the British, notably the slave trade.

Hiri Motu, Chinese Pidgin English, and Nigerian Pidgin illustrate three different types of situation. Hiri Motu and Chinese Pidgin English exemplify pidgins that originate when trade partners are equal (Hiri Motu) or unequal (Chinese Pidgin English). The two had similar outcomes, eventually fading away—Hiri Motu in favor of Tok Pisin, a widely spoken creole of New Guinea, and Chinese Pidgin English in favor of Standard English, which came to be commonly taught in schools. (Since then, a different language called Chinese Pidgin English has arisen on the Pacific island of Nauru, for communicating with Chinese-speaking merchants and traders.) By contrast, Chinese Pidgin English and Nigerian Pidgin had analogous origins (for communicating with traders in a dominant position), yet different outcomes, since the first has died out, while the second has vastly expanded its uses and its speaking population. Currently Nigerian Pidgin is learned by many children at an early age for communication with peers in virtually any informal situation.

7.2 Creoles

Creoles are first languages of members of speech communities but originate from types of language contact resembling, if not always identical to, situations that give rise to pidgins. Being acquired as a first language gives creoles a stability that pidgins lack, and so it is not surprising that many more creoles are in current use—93 listed in Ethnologue—than pidgins. Thirty-two creoles are spoken around Latin America and the Caribbean, 26 in Oceania, and 22 in Africa. Like pidgins, creoles have a substrate and a superstrate. English is the superstrate for 33 creoles, Malay for 14, Portuguese for 13, and French for 11.

Probably the most vigorously debated topic in current pidgin and creole studies is how creoles form and evolve. Bickerton ( 1981 , 1988 ) interpreted creolization in terms of what is known as the bioprogram hypothesis. This would see creoles as developing from a pidgin that learners were exposed to at an early age. The hypothesis was that acquisition is guided by an innate bioprogram that supplies structure to complement and modify the pidgin’s substrate and superstrate. This idea excited those who saw its potential to shed light on the human language faculty in general. At the same time, among creolists, the bioprogram hypothesis gave rise to a literature that almost universally sought to disprove it. Viewed more positively, it engendered lots of new thinking on how creoles come about.

Veenstra ( 2008 ) surveys some of the progress made during this period. Early commenters found reason to assign a greater role to the superstrate language than would be the case under Bickerton’s hypothesis, which leaned heavily on universal grammar. Another criticism cited the fact that some creoles develop without having a pidgin as a source. Bickerton’s explanation, relying on acquisition by a generation of speakers with no other first language, implied that a creole would always develop in a single generation, yet this has been falsified by Nicaraguan Sign Language, which took two generations (Kegl, Senghas, & Coppola, 1999 ). For many more counterproposals and refinements, see DeGraff ( 1999 ), Mufwene ( 1996 ), Singler ( 1996 ), and McWhorter ( 2005 ). One area of agreement is that neither pidgins nor creoles are homogeneous types, as earlier work seemed to assume. There are many varieties, as is found with the rest of the languages covered in this essay.

Further Reading

Online resources.

Ethnologue and Glottolog are comprehensive, frequently updated databases on languages and language families. Both sites list all known languages and language families, with extensive bibliographies. Included on the Ethnologue website are nearly 200 language maps and several tables of statistics on the largest languages. Ethnologue also exists in print form, as three volumes listed under Simons and Fennig in section 8.2.

The World Atlas of Language Structures Online is a database of typological information on languages of the world. The data are collected by a team of 55 from grammars and other descriptive materials and organized into 99 chapters on areas of phonology, morphology, and syntax. The site is frequently updated with comments and corrections.

An online database of scholarly hypotheses about possible language families and their membership is Multitree . A pronouncing dictionary of selected words from nearly 350 world languages is at Forvo . Audio pronunciations for over 100,000 words are available for some languages, down to a few hundred for others.

The pronunciations are collected from users of the site.

Unesco maintains an online Atlas of the World’s Languages in Danger with 2,465 entries. Its search functions permit one to identify languages by country and by levels of endangerment. The entry for each language includes its number of speakers, alternate names, and geographical coordinates. A complementary print atlas with 13 chapters by experts on the languages of different world is published by UNESCO in five languages. The next section includes a reference to the English-language version.

Books and Articles

Asher, R. E. , & Moseley, C. (Eds.). (2007). Atlas of the world's languages (2d ed.). London: Routledge. Find it in your library Google Preview WorldCat

Austin, P. K. (Ed.). (2008). One thousand languages: Living, endangered, and lost . Berkeley: University of California Press. Find it in your library Google Preview WorldCat

Campbell, G. L. , & King, G. (2011). The Routledge concise compendium of the world’s languages (2d ed.). New York: Routledge. Find it in your library Google Preview WorldCat

Comrie, B. (2001). Languages of the world. In M. Aronoff & J. Rees-Miller (Eds.), The handbook of linguistics (pp. 19–42). Malden: Blackwell. Find it in your library Google Preview WorldCat

Comrie, B. (Ed.). (2009). The world’s major languages (2d ed.). New York: Routledge. Find it in your library Google Preview WorldCat

Lyovin, A. , Kessler, B. , & Leben, W. R. (2016). Introduction to the languages of the world . New York: Oxford University Press. Find it in your library Google Preview WorldCat

Moseley, C. (Ed.). (2010). Atlas of the world’s languages in danger . Paris: UNESCO. Find it in your library Google Preview WorldCat

Pereltsvaig, A. (2012). Languages of the world: An introduction . Cambridge, UK: Cambridge University Press. Find it in your library Google Preview WorldCat

Simons, G. F. , & Fennig, C. D. (Eds.). (2017a). Ethnologue: Languages of Asia (20th ed.). Dallas: SIL International. Find it in your library Google Preview WorldCat

Simons, G. F. , & Fennig, C. D. (Eds.). (2017b). Ethnologue: Languages of Africa and Europe (20th ed.). Dallas: SIL International. Find it in your library Google Preview WorldCat

Simons, G. F. , & Fennig, C. D. (Eds.). (2017c). Ethnologue: Languages of the Americas and the Pacific (20th ed.). Dallas: SIL International. Find it in your library Google Preview WorldCat

  • Adkins, M. (2013). Will the real Breton please stand up? Language revitalization and the problem of authentic language. International Journal of the Sociology of Language , 2016 (223), 55–70.
  • Gamkrelidze, T. V. , & Ivanov, V. V. (1990). The early history of Indo-European languages. Scientific American , 262 (3), 110–116.
  • Gamkrelidze, T. V. , & Ivanov, V. V. (1995). Indo-European and Indo-Europeans: A reconstruction and historical analysis of a proto-language and proto-culture. Part 1: Text . ( J. Nichols , Trans.). Berlin: Mouton de Gruyter.
  • Günther, T. , Valdiosera, C. , Malmström, H. , Ureña, I. , Rodriguez-Varela, R. , Sverrisdóttir, Ó. O. , & de Castro, J. M. B. (2015). Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques. Proceedings of the National Academy of Sciences , 112 (38), 11917–11922.
  • Salminen, T. (2002). Problems in the taxonomy of the Uralic languages in the light of modern comparative studies. In A. I. Kuznet︠s︡ova , A. E. Kibrik , T. B. Agranat , & O. A. Kazakevich (Eds.), Lingvisticheskii bespredel: Sbornik statei k 70-letiiu A. I. Kuznetsovoi (pp. 44–55). Moscow: Izdatel’stvo Moskovskogo universiteta.
  • Bender, M. L. (1996–1997). Nilo-Saharan languages: An essay in classification . Munich: Lincom Europa.
  • Bendor-Samuel, J. T. , & Hartell, R. L. (1989). The Niger-Congo languages: A classification and description of Africa's largest language family . Lanham, MD: University Press of America.
  • Ehret, C. (2001). A historical-comparative reconstruction of Nilo-Saharan . Cologne: Rüdiger Köppe.
  • Ermisch, S. (Ed.). (2008). Khoisan languages and linguistics: Proceedings of the 2nd International Symposium, January 8–12, 2006 . Cologne: Rüdiger Köppe.
  • Greenberg, J. H. (1955). Studies in African language classification . New Haven, CT: Compass.
  • Greenberg, J. H. (1963). The languages of Africa . The Hague: Mouton.
  • Nordhoff, S. , Hammarström, H. , Forkel, R. , & Haspelmath, M. (Eds.). (2013). Benue–Congo . Glottolog . Jena: Max Planck Institute for the Science of Human History.
  • Austerlitz, R. (1971). Long-range comparisons of Tamil and Dravidian with other language families in Eurasia. In R. E. Asher (Ed.), Proceedings of the Second International Conference-Seminar of Tamil Studies (Vol. 2, pp. 254–261). Madras: Association of Tamil Research.
  • Benedict, P. K. (1976). Sino-Tibetan: Another look. Journal of the American Oriental Societ y, 96 (2), 167–197.
  • Bhatia, T. K. (1987). A history of the Hindi grammatical tradition: Hindi-Hindustani grammar, grammarians, history, and problems . (Vol. 4.). Leiden, The Netherlands: Brill.
  • Campbell, L. (2011). Review of the book The Dene-Yeniseian connection , by J. M. Kari and B. A. Potter. International Journal of American Linguistics , 77 , 445–451.
  • Georg, S. , Michalove, P. A. , Manaster Ramer, A. , & Sidwell, J. (1999). Telling general linguists about Altaic. Journal of Linguistics , 35 , 65–98.
  • Gamkrelidze, T. V. , & Ivanov, V. V. (1990). The early history of Indo-European languages. Scientific American , 262 (3), 110–117.
  • Greenberg, J. H. (2000). Indo-European and its closest relatives: The Eurasiatic language family . Vol. 1, Grammar . Stanford, CA: Stanford University Press.
  • Greenberg, J. H. (2002). Indo-European and its closest relatives: The Eurasiatic language family . Vol. 2, Lexicon . Stanford, CA: Stanford University Press.
  • Greenberg, J. H. (1997). Does Altaic exist? In I. Hegedus , P. A. Michalove , & A. Manaster Ramer (Eds.), Indo-European, Nostratic and beyond: A festschrift for Vitaly V. Shevoroshkin (pp. 88–93). Washington, DC: Institute for the Study of Man.
  • Johanson, L. , & Robbeets, M. (Eds.). (2010). Transeurasian verbal morphology in a comparative perspective: Genealogy, contact, chance . Wiesbaden: Harrassowitz.
  • Kachru, Y. (2008). Hindi-Urdu-Hindustani. In B. B. Kachru , Y. Kachru , & S. N. Sridhar (Eds.), Language in South Asia (pp. 81–102). Cambridge, UK: Cambridge University Press.
  • Kari, J. M. , & Potter, B. A. (Eds.).(2010). The Dene-Yeniseian connection . Department of Anthropology, University of Alaska Fairbanks.
  • Kiparsky, P. (2014). New perspectives in historical linguistics. In C. Bowern & B. Evans (Eds.), The Routledge handbook of historical linguistics (pp. 64–102). New York: Routledge.
  • Krishnamurti, B. (2003). The Dravidian languages . New York: Cambridge University Press.
  • Kurpaska, M. (2010). Chinese language(s): A look through the prism of the great dictionary of modern Chinese dialects . Berlin: Walter de Gruyter.
  • LaPolla, R. J. (2001). The role of migration and language contact in the development of the Sino-Tibetan language family. In A. Y. Aikhenvald & R. M. W. Dixon (Eds.), Areal diffusion and genetic inheritance: Case studies in language change (pp. 225–254). Oxford: Oxford University Press.
  • Matisoff, J. A. (1991). Sino-Tibetan linguistics: Present state and future prospects. Annual Review of Anthropology , 20 , 469–504.
  • Matisoff, J. A. (2003). Handbook of Proto-Tibeto-Burman: System and philosophy of Sino-Tibetan reconstruction . Berkeley: University of California Press.
  • Miller, R. A. (1991). Genetic connections among the Altaic languages. In S. M. Lamb & E. D. Mitchell (Eds.), Sprung from some common source: Investigations into the prehistory of languages (pp. 293–327). Stanford, CA: Stanford University Press.
  • Pawley, A. (2009). Greenberg’s Indo-Pacific hypothesis: An assessment. In B. Evans (Ed.), Discovering history through language: Papers in honour of Malcolm Ross (pp. 153–180). Canberra: Pacific Linguistics.
  • Rai, A. (1984). A house divided: The origin and development of Hindi/Hindavi . New Delhi: Oxford University Press.
  • Ratliff, M. S. (2010). Hmong–Mien language history . Canberra: Pacific Linguistics.
  • Robbeets, M. I. (2005). Is Japanese related to Korean, Tungusic, Mongolic and Turkic? Wiesbaden: Harrassowitz.
  • Sicoli, M. A. , & Holton, G. (2014). Linguistic phylogenies support back-migration from Beringia to Asia . PoS ONE , 9 (3), e91722.
  • Sidwell, P. (2009). Classifying the Austroasiatic languages: History and state of the art . LINCOM Studies in Asian Linguistics 76. Munich: Lincom Europa.
  • Starostin, S. A. (1991). Altajskaja problema i proisxoždenie japonskogo jazyka [The Altaic problem and the origin of the Japanese language]. Moscow: Nauka.
  • Starostin, S. A. (2005). Response to Stefan Georg’s review of the Etymological dictionary of the Altaic languages . Diachronica , 22 (2), 451–454.
  • Starostin, S. A. , Dybo, A. V. , & Mudrak, O. A. (2003). Etymological dictionary of the Altaic languages (Vols. 3). Leiden, The Netherlands: Brill.
  • Thurgood, G. , & LaPolla, R. J. (Eds.). (2003). The Sino-Tibetan languages . London: Routledge.
  • Ting Pang-Hsin [Ding Bangxin] & Hongkai Sun . (2000). Hàn-Zàngyu yánjiu de lìshi huígù [Retrospective history of Sino-Tibetan studies]. In Ding Bangxin (Ed.), Hàn-Zàngyu tóngyuáncí yánjiu , 1 [Cognate words in Sino-Tibetan languages, 1]. Nanning: Guangxi Mínzú Chubanshè [Guangxi Nationalities Press].
  • Unger, J. M. (1990). Summary report of the Altaic panel. In P. Baldi (Ed.), Linguistic change and reconstruction methodology . Berlin: Mouton de Gruyter.
  • Vajda, E. J. (2010). A Siberian link with Na-Dene languages. Anthropological Papers of the University of Alaska , new ser., 5 (1), 33–99.
  • Vajda, E. J. (2011). A response to Campbell. International Journal of American Linguistics , 77 , 451–452.
  • van Driem, G. (2001). Languages of the Himalayas: An ethnolinguistic handbook of the Greater Himalayan region, containing an introduction to the Symbiotic Theory of Language (Vols. 2). Leiden, The Netherlands: Brill.
  • van Driem, G. (2007). The diversity of the Tibeto-Burman language family and the linguistic ancestry of Chinese. Bulletin of Chinese Linguistics , 1 (2), 211–270.
  • van Driem, G. (2015). Tibeto-Burman. In W. S.-Y. Wang & C. Sun (Eds.), The Oxford handbook of Chinese linguistics (pp. 135–148). New York: Oxford University Press.
  • Wang, F. (2005). On the genetic position of the Bai language. Cahiers de Linguistique Asie Orientale , 34 (1), 101–127.
  • Wurm, S. A. (1982). The Papuan languages of Oceania . Tübingen, Germany: Gunter Narr.
  • Yan, M. M. (2006). Introduction to Chinese dialectology . Cologne: Lincom Europa.
  • Adelaar, K. A. , & Himmelmann, N. (2005). The Austronesian languages of Asia and Madagascar . New York: Routledge.
  • Blust, R. (2013). The Austronesian languages . Canberra: Asia-Pacific Linguistics.
  • Bowern, C. , & Koch, H. (Eds.). (2004). Australian languages: Classification and the comparative method . Amsterdam: John Benjamins.
  • Dixon, R. M. W. (2002). Australian languages: Their nature and development . Cambridge, UK: Cambridge University Press.
  • Foley, W. A. (1986). The Papuan languages of New Guinea . Cambridge, UK: Cambridge University Press.
  • Foley, W. A. (2000). The languages of New Guinea. Annual Review of Anthropology , 29 , 357–404.
  • Greenberg, J. H. (1971). The Indo-Pacific hypothesis. In T. A. Sebeok (Ed.), Current trends in linguistics (Vol. 8, pp. 808–871). The Hague: Mouton.
  • Hale, K. L. (1966). The Paman group of the Pama-Nyungan phylic family. In G. N. O’Grady , C. F. Voegelin , & F. M. Voegelin (Eds.), Languages of the world: Indo-Pacific (pp. 162–197). Fascicle 6. Bloomington: Anthropology Dept., Indiana University.
  • Mushin, I. , & Baker, B. (Eds.). (2008). Discourse and grammar in Australian languages . Amsterdam: Benjamins.
  • Ross, M. (2005). Pronouns as a preliminary diagnostic for grouping Papuan languages. In A. Pawley , R. Attenborough , R. Hide , & J. Golson (Eds.), Papuan pasts: Cultural, linguistic and biological histories of Papuan-speaking peoples (pp. 15–65). Canberra: Pacific Linguistics.
  • Wurm, S. A. (1982). Papuan languages of Oceania . Tübingen, Germany: Narr.

The Americas

  • Adelaar, W. (1992). Quechuan languages. In W. Bright (Ed.), Oxford international encyclopedia of linguistics 3 (pp. 303–310). Oxford: Oxford University Press.
  • Adelaar, W. (2012). Languages of the Middle Andes in areal-typological perspective: Emphasis on Quechuan and Aymaran. In L. Campbell & V. Grondona (Eds.), The indigenous languages of South America: A comprehensive guide (pp. 575–624). Berlin: Mouton de Gruyter.
  • Asher, R. E. , & Moseley, C. (Eds.). (2007). Atlas of the world's languages (2d ed.). London: Routledge.
  • Beck, D. (2000). Grammatical convergence and the genesis of diversity in the Northwest Coast Sprachbund. Anthropological Linguistics , 42 (2), 147–213.
  • Campbell, L. (1988). Review of the book Language in the Americas , by J. H. Greenberg. Language , 64 , 591–615.
  • Campbell, L. (1997). American Indian languages: The historical linguistics of native America . New York: Oxford University Press.
  • Campbell, L. (2012). Classification of the indigenous languages of South America. In L. Campbell & V. Grondona (Eds.), The indigenous languages of South America: A comprehensive guide (pp. 59–166). Berlin: Mouton de Gruyter.
  • Campbell, L. , & Grondona, V. (Eds.), (2012). The indigenous languages of South America: A comprehensive guide . Berlin: Mouton de Gruyter.
  • Coe, M. D. (1999). Breaking the Maya code (rev. ed.). London: Thames and Hudson.
  • Chafe, W. L. (1976). Siouan, Iroquoian, and Caddoan. In T. A. Sebeok (Ed.), Native languages of the Americas (pp. 527–572). New York: Springer US.
  • Fabre, A. (1998). Manual de las lenguas indígenas sudamericanas . (Vol. 1.). Munich: Lincom Europa.
  • Goddard, I. (1996). The classification of native languages of North America. In I. Goddard & W. C. Sturtevant (Eds.), Handbook of North American Indians (Vol. 17, pp. 290–323). Washington, DC: Smithsonian Institution.
  • Golla, V. , Goddard, I. , Campbell, L. , Mithun, M. , & Mixco, M. (2007). North America. In R. E. Asher & C. Mosley (Eds.), Atlas of the world's languages (2d ed., pp. 5–44). London: Routledge.
  • Greenberg, J. H. (1987). Language in the Americas . Stanford, CA: Stanford University Press.
  • Jensen, C. J. , & Grimes, B. (2003). International encyclopedia of Linguistics (2d ed.). Oxford: Oxford University Press.
  • Kaufman, T. (1990). Language history in South America: What we know and how to know more. In D. L. Payne (Ed.), Amazonian linguistics: Studies in lowland South American languages (pp. 13–67). Austin: University of Texas Press.
  • Kaufman, T. , & Berlin, B. (2007). South America. In R. E. Asher & C. Mosley (Eds.), Atlas of the world's languages (2d ed., pp. 59–93). London: Routledge.
  • Michael, L. , Chousou-Polydouri, N. , Bartolomei, K. , Donnelly, E. , Wauters, V. , Meira, S. , & O’Hagan, Z. (2015). A Bayesian phylogenetic classification of Tupí-Guaraní. LIAMES-Línguas Indígenas Americanas , 15 (2), 193–221.
  • Mithun, M. (1999). The languages of native North America . Cambridge, UK: Cambridge University Press.
  • Orr, C. , & Longacre, W. A. (1968). Proto-Quechuamaran. Language , 44 , 528–555.
  • Rodrigues, A. D. , & Cabral, A. S. (2014). Tupian languages . Oxford Bibliographies Online .
  • Schoonmaker, P. K. , Von Hagen, B. , & Wolf, E. C. (1997). The rain forests of home: Profile of a North American bioregion . Washington, DC: Island Press.
  • Sapir, E. (1915). The Na-Dene languages: A preliminary report. American Anthropologist , 17 (3), 534–558.
  • Thomason, S. G. , & Kaufman, T. (1992). Language contact, creolization, and genetic linguistics . Berkeley: University of California Press.
  • U.S. Census Bureau . (2011). Native North American languages spoken at home in the United States and Puerto Rico: 2006–2010 . American Community Survey Briefs.
  • Davis, J. E. (2010). Hand talk: Sign language among American Indian nations . Cambridge, UK: Cambridge University Press.
  • Kendon, A. (1988). Sign languages of Aboriginal Australia: Cultural, semiotic and communicative perspectives . Cambridge, UK: Cambridge University Press.
  • McBurney, S. (2012). History of sign languages and sign language linguistics. In R. Pfau , M. Steinbach , & E. Woll (Eds.), Sign language: An international handbook (pp. 909–948). Berlin: Walter de Gruyter.

Pidgins and Creoles

  • Bickerton, D. (1981). Roots of language . New York: Karoma.
  • Bickerton, D. (1988). Creole languages and the bioprogram. In F. J. Newmeyer (Ed.), Linguistics: The Cambridge survey 2 . Cambridge, UK: Cambridge University Press.
  • DeGraff, M. (Ed.). (1999). Language change and creation: Creolization, diachrony, and development . Cambridge, MA: MIT Press.
  • Kegl, J. , Senghas, A. , & Coppola, M. (1999). Creation through contact: Sign language emergence and sign language change in Nicaragua. In M. DeGraff (Ed.), Language change and creation: Creolization, diachrony, and development (pp. 179–238). Cambridge, MA: MIT Press.
  • Mufwene, S. (1996). The founder principle in creole genesis. Diachronica , 13 , 83–134.
  • McWhorter, J. H. (2005). Defining creole . Oxford: Oxford University Press.
  • Singler, J. V. (1996). Theories of creole genesis, sociohistorical considerations, and the evaluation of evidence: The case of Haitian Creole and the Relexification Hypothesis. Journal of Pidgin and Creole Languages , 11 , 185–230.
  • Veenstra, T. (2008). Creole genesis: The impact of the language bioprogram hypothesis. In S. Kouwenberg & J. V. Singler (Eds.), The handbook of pidgin and creole studies (pp. 219–241). Malden: Wiley-Blackwell.

Related Articles

  • Endangered Languages
  • Zapotecan Languages
  • Altaic Languages
  • Hokan Languages
  • Iroquoian Languages
  • Muskogean Languages
  • Eskimo-Aleut
  • Articulatory Phonetics
  • Dene-Yeniseian
  • Creole Languages
  • Number in Language
  • Vowel Harmony
  • Language Contact in the Sahara
  • Languages of the Balkans
  • Historical Developments from Middle to Early New Indo-Aryan
  • Morphosyntax of Dravidian Languages
  • Mayan Languages
  • Munda Languages
  • Caucasian Languages
  • The Tangkic languages of Australia: Phonology and Morphosyntax of Lardil, Kayardild, and Yukulta
  • Language and Linguistics in Pre-Modern China and East Asia
  • Japanese Linguistics
  • Genealogical Classification in Historical Linguistics
  • Kiowa-Tanoan Languages
  • Pidgin Languages
  • Berber-Arabic Language Contact
  • Kra-Dai Languages
  • Romance in Contact with Basque
  • Multilingualism in Rural Africa
  • Language and Linguistics in Medieval Europe
  • Morphosyntax of Himalayan Languages
  • Linguistic Landscape of Ethiopia
  • Old and Middle Japanese
  • Origins of the Japanese Language

Printed from Oxford Research Encyclopedias, Linguistics. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 14 May 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • [66.249.64.20|185.148.24.167]
  • 185.148.24.167

Character limit 500 /500

  • Privacy Policy

Research Method

Home » Research Paper – Structure, Examples and Writing Guide

Research Paper – Structure, Examples and Writing Guide

Table of Contents

Research Paper

Research Paper

Definition:

Research Paper is a written document that presents the author’s original research, analysis, and interpretation of a specific topic or issue.

It is typically based on Empirical Evidence, and may involve qualitative or quantitative research methods, or a combination of both. The purpose of a research paper is to contribute new knowledge or insights to a particular field of study, and to demonstrate the author’s understanding of the existing literature and theories related to the topic.

Structure of Research Paper

The structure of a research paper typically follows a standard format, consisting of several sections that convey specific information about the research study. The following is a detailed explanation of the structure of a research paper:

The title page contains the title of the paper, the name(s) of the author(s), and the affiliation(s) of the author(s). It also includes the date of submission and possibly, the name of the journal or conference where the paper is to be published.

The abstract is a brief summary of the research paper, typically ranging from 100 to 250 words. It should include the research question, the methods used, the key findings, and the implications of the results. The abstract should be written in a concise and clear manner to allow readers to quickly grasp the essence of the research.

Introduction

The introduction section of a research paper provides background information about the research problem, the research question, and the research objectives. It also outlines the significance of the research, the research gap that it aims to fill, and the approach taken to address the research question. Finally, the introduction section ends with a clear statement of the research hypothesis or research question.

Literature Review

The literature review section of a research paper provides an overview of the existing literature on the topic of study. It includes a critical analysis and synthesis of the literature, highlighting the key concepts, themes, and debates. The literature review should also demonstrate the research gap and how the current study seeks to address it.

The methods section of a research paper describes the research design, the sample selection, the data collection and analysis procedures, and the statistical methods used to analyze the data. This section should provide sufficient detail for other researchers to replicate the study.

The results section presents the findings of the research, using tables, graphs, and figures to illustrate the data. The findings should be presented in a clear and concise manner, with reference to the research question and hypothesis.

The discussion section of a research paper interprets the findings and discusses their implications for the research question, the literature review, and the field of study. It should also address the limitations of the study and suggest future research directions.

The conclusion section summarizes the main findings of the study, restates the research question and hypothesis, and provides a final reflection on the significance of the research.

The references section provides a list of all the sources cited in the paper, following a specific citation style such as APA, MLA or Chicago.

How to Write Research Paper

You can write Research Paper by the following guide:

  • Choose a Topic: The first step is to select a topic that interests you and is relevant to your field of study. Brainstorm ideas and narrow down to a research question that is specific and researchable.
  • Conduct a Literature Review: The literature review helps you identify the gap in the existing research and provides a basis for your research question. It also helps you to develop a theoretical framework and research hypothesis.
  • Develop a Thesis Statement : The thesis statement is the main argument of your research paper. It should be clear, concise and specific to your research question.
  • Plan your Research: Develop a research plan that outlines the methods, data sources, and data analysis procedures. This will help you to collect and analyze data effectively.
  • Collect and Analyze Data: Collect data using various methods such as surveys, interviews, observations, or experiments. Analyze data using statistical tools or other qualitative methods.
  • Organize your Paper : Organize your paper into sections such as Introduction, Literature Review, Methods, Results, Discussion, and Conclusion. Ensure that each section is coherent and follows a logical flow.
  • Write your Paper : Start by writing the introduction, followed by the literature review, methods, results, discussion, and conclusion. Ensure that your writing is clear, concise, and follows the required formatting and citation styles.
  • Edit and Proofread your Paper: Review your paper for grammar and spelling errors, and ensure that it is well-structured and easy to read. Ask someone else to review your paper to get feedback and suggestions for improvement.
  • Cite your Sources: Ensure that you properly cite all sources used in your research paper. This is essential for giving credit to the original authors and avoiding plagiarism.

Research Paper Example

Note : The below example research paper is for illustrative purposes only and is not an actual research paper. Actual research papers may have different structures, contents, and formats depending on the field of study, research question, data collection and analysis methods, and other factors. Students should always consult with their professors or supervisors for specific guidelines and expectations for their research papers.

Research Paper Example sample for Students:

Title: The Impact of Social Media on Mental Health among Young Adults

Abstract: This study aims to investigate the impact of social media use on the mental health of young adults. A literature review was conducted to examine the existing research on the topic. A survey was then administered to 200 university students to collect data on their social media use, mental health status, and perceived impact of social media on their mental health. The results showed that social media use is positively associated with depression, anxiety, and stress. The study also found that social comparison, cyberbullying, and FOMO (Fear of Missing Out) are significant predictors of mental health problems among young adults.

Introduction: Social media has become an integral part of modern life, particularly among young adults. While social media has many benefits, including increased communication and social connectivity, it has also been associated with negative outcomes, such as addiction, cyberbullying, and mental health problems. This study aims to investigate the impact of social media use on the mental health of young adults.

Literature Review: The literature review highlights the existing research on the impact of social media use on mental health. The review shows that social media use is associated with depression, anxiety, stress, and other mental health problems. The review also identifies the factors that contribute to the negative impact of social media, including social comparison, cyberbullying, and FOMO.

Methods : A survey was administered to 200 university students to collect data on their social media use, mental health status, and perceived impact of social media on their mental health. The survey included questions on social media use, mental health status (measured using the DASS-21), and perceived impact of social media on their mental health. Data were analyzed using descriptive statistics and regression analysis.

Results : The results showed that social media use is positively associated with depression, anxiety, and stress. The study also found that social comparison, cyberbullying, and FOMO are significant predictors of mental health problems among young adults.

Discussion : The study’s findings suggest that social media use has a negative impact on the mental health of young adults. The study highlights the need for interventions that address the factors contributing to the negative impact of social media, such as social comparison, cyberbullying, and FOMO.

Conclusion : In conclusion, social media use has a significant impact on the mental health of young adults. The study’s findings underscore the need for interventions that promote healthy social media use and address the negative outcomes associated with social media use. Future research can explore the effectiveness of interventions aimed at reducing the negative impact of social media on mental health. Additionally, longitudinal studies can investigate the long-term effects of social media use on mental health.

Limitations : The study has some limitations, including the use of self-report measures and a cross-sectional design. The use of self-report measures may result in biased responses, and a cross-sectional design limits the ability to establish causality.

Implications: The study’s findings have implications for mental health professionals, educators, and policymakers. Mental health professionals can use the findings to develop interventions that address the negative impact of social media use on mental health. Educators can incorporate social media literacy into their curriculum to promote healthy social media use among young adults. Policymakers can use the findings to develop policies that protect young adults from the negative outcomes associated with social media use.

References :

  • Twenge, J. M., & Campbell, W. K. (2019). Associations between screen time and lower psychological well-being among children and adolescents: Evidence from a population-based study. Preventive medicine reports, 15, 100918.
  • Primack, B. A., Shensa, A., Escobar-Viera, C. G., Barrett, E. L., Sidani, J. E., Colditz, J. B., … & James, A. E. (2017). Use of multiple social media platforms and symptoms of depression and anxiety: A nationally-representative study among US young adults. Computers in Human Behavior, 69, 1-9.
  • Van der Meer, T. G., & Verhoeven, J. W. (2017). Social media and its impact on academic performance of students. Journal of Information Technology Education: Research, 16, 383-398.

Appendix : The survey used in this study is provided below.

Social Media and Mental Health Survey

  • How often do you use social media per day?
  • Less than 30 minutes
  • 30 minutes to 1 hour
  • 1 to 2 hours
  • 2 to 4 hours
  • More than 4 hours
  • Which social media platforms do you use?
  • Others (Please specify)
  • How often do you experience the following on social media?
  • Social comparison (comparing yourself to others)
  • Cyberbullying
  • Fear of Missing Out (FOMO)
  • Have you ever experienced any of the following mental health problems in the past month?
  • Do you think social media use has a positive or negative impact on your mental health?
  • Very positive
  • Somewhat positive
  • Somewhat negative
  • Very negative
  • In your opinion, which factors contribute to the negative impact of social media on mental health?
  • Social comparison
  • In your opinion, what interventions could be effective in reducing the negative impact of social media on mental health?
  • Education on healthy social media use
  • Counseling for mental health problems caused by social media
  • Social media detox programs
  • Regulation of social media use

Thank you for your participation!

Applications of Research Paper

Research papers have several applications in various fields, including:

  • Advancing knowledge: Research papers contribute to the advancement of knowledge by generating new insights, theories, and findings that can inform future research and practice. They help to answer important questions, clarify existing knowledge, and identify areas that require further investigation.
  • Informing policy: Research papers can inform policy decisions by providing evidence-based recommendations for policymakers. They can help to identify gaps in current policies, evaluate the effectiveness of interventions, and inform the development of new policies and regulations.
  • Improving practice: Research papers can improve practice by providing evidence-based guidance for professionals in various fields, including medicine, education, business, and psychology. They can inform the development of best practices, guidelines, and standards of care that can improve outcomes for individuals and organizations.
  • Educating students : Research papers are often used as teaching tools in universities and colleges to educate students about research methods, data analysis, and academic writing. They help students to develop critical thinking skills, research skills, and communication skills that are essential for success in many careers.
  • Fostering collaboration: Research papers can foster collaboration among researchers, practitioners, and policymakers by providing a platform for sharing knowledge and ideas. They can facilitate interdisciplinary collaborations and partnerships that can lead to innovative solutions to complex problems.

When to Write Research Paper

Research papers are typically written when a person has completed a research project or when they have conducted a study and have obtained data or findings that they want to share with the academic or professional community. Research papers are usually written in academic settings, such as universities, but they can also be written in professional settings, such as research organizations, government agencies, or private companies.

Here are some common situations where a person might need to write a research paper:

  • For academic purposes: Students in universities and colleges are often required to write research papers as part of their coursework, particularly in the social sciences, natural sciences, and humanities. Writing research papers helps students to develop research skills, critical thinking skills, and academic writing skills.
  • For publication: Researchers often write research papers to publish their findings in academic journals or to present their work at academic conferences. Publishing research papers is an important way to disseminate research findings to the academic community and to establish oneself as an expert in a particular field.
  • To inform policy or practice : Researchers may write research papers to inform policy decisions or to improve practice in various fields. Research findings can be used to inform the development of policies, guidelines, and best practices that can improve outcomes for individuals and organizations.
  • To share new insights or ideas: Researchers may write research papers to share new insights or ideas with the academic or professional community. They may present new theories, propose new research methods, or challenge existing paradigms in their field.

Purpose of Research Paper

The purpose of a research paper is to present the results of a study or investigation in a clear, concise, and structured manner. Research papers are written to communicate new knowledge, ideas, or findings to a specific audience, such as researchers, scholars, practitioners, or policymakers. The primary purposes of a research paper are:

  • To contribute to the body of knowledge : Research papers aim to add new knowledge or insights to a particular field or discipline. They do this by reporting the results of empirical studies, reviewing and synthesizing existing literature, proposing new theories, or providing new perspectives on a topic.
  • To inform or persuade: Research papers are written to inform or persuade the reader about a particular issue, topic, or phenomenon. They present evidence and arguments to support their claims and seek to persuade the reader of the validity of their findings or recommendations.
  • To advance the field: Research papers seek to advance the field or discipline by identifying gaps in knowledge, proposing new research questions or approaches, or challenging existing assumptions or paradigms. They aim to contribute to ongoing debates and discussions within a field and to stimulate further research and inquiry.
  • To demonstrate research skills: Research papers demonstrate the author’s research skills, including their ability to design and conduct a study, collect and analyze data, and interpret and communicate findings. They also demonstrate the author’s ability to critically evaluate existing literature, synthesize information from multiple sources, and write in a clear and structured manner.

Characteristics of Research Paper

Research papers have several characteristics that distinguish them from other forms of academic or professional writing. Here are some common characteristics of research papers:

  • Evidence-based: Research papers are based on empirical evidence, which is collected through rigorous research methods such as experiments, surveys, observations, or interviews. They rely on objective data and facts to support their claims and conclusions.
  • Structured and organized: Research papers have a clear and logical structure, with sections such as introduction, literature review, methods, results, discussion, and conclusion. They are organized in a way that helps the reader to follow the argument and understand the findings.
  • Formal and objective: Research papers are written in a formal and objective tone, with an emphasis on clarity, precision, and accuracy. They avoid subjective language or personal opinions and instead rely on objective data and analysis to support their arguments.
  • Citations and references: Research papers include citations and references to acknowledge the sources of information and ideas used in the paper. They use a specific citation style, such as APA, MLA, or Chicago, to ensure consistency and accuracy.
  • Peer-reviewed: Research papers are often peer-reviewed, which means they are evaluated by other experts in the field before they are published. Peer-review ensures that the research is of high quality, meets ethical standards, and contributes to the advancement of knowledge in the field.
  • Objective and unbiased: Research papers strive to be objective and unbiased in their presentation of the findings. They avoid personal biases or preconceptions and instead rely on the data and analysis to draw conclusions.

Advantages of Research Paper

Research papers have many advantages, both for the individual researcher and for the broader academic and professional community. Here are some advantages of research papers:

  • Contribution to knowledge: Research papers contribute to the body of knowledge in a particular field or discipline. They add new information, insights, and perspectives to existing literature and help advance the understanding of a particular phenomenon or issue.
  • Opportunity for intellectual growth: Research papers provide an opportunity for intellectual growth for the researcher. They require critical thinking, problem-solving, and creativity, which can help develop the researcher’s skills and knowledge.
  • Career advancement: Research papers can help advance the researcher’s career by demonstrating their expertise and contributions to the field. They can also lead to new research opportunities, collaborations, and funding.
  • Academic recognition: Research papers can lead to academic recognition in the form of awards, grants, or invitations to speak at conferences or events. They can also contribute to the researcher’s reputation and standing in the field.
  • Impact on policy and practice: Research papers can have a significant impact on policy and practice. They can inform policy decisions, guide practice, and lead to changes in laws, regulations, or procedures.
  • Advancement of society: Research papers can contribute to the advancement of society by addressing important issues, identifying solutions to problems, and promoting social justice and equality.

Limitations of Research Paper

Research papers also have some limitations that should be considered when interpreting their findings or implications. Here are some common limitations of research papers:

  • Limited generalizability: Research findings may not be generalizable to other populations, settings, or contexts. Studies often use specific samples or conditions that may not reflect the broader population or real-world situations.
  • Potential for bias : Research papers may be biased due to factors such as sample selection, measurement errors, or researcher biases. It is important to evaluate the quality of the research design and methods used to ensure that the findings are valid and reliable.
  • Ethical concerns: Research papers may raise ethical concerns, such as the use of vulnerable populations or invasive procedures. Researchers must adhere to ethical guidelines and obtain informed consent from participants to ensure that the research is conducted in a responsible and respectful manner.
  • Limitations of methodology: Research papers may be limited by the methodology used to collect and analyze data. For example, certain research methods may not capture the complexity or nuance of a particular phenomenon, or may not be appropriate for certain research questions.
  • Publication bias: Research papers may be subject to publication bias, where positive or significant findings are more likely to be published than negative or non-significant findings. This can skew the overall findings of a particular area of research.
  • Time and resource constraints: Research papers may be limited by time and resource constraints, which can affect the quality and scope of the research. Researchers may not have access to certain data or resources, or may be unable to conduct long-term studies due to practical limitations.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Research Paper Citation

How to Cite Research Paper – All Formats and...

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Paper Formats

Research Paper Format – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

  • Reference Manager
  • Simple TEXT file

People also looked at

Editorial article, editorial: capturing talk: the institutional practices surrounding the transcription of spoken language.

research paper in different languages

  • 1 Research Hub for Language in Forensic Evidence, The University of Melbourne, Parkville, VIC, Australia
  • 2 Aston Institute for Forensic Linguistics, Aston University, Birmingham, United Kingdom
  • 3 Department of Communication and Media, School of Social Science and Humanities, Loughborough University, Loughborough, United Kingdom
  • 4 Netherlands Institute for the Study of Crime and Law Enforcement (NSCR), Amsterdam, Netherlands

Editorial on the Research Topic Capturing talk: the institutional practices surrounding the transcription of spoken language

Transcripts are a ubiquitous feature of virtually all modern institutions, many of which would be unable to function without them. Nevertheless, transcription remains an under-researched subject—a situation that Capturing talk: the institutional practices surrounding the transcription of spoken language seeks to remedy.

The initial aim of this Research Topic was to expose and examine under-appreciated features of “entextualization” (the process of representing spoken language as written text). One of these features is the fact that a transcript can only ever be a representation of speech, not a copy—and thus can never represent speech exactly. Another feature, well-articulated by Sarangi (1998) , is the unequal power over the process of transcription exercised by, on the one hand, the speakers whose voices are represented, and, on the other, by those controlling the transcription process.

Where Sarangi's interest was mainly in health and social services institutions, the present Research Topic has a leaning toward legal institutions, where, arguably, these power inequalities are even more starkly contrasted—as demonstrated by the territory-defining volume ( Heffer et al., 2013 ).

Four of the papers in this Research Topic deal with police interviews, providing insight into differing practices across jurisdictions and type of interview (e.g., whether with witnesses or suspects). Several papers examine the practice of converting an interview into a “statement,” written up by the officers who conduct the interviews. Beginning with interviews with witnesses in England and Wales (E&W), Milne et al. analyze a sample of such statements against transcripts produced by the researchers from an audio recording. The omissions, additions, distortions, and other errors in the police versions give cause for deep concern.

An extended study analyzing the creation of records of interviews with suspects in the Netherlands is recounted by Komter , which, again, contrasts transcripts prepared by police interviewers, with the author's transcripts prepared from audio recordings. Again, many concerning limitations on the police transcripts are observed and analyzed. However, while her own transcripts are far more detailed, Komter acknowledges that she too is necessarily selective in what she chooses to represent, guided by the evolving research questions she seeks to investigate.

One practice Komter discusses is that of police records presenting an interview as a monolog, in the voice of the interviewee, rather than as the question-and-answer dialogue it actually was. This practice is also investigated by Eerland and van Charldorp , again focusing on the Dutch context. These authors study how readers of the statements were influenced by three different styles of reporting (monolog, dialogue and narrative), with the troubling finding that the style of reporting affected perceptions of the statements' accuracy and comprehensibility.

In many jurisdictions, police interviews with suspects are routinely audio- or video-recorded. However, this does not signal the end of problems with the representation of these high-stakes interactions. The last of our interview papers is Haworth et al. , which summarizes the key findings to date of an ongoing study of the transcription of electronic records of interviews with suspects in E&W. It demonstrates a range of problems with official police transcripts even when these ostensibly capture the dialogue “verbatim,” and proposes that consistency, accuracy, and neutrality are the foundational features that should underpin any police interview transcript.

A second group of papers studies transcription in non-legal institutional settings. Holder et al. delves into two very large and highly structured organizations with serious security needs: NASA and the US Military. Both make extensive use of audio and video recordings capturing employees as they work—with transcripts produced either routinely, or on demand. The authors look into the two organizations' use of these transcripts, again comparing the official transcripts with their own transcripts of selected sections, using conversation analysis (CA) conventions.

Park and Hepburn also examine CA-style transcripts. Taking as an example Rachel Mitchell's interview of US Supreme Court nominee Brett Kavanaugh about his alleged historical sexual misconduct, these authors compare the information retrievable from a richly detailed Jeffersonian transcript with an orthographic transcript that “wipes out” or “skates over” crucial aspects of speech used by speakers and listeners in constructing the message expressed by the speech.

Another institutional use of transcripts covered in Capturing Talk concerns workers on the assembly line of a small factory in Sweden. Carlsson and Harari report an observation-and-interview study of the instruction manuals created by the workers. While they find much to commend in the retention of power by the creators and users of the manuals, the authors observe room for improvement in the “information design” of the texts, recommending that consultation of linguistics experts could offer benefits.

Voutilainen showcases the high quality of transcripts produced as an official record of the complex and challenging multicultural discussions of wide-ranging Research Topics covered by the parliament in Finland. His account demonstrates how much thought, research and work goes into managing all the factors that need to be considered to create transcripts of this standard.

In a return to the legal setting, a further group of papers examines transcripts of forensic audio, i.e., recordings of speech used as evidence in criminal trials. These are often of very poor quality, meaning that the transcript is intended not as a record of what was said, but as assistance to the court in determining what was said. Internationally, it is common for such transcripts to be provided by police investigating the case. While the courts recognize that police transcripts might contain errors, they rely on judges and/or juries being able to check the transcript against the audio. This ignores well-established research findings that the very act of checking a transcript can cause the listener to hear in line with the transcript, even if it is demonstrably false. For this reason, linguists sometimes recommend that, to ensure accuracy, transcripts should be produced by independent experts in transcription.

However, mere independence may not be enough, and Love and Wright point out some important caveats around this recommendation. They had eight trained transcribers produce transcripts of poor-quality forensic-like audio—finding huge divergences in the content of the transcripts (< 3% of conversational turns were transcribed consistently by all eight participants). This demonstrates that transcribing poor-quality forensic audio needs not just expertise in linguistics, but a managed, evidence-based method.

Recently, a common response to any discussion of the difficulty of transcribing poor-quality audio has been: “Why not let AI do it?” Loakes investigates this suggestion, finding that, while modern automatic speech recognition (ASR) systems are extremely efficient at transcribing good-quality audio, their performance on poor-quality forensic-like audio is low. Even the best-performing system, Whisper, scored only around 50% accuracy, with others far lower.

Harrington also observed low scores for ASR transcripts of poor-quality forensic-like audio. Bridging two of the main areas considered in this Research Topic, she also trialed ASR on recordings of police interviews. The resulting transcripts, though not problem-free, score far higher than those of covert recordings, with errors easier to identify. Harrington makes innovative recommendations for how ASR could be used as a “first draft” interview transcript, to be refined via human transcribers.

Two papers consider the transcription and translation of forensic audio featuring languages other than English. Gilbert and Heydon look at translated transcripts of Vietnamese recordings used as evidence in a drug-related trial. They point out significant errors in the translations, but note that, unless the defense goes to the expense of hiring their own translator/interpreter, such errors are unlikely to be detected—and suggest that audio in languages other than English is often admitted with inadequately tested translations.

Lai presents results of a large national survey of the practices and concerns of translators and interpreters who undertake forensic casework across a wide range of languages. Here, too, results indicate a number of important deficiencies in current practice for translating forensic audio featuring languages other than English—and Lai makes valuable recommendations for improvement.

Finally, taking an authoritative overview of the key issues relevant to this Research Topic, Fraser provides a systematic review of interdisciplinary research on transcripts and transcription, and sets out a series of interacting factors that are known to affect a transcript's reliability. Using examples from a range of legal and academic situations, Fraser argues that, to ensure a transcript is suitable for its intended purpose, it is essential that all the factors be appropriately managed.

Taken as a whole, Capturing Talk amplifies two observations made in both Sarangi (1998) and Heffer et al. (2013) , which, though not the exclusive focus of any individual paper, are highlighted throughout the Research Topic. First, the strong role that context inevitably plays in the interpretation of a transcript implies that “recontextualization” (using a transcript in a context other than the one it was created in) is likely to change its interpretation. Second, even the most expert linguistic analysis of transcripts produced by others is not itself a neutral or “objective” activity. However, this does not mean that such analysis must be “subjective” in any limiting sense. Rather it indicates a need for transcripts to be produced and analyzed by independent, context-aware experts able to devote appropriate attention to all relevant factors.

Most importantly, all contributions to Capturing Talk emphasize that transcription is far from the simple transduction of “sounds” into letters that it is often assumed to be by those who have not studied its intricacies. It is a highly complex and fascinating Research Topic worthy of taking its place as a dedicated field of research in its own right, particularly in view of the widespread misconceptions and unhelpful language ideologies that still beset the institutional practices surrounding the transcription of spoken language.

Author contributions

HF: Writing – original draft, Writing – review & editing. KH: Writing – review & editing. FD: Writing – review & editing. DL: Writing – review & editing. ER: Writing – review & editing. MK: Writing – review & editing.

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Heffer, C., Rock, F., and Conley, J. (2013). Legal-Lay Communication: Textual Travels in the Law . Oxford: Oxford University Press.

Google Scholar

Sarangi, S. (1998). Rethinking recontextualization in professional discourse studies: an epilogue. Text Talk 18, 301–318. doi: 10.1515/text.1.1998.18.2.301

Crossref Full Text | Google Scholar

Keywords: transcription, misconceptions about language and linguistics, language ideologies, forensic linguistics, forensic transcription, police interviews and interrogations, entextualization

Citation: Fraser H, Haworth K, Deamer F, Loakes D, Richardson E and Komter M (2024) Editorial: Capturing talk: the institutional practices surrounding the transcription of spoken language. Front. Commun. 9:1417465. doi: 10.3389/fcomm.2024.1417465

Received: 15 April 2024; Accepted: 22 April 2024; Published: 08 May 2024.

Edited and reviewed by: Mila Vulchanova , NTNU, Norway

Copyright © 2024 Fraser, Haworth, Deamer, Loakes, Richardson and Komter. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Helen Fraser, helen.fraser@unimelb.edu.au

This article is part of the Research Topic

Capturing Talk: The Institutional Practices Surrounding the Transcription of Spoken Language

Help | Advanced Search

Computer Science > Computation and Language

Title: how do languages influence each other studying cross-lingual data sharing during llm fine-tuning.

Abstract: Multilingual large language models (MLLMs) are jointly trained on data from many different languages such that representation of individual languages can benefit from other languages' data. Impressive performance on zero-shot cross-lingual transfer shows that these models are capable of exploiting data from other languages. Yet, it remains unclear to what extent, and under which conditions, languages rely on each other's data. In this study, we use TracIn (Pruthi et al., 2020), a training data attribution (TDA) method, to retrieve the most influential training samples seen during multilingual fine-tuning for a particular test language. This allows us to analyse cross-lingual sharing mechanisms of MLLMs from a new perspective. While previous work studied cross-lingual sharing at the level of model parameters, we present the first approach to study cross-lingual sharing at the data level. We find that MLLMs rely on data from multiple languages from the early stages of fine-tuning and that this reliance gradually increases as fine-tuning progresses. We further study how different fine-tuning languages influence model performance on a given test language and find that they can both reinforce and complement the knowledge acquired from data of the test language itself.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Study of automatic text summarization approaches in different languages

  • Published: 12 February 2021
  • Volume 54 , pages 5897–5929, ( 2021 )

Cite this article

research paper in different languages

  • Yogesh Kumar   ORCID: orcid.org/0000-0002-2879-0441 1 ,
  • Komalpreet Kaur 2 &
  • Sukhpreet Kaur 3  

2848 Accesses

24 Citations

1 Altmetric

Explore all metrics

Nowadays we see huge amount of information is available on both, online and offline sources. For single topic we see hundreds of articles are available, containing vast amount of information about it. It is really a difficult task to manually extract the useful information from them. To solve this problem, automatic text summarization systems are developed. Text summarization is a process of extracting useful information from large documents and compressing them into short summary preserving all important content. This survey paper hand out a broad overview on the work done in the field of automatic text summarization in different languages using various text summarization approaches. The focal centre of this survey paper is to present the research done on text summarization on Indian languages such as, Hindi, Punjabi, Bengali, Malayalam, Kannada, Tamil, Marathi, Assamese, Konkani, Nepali, Odia, Sanskrit, Sindhi, Telugu and Gujarati and foreign languages such as Arabic, Chinese, Greek, Persian, Turkish, Spanish, Czeh, Rome, Urdu, Indonesia Bhasha and many more. This paper provides the knowledge and useful support to the beginner scientists in this research area by giving a concise view on various feature extraction methods and classification techniques required for different types of text summarization approaches applied on both Indian and non-Indian languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research paper in different languages

Similar content being viewed by others

research paper in different languages

A Survey on Automated Text Summarization System for Indian Languages

research paper in different languages

A Survey of Distinctive Prominence of Automatic Text Summarization Techniques Using Natural Language Processing

research paper in different languages

A Concise Review on Automatic Text Summarization

Ali M, Wagan AI (2017) Sentiment summerization and analysis of Sindhi text. Int J Adv Comp Sci Appl, pp 296–300.

Azmi AM, Al-Thanyyan S (2012) A text summarizer for Arabic. Computer Speech Lang, pp 260–273.

Bahloul B, Aliane H, Benmohammed M (2019) ArA*summarizer: An Arabic text summarization system based on subtopic segmentation and using an A* algorithm for reduction. Wiley Expert systems, New York, pp 1–16.

Balabantaray RC, Sahoo B, Sahoo DK, Swain M (2012) Odia text summarization using stemmer. Int J Appl Inf Syst (IJAIS), pp 21–24.

Baotian H, Qingcai C, Fangze Z (2015) Lcsts: a large scale Chinese short text summarization dataset. arXiv preprint arXiv, pp 1–6.

Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GRAPHSUM: discovering correlations among multiple terms for graph-based summarization. Inf Sci, pp 96–109.

Baruah N, Sarma S, Borkotokey S (2019) Text summarization in Indian languages: a critical review. In: IEEE second international conference on advanced computational and communication paradigms (ICACCP), pp 1–6

Baruah N, Sarma SK, Borkotokey S (2020) Evaluation of content compaction in Assamese language. Third international conference on computing and network communications (CoCoNet’19), pp 2275–2284.

Bashir M, Rozaimee A, Wan M, Isa W (2017) Automatic Hausa language text summarization based on feature extraction using Naïve Bayes model. World Appl Sci J 35(9):2074–2080

Google Scholar  

Belkebir R, Guessoum A (2015) A supervised approach to Arabic text summarization using adaboost. In: Springer New contributions in information systems and technologies, pp 227–236

Berenjkoob M, Mehri R, Khosravi H, Nematbakhsh MA (2009) A method for stemming and eliminating common words for Persian text summarization. In: IEEE International conference on natural language processing and knowledge engineering, pp 1–6

Bhatia N, Jaiswal A (2016) Automatic text summarization and it's methods-a review. In: IEEE 6th international conference-cloud system and big data engineering (Confluence), pp 65–72

Biswas S, Acharya S, Dash S (2015) Automatic text summarization for Oriya language. Int J Comp Appl, pp 19–26.

Bois R, Levelling J, Goeuriot L, Jones GJF, Kelly L (2014) Porting a summarizer to the French language. 21ème Traitement Automatique des Langues Naturelles, Marseille, pp 550–555.

Breem SN, Baraka RS (2017) Automatic arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. In: Palestinian International Conference on Information and Communication Technology, pp 40–45.

Burney A, Sami B, Mahmood N, Abbas Z, Rizwan K (2012) Urdu text summarizer using sentence weight algorithm for word processors. Int J Comp Appl, pp 38–43

Cigir C, Kutlu M, Cicekli I (2009) Generic text summarization for Turkish. In: IEEE 24th International symposium on computer and information sciences, pp 224–229

Cunha ID, Juan ES, Torres-Moreno J-M, Cabre MT, Sierra G (2012) A symbolic approach for automatic detection of nuclearity and rhetorical relations among intra-sentence discourse segments in Spanish. In: International conference on intelligent text processing and computational linguistics. Springer, Heidelberg, pp 462–475.

D’Silva, J, Sharma U (2020) Unsupervised automatic text summarization of Konkani texts using K-means with Elbow method. Int J Eng Res Technol, pp 2380–2384.

Dalal V, Malik L (2017) Data clustering approach for automatic text summarization of Hindi documents using particle swarm optimization and semantic graph. In: International Journal of Soft Computing and Engineering (IJSCE), pp 1–3.

Das A, Bandyopadhyay S (2010) Topic-based Bengali opinion summarization. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 232–240

Eddy BP, Robinson JA, Kraft NA, Carver JC (2013) Evaluating source code summarization techniques: replication and expansion. In: 21st International conference on program comprehension (ICPC), IEEE, pp 13–22.

Eduard H, Chin-Yew L (1998) Automated text summarization and the SUMMARIST system. workshop on held at Baltimore. Maryland, Association for Computational Linguistics, pp 197–214

Fachrurrozi M, Yusliani N, Yoanita RU (2013) Frequent term-based text summarization for bahasa Indonesia. In: International Cconference on innovations in engineering and technology (ICIET'), pp 30–32

Fejer HN, Omar N (2014) Automatic Arabic text summarization using clustering and keyphrase extraction. In: International conference on information technology and multimedia (ICIMU), pp 293–298.

Florescu C, Jin W (2019) A supervised keyphrase extraction system based on graph representation learning. European conference on information retrieval, pp 197–212.

Fowkes J, Chanthirasegaran P, Ranca R, Allamanis M, Lapata M, Sutton C (2017) Autofolding for source code summarization. IEEE Trans Softw Eng, pp 1095–1109

Geetha JK, (2015) Kannada text summarization using Latent Semantic Analysis. In: IEEE International conference on advances in computing, communications and informatics (ICACCI), pp 1508–1512

Gulati AN, Sawarkar SD (2017) A novel technique for multi-document Hindi text summarization. In: IEEE International conference on Nascent technologies in engineering (ICNTE), pp 1–6

Gupta V, Singh GL (2012) Automatic Punjabi text extractive summarization system. In: Proceedings of COLING: Demonstration Papers, pp 191–198

Gupta V, Singh GL (2013) Automatic text summarization system for Punjabi language. J Emerg Technol Web Intell, pp 257–271

Haiduc S, Aponte J, Marcus A (2010) Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering. ACM, New York, pp 223–226.

Hammad M, Abuljadayel A, Khalaf M (2016) Summarizing services of java packages. Lecture Notes on Software Engineering, pp 129–132.

Haroon RP (2015) An extractive Malayalam document summarization based on graph theoretic approach. In: IEEE Fifth international conference on e-Learning (econf), pp 237–240

Hassel M, Dalianis H (2012) Portable text summarization. In: Applied natural language processing: identification, investigation and resolution, pp 17–32

Hendrickx I, Daelemans W, Marsi E, Krahmer E (2009) Reducing redundancy in multi-document summarization using lexical semantic similarity. In: Proceedings of the 2009 Workshop on language generation and summarisation, ACL-IJCNLP, pp 63–66.

Hidayat EY, Firdausillah F, Hastuti K, Ika ND, Azhari (2015) Automatic text summarization using latent drichlet allocation (LDA) for document clustering. Int J Adv Intell Informatics, pp 132–139

Hu P, Tingting H, Donghong J, Meng W (2004) A study of Chinese text summarization using adaptive clustering of paragraphs. In: IEEE Fourth international conference on computer and information technology, pp 1159–1164

Humayoun M, Hwanjo Y (2016) Analyzing pre-processing settings for Urdu single-document extractive summarization. In: Proceedings of the tenth international conference on language resources and evaluation (LREC), pp. 3686–3693

Jassem K, Pawluczuk L (2015) Automatic summarization of Polish news articles by sentence selection. Federated Conference on Computer Science And Information Systems, pp 1–5.

Jayashree, Murthy KS (2011) An analysis of sentence level text classification for the Kannada language. In: IEEE International conference of soft computing and pattern recognition (SoCPaR), pp. 147–151

Jeong H (2013) Efficient keyword extraction and text summarization for reading articles on a smart phone. Comput Informatics, pp 1001–1016.

Kabeer R, Idicula MS (2014) Text summarization for Malayalam documents-an experience. In: IEEE International conference on data science & engineering (ICDSE), pp 145–150

Kamimura M, Murphy GC (2013) Towards generating human-oriented summaries of unit test cases. In: 21st International conference on program comprehension (ICPC), IEEE, pp 215–218.

Khan A, Naomie S (2014) A review on abstractive summarization methods. J Theor Appl Inf Technol, pp 64–72

Kopeć M (2019) Three-step coreference-based summarizer for Polish news texts. Poznań Studies in Contemporary Linguistics, pp 397–443.

Kutlu M, Cıgır C, Cicekli I (2010) Generic text summarization for Turkish. Comp J, pp 1315–1323

Lagrini S, Redjimi M, Azizi N (2017) Automatic arabic text summarization approaches. Int J Computer Appl, pp 31–37

Lee D, Shin M, Whang T, Cho S, Ko B, Lee D, Kim E, Jo J (2020) Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization. pp 1–13

Lehto N, Sjodin M (2019) Automatic text summarization of Swedish news articles. Eng Tech 1–12

Liu W, Wang L (2017) Efficient Korean text summarization based on key phrase extraction. In: International conference on machine learning and cybernetics, pp 61–66.

Maaloul MH, keskes I, Belguith LH, Blache P (2010) Automatic summarization of Arabic texts based on RST technique. In: Proceedings of the 12th international conference on enterprise information systems, pp 1–7.

Malamos AG, Ware MGJA (2005) Applying statistic-based algorithms for automated content summarization in Greek language, Jaoua, Ben, pp 1–8

Mao X, Yang H, Huanga S, Liua Y, Li R (2019) Extractive summarization using supervised and unsupervised learning. Expert systems with applications, pp 173–181, 2019.

Mehrnoush S, Tara A, Erfani JM (2009) Parsumist: a Persian text summarizer. In: IEEE International conference on natural language processing and knowledge engineering, pp 1–7.

Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. Empirical methods in natural language processing (EMNLP). Barcelona, Spain, pp 404–411.

Mohamed M, Oussalah M (2019) SRL-ESA-TextSum: a text summarization approach based on semantic role labeling and explicit semantic analysis. Information Processing & Management, pp 1356–1372.

Mohan MJ, Sunitha C, Ganesha A, Jaya A (2016) A study on ontology based abstractivesSummarization. Procedia Computer Science, pp 32–37.

Moratanch N, Chitrakala S (2017) A survey on extractive text summarization. In: 2017 International conference on computer, communication and signal processing (ICCCSP), pp 1–6.

Movshovitz-Attias D, Cohen WW (2013) Natural language models for predicting programming comments. In: Proceedings of the 51st annual meeting of the association for computational linguistics, pp 35–40.

Nagaprasad S, Vijayapal Reddy P, Vinaya Babu A (2015) Authorship Attribution based on Data Compression for Telugu Text. Int J Comput Appl 110(1):1–5

Nallapati R, Zhai F, Zhou B (2017) SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents. Thirty-First aaai conference on artificial intelligence (AAAI-17), pp 3075–3081.

Nathani B, Joshi N, Purohit GN (2020) Design and development of unsupervised Stemmer for Sindhi language. In: International Conference on Computational Intelligence and Data Science (ICCIDS), pp. 1920–1927.

Ozsoy MG, Cicekli I, Alpaslan FN (2010) Text summarization of Turkish texts using latent semantic analysis. In: Proceedings of the 23rd International conference on computational linguistics, pp 869–876.

Parida S, Motlicek P (2019) Idiap abstract text summarization system for German text summarization task. SwissTex, pp 1–5.

Parveen D, Mesgar M, Strube M (2016) Generating coherent summaries of scientific articles using coherence patterns. Empirical methods in natural language processing, Texas: Austin, pp 772–783.

Periantu MS, Djoko BS (2017) Summarizing Indonesian text automatically by using sentence scoring and decision tree. In: IEEE 2nd International conferences on information technology, information systems and electrical engineering (ICITISEE), pp 1–6

Pontes EL, Huet S, Torres-Moreno J-M, Linhares AC (2018) Cross-language text summarization using sentence and multi-sentence compression. Natural Language Processing and Information Systems, pp 467–479.

Prasad, SN, Narsimha, VB, Reddy, PV, Babu, AV (2015) Influence of lexical, syntactic and structural features and their combination on Authorship Attribution for Telugu Text. In: International conference on intelligent computing, communication & convergence, pp 58–64.

Qassem LMA, Wanga D, Barada H, Rubaiea AA, Moosaa NA (2019) Automatic Arabic text summarization based on fuzzy logic. In: Proceedings of the 3rd international conference on natural language and speech processing, pp 42–48.

Raj MR, Haroon RP (2016) Malayalam text summarization: minimum spanning tree-based graph reduction approach. In: IEEE 2nd International conference on advances in computing, communication, & automation (ICACCA) (Fall), pp 1–5

Ramanujam, N, Kaliappan, M (2016) An automatic multidocument text summarization approach based on Naive Bayesian classifier using timestamp strategy. Sci World J, pp 1–11

Ranabhat R, Upreti A, Sangpang B, Manandhar S (2019) Salient sentence extraction of Nepali online health news texts. Int J Adv Soc Sci, pp 21–26.

Ren M, Kang S (2018) Korean news text summarizer enriched with major information items. Int J Adv Sci Technol, pp 115–126.

Rodeghero P, McMillan C, McBurney PW, Bosch N, Mello SD (2014) Improving automated source code summarization via an eye-tracking study of programmers. In:Proceedings of the 36th international conference on Software engineering, pp 390–401.

Rodrigues S, Fernandes S, Pai A (2019) Konkani text summarization by sentence extraction. In: 10th International conference on computing, communication and networking technologies (ICCCNT), pp 1–6.

Saggion H, Poibeau T (2013) Automatic text summarization: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, Heidelberg, pp 3–21

Sahoo D, Balabantaray R, Phukon M, Saikia S (2016) Aspect based multi-document dummarization. In: International conference on computing, communication and automation (ICCCA2016), pp 873–877.

Sakhare DY, Kumar R (2016) Syntactical knowledge and Sanskrit memamsa principle based hybrid approach for text summarization. Int J Comp Sci Inf Security (IJCSIS), pp 270–275.

Sarwadnya VV, Sonawane SS (2018) Marathi extractive text summarizer using graph based model. In: IEEE Fourth international conference on computing communication control and automation (ICCUBEA), pp 1–6

Shah P, Desai N (2016) A survey of automatic text summarization techniques for Indian and foreign languages. IEEE International conference on electrical, electronics, and optimization techniques (ICEEOT), pp 4598–4601

Shimpikar S, Govilkar S (2017) A survey of text summarization techniques for Indian regional languages. int J Comp Appl, pp. 29–33

Straka M, Mediankin N, Kocmi T, Zabokrtsky Z, Hudecek V, Ha J (2018) SumeCzech: large Czech News-based summarization dataset. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC), pp 3488–3495.

Sudha, DN, Latha YM (2020) Multi-document abstractive text summarization through semantic similarity matrix for Telugu language. Int J Adv Sci Technol, pp 513–521.

Syed SM, Shanmugasundaram H (2017) An investigation on graphical approach for tamil text summary generation. In: IEEE International conference on intelligent computing and control (I2C2), pp 1–5

Uddin MN, Khan SA (2007) A study on text summarization techniques and implement few of them for Bangla language. In: IEEE 10th international conference on computer and information technology, pp 1–4

Umadevi KS, Chopra R, Singh N, Aruru L, Kannan RJ (2018) Text summarization of Spanish documents. In: International conference on advances in computing, communications and informatics (ICACCI), pp 1793–1797.

Vijay S, Rai V, Gupta S, Vijayvargia A, Sharma MD (2017) Extractive text summarisation in hindi. In: IEEE International conference on Asian language processing (IALP), pp 318–32

Widyassari PA, Affandy NE, Fanani AZ, Syukur A, Basuki RS (2019) Literature review of automatic text summarization: research trend, dataset and method. In: IEEE International conference on information and communications technology (ICOIACT), pp 491–496.

Yu H, Kaufman YJ, Chin M, Feingold G, Remer LA, Anderson TL, Balkanski Y, Bellouin N, Boucher O, Christopher S, DeCola P, Kahn R, Koch D, Loeb N, Reddy MS, Schulz M, Takemura T, Zhou M (2006) A review of measurement-based assessments of the aerosol direct radiative effect and forcing. Atmos Chem Phys 6(3):613–666

Article   Google Scholar  

Download references

Author information

Authors and affiliations.

Department of Computer Science & Engineering, Chandigarh Group of Colleges, Landran, Mohali, India

Yogesh Kumar

Department of Electronic & Communications, Punjabi University, Patiala, India

Komalpreet Kaur

Department of Computer Science & Engineering, Chandigarh Engineering College, Landran, Mohali, India

Sukhpreet Kaur

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Yogesh Kumar .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Kumar, Y., Kaur, K. & Kaur, S. Study of automatic text summarization approaches in different languages. Artif Intell Rev 54 , 5897–5929 (2021). https://doi.org/10.1007/s10462-021-09964-4

Download citation

Accepted : 21 January 2021

Published : 12 February 2021

Issue Date : December 2021

DOI : https://doi.org/10.1007/s10462-021-09964-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Text summarization
  • Machine learning
  • Feature extraction
  • Classifications
  • Natural language processing
  • Find a journal
  • Publish with us
  • Track your research

How Much Research Is Being Written by Large Language Models?

New studies show a marked spike in LLM usage in academia, especially in computer science. What does this mean for researchers and reviewers?

research papers scroll out of a computer

In March of this year, a  tweet about an academic paper went viral for all the wrong reasons. The introduction section of the paper, published in  Elsevier’s  Surfaces and Interfaces , began with this line:  Certainly, here is a possible introduction for your topic. 

Look familiar? 

It should, if you are a user of ChatGPT and have applied its talents for the purpose of content generation. LLMs are being increasingly used to assist with writing tasks, but examples like this in academia are largely anecdotal and had not been quantified before now. 

“While this is an egregious example,” says  James Zou , associate professor of biomedical data science and, by courtesy, of computer science and of electrical engineering at Stanford, “in many cases, it’s less obvious, and that’s why we need to develop more granular and robust statistical methods to estimate the frequency and magnitude of LLM usage. At this particular moment, people want to know what content around us is written by AI. This is especially important in the context of research, for the papers we author and read and the reviews we get on our papers. That’s why we wanted to study how much of those have been written with the help of AI.”

In two papers looking at LLM use in scientific publishings, Zou and his team* found that 17.5% of computer science papers and 16.9% of peer review text had at least some content drafted by AI. The paper on LLM usage in peer reviews will be presented at the International Conference on Machine Learning.

Read  Mapping the Increasing Use of LLMs in Scientific Papers and  Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews  

Here Zou discusses the findings and implications of this work, which was supported through a Stanford HAI Hoffman Yee Research Grant . 

How did you determine whether AI wrote sections of a paper or a review?

We first saw that there are these specific worlds – like commendable, innovative, meticulous, pivotal, intricate, realm, and showcasing – whose frequency in reviews sharply spiked, coinciding with the release of ChatGPT. Additionally, we know that these words are much more likely to be used by LLMs than by humans. The reason we know this is that we actually did an experiment where we took many papers, used LLMs to write reviews of them, and compared those reviews to reviews written by human reviewers on the same papers. Then we quantified which words are more likely to be used by LLMs vs. humans, and those are exactly the words listed. The fact that they are more likely to be used by an LLM and that they have also seen a sharp spike coinciding with the release of LLMs is strong evidence.

Charts showing significant shift in the frequency of certain adjectives in research journals.

Some journals permit the use of LLMs in academic writing, as long as it’s noted, while others, including  Science and the ICML conference, prohibit it. How are the ethics perceived in academia?

This is an important and timely topic because the policies of various journals are changing very quickly. For example,  Science said in the beginning that they would not allow authors to use language models in their submissions, but they later changed their policy and said that people could use language models, but authors have to explicitly note where the language model is being used. All the journals are struggling with how to define this and what’s the right way going forward.

You observed an increase in usage of LLMs in academic writing, particularly in computer science papers (up to 17.5%). Math and  Nature family papers, meanwhile, used AI text about 6.3% of the time. What do you think accounts for the discrepancy between these disciplines? 

Artificial intelligence and computer science disciplines have seen an explosion in the number of papers submitted to conferences like ICLR and NeurIPS. And I think that’s really caused a strong burden, in many ways, to reviewers and to authors. So now it’s increasingly difficult to find qualified reviewers who have time to review all these papers. And some authors may feel more competition that they need to keep up and keep writing more and faster. 

You analyzed close to a million papers on arXiv, bioRxiv, and  Nature from January 2020 to February 2024. Do any of these journals include humanities papers or anything in the social sciences?  

We mostly wanted to focus more on CS and engineering and biomedical areas and interdisciplinary areas, like  Nature family journals, which also publish some social science papers. Availability mattered in this case. So, it’s relatively easy for us to get data from arXiv, bioRxiv, and  Nature . A lot of AI conferences also make reviews publicly available. That’s not the case for humanities journals.

Did any results surprise you?

A few months after ChatGPT’s launch, we started to see a rapid, linear increase in the usage pattern in academic writing. This tells us how quickly these LLM technologies diffuse into the community and become adopted by researchers. The most surprising finding is the magnitude and speed of the increase in language model usage. Nearly a fifth of papers and peer review text use LLM modification. We also found that peer reviews submitted closer to the deadline and those less likely to engage with author rebuttal were more likely to use LLMs. 

This suggests a couple of things. Perhaps some of these reviewers are not as engaged with reviewing these papers, and that’s why they are offloading some of the work to AI to help. This could be problematic if reviewers are not fully involved. As one of the pillars of the scientific process, it is still necessary to have human experts providing objective and rigorous evaluations. If this is being diluted, that’s not great for the scientific community.

What do your findings mean for the broader research community?

LLMs are transforming how we do research. It’s clear from our work that many papers we read are written with the help of LLMs. There needs to be more transparency, and people should state explicitly how LLMs are used and if they are used substantially. I don’t think it’s always a bad thing for people to use LLMs. In many areas, this can be very useful. For someone who is not a native English speaker, having the model polish their writing can be helpful. There are constructive ways for people to use LLMs in the research process; for example, in earlier stages of their draft. You could get useful feedback from a LLM in real time instead of waiting weeks or months to get external feedback. 

But I think it’s still very important for the human researchers to be accountable for everything that is submitted and presented. They should be able to say, “Yes, I will stand behind the statements that are written in this paper.”

*Collaborators include:  Weixin Liang ,  Yaohui Zhang ,  Zhengxuan Wu ,  Haley Lepp ,  Wenlong Ji ,  Xuandong Zhao ,  Hancheng Cao ,  Sheng Liu ,  Siyu He ,  Zhi Huang ,  Diyi Yang ,  Christopher Potts ,  Christopher D. Manning ,  Zachary Izzo ,  Yaohui Zhang ,  Lingjiao Chen ,  Haotian Ye , and Daniel A. McFarland .

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition.  Learn more . 

More News Topics

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Welcome to the Purdue Online Writing Lab

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

The Online Writing Lab at Purdue University houses writing resources and instructional material, and we provide these as a free service of the Writing Lab at Purdue. Students, members of the community, and users worldwide will find information to assist with many writing projects. Teachers and trainers may use this material for in-class and out-of-class instruction.

The Purdue On-Campus Writing Lab and Purdue Online Writing Lab assist clients in their development as writers—no matter what their skill level—with on-campus consultations, online participation, and community engagement. The Purdue Writing Lab serves the Purdue, West Lafayette, campus and coordinates with local literacy initiatives. The Purdue OWL offers global support through online reference materials and services.

A Message From the Assistant Director of Content Development 

The Purdue OWL® is committed to supporting  students, instructors, and writers by offering a wide range of resources that are developed and revised with them in mind. To do this, the OWL team is always exploring possibilties for a better design, allowing accessibility and user experience to guide our process. As the OWL undergoes some changes, we welcome your feedback and suggestions by email at any time.

Please don't hesitate to contact us via our contact page  if you have any questions or comments.

All the best,

Social Media

Facebook twitter.

Research in Different Languages. Learn How to Say and Translate

Research in Different Languages: Please find below many ways to say research in different languages. This page features translation of the word "research" to over 100 other languages. We also invite you to listen to audio pronunciation in more than 40 languages, so you could learn how to pronounce research and how to read it.

Saying Research in European Languages

Saying research in asian languages, saying research in middle-eastern languages, saying research in african languages, saying research in austronesian languages, saying research in other foreign languages.

Research in Different Languages - Image

  • research and development
  • research institute
  • research paper

Graduate College

Science of language.

Ashby Martin, a third-year doctoral student in the neuroscience program, didn’t always want to study the brain. Initially, he wanted to be a librarian. At a young age, he memorized his library card number and looked up to the librarians. “My favorite place to go was the library,” he said. “Librarians get to give people knowledge and resources all for free. They give everyone the same access.”  

Headshot of Ashby Martin

Buried in books, Martin found himself wanting to investigate things that were unknown; things that hadn’t been written down yet or even discovered, especially about the brain.   

This inclination led Martin to pursue neuroscience. Martin received his bachelor’s in neuroscience and behavior at the University of Notre Dame before joining Iowa’s doctoral program. The faculty, the connection to the hospital, and the foundational research that originated on campus were driving factors in his decision to come here. Martin specifically recalled Iowa’s psychological studies on patient S.M. (“The Woman with No Fear”) and meeting program director Dr. Dan Tranel.  

“That’s the past. [Tranel] is part of the present, and I could be part of the future. My research could be part of that future,” he says.  

Third-year doctoral student Ashby Martin. Photo provided by Ashby Martin.

Connecting through language  

At Iowa, Martin studies developmental neurolinguistics, particularly in young children who are bilingual in Spanish and English. His focus is on “numbers as language”, and he examines the neurological impact and visual representation of shifting between the individual’s multiple linguistic repertories through neurological imaging.   

One in three children under the age of eight speaks two languages. In Iowa, there are robust multilingual communities including West Liberty, Columbus Junction, and Amish populations who speak Pennsylvanian Dutch. As part of his studies, Martin has been able to connect with some of these multilingual communities in addition to participants in the Iowa City area.   

“Visually you can see the learning happening,” he says. “You can connect with people and share something local Iowans actually have . You can share effects that you see in their brains.”   

As someone who grew up speaking two languages, Martin has been able to use his Spanish to further connect with children who are involved in his studies and their parents. He noted that language barriers can impact a parent's involvement with their children’s activities , but being able to listen and respond to their questions in Spanish has bridged that gap.   

Martin notes that connecting with the parents has been a positive byproduct of this research. “Now there’s a parent who is not only engaged in research but is engaged with their kid in a new way that they maybe didn’t have access to before,” he describes. 

Martin leads a psychology and neuroscience station at a STEM event with students from West Liberty High School. Photo provided by Ashby Martin.

Expanding access to science  

One of the largest components of Martin’s research is the community impact. He recalls a story from Tranel, who also graduated from Notre Dame, about the implications of a university-required swim test. Despite its positive intentions, the test drew lines between students who had financial access to a pool and those who didn’t, emphasizing several considerations for research.   

“What is a good purpose? What is good execution? What is the back end of something that you are doing now, and how does it affect the local community?” Martin asks.

Martin leads a psychology and neuroscience station at a STEM event with students from West Liberty High School. Photo provided by Ashby Martin.

  Martin hopes that his work will shift people’s perspectives on language learning, especially modifying the mindset that one needs to achieve proficiency at an early age to learn a new language. Instead, Martin’s research emphasizes that developing a dual representation in the brain requires practice.   

Although he is only in his third year of his PhD, Martin hopes to eventually also publish in Spanish. One of his favorite parts of his work is addressing the lack of Spanish language representation in science by providing something that is normally only in English in Spanish. For Martin, this allows more people to be involved.  

With such a large emphasis on community engagement in his work, it’s no surprise that Martin can strike up a conversation with anyone over something as simple as the colors on a booth. For him, language is a common ground for developing connections with complete strangers.   

In the long-term, Martin hopes to bring his work to other countries outside of the United States to see if this dual representation presents in the same way across international multilingual populations. He describes this as seeing if it’s not just an “Iowa effect, but a human effect.”  

 For now, his team is focusing on bringing their technology out of the lab and into homes where language flows freely.  

IMAGES

  1. Research Papers in Foreign Languages

    research paper in different languages

  2. 💌 Research paper ideas for english. 75 Best Literature Research Paper

    research paper in different languages

  3. ⛔ Research papers on teaching english as a second language. Research

    research paper in different languages

  4. (PDF) Culture and its role in Teaching Foreign Languages

    research paper in different languages

  5. Research on language teaching and learning: 1999

    research paper in different languages

  6. The percentage of the articles in different languages

    research paper in different languages

VIDEO

  1. Linguistic evolution: how and why languages change

  2. FBISE SSC-II PRACTICAL PAPER: Different between getch() and getche()

  3. Variables in Research: Applied Linguistics

  4. Lesson 1: Writing a Research Paper

  5. paper different part 1

  6. Comparing and Contrasting 10 Languages in Asia

COMMENTS

  1. How do I read papers relevant to my research that are written in a

    Obviously, the best options are for you to get a friend who knows the language or learn the language yourself. But in the meantime, try this: Download the paper as a PDF file. If the paper is a little bit old or is composed of scanned images, use a good software to do Optical Character Recognition (Adobe Acrobat should do the trick).

  2. Researching language and cognition in bilinguals

    Research article First published online June 11, 2021 Researching language and cognition in bilinguals Bene Bassetti https://orcid.org/0000-0003-4547-4085 and Luna Filipović https://orcid.org/0000-0001-9460-9797 View all authors and affiliations Volume 26, Issue 1 https://doi.org/10.1177/13670069211022860 Contents PDF / ePub More Abstract Aims:

  3. Research Guides: Languages: Find Articles in Other Languages

    Finding Articles in Other Languages. Finding articles in specific languages is very similar to finding articles in English, but with some key changes: Use search terms in your target language. For example, to find articles about climate change in Spanish, try calentamiento global, ensuciamiento de aire, or gases de invernadero.

  4. The changing role of non‐English papers in scholarly communication

    Non-English languages are widely used, but their roles in scholarly communication are relatively under-explored. By using Web of Science's Science Citation Index Expanded (SCIE, 1900-2015), Social Sciences Citation Index (SSCI, 1900-2015), and Arts and Humanities Citation Index (A&HCI, 1975-2015), this study probes the patterns and dynamics of non-English papers by year, citation index ...

  5. The ethics of publishing in two languages

    The value of the broader English-based scientific, technical and medical publishing market in 2017 was over US$25 billion while the number of active English peer-reviewed journals amounted to 33,100 in mid-2018 (vs. 9400 non-English journals) (Johnson et al. 2018 ).

  6. Publishing in English or another language: An inclusive study of

    A third group of studies has a slightly different focus and investigates the motivations of scholars to write in English rather than in their native language; in this type of research, there seems to be consensus that the prestige and international recognition of English-language journals is the primary motivation for non-native speakers to ...

  7. The Psychology of Communication: The Interplay Between Language and

    From this research, controversy emerged in terms of whether color naming was more universal, with reported differences being due to different color terminologies (e.g., how many color terms were available in a language), rather than to language terminology that altered actual color perception (Berlin & Kay, 1969). Historically, aspects of this ...

  8. Research on learning and teaching of languages other than English in

    Articles on language pedagogy, which are closely associated with studies on language learners and their language development, have addressed a variety of topics, such as reading, pronunciation, task motivation and engagement, story retelling, group cohesion, and classroom management.

  9. Research Across Languages: Translating Data

    Research Across Languages: Translating Data. Research Design. Sep 14, 2022. The September 2022 Methodspace focus is on culture and research. Sometimes crossing cultural boundaries means collecting data from people who speak a language different from your own. This excerpt from the SAGE Research Methods Foundations entry by Gawlewicz (2019 ...

  10. Cross-lingual citations in English papers: a large-scale ...

    In this paper, we present an analysis of cross-lingual citations based on over one million English papers, spanning three scientific disciplines and a time span of three decades. Our investigation covers differences between cited languages and disciplines, trends over time, and the usage characteristics as well as impact of cross-lingual citations.

  11. Changing perceptions of language in sociolinguistics

    Research regarding language in multimodal constructs sees language and non-language resources as different modes, or types of resources. These different modes have boundaries, and efforts are made ...

  12. Language differences in qualitative research: is meaning lost in

    Qualitative research seeks to study meanings in subjective experiences. The relation between subjective experience and language is a two-way process; language is used to express meaning, but the other way round, language influences how meaning is constructed. Giving words to experiences is a complicated process as the meaning of experiences is ...

  13. Why Science's Universal Language Is a Problem for Research

    In a 2005 case study of Korean scientists living in the U.K., the researcher Kumju Hwang, then at the University of Leeds, wrote: "The reason that [non-native English-speaking scientists] have ...

  14. Different languages, similar encoding efficiency: Comparable

    We show here, using quantitative methods on a large cross-linguistic corpus of 17 languages, that the coupling between language-level (information per syllable) and speaker-level (speech rate) properties results in languages encoding similar information rates (~39 bits/s) despite wide differences in each property individually: Languages are more...

  15. Languages of the World

    The 2010 U.S. Census Bureau report found 169 Native North American languages to be spoken in the home, with a total speaking population of less than half a million. By far the largest is Navajo, with nearly 170,000. Central and South America are home to a few much larger languages, spoken by several million.

  16. Languages

    Languages, Volume 8, Issue 1 (March 2023) - 90 articles Cover Story ( view full-size image ): Speech pauses between conversational turns are crucial for assessing conversation partners' cognitive states; for example, speakers making longer pauses are regarded as less willing to grant requests.

  17. How to cite sources written in a different language than English?

    Assume one wants to cite a paper written in a different language than the article citing said paper (e.g., ones own article is written in English, but the source is French). How to properly handle citation in such a case? Do I cite the original (French) title or a translated title?

  18. Research Paper

    Actual research papers may have different structures, contents, and formats depending on the field of study, research question, data collection and analysis methods, and other factors. ... and accuracy. They avoid subjective language or personal opinions and instead rely on objective data and analysis to support their arguments. Citations and ...

  19. (PDF) EXPLORING THE IMPACT OF CULTURE ON LANGUAGE ...

    The research aimed at Developing the Language Repertoire of Non-Native Arabic Novice Learners by Using Web Based Semantic Fields in Light of the European Framework of Reference for Language ...

  20. Research paper in Different Languages. Translate, Listen, and Learn

    Research paper in Different Languages: Please find below many ways to say research paper in different languages. This page features translation of the word "research paper" to over 100 other languages. We also invite you to listen to audio pronunciation in more than 40 languages, so you could learn how to pronounce research paper and how to read it.

  21. Can I add research works in another languages (not English)?

    1 Recommendation. Ali Alhayany. University of Diyala. Yes, you can. There are dozens of papers published in different languages such as Spanish, Portuguese, Russian, Romanian and Persian, but the ...

  22. Frontiers

    The initial aim of this Research Topic was to expose and examine under-appreciated features of 'entextualisation' (the process of representing spoken language as written text). One of these features is the fact that a transcript can only ever be a representation of speech, not a copy -and thus can never represent speech exactly. Another feature, well articulated by Sarangi (1998), is the ...

  23. [2305.13286] How do languages influence each other? Studying cross

    Multilingual large language models (MLLMs) are jointly trained on data from many different languages such that representation of individual languages can benefit from other languages' data. Impressive performance on zero-shot cross-lingual transfer shows that these models are capable of exploiting data from other languages. Yet, it remains unclear to what extent, and under which conditions ...

  24. Study of automatic text summarization approaches in different languages

    Chinese language is an interesting area of research. This language contains more than 50,000 characters and each character is symbolic in nature (Movshovitz-Attias and Cohen 2013). Yu et al. in their research paper described an approach which gathers the original news from an on-line source and generate the summary sentences from them ...

  25. How Much Research Is Being Written by Large Language Models?

    In two papers looking at LLM use in scientific publishings, Zou and his team* found that 17.5% of computer science papers and 16.9% of peer review text had at least some content drafted by AI. The paper on LLM usage in peer reviews will be presented at the International Conference on Machine Learning.

  26. Welcome to the Purdue Online Writing Lab

    The Online Writing Lab at Purdue University houses writing resources and instructional material, and we provide these as a free service of the Writing Lab at Purdue.

  27. Research in Different Languages. Translate, Listen, and Learn

    Pronunciation: Research in Different Languages: Please find below many ways to say research in different languages. This page features translation of the word "research" to over 100 other languages. We also invite you to listen to audio pronunciation in more than 40 languages, so you could learn how to pronounce research and how to read it.

  28. The Science of Language: Neurolinguistic Research for Betterment of

    Martin hopes that his work will shift people's perspectives on language learning, especially modifying the mindset that one needs to achieve proficiency at an early age to learn a new language. Instead, Martin's research emphasizes that developing a dual representation in the brain requires practice.

  29. Language Evolution for Evading Social Media Regulation via LLM-based

    Studying the evolution of language in regulated social media contexts is of significant importance for ensuring freedom of speech, optimizing content moderation, and advancing linguistic research. This paper proposes a multi-agent simulation framework using Large Language Models (LLMs) to explore the evolution of user language in regulated ...

  30. Gender gaps in mathematics and language: The bias of competitive

    This research paper examines the extent to which high-stakes competitive tests affect gender gaps in standardized tests of Mathematics and Language. To this end, we estimate models that predict students' results in two national standardized tests: a test that does not affect students' educational trajectory, and a second test that determines access to the most selective universities in Chile.