Emerging Drivers
Big data
The speed and versatility of electronic communication and the growth of digital media and tools, as well as their accessibility, is underlying the success of the five ESFRI Landmarks in the SCI domain. From large-scale databases to virtual museums, the tools that these developments have fostered are changing the way in which research is carried out. From the old model of theorise/hypothesise/collect data/test/refine/conclude, scientific enquiry has now become much more data-driven and, at the same time, more theory- and method-dependent. The ability to rapidly access large bodies of texts in different languages, to examine music archives, to compare three-dimensional images, to analyse census and survey data from around the world provides research possibilities that were inconceivable 20 years ago and redefine Social Sciences and Humanities research.
The term Big Data started to be used to describe assemblages of data – data files, datasets, databases or data streams – that, in terms of their volume, their variety, and the velocity of creation, pose severe challenges for many conventional analytical and computational methods in SSH. Such data may be generated by machines through the operation of sensing and imaging devices – e.g. Radio-Frequency Identifiers, imaging equipment; by robotic analysis – e.g. genome wide scans; by social media interactions – e.g. Twitter feeds; mass-recordings of video magnetic tapes – e.g. video cassettes from last centuries art projects; or from the recording of administrative processes – e.g. hospital records, tax and benefit claims. In 2012, digital content grew to over 2.8 Zettabytes (ZB, 1021 bytes) to 8.5 ZB by 2015The Forum for Europe’s Language Technology Industry https://www.lt-innovate.org/sites/default/files/LTIpresentation_European_Data_Forum100413_0.pdf. Big Data technologies, tools, and services that turn this information overload into information gains are the next opportunity for competitive advantage, and Language Technology (LT) is a core Big Data technology. Growth in the volume and variety of data is mostly due to the accumulation of unstructured text data; in fact, up to 80% of all data is unstructured text datahttp://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-80-percentrule/. Moreover, the translation technology segment will continue to dominate the European LT market. RIs in LT are indispensable in breaking new ground. A common characteristic of Big Data in SSH is that they have significant research value in terms of the information contained either in its own right or when linked to other data sources. They can, for example, be used to extract information about preferences, or undiscovered relations between people, and therefore provide important snapshots of human activities and orientations. When data are collected over time, such collections will also contain information about how culture and society develop.
While data types are many and varied, their value for research relates to the depth of their content and the extent of their coverage and the possibility to link data from different sources, which in turn is a function of the processes by which such data are generated. For example, supermarket store card data derived from specific and self-selecting customers who shop at particular stores and feel motivated to use a store card to gain a loyalty bonus. With data from millions of shoppers, in particular when linked to social surveys or administrative sources, the information generated can be used to explore dietary patterns and to relate these information to geographical indicators of social deprivation. Data generated by social media interactions can be used to gauge the mood of users, their political affiliations, or to document popular interpretations of significant events – e.g. migration, riots, and virus outbreaks. Biosocial data, such as a genome-wide scan linked to longitudinal life course survey data, represent a special form of Big Data, with the potential to demonstrate the links between our health, well-being and lifestyles. These data are evidence bases as well as indicators of the effects of public policies.
Before the research value of Big Data for SSH can be realised, three important conditions must be met. First, the data must be accessible for research purposes, often through their availability, digitalisation and normalisation; second, the best possible metadata and methods need to be used to extract and interpret the information; and third, there should be clarity about how the data have been generated. While the first condition seems obvious to researchers, data holders may place restrictive conditions on research when individual- related data or commercial interests are involved. In addition, linking data from different sources substantially increases their scientific potential but raises many practical issues to be addressed: technical connected to data integration, and legal related to data protection.
New means for communicating and disseminating research
The mechanism for dissemination of research results emerges as one of the most important predictors of extra-academic impact. Open Access has gained momentum with the involvement of Governments from different countries and the support of funding agencies for research in order to create a strong Open Access Landscape in Europe. Open Access is just one component of Open Science – the movement to give access to data, research and publications and open up the whole research cycle for participation and collaboration. Initiatives such as the European Open Science Cloud (EOSC)European Open Science Cloud https://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud, or the Open Access mandate that goes along with the Horizon 2020 FrameworkH2020 Programme Guidelines to the Rules on Open Access to Scientific Publications and Open Access to Research Data in Horizon 2020 http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf, are strong messages to the whole scientific community. Yet, the development of Open Access for publications in SSH seems to lag behind other scientific disciplines. For example, the Directory of Open Access Journals indicates that only 41% of the journalsDirectory of Open Access Journals https://doaj.org – representing 23% of the articles – belong to the SSH area. One of the reasons explaining this divergence could be that a large part of the scientific production in SSH disciplines is published in books and not journals. Even if quantitative social research is often coming closer to other sciences in terms of methodological approaches, books still play an important role in the dissemination of knowledge for those disciplines. Open Access for academic books in SSH develops under different conditions than those we know for articles in the natural or hard sciences. The challenges are different in technical and economic terms as well as in usage, and there are many initiatives in different European countries, by the publishers, the libraries or the scientific communities themselves, which need to be better coordinated in the future.
The dissemination is increasingly not limited to the publications but to the data underlying publications. The request to make primary sources of SSH data available is increasing, both from funding agencies and publishers. This trend in open science not just for publications, but for data as well, requires new and enhanced capabilities to store and access different type of data in the future.
The EOSC aspires to become the European stakeholder-driven infrastructure for science and innovation. It will not only be a data repository, but will also comprise technical elements of connectivity, hardware, repositories, data formats and Application Programming Interfaces (APIs) and it will offer access to a wide range of user-oriented services, data-management, associated HPC analytics environments, stewardship services and, notably, expertise.
New forms of interdisciplinarity
Recently there has been an increase in the value and practice of interdisciplinary SSH research.
INTERNAL INTERDISCIPLINARITY. The traditional fragmentation of the area is being overcome: social sciences and humanities makes way for promising interactions. Disciplinary boundaries are gradually fading to make room for integrative and transversal research methods concerning the entire field of Social Sciences and Humanities. On the one hand, a large body of digitised texts allows Humanities to use quantitative methods that were previously confined to the Social Sciences. On the other hand, a linguistic turn within the Social Sciences, makes room for new types of discourse and conversation analysis. Media Studies, which connect the Social Sciences and Humanities, are an eloquent example of that evolution. In particular, the scientific study of the web, which has become an integrated part of society, culture, business, and politics, is a burgeoning field of research activity, with enormous potential for contributing to societal challenges related to the evolution of communication, solidarity or security issues.
EXTERNAL INTERDISCIPLINARITY. The increase of the interaction between SSH and other sciences is one of the most salient features of the recent period. There is now a more acute perception that many causal chains that are the object of natural sciences have their determinants in human action and behaviour. To cite just one example, the extraction of oil from bituminous sands and shales in Canada is expected to move every year more than two times the total mass of annual river sediments in the whole world. While the environmental impact of such extraction can be estimated by natural sciences, it requires the social sciences to analyse and understand the decision-making processes that lead to or can avoid such massive changes in the environment. This change has been accentuated by recent developments in the way of managing science. Horizon 2020, which is not structured by disciplinary fields, but by societal challenges – e.g. health and well-being, climate changes – is the paradigmatic example of this transformation of the science system in Europe. This new approach poses the question of hybrid infrastructures, aggregating data arising from different domains or, alternatively, new forms of collaboration and interchange between existing infrastructures. A good example of this hybridization is provided by the ESFRI Project E-RIHS (European Research Infrastructure for Heritage Science) which combines material science methods with interpretative schemes of history of art to rejuvenate the field of heritage studies.