Availability of public sector Information
In furtherance of the objective of leveraging existing datasets paid for by public funds, a number of jurisdictions have sought to make public sector information (PSI) available to industry. Examples include geographical information, statistics and weather data.
This has been particularly important in relation to the AI industry, which has a dependency on getting access to sufficiently large datasets.
The EU Commission takes the view that, where PSI is made available, it should be shared for re-use by all in the EU, in a borderless manner, in order to promote the digital economy and innovation. To that end, it implemented the PSI Directive in 2003 (it was revised in 2013).
Significant aspects of the amended PSI Directive are as follows:
- Information held by public sector bodies (but not educational or research establishments) must be made available for re-use, commercially or non-commercially. The information provided must be, where possible, in open and machine-readable format together with metadata.
- Availability of information is, however, subject to exceptions, such as third party intellectual property rights (as noted elsewhere, not all information / data is protectable by intellectual property rights), confidentiality, exclusive arrangements and protection of personal data.
- Information may be made available under a standard license, but charges must be limited to marginal costs, unless it is justifiable to charge a higher sum, and conditions must not be restrictive.
- Member states must facilitate the search for documents available for re-use, such as providing asset lists of main documents with relevant metadata (for example, through portal sites).
The European Open Data Portal contains examples of businesses that have used European PSI. Despite such examples of use, the PSI Directive has been subject to criticisms:
- Requests for information have been turned down too readily on the basis of exceptions.
- It has not given access to data most wanted, because of a lack of real-time data, restrictive licensing approaches and a lack of information belonging to publicly funded research organizations.
In light of such problems, the EU has since legislated for an Open Data Directive. It will be implemented by July 2021, and is intended to bring about better availability of publicly-held information as part of a package of measures aiming to facilitate the creation of a “Common Data Space.” The EU Common Data Space is intended to facilitate the exchange of data between data providers and users.
EU Open Data Directive
The Open Data Directive has similar provisions to the amended PSI Directive (e.g. the same exceptions apply), but it extends the scope of availability for re-use of data in the following ways:
- Information available is to be extended to that held by certain public undertakings (where the re-use has been allowed) and publicly-funded research.
- The re-use of documents shall be generally free of charge (although exceptions apply).
- Where possible, dynamic data should be made available for re-use immediately after collection, via suitable APIs and, where relevant, as a bulk download.
- Publicly-funded research data is to be made available in line with the principle “as open as possible, as closed as necessary.” The Directive encourages sharing of such data, which can be made available through institutional or subject-based repositories and eventually via the European Open Science Cloud.
- Certain high-quality public sector data will be made available for re-use in the future, primarily for free in accordance with open standards licenses (although it is unclear which open standards licenses are endorsed). The thematic scope of high value datasets is broad:
- Earth observation and environment.
- companies and company ownership.
These datasets are defined as documents whose re-use is associated with important benefits for the society and economy. All publicly-held data is encouraged to be made available with regard to the FAIR principle (that is, findable, accessible, interoperable and re-usable) as much as possible.
There are already available private company-led industrial spaces facilitating secure data exchange among different organizations based on common standards, enabling easy integration of data from different sources within an agreed data governance framework. The EU Common Data Space is likely to be similar to these systems.
There are also EU initiatives to stimulate business-to-government (B2G) sharing of data to boost improvement to policy making and public services, which may increase the available range of information for re-use.
The UK has implemented the PSI Directive (as amended) but does not plan to implement the EU’s Open Data Directive. The Open Government Licence Framework is commonly used to facilitate the re-use of PSI. The requestor can procure the information by making a request under the Freedom of Information Act 2000.
In 2019 the United States enacted the Open, Public, Electronic, and Necessary Government Data Act, which specifically makes most government “mass data” open to the public. As of the date of publication, the US government has made more than 200,000 datasets publicly available at www.data.gov.
Canada has adopted the international Open Data Charter and has created an Open Data Portal for disseminating PSI.
Regionally, open data directives and portals also exist in eight provinces and one territory. For example, the Province of Ontario’s Open Data Directive requires all government data to be made public and sets out key principles and requirements for publishing open data.
China does not have PSI laws for the time being.
Singapore does not have any legislation which provides for the sharing of PSI with the public. However, the Singaporean government has introduced a number of initiatives to share PSI with the public. These include:
- Singstat, an online repository of statistical data concerning the Singaporean economy and population, collected by public sector agencies, and which contains over 27,000 data series.
- Data.gov.sg, an online portal for datasets from public agencies containing more than 1,600 high quality datasets and 14 real-time APIs.
The Public Sector (Governance) Act 2018 sets out a framework for the sharing of PSI between various Singaporean public sector agencies. There are 61 public sector agencies in total, which are categorised into three main groups: (1) agencies that fulfil public functions; (2) professional boards established to self-regulate members of their respective professions; and (3) bodies whose main function is to represent particular community interests or the volunteer movement.
In Australia the Commonwealth government has committed to optimising the use and re-use of PSI, as noted in the Digital Transformation Agency Open Data web page.
The Australian government had proposed introducing a Data Availability and Transparency Act as part of a general push to greater sharing and use of government datasets, and to establish safeguards in respect of that sharing and use. However, as of the date of publication this has been delayed.
Availability of sector-specific data as a result of market failure
Although antitrust / competition law data issues are dealt with more generally elsewhere (see Antitrust / Competition Law Data Issues), this part deals with the specific question as to whether there are circumstances in which antitrust / competition law, or sector-specific rules, will apply to provide access to a business’s data on a market failure.
Although in principle data access should not be made compulsory (having regard to the legitimate interests of data holders), where there are market failures which antitrust / competition law cannot solve, the EU Commission has legislated for a sector-specific data access right under fair, transparent, reasonable, proportionate and/or non-discriminatory conditions.
The sectors where this applies include automotive, payment services, smart metering information, electricity network data and intelligent transport systems.
Automotives case study
The Directive for Automotives stipulates that original equipment manufacturers (OEMs) must make available necessary information to enable independent firms to provide aftermarket services.
The information may be provided on a fair, reasonable, and non-discriminatory (FRAND) basis.
The Directive has been amended recently to account for the fact that modern cars rely on increasingly critical software and telematics, and that new methods or techniques for vehicle diagnostics and repair (such as remote access to vehicle information) and software have become available.
In future, the European Commission may also be able to use a proposed new ex ante competition “tool” in circumstances like these. The proposed tool is based on the United Kingdom’s market investigation regime, which currently allows the United Kingdom’s Competition and Markets Authority to investigate markets that are not working well and to impose remedies, including access to information and data. For more information, see EU Commission Launches Consultations on Ex Ante Antitrust Tool and Platform Regulation.
Australia’s competition regulator, the Australian Competition and Consumer Commission (ACCC), has long taken the view that data must not be used in an anti-competitive manner, such as through collusive arrangements or through the abuse of a dominant market position. That is, Australia’s competition laws apply to conduct in respect of data in the same way as other conduct.
The ACCC has established a Data Analytics Unit which has been deployed in a number of market studies undertaken by the ACCC, as well as to support the work of ACCC investigations teams and economists.
There is no point in being able to transfer data if it cannot be re-used easily. Data can come in all types of formats (for example, in word, PDF, CSV, etc.). Datasets may not be in a format readily readable by machines, and some may lack metadata, reducing the usability of the data file.
Such problems in interoperability (that is, the ability to transfer data meaningfully between different IT platforms) make it difficult to share, aggregate or combine data and impede re-use.
Complex supply chains, service delivery models, and the possibility of providing complementary products and services may give rise to the need for businesses to share data with a variety of stakeholders. Specific interoperability requirements should be discussed with them so that data can be easily shared, combined and analyzed.
The EU has issued a Rolling Plan on ICT Standardisation to support the development of standardized and compatible formats and protocols for gathering and processing data from different across sectors and vertical markets.
Text and data mining
Uptake of Big Data analytics is supported by the emergence of exceptional technologies, such as data mining, which enable the gathering of large quantities of data. Accordingly, the technique is often used for collecting data for training and developing AI. Text and data mining, sometimes referred to as “data scraping,” has been the source of a number of intellectual property disputes in recent years in several jurisdictions (particularly in relation to data scraped from websites, such as flight schedules), reflecting the value of the data at issue.
In the EU, text and data mining is defined as “any automated analytical technique aimed at analyzing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations” (Article 2, Directive on Copyright in the Digital Single Market).
In certain instances, text and data mining may involve acts protected by copyright and / or by the sui generis database right (notably the reproduction of works or other subject-matter and / or the extraction of contents from a database.)
However, the Directive on Copyright in the Digital Single Market includes provisions that enable research organizations to undertake such text and data mining of works protected by copyright or a database right without there being an infringement, provided: (1) they are carrying out scientific research (public-private partnerships can also benefit if the research organization is not under the control of a commercial undertaking); and (2) they have lawful access to the material.
The exception does not apply to works protected under the law of confidence. The Directive is to be implemented by member states by June 2021.
Data rights-holders retain some level of control, as they can still control access to their data, although once they have given research organizations access to the data, they cannot disapply the exception by contract, nor obstruct text and data mining by implementing technical measures for that purpose (they may do so to ensure the security and integrity of the networks and databases).
Under the Directive commercial entities can text and data mine content they have lawful access to, unless rights-holders explicitly reserve their rights in an appropriate, machine-readable manner.
Uses for the purpose of scientific research, other than text and data mining, should remain covered by any exceptions provided for in the InfoSoc Directive.
As at the date of publication, the UK government has confirmed that the UK has no plans to implement the Directive on Copyright in the Digital Single Market.
As at the date of publication the US Supreme Court has been asked to rule on the scope of the federal law commonly used in data-scraping cases, the Computer Fraud and Abuse Act (CFAA). The case at issue is a federal appeals court case (hiQ Labs, Inc. v. LinkedIn Corp., 938 F.3d 985 (9th Cir. 2019)), permitting hiQ to scrape publicly available social media profile data from LinkedIn.
The federal appeals court held that hiQ had shown a likelihood of success on the merits in its claim that, when a computer network generally permits public access to its data, a user’s accessing of that publicly available data will not constitute access “without authorization” under the CFAA.
The US has no research purpose exceptions in relation to the legality of data mining of accessible data in its laws, meaning that data mining is not prohibited insofar as it falls within the fair use provision.
Similar to the US, when copyright is recognized as subsisting in data, data mining is not prohibited by the Canadian Copyright Act insofar as it falls within a fair dealing exception, such as being for the purpose of research, private study, or education and being otherwise fair under the circumstances.
PRC laws have no specific rules relating to text and data mining.
Singapore's Copyright Act is, at the date of publication, being reviewed with a view to implementing several significant proposed amendments, including the creation of a new exception in the Copyright Act to allow the copying of copyright works for the purposes of data analysis.
The exception is expressly targeted at text and data mining, which is broadly understood to mean the use of automated techniques to analyze text, data and other content to generate insights and information that may not have been possible to obtain through manual effort.