The Oscar Database - Accurate Government and Public Sector Data

Overview, Coverage and Background

Key facts:

1 Database
423 Tables
51 Million Rows
360 Million Data Points
Over 35 years of history

Started in 1984 by two Local Government Officers, our database remains the cornerstone of our business.

The brief is for the Oscar database to record all “Public Service Organisations” - which for us means all Public and Statutory bodies as well as those providing services to the Public that are partly or fully funded directly or indirectly by Government. You can see a complete breakdown of the types of organisations covered by us here.

This ‘backbone’ of organisational coverage then gives us the platform to link and record a host of proprietary and open contact, structural, post holder, demographic and metric data points which feed into a range of data driven products and services to ultimately to achieve our mission statement: “To Deliver Public Sector Intelligence and Engagement”.

The Director started work as a researcher for the original owners of the business in the mid- Nineties before taking the business on and renaming it as Oscar in 1999. The ethos of and focus on data quality has been at the centre of all we do since then.

Curating our database represents a significant annual investment across staff and systems. In fact, we have 4 times more research than sales staff. Reflecting our belief that nothing we do is more important than ensuring our data is complete and accurate.

We have been developing our proprietary live database management software for over 20 years and have invested heavily in its creation and evolution. Based on our own UK servers; the system has been built from the ground up to work across our unique taxonomies and to facilitate the best possible management and delivery of data around Public Service organisations, structures, metrics and post-holders.

You can find our a bit more about Oscar here.

Research Team

Key facts:

UK Based
300 Years of collective experience
4 Specialist Teams

Our internal research team comprises 15 UK based researchers. We have 4 Department heads managing teams split across different organisation types with each member being a specialist in their particular area of the database.

We employ a full time Head of Data Development and Bespoke research who works across all datasets, monitors industry changes and works closely with the Director to establish new taxonomies, sources new data points and generally monitors the health and integrity of the database.

Many members of the research team have Public Sector backgrounds and represent a collective experience of over 300 years.The average time with Oscar across the research team is over 10 years and we are proud of an almost uniquely low level of staff change. We have had one person move on in the last decade.

It takes a long time - typically 3 years of training and experience - for a researcher to move up to what we call ‘admin’ level and make live updates to the database without the need for moderation - a feat now achieved by all but two recent appointments to the team.

You can see more about the team here.

Tech and Software

Key facts:

20 Years of Development
UK Based
Bespoke Proprietary Systems

We have been big fans of cloud based systems since the nineties. Starting with offline customised Lotus/IBM software for the research team and locally hosted servers using our own (award winning) store and forward tech for data delivery, we made the jump to integrated and fully online data management, storage and delivery in 2006.

The brief was simple; create a single live nexus for all data. We have achieved this through hosting a single database on our own web servers and building live cloud based custom software and systems around this to manage all incoming and outgoing dataflows.

Incoming:

Research Admin Interface - Management, scheduling, reporting and updating environment.

Customer Admin Interface - Subscription and product access management and CRM.

Websites - Data subjects and customer updates.

Data feeds - Scheduled incoming open, regulatory and other datasets.

Outgoing:

Websites - Display and Interrogation

Direct Delivery - Downloads and Live Feeds

Our Tech Stack

OS, Database and Languages:

- Linux

- Apache

- MySQL

- PHP

- SQL

- Python

Frontend:

- Javascript

- AJAX

- HTML

- CSS

- Bootstrap

- jQuery

Data Delivery:

- REST API

Other Software:

- Google Analytics

- Google Data Studio

- Google Drive

- Paypal

- Xero

- Insightly

Maintenance and Updating - Methodology and Integrity

Key facts:

30,000 hours of manual research time per year
140,000 monthly emails
82 Automated Daily checks and processes

There are three key elements to maintaining our database:

1 - Direct Research via our Internal Team

2 - Monthly Newsletter to, liaison with and updates from data subjects

3 - Using Published and Open datasets for benchmarking and additional intelligence

General Daily Checks and Balances

We have built our own database health check and scheduling software to monitor record age, updates per team member, the flagging of future events, erroneous data points and other higher level checks which our research heads use to trigger updates and changes on a daily basis.

A monthly newsletter is sent to all postholders/data subjects. Any undeliverable emails or responses that mention changes are passed to the research team for revalidation and update.

Outside of these general checks our processes and methodologies for maintaining and updating the database can be broken down into the following three parts:

1 - Organisational Checking and Updates:

Primary Scope:

90,000 Public Service Organisations

120,000 Organisational Sites including HQ sites

300 Organisation Types

Organisational Hierarchy - Parent, Child and Sibling Entity relationships

Method:

The starting point for each organisation type is ensuring as close to total coverage as possible. We achieve this through a combination of checks made during our ongoing research cycle (more details on this under the post holder section) through postal, telephone and email requests directly to the subject organisations requesting confirmation of primary entity data: name, main address, phone number and email as well as in larger organisations our primary organisational metrics: annual spend and number of employees.

We record a wide range of official ID numbers linked to our own unique Oscar number and then use automated processes to compare our entity data with published Government open and regulatory datasets. This benchmarking process helps to ensure coverage and identification and recording of organisational relationships and key metrics.

Where differences in coverage and primary contact data are detected these records are passed to the relevant research team to be checked directly to ensure the best possible core entity data. More information on these processes for specific/notable types of organisation can be seen below.

Where we discover ‘interesting’ things about organisations during the direct research process, we record this in the organisational notes field.

Frequency:

Ongoing 6 monthly research-cycle.

Live, daily, weekly and monthly scheduled processes around Open datasets.

Ongoing updates across known areas of change.

Additional Triggers for review and update:

Monitoring the press and published content for upcoming changes

Reports of changes from customers, data subjects and users

We provide services to a number of Local Authorities, Government Departments and Universities who also kindly provide us with the details of upcoming organisational changes and mergers they are privy to.

2 - Posts and Post Holder Checking and Updating:

Primary Scope:

200,000 Posts

700 Functional Categories

9 Management Levels

Method:

Updating post holder data is always done through either direct contact with or information published by the employing organisation. No third party sources are used for sourcing, validating or updating our post holder data.

We have two primary methods depending on the size of the organisation:

Larger Organisations (>100 staff) - Over the past thirty years we have developed relationships with an extensive network of people -‘nominated research contacts’- who are either departmental or sit centrally within their organisations. We liaise with these contacts via their preferred method of email, post, telephone or reviewing published material to update the post holder data we hold.

Smaller Organisations (<100 staff) - Primarily telephone research is used to validate or update the data held.

As well as validating the contact and post information, we are updating and reviewing the functional responsibilities, levels of management and reporting structure of each post at the same time. One of the most time consuming elements of updating posts is researching and categorizing posts specifically by job function and responsibility; in each case we analyze the structure of the organisation, refer to published organisation charts and often ask the organization directly which posts fulfill a particular function. We then link our proprietary standardised functional category(ies - it’s very common for posts to carry more than one responsibility) to that post. More information about the use and benefits of our standardised functional categories can be seen in the 'Unique Standardisation and Categorisation' section below. In each case, a post is also given a level of management (based on the specific reporting structure within the organisation), in larger organisations we link posts to their immediate superior. These relationships between posts is used to drive our dynamic organograms which can be seen on our Saas platform and these are also used to sense-check and visualise our research by the team.

Frequency:

Apart from special cases (more on those in part 3 below) we maintain a 6 month rolling cycle through our post holder data, yielding an average record age of 3 months.

Additional Triggers for review and update:

Monitoring the press and published content for upcoming changes

Reports of changes from customers, data subjects and users

We provide services to a number of Local Authorities, Government Departments and Universities who also kindly provide us with the details of current or future structural or postholder changes they are privy to.

3 - Additional Sector Specific Checks and Processes:

Our approach to research and database maintenance is slightly different for each part of the database to reflect the nuances and specific considerations of each part of Public Services or indeed of that particular dataset. With that in mind, the following areas of the database are notable for their additional specific methods and frequencies of update.

These additional processes can be in addition to or in place of our general update procedures mentioned above:

Schools

The majority of changes to Head Teachers occur at term start dates (Easter, Post Summer or Christmas). We monitor press, school websites and other sources, where a change is published or known to be taking place, we flag those Schools and work through, validate and apply those changes in batches at the appropriate time.

We consume a feed Published by the Government of all Educational Establishments on a monthly basis. We maintain a linkage between our own School IDs and the official Government IDs and using some custom software we have built internally, this dataset is used to monitor a number of changes which are then flagged for the research team to investigate and make updates where needed, these changes include:

- Organisational Coverage and Closures

- New School Openings

- Academy Conversions and MAT Membership

- Pupil Numbers and Service Types

Elected Representatives

As might be expected the majority of changes occur post election. In the case of National elections (Westminster and Devolved Governments) these typically take place every 5 years and we focus our updating work on the evening of and day following these elections to ensure the necessary changes have been made. We also monitor press sources for changes in role (particularly cabinet reshuffles) as well as resignations and those moving Party and make the changes to the live file as soon as possible.

Local (Councillor) elections are annual (usually in early May). As part of our contractual obligations to Government, our Local Government team ensures that all changes of elected representative and re-election are recorded by the end of May. This includes the name and party of those representatives, we also include the date of next election and for those newly elected, we add the current year as the ‘year of election’. The team then spends the next approximately 2 months building out the leadership, cabinet and committee responsibilities and structures and linking all the relevant categories to the new and re-elected post-holders.

Once the ‘election councils’ are completed, the rest of the year is spent focusing on the ‘non-election’ councils and re-confirming the elected post holders and responsibilities we hold across these.

Outside of this structure there are 2 further primary sources of change:

- Boundary Changes - these happen when Constituencies (rarely), Wards or Electoral divisions change - they may be made bigger, merged with others or new ones may be created (usually reflecting population changes in areas) . We monitor for changes (more on this in the ‘Geo Data Points’ section below), but where changes occur we ensure post-holders are connected to the right areas and that all wards or divisions have the correct representatives recorded.

- By-Elections - as with MP’s and devolved members, we monitor press sources and often are alerted to forthcoming changes through our work in Government to changes in role, resignations and those moving Political Party and make the changes to the live file as soon as possible.

Care Homes

We pull in a copy of the regulatory CQC location and provider file every quarter. From which we run a series of 6 daily automated checks around coverage, location, providers and bed numbers. Where differences exist between the regulatory and Oscar data, this is flagged up for review and update by the research team.

We also look at the published provider (group) data and compare and benchmark against our own data. Looking to create the right hybridisation of actual groups vs regional and other registered CQC registered providers.

Housing Associations

We collect annually published data from the Housing Regulator around Local Authority stock volumes, PRP Group structures, housing types and stock numbers, both total and stock volumes and types across Local Authority areas. This data is loaded into a series of processes to compare to our existing organisational coverage and structures as well as the basis for comparing and benchmarking our own stock data (total and by area) and then a series of managed updates are applied to the underlying data.

Tenders and Contract Awards

Each day we collect the Published Tender requests and Contract Awards from across Government. This data is then passed through a number of automated data matching processes, central to which is ensuring the buyer name is matched to the Oscar universe and, in the case of Contract awards, the Supplier is linked to our standardised Supplier database. Where no previously known matches exist and a direct match isn’t possible, the buyer and supplier names are then passed to the research team for investigation and manual linkage.

Geo Data Points

We pay for and consume a number of open postcode and geo point datasets including; the ONS file, NHS postcode dataset and a specialist file containing county electoral divisions and devolved government constituencies. We have 17 different geo data checks we run weekly across the database to support the political aspect of the database and in particular the e-lobbying and online Saas services.

To ensure the correct linkage between elected representative, geo code and area name is is place we check and monitor the following area and representative types:

- Constituency - MPs

- Devolved Constituency - Devolved Members

- Devolved Regions - Devolved Members

- Wards - Councillors

- Electoral Divisions - Councillors

- District/County Councils Leaders - Councillors

Statutory Post Checking

We run a series of weekly checks across the database to ensure coverage of key and statutory posts, these include:

- Chief Executives

- Section 151 Finance Posts

- Monitoring Officers

- Public Health Directors

General Anomaly Checking

We have developed a series of live reports across the database that researchers check weekly looking at specific fields and data points that may need correcting, these include:

- Email address formatting anomalies

- Post Holder Name formatting anomalies

- Organisations with a deleted linked parent

- Posts with missing functions

- Posts linked to deleted parents

- Organisations with no Main Address/HQ listed or linked

- Key fields with unwanted spaces at start or end of string

Unique Standardisation and Categorisation

Key facts:

Unique OSCAR ID’s
800 Org Type Combinations
700 Functional categories
8 Management Levels

One of the most valuable aspects of what we do is around standardisation. To make the data usable when combining all organisation and post types from across Public Services it’s necessary to develop and use consistent categories and values.

To that end, we have developed and built our own ID’s and categories to ensure integrity and selectability:

OSCAR ORGURNs

Every organisation listed by us has a unique Oscar number. This is stored for the lifetime of that entity and is then archived and not reused.

OSCAR CONTACTURNs

Every post holder listed by us has a unique Oscar number. This is stored for so long as they are listed at that organisation. Upon leaving the contact record is archived and the contacturn not reused.

Oscar Org Types

We use 3 levels of organisational type categorisation:

Level 1 - Broad Area of Public Services (eg Local Government)

Level 2 - More specific Organisation Type (eg Local Authority)

Level 3 - Further level of organisation type where needed (eg English County Council).

Oscar Functional Categories

We have 700 standardised functional categories which we link with posts. It is common for a post to carry multiple functions. The categories are grouped into 36 functional areas to help with navigation and selectability.

You can see our full list of categories with descriptions here.

Benefits of our standardardised categories can be seen here.

Oscar Seniority Categories

For Chairs and senior executive/management posts we use a set of categories to link comparable levels of management within an organisation. These management level categories are critical for both ‘horizontal’ selection of posts, an aid to the identification of departments and also as an important data point in the generation of our organograms.

The Seniority Categories we use are as follows:

- Chairman - Most senior Non Exec Board Member

- Chief Officer - The most senior Executive/Employed Post

- Deputy/Assistant Chief Officer - Some organisations have specific Deputy Chief Officer posts

- Clerk/Secretary - Most senior Administrative contact. Specific and in some cases statutory posts

- Top Level Director - Senior Post reporting directly to the Chief Officer

- Second Level / Head of Service - Assistant Director or Head of Service that reports to a Director or equivalent

- Third Tier/Manager - Heads of Service or Managers that report directly to 'Second Tier' employees

In all other cases, posts are categorised as ‘unspecified’ - this is particularly the case with elected representatives, those outside of a conventional management structure and also lower management posts which are included specifically for their functions rather than management level.

Accuracy and Analysis

Key facts:

97% Accuracy
8 Date Fields
Internal Reporting and Tracking
Live Update Counter
500,000 Annual Post Holder and Entity Updates

The accuracy of our database is fundamental to the business and an inherent part of our commitment to users. Users should expect an accuracy rate of > 97% across core post and entity data points. This is tested monthly through our own interactions with the database and our monthly e-newsletters.

Following the processes detailed in the various methodologies above we maintain the live database with a continuous 6 monthly cycle. This creates a current average post holder record age of just 3 months.

Record age is one of the key indicators for us and to help us (and users) monitor the accuracy of the database we have a number of time/date fields recorded across the database:

The following fields are maintained separately for both Organisations and Posts:

- Date Added

- Date Last Changed

- Date Last Confirmed

- Date Deleted

Using these fields and in conjunction with our proprietary organisation types linked to the relevant team members enables us to efficiently and effectively monitor each area of the database. We have produced an internal live report shared across the Research department heads analysing the age profile of each dataset. This is reviewed weekly and any outliers or issues are passed to the relevant team members for investigation and updating.

Changes are another helpful measure of database health, we group these up for users into three categories:

- New - Where a post or organisational record has been added.

- Changed - Where a post or organisational record has been changed in some way.

- Deleted - Where a post or organisational record has been removed from circulation and archived.

A complete analysis of changes made to the postholder records over the last 90 days from today is shown below:

This table refreshes daily at midnight.

Live, Compliant and Transparent

Key facts:

EU/UK GDPR Compliant
PECR Compliant
2 billion downloads and updates
Email Opt-Out Management
Daily TPS Screening

One of the fundamentals here is ‘live data’ - this means that whether browsing online or consuming data through our feeds or downloads - whenever something is viewed or accessed, it is the latest version of our data and all users have real time access to the work done by the team to ensure the full benefit from the work done on the Oscar database.

For those consuming via downloads or feeds this means taking updates at least every 30 days and ideally more frequently. We have a number of data access methods to make this as easy as possible. You can find our more about how consume the latest data from us here.

This is important for two primary reasons:

1 - Accuracy - always viewing and using the most accurate version of our data.

2 - Compliance - ensuring the latest permissions are reflected.

For more on both our general terms of usage can be seen here.

Coverage of Personal Data

The extent of personal data we cover is:

- The name of the holder of an official post or position

- Any Elements of Corporate Subscriber Email addresses that identify an individual.

We can only list and/or provide access to these personal data points where it has been given to us and we have disclosed our usage and purpose, or where, subject to additional checks, it is published by the employing entity.

All individuals for which we hold personal data (‘data subjects’) have real time access to their own control panel which includes data updating and an individual preference centre. The preference centre gives data subjects full control over their personal data points (names and corporate emails). As well as invaluable notifications of updates and changes from users, changes made via the preference centre includes:

- Which personal data points are recorded and where they are available

- A list of all users and subscribers that have access to the data subjects data with the option to switch off access to all or specific users.

Additional Compliance

We provide our privacy policy and data usage statement in the form of a Data Transparency document to data subjects and employing entities. This is shared either at the point of research and update or within 30 days thereafter. A copy can be seen here.

The legal status of all organisations is recorded to distinguish between:

- Corporate - Public and Incorporated Entities

- Non Corporate - Private Partnerships and Sole Traders

In the case of Non Corporate organisations, all postholder, direct and named emails are removed from access by anyone to ensure compliance with PECR/GDPR rules.

Each data subscriber has access to our customer platform that includes email opt-out management to help with data management and compliance. Subscribers can upload their own suppression lists and all matching records will be - as chosen - flagged or removed in subsequent downloads and feeds.

We host a full Telephone Preference File (TPS) and run a daily screen across the entire Oscar database for matches across with the TPS or Corporate TPS schemes. For those using the telephone numbers, we can include flags to identify any phone numbers that are listed. You can find out more about screening here.

You Matter

We like to see all the users of our data and systems as a community. As such - and as referenced in various sections above - although the vast majority of our database maintenance is proactive, there is always a valuable reactive element. We value and encourage any feedback, notifications of inaccuracies or details of forthcoming changes and will always pass these to the appropriate research team member for review, validation and update. The advantage of our live data and delivery platforms is that these changes are often seen and consumed within hours. The net result of having feedback from countless data subjects, public bodies and commercial users is a better dataset for all.

The Unique OSCAR Database

LIVE DATA UPDATING

Overview, Coverage and Background

Research Team

Tech and Software

Maintenance and Updating - Methodology and Integrity

Unique Standardisation and Categorisation

Accuracy and Analysis

Live, Compliant and Transparent

You Matter