We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents.
- Authors
Senger, Stefan; Bartek, Luca; Papadatos, George; Gaulton, Anna
- Abstract
Background: First public disclosure of new chemical entities often takes place in patents, which makes them an important source of information. However, with an ever increasing number of patent applications, manual processing and curation on such a large scale becomes even more challenging. An alternative approach better suited for this large corpus of documents is the automated extraction of chemical structures. A number of patent chemistry databases generated by using the latter approach are now available but little is known that can help to manage expectations when using them. This study aims to address this by comparing two such freely available sources, SureChEMBL and IBM SIIP (IBM Strategic Intellectual Property Insight Platform), with manually curated commercial databases. Results: When looking at the percentage of chemical structures successfully extracted from a set of patents, using SciFinder as our reference, 59 and 51 % were also found in our comparison in SureChEMBL and IBM SIIP, respectively. When performing this comparison with compounds as starting point, i.e. establishing if for a list of compounds the databases provide the links between chemical structures and patents they appear in, we obtained similar results. SureChEMBL and IBM SIIP found 62 and 59 %, respectively, of the compound-patent pairs obtained from Reaxys.
- Subjects
PATENTS; EXTRACTION (Chemistry); CHEMICAL structure; CHEMISTRY databases; PATENT applications
- Publication
Journal of Cheminformatics, 2015, Vol 7, Issue 1, p1
- ISSN
1758-2946
- Publication type
Article
- DOI
10.1186/s13321-015-0097-z