Semantic Patent Analysis

Principal Investigator: 
Dr Peter Murray-Rust

Semantic Patent analysis

Daniel Lowe is a PhD student funded by Boehringer Ingelheim developing new tools to analyse patents. We are particularly interested in Pharmaceutical patents. The initial work has focussed on developing a name to structure program, OPSIN, which will take IUPAC names and convert these to chemical structures. OPSIN is a Java package for converting (English) chemical names to structures. It is open source and freely available from SourceForge for use as either a standalone application or library.

• OPSIN combines good recall with exceptional precision and speed of execution (as compared to tested commercial offerings) and is an open source free project.
• Being open source gives the potential for OPSIN to be extended by interested members of the community. OPSIN’s fragment dictionaries are stored as XML and can be easily edited.
• OPSIN is currently employed as the IUPAC name resolution software in OSCAR3[3] a tool for recognising chemical names in text. It is hoped that in the future that other text mining tools will employ OPSIN for their chemical name to structure needs. http://opsin.ch.cam.ac.uk/ is available on our website to try out.
 
Future work will concentrate on entity recognition and knowledge abstraction from Patents.

 

[1]  http://sourceforge.net/projects/oscar3-chem/

[2] http://pubchem.ncbi.nlm.nih.gov/

[3] Corbett, P.; Murray-Rust, P. High-Throughput Identification of Chemistry in Life Science Texts. Lecture Notes in Computer  Science 2006, 4216, 107-118.

Summary
Date: 
Jan 2008 - Jan 2011
Research group: 
Murray-Rust group
Members: 
David Jessop
Members: 
Dr Lezan Hawizy
Members: 
Mr Daniel Lowe