NIF LinkOut Portal

Options
Only Pubmed Central
Include Pubmed Central
Sections
Title
Abstract
Introduction
Methods
Results
Supplement
Appendix
Contributions
Background
Commentary
Funding
Limitations
Caption
FILTERS

A novel method for mining highly imbalanced high-throughput screening data in PubChem.

Authors:
Li Q, Wang Y, Bryant SH
Affiliation:
Journal:
Bioinformatics (Oxford, England)

Abstract

MOTIVATION: The comprehensive information of small molecules and their biological activities in PubChem brings great opportunities for academic researchers. However, mining high-throughput screening (HTS) assay data remains a great challenge given the very large data volume and the highly imbalanced nature with only small number of active compounds compared to inactive compounds. Therefore, there is currently a need for better strategies to work with HTS assay data. Moreover, as luciferase-based HTS technology is frequently exploited in the assays deposited in PubChem, constructing a computational model to distinguish and filter out potential interference compounds for these assays is another motivation. RESULTS: We used the granular support vector machines (SVMs) repetitive under sampling method (GSVM-RU) to construct an SVM from luciferase inhibition bioassay data that the imbalance ratio of active/inactive is high (1/377). The best model recognized the active and inactive compounds at the accuracies of 86.60% and 88.89 with a total accuracy of 87.74%, by cross-validation test and blind test. These results demonstrate the robustness of the model in handling the intrinsic imbalance problem in HTS data and it can be used as a virtual screening tool to identify potential interference compounds in luciferase-based HTS experiments. Additionally, this method has also proved computationally efficient by greatly reducing the computational cost and can be easily adopted in the analysis of HTS data for other biological systems. AVAILABILITY: Data are publicly available in PubChem with AIDs of 773, 1006 and 1379. CONTACT: ywang@ncbi.nlm.nih.gov; bryant@ncbi.nlm.nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  1. Welcome

    Welcome to NIF. Explore available research resources: data, tools and materials, from across the web

  2. Community Resources

    Search for resources specially selected for NIF community

  3. More Resources

    Search across hundreds of additional biomedical databases

  4. Literature

    Search Pub Med abstracts and full text from PubMed Central

  5. Insert your Query

    Enter your search terms here and hit return. Search results for the selected tab will be returned.

  6. Join the Community

    Click here to login or register and join this community.

  7. Categories

    Narrow your search by selecting a category. For additional help in searching, view our tutorials.

  8. Query Info

    Displays the total number of search results. Provides additional information on search terms, e.g., automated query expansions, and any included categories or facets. Expansions, filters and facets can be removed by clicking on the X. Clicking on the + restores them.

  9. Search Results

    Displays individual records and a brief description. Click on the icons below each record to explore additional display options.

X