Project Start
Project Introduction

Work Content: Leading the construction of the company's big data platform products, covering data collection, master data management, data warehouse, data services, and labels.

Data Collection:

  • Spider: Completed external data collection and cleaning of medical POIs, hospitals, doctors, medical devices, medical literature, medical conferences, clinical research, national natural projects, online consultations, patient evaluations, etc. Established a system for intelligent extraction of unstructured data from conference posters.
  • Track: Used open-source SDK for full tracking, defined tracking events and attributes for conferences and activities with the business side.

Master Data Management (MDM): Established the 100Doc master data management system, mainly supporting manual editing and review of hospitals, doctors, drugs, conference posters, and data dictionaries. Innovatively integrated MDM's manual review and editing mechanism with the DW's DIM layer.


Data Warehouse (DW):

  • Adopted dimensional modeling paradigm; completed data warehouse domain planning and cooperatively established data warehouse layering standards with the technical team.
  • Provided OneID product logic for "individuals" and "institutions", adopted graph computation for person ID-Mapping, used business rules & NLP models for unique identification and association of medical institutions.
  • DIM and DMD layers: HCP (doctors, nurses, pharmacists, technicians, sales representatives, patients, social relations), HCO (organizational institutions, medical institutions, pharmaceutical companies, associations), medical knowledge (drugs, diseases, adverse reactions, pharmacological classification, dosage forms), academia (literature, conferences, clinical research, cases), traffic (online consultation, patient evaluation, questionnaire), location (administrative regions, POIs), miscellaneous (miscellaneous dimensions, date).
  • DWS layer: Corporate summaries, scholar summaries, KOL summaries, audience behavior summaries, literature summaries, conference summaries, activity summaries, online consultation summaries.


One Index: Adopted the OSM-UJM methodology combined with the AARRR model, worked with the business side to organize the business process's goals, pain points, measurement systems, and analysis logic, established an indicator system (with 58 atomic indicators and 70+ dimensions), and established some indicator summary tables in the DWS layer of the data warehouse.


One Service: Provided ID-Mapping query service for "individuals" and "institutions"; supported combined queries of data warehouse models by theme, thereby avoiding falling into customized interfaces for the business.


Profile & Tag: Processed more than 60 HCP-type tags, completed the planning and launch of the tag system, supported derived tags, combined tags, and supported population operations. Evaluated doctors' influence on health institutions, academia, patients, and pharmaceutical companies separately. Based on the medical knowledge graph, predicted the department the doctor belonged to, and mined tags related to the doctor's focus on diseases, symptoms, drugs, targets, treatment plans, operations, etc.

Overall plan
Overall plan
MDM hospital
MDM doctor
MDM drugs
MDM - Review and extract meeting information from posters
DW subjects plan
 OneID
HCO DIM
HCP DIM
Academic DWS
Miscellaneous Data
One Index
One Index