Email: | kevinopix@gmail.com |
Phone: | 5074058760 |
Location: | Rochester, MN, U.S.A |
Master of Science (MSc) in Data Science
Bachelor of Science (BSc) in Data Science
Conducted a comprehensive clinical study on elderly outpatient polypharmacy trends, using Python(PySpark) to analyze prevalence, adverse outcomes, perform survival analysis and investigate correlations with patient demographics.
View StudyLanguages: | Python, R, SQL, HTML, CSS, JavaScript |
Frameworks/Libraries: | Django, PySpark, TensorFlow, Keras, Pandas, NumPy, Scikit-learn, Matplotlib |
Medical Data: | IQVIA PharMetrics, MarketScan, MIMIC-IV, EHR Systems, Claims Data, Cohort Creation, Survival Analysis, Medication Adherence |
Tools: | Power BI, Excel, Tableau, Spotfire, CRON, Docker, CMD, GitHub |
Databases: | MS SQL Server, Snowflake, PostgreSQL, SQLite, MongoDB, Amazon Redshift |
Platforms: | JupyterHub, PyCharm, Visual Studio Code, Google Cloud Platform (GCP), AWS, JIRA, Confluence, Databricks |
Soft Skills: | Data-Driven Decision-Making, Stakeholder Engagement, Cross-Functional Collaboration, Communication, Process Documentation |
Validate Clinical Study Report (CSR) numbers and plots derived from IQVIA PharMetrics and MarketScan datasets, ensuring accuracy and compliance with pharmaceutical standards. Developed SQL and Python scripts for data extraction, cohort creation, and advanced analytics to support pharmaceutical research. Leveraged PySpark, scikit-learn, and JupyterHub to build machine learning models predicting medication adherence and persistence. Utilized Databricks and Spotfire for scheduled job creation, monitoring, and troubleshooting, ensuring seamless platform transitions. Employed SAS programming to streamline workflows, maintain data integrity, and analyze critical clinical metrics. Key projects included analyzing adverse event and laboratory data provided in SAS format for de-identified machine learning model validation aligned with the CDISC SDTM framework, as well as studying geographic atrophy (GA) and falls among the elderly to uncover patterns for improved healthcare outcomes and preventative measures.
Created Master Data Models (MDMs) by integrating customer-related data from varied sources and formats, using Azure Data Factory, into Snowflake, enabling centralized and reliable data access. This process involved aggregating and standardizing datasets to ensure consistency and usability. Utilizing PySpark and Snowflake, I developed scripts to create and manage scheduled jobs, streamlining data processing and integration. These jobs were monitored using Django applications backed by PostgreSQL databases, allowing seamless tracking and troubleshooting. My role also included validating migrated data by designing PowerBI visuals to compare datasets between the original SQL Server database and Snowflake, ensuring data accuracy and integrity. These visuals provided actionable insights that informed key business decisions and enhanced reporting capabilities. Additionally, I consolidated JIRA worklog data for improved tracking and reporting and documented efficient processes for the Product Operations Team in Confluence, fostering collaboration and operational effectiveness.
Developed dynamic and interactive PowerBI dashboards to deliver actionable insights, empowering stakeholders to make data-informed decisions. Proactively monitored and analyzed key performance indicators (KPIs) using PowerBI, ensuring alignment with organizational objectives and enabling timely interventions. Spearheaded the implementation of robust data governance processes and strategies, strengthening data integrity and fostering enhanced team collaboration. Additionally, documented and streamlined data management workflows via Confluence, ensuring process transparency, scalability, and accessibility, while driving performance improvements through data-driven initiatives.
I designed and executed data modeling processes that enhanced data compartmentalization and efficiency. I developed a Django application called Switchboard, which leveraged PySpark to aggregate data from multiple sources, with CRON jobs automating daily ingestion of new data at the end of each day. The aggregated data was then connected to PowerBI, enabling comprehensive reporting and dynamic visualizations. Additionally, I migrated data aggregates from SQL Server to PostgreSQL, significantly improving database performance. I also created interactive visualizations using Django and JupyterHub, simplifying data interpretation for business teams, while meticulously documenting all data activities to ensure transparency and maintain robust process records.
Feel free to reach out, anytime, via email.