docs.dongwkim 2026

# 0\. Inbox ### 2026-02-09 Sent payment for medical polish exam. Sent application for medical polish exam in March or April. Completed search strategy, protocol update and performed search for six graft types on PubMed. ### 2026-02-10 # Capture # Daily Notes # 1\. Projects # Start Business # Business Plan # Business Funding 1. Biz2Credit (pre-approved for $5,000) 1. Send last 4 months of balance statement Send official PDF of PLN Send unofficial CSV with converted PLN to USD Send official QuickBooks with converted PLN to USD 2. Business Tax Return (non-filing since educational income is non-taxable) 2. Finance Logic (pre-approved for $15,000) 1. Send last 4 months of balance statement Send official PDF of PLN Send unofficial CSV with converted PLN to USD Send official QuickBooks with converted PLN to USD 3. Credit One (pre-approved for $3,000) 1. Reapply to send mail again to iPostal1 2. Social Security Card to verify identity for Credit One \+1 800-772-1213 call between 14:00 and 1:00 to change phone number, address, and ask about SSN card to complete Credit One pre-approval for $3,000. ### Start-Up Costs Strategy Use business credit to purchase office supplies (e.g., Macbook Pro) that have profit margins between B2B (tax-free) U.S. purchases and Poland sales using a sole proprietorship (+23% VAT). Then calculate the number of items you need to sell in order to have one for free. **Competitive advantage** is AppleCare+ included with purchase that is not available for non-US sellers in Poland. **Example 1** Macbook Pro in U.S. 12,000zł and in Poland 18,800zł (6,800zł profit margin means buy 3 in U.S., export to Poland using Polonez and sell 2 using a sole proprietorship for Poland or Estonia. Use starting tax deduction program to take advantage of VAT and duty-free import for laptops. 36,000zł purchase 58,400zł sales 15,000 **2024** 33490 purchase 3x inventory 37998 sales 2x inventory in Poland 4508 zł profit margin \+ free Macbook Pro **2026** (sell multiples to maximize profit margin) 6x \= 4508 x 3 \+ free macbook, etc. # Addresses **Primary credit bureau** (source of address verification) Chase Bank \- Experian | | ChexSystem | Experian | TransUnion | Equifax | | :----------- | :--------- | :----------------------------------- | :--------- | :------ | | Name | | Dong Woon Kim | | | | Address | | 25 CRESTWOOD LN Ronkonkoma, NY 11779 | | | | Phone number | | (631) 316-8517 | | | | Email | | [email protected] | | | # Letter of Recommendation Konrad Malinowski, MD PhD Hey Konrad, here is a bullet point list of things I’d like for you to touch on for the letter of recommendation that is required for some job applications, etc. I also added screenshots to remind and help you with the letter. * Finishing projects in a very fast time period * Meniscus project (Feb 23 → Mar 3\) * # Bayesian Network MA | | Systematic Literature Review | Date | | :--- | :----------------------------------------------------------------------------- | :--- | | DONE | Protocol (CRD42024592549) | | | DONE | Search strategy | | | | Literature search | | | DONE | PubMed | | | | Embase | | | | Web of Science Core Collection | | | | Scopus | | | | Deduplication | | | | | | | | | | | | Screening | | | | | | | | | | | | Data collection | | | | | | | | Data entry form | | | | Outcomes: | | | | Patient-reported outcome measures: IKDC-SKF, Lysholm, Tegner. | | | | Objective knee laxity: KT-1000/2000 instrumental laxity, pivot shift, Lachman. | | | | | | | | | | | | Database creation | | | | Data extraction | | | | Data transformation | | | | Risk of bias assessment | | | | | | | | | | | | Analyses | | | | | | | | Meta-analysis | | | | Continuous data | | | | Dichotomous data | | | | Meta-regression | | | | Plots | | | | Forest plots | | | | Lattice plots | | | | Regression plots | | | | Funnel plot | | | | | | | | | | | | Manuscript | | | | | | | | Tables and figures | | | | Methods | | | | Results | | | | Discussion | | | | Introduction | | | | Abstract | | ## ## Search strategy The search strategy was developed for all graft types separately, as well as combined (too many results). The systematic reviews and meta-analysis papers were excluded using exclusion Boolean operators. Databases searched include: * PubMed * Embase * Web of Science Core Collection * Scopus And as supplementary searches Google Scholar was used, and references. ("anterior cruciate ligament" OR "anterior cruciate ligament reconstruction" OR "ACL" OR "ACLR") AND ("peroneus longus" OR "PLT") AND ("Systematic OR "Meta-Analysis"\[pt\] OR “Review”\[pt\]) | | Bone-patellar tendon-bone | Semitendinosus-Gracilis | Peroneus Longus | Quadriceps | | :---- | :---- | :---- | :---- | :---- | | Randomized controlled trials | | | | | | Observational studies | | | | | | Systematic Reviews | (“anterior cruciate ligament”\[MeSH\] OR ACL\[tiab\] OR ACLR\[tiab\]) AND (“bone-patellar tendon-bone”\[tiab\] OR “BPTB”\[tiab\]) AND ((“systematic review”\[pt\] OR “Review”\[pt\]) OR “meta-analysis”\[pt\]) | (“anterior cruciate ligament”\[MeSH\] OR ACL\[tiab\] OR ACLR\[tiab\]) AND (“hamstring\*”\[tiab\] OR “semitendinosus"\[tiab\] OR "gracilis”\[tiab\]) AND ((“systematic review”\[pt\] OR “Review”\[pt\]) OR “meta-analysis”\[pt\]) | ("anterior cruciate ligament" OR "anterior cruciate ligament reconstruction" OR "ACL" OR "ACLR") AND ("peroneus longus" OR "PLT") AND ("Systematic OR "Meta-Analysis"\[pt\] OR “Review”\[pt\]) | | ### ### PubMed (“anterior cruciate ligament”\[MeSH\] OR ACL\[tiab\] OR ACLR\[tiab\]) AND ((“bone-patellar tendon-bone”\[tiab\] OR BPTB\[tiab\]) OR (“hamstring\*”\[tiab\] OR “semitendinosus"\[tiab\] OR "gracilis”\[tiab\]) OR (“quadriceps”\[tiab\] OR “S-QT”\[tiab\] OR “B-QT”\[tiab\])) NOT ((“systematic review”\[pt\] OR “Review”\[pt\]) OR “meta-analysis”\[pt\]) 1. ‘anterior cruciate ligament\*’\[MeSH\] 2. ACL\*\[tiab\] 3. 1 or 2 4. ‘bone-patellar tendon-bone’\[tiab\] 5. BPTB\[tiab\] 6. 4 or 5 7. ‘hamstring\*’\[tiab\] 8. ‘semitendinosus-gracilis’\[tiab\] 9. 7 or 8 10. ‘quadriceps’\[tiab\] 11. ‘S-QT’\[tiab\] 12. ‘B-QT’\[tiab\] 13. 10 or 11 or 12 14. 6 or 9 or 13 15. ‘systematic review’\[pt\] 16. ‘meta-analysis’\[pt\] 17. 15 or 16 18. 3 and 14 and 17 ### Web of Science Core Collection ("anterior cruciate ligament" OR ACL\*) AND (("bone-patellar tendon-bone" OR BPTB) OR (hamstring\* OR semitendinosus OR gracilis) OR (quadriceps OR S-QT OR B-QT)) NOT (("systematic review" OR Review) OR meta-analysis) ### Scopus ("anterior cruciate ligament" OR ACL\*) AND (("bone-patellar tendon-bone" OR BPTB) OR (hamstring\* OR semitendinosus OR gracilis) OR (quadriceps OR S-QT OR B-QT)) NOT (("systematic review" OR Review) OR meta-analysis) ### Embase ('anterior cruciate ligament'/exp OR ACL\*:ti,ab) AND (('bone-patellar tendon-bone':ti,ab OR BPTB:ti,ab) OR (hamstring\*:ti,ab OR semitendinosus:ti,ab OR gracilis:ti,ab) OR (quadriceps:ti,ab OR S-QT:ti,ab OR B-QT:ti,ab)) NOT ((term:it OR term:it) OR term:it) ## Literature search ### PubMed ### Embase ## Deduplication Deduplication was performed using a python script ‘deduplication.py’[^1] # Documentation Template # Bayesian Network Meta-Analysis # Documentation # Literature search ## Protocol ## Search strategy ## Search ## Deduplication ## Screening ## # Data collection ## Data entry form ## Database creation ## Data extraction ## Data transformation ## Risk of bias assessment ## # Analyses ## Meta-analysis ## Continuous data ## Dichotomous data ## Meta-regression ## Plots ## Forest plots ## Lattice plots ## Regression plots ## Funnel plot ## # Manuscript ## Tables and figures ## Methods ## Results ## Discussion ## Introduction ## Abstract # 2\. Areas # Masters in Computer Science 1. # Masters in Data Science | DSC 520 Statistics for Data Science | | | Due Date | DSC 530 Data Exploration and Analysis | | | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Week 1 | Formal Discussion Post (1) | ☒ | 2025-12-08 06:59 CET | ☒☒ | Initial Post (2) | Weeks 1 & 2 | | | Informal Discussion Posts (2) | ☒☒ | | | | | | | Replies, comments, other posts (7) | ☒☒☒☒☒☒☒ | | ☒☒☒☒ | Discussion Responses (4) | | | | R Exercise | ☒ | | | | | | Week 2 | Formal Discussion Post (1) | ☒ | 2025-12-15 06:59 CET | ☒☒☒☒☒☒☒☒☒☒☒☒☒☒ | Informal Posts (14) | | | | Informal Discussion Posts (2) | ☒☒ | | | | | | | Replies, comments, other posts (7) | ☒☒☒☒☒☒☒ | | ☒ | Coding Assignment | | | | R Exercise | ☒ | | | | | | Week 3 | Formal Discussion Post (1) | ☒ | 2026-01-12 06:59 CET | ☒☒ | Initial Post (2) | Weeks 3 & 4 | | | Informal Discussion Posts (2) | ☒☒ | | | | | | | Replies, comments, other posts (7) | ☒☒☒☒☒☒☒ | | ☒☒☒☒ | Discussion Responses (4) | | | | R Exercise | ☒ | | | | | | Week 4 | Formal Discussion Post (1) | ☒ | 2026-01-19 06:59 CET | ☒☒☒☒☒☒☒☒☒☒☒☒☒☒ | Informal Posts (14) | | | | Informal Discussion Posts (2) | ☒☒ | | | | | | | Replies, comments, other posts (7) | ☒☒☒☒☒☒☒ | | ☒ | Coding Assignment | | | | R Exercise | ☒ | | | | | | Week 5 | Formal Discussion Post (1) | ☒ | 2026-02-02 06:59 CET | ☒☒ | Formal Discussion Post (2) | Weeks 5 & 6 (Due February 2\) | | | Informal Discussion Posts (2) | ☒☒ | | | | | | | Replies, comments, other posts (7) | ☒☒☒☒☒☒☒ | | ☒☒☒☒ | Informal Discussion Responses (4) | | | | R Exercise | ☒ | | | | | | Week 6 | Formal Discussion Post (1) | ☒ | | ☒☒☒☒☒☒☒☒☒☒☒☒☒☒ | Replies, comments, other posts. (14) | | | | Informal Discussion Posts (2) | ☒☒ | | | | | | | Replies, comments, other posts (7) | ☒☒☒☒☒☒☒ | | ☒ | Coding Assignment | | | | R Exercise | ☒ | | | | | | Week 7 | Formal Discussion Post (1) | ☒ | 2026-02-09 06:59 CET | ☐☐ | Initial Post (2) | Weeks 7 & 8 (Due February 16\) | | | Informal Discussion Posts (2) | ☒☒ | | | | | | | Replies, comments, other posts (7) | ☒☒☒☒☒☒☒ | | ☐☐☐☐ | Discussion Responses (4) | | | | R Exercise | ☒ | | | | | | Week 8 | Formal Discussion Post (1) | ☒ | 2026-02-16 06:59 CET | ☐☐☐☐☐☐☐☐☐☐☐☐☐☐ | Informal Posts (14) | | | | Informal Discussion Posts (2) | ☐☐ | | | | | | | Replies, comments, other posts (7) | ☐☐☐☐☐☐☐ | | ☒ | Coding Assignment | | | | R Exercise | ☒ | | | | | | Week 9 | Formal Discussion Post (1) | ☒ | 2026-02-23 06:59 CET | ☐☐ | Initial Post (2) | Weeks 9 & 10 | | | Informal Discussion Posts (2) | ☐☐ | | | | | | | Replies, comments, other posts (7) | ☐☐☐☐☐☐☐ | | ☐☐☐☐ | Discussion Responses (4) | | | | R Exercise | ☒ | | | | | | Week 10 | Formal Discussion Post (1) | ☒ | 2026-03-02 06:59 CET | ☐☐☐☐☐☐☐☐☐☐☐☐☐☐ | Informal Posts (14) | | | | Informal Discussion Posts (2) | ☐☐ | | | | | | | Replies, comments, other posts (7) | ☐☐☐☐☐☐☐ | | ☒ | Coding Assignment | | | | R Exercise | ☒ | | | | | | Week 11 | Formal Discussion Post (1) | ☒ | 2026-03-09 06:59 CET | ☐ | Initial Post (1) | Week 11 | | | Informal Discussion Posts (2) | ☐☐ | | ☐☐☐ | Discussion Responses (2) | | | | Replies, comments, other posts (7) | ☐☐☐☐☐☐☐ | | ☐☐☐☐☐☐☐ | Informal Posts (7) | | | | Written Assignment | ☐ | | ☐ | Final Project | | | **Current Grade** | | B+ | | | | B | **DONE** Assignments except for week 11 coding assignment for DSC 520 and final project for DSC 530\. **TODO** Discussion posts for DSC 520 and DSC 530\. Finish all formal posts (essays) ahead of time. **Template headers** Dong Woon Kim DSC 520 Statistics for Data Science Bellevue University Masters in Data Science Dong Woon Kim DSC 530 Data Exploration and Analysis Bellevue University Masters in Data Science # Formal Discussion Post Topics 250-500 words 2 APA-style references # DSC 520 Statistics for Data Science ## Week 4 (Due 2026-01-19 DONE) ## Week 5 (Due 2026-01-26 DONE) Compare and contrast probability and proportion. Do we really need probability? Defend your answer. What are the requirements for computing probabilities from empirical data? Describe the cumulative distribution function in your own words. What is expected value? # Week 6 (Due 2026-02-02 DONE) In your own words, what is sampling variability? Is sampling reliable? Why or why not? What is the standard error of the mean? Is random sampling a good way to ensure representative sampling? Why or why not? What is the Law of Large Numbers and why does it matter? What is the Central Limit Theorem and what are the implications of it? What is the Law of Large Numbers? What is the Central Limit Theorem? How do p-values help in hypothesis testing? How do you spot a strong versus weak hypothesis? Provide one or more examples. Compare dependent versus independent variables. How do these terms relate to supervised learning ML models? Define and compare Type-I and Type-II errors. ## ## Week 7 In your own words, describe what t-testing is. What is effect size and the measures of it? Compare the Wilcoxon signed-rank test, Mann-Whitney U test, and t-testing. What is permutation testing? ## Week 8 Describe correlation and explain why it matters. What is the correlation coefficient? What’s the relationship between covariance and correlation? What’s a correlation matrix? Provide at least one example as part of your explanation. What are the assumptions of correlation? What is statistical significance? Why would you think something like Cosine similarity is important in testing generative AI systems? Contrast confidence intervals and standard deviation. What are the assumptions of analytical confidence intervals? What is bootstrapping? Provide more than one example. What’s the relationship between confidence intervals and hypothesis testing? ## Week 9 What is ANOVA, and when should it be used? What are the assumptions of ANOVA? Describe the sum of squares. How are mean square and the f-stat used with ANOVA? In your own words, describe post-hoc comparisons. How is ANOVA evaluated? Compare ANOVA to rmANOVA. ## Week 10 What is the intuition behind linear regression? Compare regression and GLM. Compare regression and ANOVA. In your own words, describe regression terminology and notation as described in the text. How are linear regression models evaluated? ## Week 11 Summarize the “Science vs. the real world” section of the text in your own words. Distinguish between unintentional and intentional bias using examples. Describe the sources of bias. Describe researcher overfitting. What are some of the culture and tradition biases? Use examples for each. # DSC 530 Data Exploration and Analysis ## Weeks 3 & 4 (Due 2026-01-19 DONE) How do you decide whether to use the mean, trimmed mean, or weighted mean to measure the central tendency of a continuous variable? What indicators help distinguish whether a numeric value represents a continuous scale or a nominal scale? What criteria should you use to select the appropriate distribution for modeling variables in a dataset? What distinguishes a bar chart and a histogram, and how do you choose the appropriate one for visualizing your data? What are the most frequently used test statistics and how do you determine which one is suitable for your analysis? What is the significance of outliers in descriptive statistics, and how should they be handled in your analysis? How can measures of variability, such as range, interquartile range, and standard deviation complement the interpretation of central tendency? In what scenarios would one use a box plot to visualize a variable? \[x\] How can you assess whether a variable's distribution is symmetric or skewed, and why does this matter? What role does sample size play in the reliability of descriptive statistics, and how can it influence the interpretation of results? ## Weeks 5 & 6 ## Weeks 7 & 8 ## Weeks 9 & 10 # DSC 520 # Week 5 [What is expected value?](#what-is-expected-value?) [Compare and contrast probability and proportion.](#heading=h.waqr9kxy09x9) Dong Woon Kim DSC 520 Statistics for Data Science Bellevue University Masters in Data Science Week 5 Discussion/Participation Formal Post # What is expected value? {#what-is-expected-value?} The expected value or E(X) is the arithmetic mean of all possible outcomes of a variable expressed as the sum of all possible outcomes weighted by the event probabilities (1). In real-life, the expected value of multiple events can be used to determine the investment value of taking a risk [(“Expected Value in Statistics,” n.d.)](https://www.zotero.org/google-docs/?FPdzMQ). 1) E(x) \=i=1n xiP(xi)...xnP(xn) For example, to determine whether or not to invest in a lottery ticket, the difference in expected values of profit and loss is used to determine whether to take the risk by the value being below or above $0 (2). 2) E(x)=E(a)-E(b) Where a and b represent profit and loss, respectively. Given a $2 lottery ticket that offers a grand prize of $200,000; 5 losers are winners prizes each pay $1000; 15 third-place prizes each pay $300; and 25 fourth-place prizes each pay $10. Find the expected value of entering this contest if 5 million tickets are sold: E(x) \= i=1naiP(xi)-b(1-P(x))=($200,00015,000,000 \+$100055,000,000+$300155,000,000+$10255,000,000)-$24,999,9495,000,000 Which gives an expected value of – $1.96, and since it is below $0, it would be an unwise investment risk as it results in a mean loss, not profit. ## References [Expected Value in Statistics: Definition and Calculations. (n.d.). *Statistics How To*. Retrieved January 26, 2026, from https://www.statisticshowto.com/probability-and-statistics/expected-value/](https://www.zotero.org/google-docs/?cJOWSj) ## Compare the Wilcoxon signed-rank test, Mann-Whitney U test, and t-testing. Dong Woon Kim DSC 520 Statistics for Data Science Bellevue University Masters in Data Science Week 7 Informal Discussion Post \#2 **Compare the Wilcoxon signed-rank test, Mann-Whitney U test, and t-testing.** The Wilcoxon signed-rank and Mann-Whitney U tests are both non-parametric statistical tests for data that do not follow a normal distribution curve [(*Nonparametric Test \- an Overview | ScienceDirect Topics*, n.d.)](https://www.zotero.org/google-docs/?MPTbYE). The non-parametric tests listed above first give ranks to the data starting from 1 by increasing value. Equal values are given equal ranks. For example, a dataset with values: 10, 20, 20, 30, 50 are given ranks of: 1, 2, 2, 3 and 4\. The Wilcoxon signed-rank test and the Mann-Whitney U test (also called the Wilcoxon sum-rank) use median and interquartile range as the effect size measure of central tendency, and are paired and unpaired, respectively. T-testing is a parametric statistical test for data that *do* follow a normal distribution. Typically, the measure of central tendency to describe the dataset is mean and its dispersion using standard deviation or error. **References** [*Nonparametric Test—An overview | ScienceDirect Topics*. (n.d.). Retrieved February 8, 2026, from https://www.sciencedirect.com/topics/medicine-and-dentistry/nonparametric-test](https://www.zotero.org/google-docs/?cJOWSj) # DSC 530 # **Weeks 5 & 6 Data Aggregation, Exploration, and Test Statistics** ## Key Topics This is a list of statistics concepts covered in this week’s reading: * Mean * Median * Z-scores * Min, Max, Range * Percentage Change * Rank * Resampling * p-value * Student’s t test/distribution * F-test/distribution * Chi-square test/distribution * ANOVA ## Required Readings The readings for Weeks 5 and 6 focus on applying data aggregation techniques while setting up the variables to conduct calculations or apply existing descriptive statistics functions. You will utilize the two textbooks and see examples of data aggregation and further your knowledge of statistical experiments and significance testing. The documentation of the Python packages are listed under supplemental reading for your reference. Be sure to activate the book\_env virtual environment in your computer and then call up either JupyterLab or Jupyter Notebook to review the chapter Python codes during your reading. * *Practical Statistics for Data Scientists* (Bruce, Bruce, & Gedeck, 2020\) Chapter 3: Statistical Experiments and Significance Testing (All sections) * *Hands-On Data Analysis with Pandas* (Molin, 2021\) Chapter 4: Aggregating Pandas DataFrames (all sections except Time Series) * [Weeks 5 and 6 Supplemental Notes from Instructor](https://cyberactive.bellevue.edu/bbcswebdav/pid-17477932-dt-content-rid-108406390_4/xid-108406390_4) ## Supplemental Readings * [Statsmodels User Guide](https://www.statsmodels.org/stable/user-guide.html) (statsmodels, 2024\) * [Python \- Statistics — Mathematical statistics functions](https://docs.python.org/3/library/statistics.html) (Python, 2024\) * [SciPy Statistics](https://docs.scipy.org/doc/scipy/tutorial/stats.html) (SciPy, 2024\) ## Discussion/Participation Topics DONE \- 3 formal posts DONE \- 3 formal posts (2026-02-01) TODO \- 14 replies, comments, etc. by 2026-02-02 ### Data Aggregation 1. How do different levels of data aggregation (e.g., daily vs. monthly averages) impact the accuracy and interpretability of results? In what situations might aggregation obscure important patterns in the data? 2. When should you use measures like sum, mean, median, or maximum for aggregating data? 3. How do the choice of metric and the nature of the dataset influence the insights generated? 4. What are the best practices for dealing with missing data during aggregation? 5. What are the challenges and considerations of aggregating data over time (e.g., time-series data) and across spatial dimensions (e.g., geographic regions)? ### Test statistics 6. Why is it essential to choose the appropriate test statistic for a given hypothesis test, and what are the potential consequences of using the wrong one? 7. How does the concept of p-value influence decision-making in hypothesis testing, and what are its limitations? 8. How do sample size and variability affect the power of a t-test, and what steps can you take to address these challenges in an experiment? 9. What does the F-statistic tell us about the variances between groups, and how is it used in ANOVA to determine statistical significance? 10. How would you interpret the results of a chi-square test if the expected frequencies in some cells are very low, and what alternative methods could be used in such cases? ## When should you use measures like sum, mean, median, or maximum for aggregating data? Dong Woon Kim DSC 530 Data Exploration and Analysis Bellevue University Masters in Data Science Weeks 5 & 6 Formal Discussion Post \#2 You should use specific data aggregation measures to summarize datasets depending on the nature of the dataset itself. For example, most datasets can be classified as one of two data types: continuous or dichotomous data. Others include ordinal data, time-series, and others, but for the most part, data either exist on a continuous numerical scale (e.g., visual analogue scale (VAS) for pain scaled from 0 to 10\) or as discrete or categorical data that can be identified as either identical or different from one another by placing groups of data into 'boxes' (e.g., grade I, II, or III sprains, or mild, moderate, severe for disease severity based on characteristics of disease on patients). For continuous data, the mean or median is used to aggregate data depending on the normality and homogeneity of variances. The most important elements to consider are whether the data within a dataset resemble a normal distribution or not, and whether the dispersion of data variability is similar between the data included. In frequentist statistics, parametric datasets are summarized using means and standard deviation, error, and variances to derive confidence intervals while non-parametric datasets are summarized using median and interquartile ranges to derive confidence intervals (Molin, 2019). Bayesian statistics utilize new and old data together to summarize data as probabilities with credible intervals. For dichotomous data, sum and maximum (or minimum) are used to aggregate the datasets as the number of events, occurrences, etc. that fit the 'bin' of a given category is of importance. The goal is to create division, not coalescence, therefore means and medians are not appropriate when determining a summary measure, as they combine the data into one number, whereas sum, minimum and maximum are used instead to divide the data into 'digestible' boxes. Another, less common instance where the sum or other uses of measures in data aggregation is when there are duplications of data (Chapter 5: Collecting data | Cochrane, 2025). In these situations, the most recent, the maximum or minimum values are used so that there is only one data value per subject or patient prior to analyses. Because outliers exist in all datasets, it is the goal of data aggregation to ensure that they are not affected by those on either ends of the distribution curve in order to find apt measures that summarize what the data shows. **References** [Chapter 5: Collecting data | Cochrane. (n.d.). Retrieved February 1, 2026, from https://www.cochrane.org/authors/handbooks-and-manuals/handbook/current/chapter-05\#section-5-3-5](https://www.zotero.org/google-docs/?h6yW8G)\*\* [Molin, S. (2019). Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python (1st ed.). Packt Publishing Limited.](https://www.zotero.org/google-docs/?h6yW8G) ## How do the choice of metric and the nature of the dataset influence the insights generated? Dong Woon Kim DSC 530 Data Exploration and Analysis Bellevue University Masters in Data Science Weeks 5 & 6 Informal Discussion Post \#3 The choice of metric and the nature of the dataset not only influence insights generated, but the latter influences the former first. This concept of using a certain metric is understood in this writing as a variation in reported data that was chosen arbitrarily or at the discretion of the authors' of such dataset in question that cannot be converted to a metric or unit that is homogeneous and exact as the others being used to run analyses. Since choice of metric$\\textemdash$in the sense that it *can* be converted in a manner that is outside the scope of the aforementioned description$\\textemdash$does not affect analyses, the context of the problem in current research methodologies is converting metrics when they cannot be and therefore must be transformed into a standardized metric. ## What are the best practices for dealing with missing data during aggregation? Dong Woon Kim DSC 530 Data Exploration and Analysis Bellevue University Masters in Data Science Weeks 5 & 6 Informal Discussion Post \#4 The best practices for dealing with missing data is to first contact the authors of the study in which there is insufficient data reported or published in order to be included in the data analysis. The problem can be that the reported data uses a central tendency measure that is incongruent to those of the other studies that are included in an analysis. This is called unit-of-analysis problem and is best addressed by asking for the data that the authors collected in order to be able to fill in the missing data instead of falling to less optimal options, such as data transformation or leaving out the study and its data [(Rastkar, 2025\)](https://www.zotero.org/google-docs/?TYEb08). The problem with missing data is not that the data itself should not be included, but because it *should* be, but it cannot be for a number of reasons that is quite prevalent in many research domains, whether it be primary or secondary sources of data collection. With primary data collection, it is uncommon to face missing data problems, as one can collect the data again, if possible. Many authors of secondary data research studies such as systematic reviews and meta-analyses papers, do not realize that authors are readily and promptly responsive to requests for unpublished data from their respective papers that may facilitate the new data analysis. For example, around 10 studies reported insufficient data in which the studies' authors had to be contacted in a previous research project and 8 responded in order for their study to be able to be included in the analysis without 'missing data' to force their papers to not be cited in further papers. As transformation and leaving out the studies entirely are estimations, and diminish the validity of the research question by limiting data analysis of all available literature and its results, it is best to keep in mind that there are human beings behind the data, and they are worth contacting before exploring the other options readily available in Data Science. **References** [Rastkar, M. (2025). Missing data in systematic reviews. In Systematic Review and Meta-Analysis (pp. 181–182). Elsevier. https://doi.org/10.1016/B978-0-443-13428-9.00014-8](https://www.zotero.org/google-docs/?h6yW8G) Dong Woon Kim DSC 530 Data Exploration and Analysis Bellevue University Masters in Data Science Weeks 5 & 6 Formal Discussion Post \#1 ## Why is it essential to choose the appropriate test statistic for a given hypothesis test, and what are the potential consequences of using the wrong one? It is essential to choose the appropriate test statistic in order to derive a statistically significant—or insignificant—*p-value* that tests the null hypothesis against the test hypothesis for the research question at hand. An example of a commonly mishandled test statistic is when the data collection results in a non-parametrically distributed set of data. If the data collected as part of the study design and experiment is not normally distributed around the mean, or other measures of central tendency is better apt for describing the data, such as median and interquartile range—specifically in the presence of outlier data—then other test statistics than the conventional two-sample t-test must be used to analyze appropriately. A comprehensive list of statistical tests [(“List of Statistical Tests,” 2025\)](https://www.zotero.org/google-docs/?AgWShy): 1. One sample t-test 2. Paired difference test 3. Unpaired t-test 4. Welch's t-test 5. Paired t-test 6. F-test 7. Z-test (one mean) 8. Z-test (two means) 9. Permutation test 10. Kruskal-Wallis H-test 11. Mann-Whitney U-test 12. Wilcoxon signed-rank test 13. Sign test 14. Friedman test 15. X2 test 16. Pearson's X2 test 17. Median test 18. Multinomial test 19. Multinomial test 20. McNemar's test 21. Cochran's Q test 22. Binomial test 23. Siegel—Tukey test 24. Chow test 25. Fisher's exact test 26. Barnard's exact test 27. Boschloo's test 28. Shapiro—Wilk test 29. Kolmogorov—Smirnov test 30. Shapiro-Francia test 31. Lilliefors test There are many other statistical tests for each given scenario, depending on what the study design and subsequence data collection shows in terms of normality of distribution, homogeneity of variances, number of samples, unpaired vs. paired samples, and even sample sizes. **References** [List of statistical tests. (2025). In *Wikipedia*. https://en.wikipedia.org/w/index.php?title=List\_of\_statistical\_tests\&oldid=1317511441\#List\_of\_statistical\_tests](https://www.zotero.org/google-docs/?h6yW8G) # 3\. Resources # 4\. Archives # Links # Links * Link to [2025 docs](https://docs.google.com/document/u/0/d/1JA0oqgdDQIXB0SsznDsHghH9-6Lfg9Aru5lA_Yu0HJk/edit) # Read everyday **Reading List** | Title | Author | PDF | Link to Notes | | :---- | :---- | :---- | :---- | | Men Are From Mars and Women Are From Venus | John Gray | | | | Let Them Theory | Mel Gibson | | | | The Lean Startup | Eric Reis | | | | E-Myth Revisited | | | | | | | | | # Job Search # Roles/Positions # Evidence Synthesis Specialist (full-time, remote) Denied after interview with HR * Needed more SR/MA publications MTRC is seeking an experienced Evidence Synthesis Specialist to contribute to our evidence synthesis projects, including systematic reviews and value dossiers, in the evolving field of medical devices. As a pan-European market access consultancy, we provide expert support to 19 of the top 30 global Med Tech companies, focusing exclusively on medical technologies, including medical devices, in-vitro diagnostic tests, and digital health solutions. Since our establishment in March 2017 in Stockholm, our team of over 20 senior professionals has successfully delivered over 1000 projects, showcasing our expertise in data collection, policy anticipation and high-impact analyses. We are committed to maintaining the highest standards of quality, as evidenced by our ISO 9001:2015 certification from the United Kingdom Accreditation Service (UKAS). At MTRC, we pride ourselves on our strong internal culture, reflected in our employee satisfaction rate of 83%. With ambitious plans for over 50% growth this year, we offer new joiners a unique opportunity to gain deep insights into the EU reimbursement system. Our environment fosters professional autonomy and provides direct contact with clients at the top management level in leading medical device and IVD corporations. If you are looking to advance your career in a collaborative and intellectually stimulating setting, we invite you to consider the opportunities at MTRC. Join us in shaping the future of medical technology consulting. Requirements: Medical Doctor’s degree or a master’s degree in any relevant science, such as medical science, pharmaceuticals, epidemiology, biology, statistics, or health economics Minimum 5 years’ work experience in research, medical device industry and/or medical device consultancy Minimum 5 years’ experience with systematic literature reviews, value dossiers or health technology assessments (HTAs) Minimum 3 publications in peer-reviewed journals The position is remote. # # Director of Evidence Synthesis (full-time, remote) **\- reference for inspiration (find differences between the senior role and the director role)** Vacancy closed (found in archives) MTRC, a leading boutique Med Tech market access consultancy in Europe, is looking for a Director of Evidence Synthesis to lead our growing evidence synthesis function. MTRC has 25 full-time employees. Our offices are in the UK (HEOR), Spain (Market Access) and Bulgaria (Billing and Contracting). Most of the employees are remote. MTRC provides support to large Med Tech corporations (80% of the business) and SMEs (20% of the business) in the fields of market access analysis and strategy, Evidence Synthesis, and Health Economics. MTRC’s clients include 17 out of the top 30 global medical device companies. We are seeking an established leader in the field of evidence synthesis — someone who combines deep expertise in systematic literature reviews or value dossiers with a strong track record of project and team leadership. This is a senior strategic role with responsibility for leading international projects, supervising a team of reviewers, and engaging directly with senior stakeholders at top-tier Med Tech companies. The ideal candidate combines scientific credibility with excellent people and process management skills. At MTRC, you will join a focused, pan-European consultancy operating at the highest level of the Med Tech industry. Primary types of projects include systematic literature reviews, value dossiers, reimbursement and HTA submissions in different European countries and evidence gap analysis. This is a remote position (time-zone close to Europe preferred), offering autonomy, visibility, and a central role in shaping the company’s evidence strategy and direction. Compensation includes a package of fixed salary, quarterly, and annual bonuses. Responsibilities include: Conduct and lead international evidence synthesis projects for medical technologies Provide leadership to a team of specialists and reviewers Lead engagement activities with prospective clients at top corporate level Oversee commercial proposals for evidence synthesis projects Oversee and develop content marketing materials (white papers, educational videos) Requirements: Academic or consulting experience in systematic literature reviews/evidence synthesis in healthcare, more than 5 years Peer-reviewed publications Relevant MSc / PhD degree Excellent written and verbal English Knowledge of European languages (other than English) would be a benefit # Comparison **Table 1** Comparisons between job descriptions, requirements and responsibilities in industry/consultancy research | | Systematic Literature Reviewer | Evidence Synthesis Specialist | Evidence Synthesis Specialist (Director)(Not vacancy) | | :---- | :---- | :---- | :---- | | | | Systematic reviews and value dossiers in medical devices Med Tech companies medical devices in-vitro diagnostic (IVD) tests digital health solutions Data collection and analyses Policy anticipation European Union reimbusement and HTA submissions Research Medical device industry/consultancy Systematic literature reviews Value dossiers Health technology assessments (HTAs) | **Key differences between levels:** strong track record of project and team leadership lead international projects supervise team of reviewers engage directory with senior stakeholders of Med Tech companies people skills management skills | | **Requirements** | | | | | **Responsibilities** | | | | | | | | | # Draw and Paint more as hobby **Fig 1** Decision Modelling using Markov Model # Letter to Myself # # Career Get job → save up for 6-month staż \+ study Polish → decide in September after saving up and passing medical Polish exam whether to do 6-month or 13-month staż. If you get a job, then the 33,500zł you receive in March, June, and September (too late probably) can be used to pay for 6-month staż. Options: 1. Get job for 6-month staż **and pass medical Polish exam before Sept. anyway** 2. Do polish staż 3. Die 4. Ask dad for help paying for staż (if attempt to die fails) The goal is to get back to normal mentally, physically, and everything required for a long career in medicine. # Table of Contents 1. **\!\!** Apply to 300 jobs 1. Job posting sites: 1. Handshake 2. UNICEF (look for paid opportunities) 3. LinkedIn 2. Roles/positions (what doctors on reddit did during residency \- “moonlighting” to supplement low pay during training years in U.S.) 1. Research (lowest pay but high relevance to career) 1. Evidence synthesis/systematic review specialist 2. Clinical research associate 2. Medical Writer (good pay but hard to find employment in company?) 3. Tutoring (highest pay) 4. 3. Applied 1. 2. **\!** Keep up with MSDS at BU 3. **\!** Start MSCS at JHU 4. **\!\!\!** Start staż (6-month \- preferred for earlier specialization training start or 13-month) 1. Application opening dates: Also check with UJ since the law apparently says 6-month staż is supposed to be offered by every university in Poland. | | Application Opens | Application Deadline | Price | Contact | | | :---- | :---- | :---- | :---- | :---- | :---- | | [Warsaw Medical University](https://www.wum.edu.pl/en) | June 16 (actual date for 2026 TBA) | July 23 | 35,000 zł | [email protected] | | | [Medical University in Łódź](https://studymed.umed.pl/graduating-students-alumni/post-graduate-internship/?_gl=1*182mplm*_up*MQ..*_ga*NzA4MDUxNjU0LjE3Njg3ODE1OTY.*_ga_FCB9M8BMFB*czE3Njg3ODE1OTUkbzEkZzAkdDE3Njg3ODE1OTUkajYwJGwwJGgw) | | | 42,000 zł | [email protected] | | | [Katowice](https://smk.sum.edu.pl/practical_training/) | June 23 | | | | | | Poznan | | | | | | 1. Warsaw June 16, 2026 (double-check) 2. Łódź 5. Pass polish language exam # Apply to 300 jobs The goal is to fill up your time off until September 2026 or 2027 (latest possible start date) with research in industry or consulting (challenging area) that is paid and allows flexibility for residency training in the future. Allows you to recharge burnout while **getting paid** pursuing research. * Look for jobs that * require or prefer candidates with medical degree. * remote * U.S. based * pays enough to pay for 6-month staż # Keep up with MSDS at BU # Start MSCS at JHU # Start staż # Pass polish language exam # # Personal # # Journal # # 2026 ## Resolutions Find a job that makes ***you*** happy [Read a chapter of a book everyday]() Learn to be happy living alone ## 01.01 czwartek - [x] ~~8:00 activate eSIM on new iPhone~~ - [x] ~~8:00 transfer rent and utilities for January~~ - [x] ~~9:00 order food for Jan~~ - [x] ~~9:00 order a mop~~ - [x] ~~9:30 walk Jan~~ - [x] ~~9:45 clean flat~~ - [ ] **\!\!\!** 10:00 finish 2nd course on HTA ## 02.01 piątek - [x] ~~**\!\!** 15:00 fill out questionnaire for Marcin~~ - [x] ~~**\!\!\!** 13:00-14:45 finish 2nd course on HTA~~ ## 03.01 sobota - [x] ~~check for career resources in JHU portal~~ - [x] ~~activate eSIM~~ - [ ] Value Dossiers - [x] ~~Evidence Gap Analysis~~ - [x] ~~HTA~~ ## 04.01 niedziela - [x] ~~get a US personal checking account (HSBC)~~ - [ ] get a US business checking account (Nav \- identity verification fix it) - [ ] get a residential address - [ ] call DMV in NY on Monday - [x] ~~order a non-Drivers ID for back up ID for identity verification~~ ## 05.01 poniedziałek - [x] ~~Finish reading NICE Technology Appraisal guidelines (chapter 3 on Evidence).~~ - [x] ~~Buy a white shirt~~ - [ ] Buy a tie - [x] ~~Camera for meetings online~~ - [ ] Frames for medical diplomas ## 06.01 wtorek - [x] ~~Revise the github repository of your notes from readings for the last 2 weeks~~ - [x] ~~Formulate possible questions that will be asked for industry research/consulting~~ - [ ] Add in your srma repository to the evidence synthesis repository on github - [ ] Keep it in mind that this will be the backbone to the structure of your srma application/software (finish prototype by end of 2026). ## 07.01 środa - [ ] Buy a tie - [ ] Frames for medical diplomas - [ ] Interview - [ ] Dispute the fraudulent account on all three bureaus - [ ] Freeze SSN on all three bureaus - [ ] Report identity theft on CFTB - [ ] Get an EIN as sole proprietorship and when you go back, get a new SSN from Social Security Agency Think carefully, and set up the three addresses according to secondary identification (proof of address). 1\. needs to be non-CMRA to prevent identity verification failure and match voter registration card 2\. - [ ] Secure your laptop and export the logs - [ ] Put a monitor on the EIN # Living Abroad # Guide to Living Abroad as a U.S. Citizen ## In order to maintain ties with the U.S. while living abroad (e.g., Europe), you must have: ## Identity Verification Primary forms of identification (photo verification) * Driver’s license * Non-driver’s license * U.S. passport Secondary forms of identification (address verification \- *must match **permanent residential address**)* * Utility bill, lease agreement * Bank statement, debit/credit card bill * Voter registration card * School ID ## U.S. Address Pick a mailbox forwarding address from Anytime Mailbox (instead of iPostal1) that is residential (non-CMRA) in smarty.app. | Permanent residential address (must match secondary identity verification document) | 221 Rosemary St, Apt 3 Needham Heights, MA 02464 United States | Home | | :---- | :---- | :---- | | **Mailing address** (documents) (U.S. mailbox forwarding address) | 1178 Broadway, 3rd Floor \#4144 New York, NY 10001 United States | iPostal1 | | **Mailing address** (packages) | 600 Markley St Port Reading, NJ 07064 United States | Polonez | | **Current residential address** (address abroad) | ul. Mariana Domagały 27C/18 Kraków 30-741 Polska | | ## U.S. Phone number | \+1 (332) 265 3112 | Landline (VOIP) | NumberBarn | | :---- | :---- | :---- | | \+1 (212) 300 6082 | **Mobile** (eSIM) | Tello | ## # Tab 25 [^1]: