Category: Uncategorized

  • Well, I did it!

    Tonight (4/14/25) I completed my last course as part of DataCamp’s “Associate Data Scientist in R” career track.

    The course was “Unsupervised Learning in R.” Given that it was a refresher of some of the previous material, and was presented in a much clearer manner, I would posit that it should have come earlier in the program track.

    All told, I’ve compiled quite a bit of code from the courses and projects that I’ve taken as part of this track. My target now will be to review that material, and to enhance my notes and the inline documentation I added to the code so that I can feel confident that I understand and can apply the material in real-world situations.

    Completing the career track isn’t a “certification”, per se. That can only be earned after completing DataCamp’s two timed exams and practical exam.

    But, once I register, I will have 30 days to complete all three exams.

    Since many of the courses I have just completed overlap with the requirements for the full “Data Scientist in R” career track, I am mulling over pursuing the remaining courses for that track first so that I will hopefully be better prepared.

    For the record, here’s the list of components for the “Associate Data Scientist in R” career track:

    CourseProjectSkill Assessment
    #1Introduction to R
    #2Intermediate R
    #3Introduction to the Tidyverse
    #4Data Manipulation with dplyr
    #5Analyze the Popularity of Programming Languages
    #6Joining Data with dplyr
    #7Introduction to Statistics with R
    #8Introduction to Data Visualization with ggplot2
    #9Intermediate Data Visualization with ggplot2
    #10Data Manipulation with R
    #11Data Communication Concepts
    #12Introduction to Importing Data in R
    #13Cleaning Data in R
    #14Exploring Airbnb Market Trends
    #15Working with Dates & Times in R
    #16Importing & Cleaning Data with R
    #17Introduction to Writing Functions in R
    #18R Programming
    #19Exploratory Data Analysis in R
    #20Introduction to Regression in R
    #21Modeling Car Insurance Claim Outcomes
    #22Intermediate Regression in R
    #23Sampling in R
    #24Hypothesis Testing in R
    #25Hypothesis Testing with Men’s & Women’s Soccer Matches
    #26Experimental Design in R
    #27Statistics Fundamentals in R
    #28Supervised Learning in R: Classification
    #29Supervised Learning in R: Regression
    #30Unsupervised Learning in R

    And away we go!

  • I overlooked one project that I actually completed …

    After completing the courses on “Hypothesis Testing in R” and “Experimental Design in R”, I completed the project, “Hypothesis Testing with Men’s and Women’s Soccer Matches”, and forgot to record it here in the blog.

    CourseProjectSkill Assessment
    #25Hypothesis Testing with Men’s and Women’s Soccer Matches

    It was an error of omission, clearly. I suppose I thought that I had recorded it.

    But, no, it seems.

    Oh, well … no blood, no foul! No Red Card!

    (Sorry! I couldn’t help myself!)

  • Supervised Learning is a complex concept

    Today (4/12/25-4/13/25) I completed two additional courses in the DataCamp “Associate Data Scientist in R” career track: “Supervised Learning in R: Classification”, and “Supervised Learning in R: Regression.”

    There was a lot to unpack, and quite frankly, in a few days I am going to review the video presentations and the code that I wrote to get a better grasp on the material.

    CourseProjectSkill Assessment
    # 28Supervised Learning in R: Classification
    # 29Supervised Learning in R: Regression

    I have one course remaining to complete for the certification.

    THEN the REAL work will begin!

  • It’s coming back to me, now…

    The latest course I have completed as part of DataCamp’s “Associate Data Scientist in R” career track, “Experimental Design in R”, has been a very visceral experience.

    Completing the exercises had me reminiscing about undergraduate and graduate courses that I took at CSU East Bay in Statistical Inference and Mathematical Statistics.

    It got me thinking of the skills I acquired then, but haven’t used to any great depth since, and imagining them as purpose-built knives I need to sharpen and hone for them to be useful, again.

    I also took the skill assessment for “Statistics Fundamentals with R” and passed. So, I may not need to grind away too much material to make those tools sharp, again!

    In any case, I’m closer to completing the track, and driving forward!

    CourseProjectSkill Assessment
    #26Experimental Design in R
    #27Statistics Fundamentals with R

  • I’m getting my groove back…

    Being hospitalized, even for a short time, saps your strength, and your resolve (if you let it.)

    Learning that I have Hepatocellular Carcinoma as a result of the gallstones, infection, and nonalcoholic fatty liver disease (NAFLD) that sidelined me for a couple of months last year has been a shock.

    But, learning that HCC is a “better” diagnosis than Biliary Carcinoma because it responds well to immunotherapy has given me hope that I may yet stick around for a few more years.

    In any case, life goes on. And I have been taking my time trying to get back into the groove of tackling the remaining courses in my Associate Data Scientist in R career track from DataCamp.

    I had started the “Hypothesis Testing in R” course a couple of days before I was hospitalized, and just finished it this evening.

    CourseProjectAssessment
    #24Hypothesis Testing in R

    Most of the hypothesis testing techniques I learned in my undergraduate and graduate courses at CSU Hayward/East-Bay in the early 2000’s leveraged early or rudimentary packages for inferential statistics.

    I’ve learned a lot about the “infer” package and simulation-based hypothesis tests.

    I’ll definitely need to practice with the latter more so that using them becomes second nature.

  • Health Update:

    The evening of March 20, 2025, I drove myself to the Emergency Room at Sutter Tracy Community Hospital (STCH). I was experiencing extreme pressure and point tenderness in my abdomen near my liver. The pain had started as a 2/10 at 6:30 and progressed to 4/10 by 8:30, reaching 6/10 by the time I was seen by an ER physician around midnight.

    The physician at STCH arranged to have me transferred to California Pacific Medical Center (CPMC) where last year I had been hospitalized and treated multiple times for blockages to my liver and pancreas due to gallstones, and had my gall bladder removed.

    Unfortunately, blood tests, a CT scan, MRI, and biopsy revealed that I have Stage 3 Hepatocellular carcinoma (HCC). It’s a type of liver cancer that originates in the liver’s main cells, and is the most common type of primary liver cancer.

    How did I get it? It’s apparently a high probability that it started due to my Non-alcoholic fatty liver disease (NAFLD), and the associated Cirrhosis, and possibly the Metformin and other medications that I have been taking over the past few years.

    The Hepatologists and Oncologists at CPMC have told me that this is a better diagnosis than bile duct carcinoma, which is more aggressive. So, I am grateful to GOD that this is the case.

    On Monday, March 31, I have my first consultation with an oncologist in Tracy, where I hope to discuss how we can start immunotherapy to shrink the malignancy.

    So, in addition to actively pursuing a new career, if possible, as a Data Scientist — or as a Lead/Sr. Business Intelligence Analyst, I’ll be looking to do what I have to do to tackle this medical condition.

    Wish me luck — or include me in your prayers, if you’re so inclined. I’ll be grateful for either one!

  • Brute force is simple, but elegance can be taught.

    Completing the “Modeling Car Insurance Claim Outcomes” project as part of DataCamp’s “Associate Data Scientist in R” career track has been the most humbling element of the program for me.

    The project was intended to be completed between the two courses on Regression, but I felt it would be better for me to wait until I had completed both to work on the project.

    CourseProjectAssessment
    #21Modeling Car Insurance Claim Outcomes

    Why? Because it had been my experience that there are nuances to be learned from the repetitive nature of the coding exercises, and those nuances had come into play in the previous “projects”.

    I’ve been taking pretty extensive notes (and screen captures) from the course materials and the coding exercises.

    For now, I simply use them as reference materials while completing the exercises and the assessments. But, once I have completed the track, I intend to compile and coalesce them as “crib notes” for any future R coding that I might do.

    Regardless, I’m caught up, now. So, it’s off to “Hypothesis Testing in R”!

  • One more down!

    Well, I completed DataCamp’s “Sampling in R” course this evening. That’s one more down!

    CourseProjectAssessment
    #23Sampling in R

    As I build momentum in completing these courses, details that I had either forgotten or wasn’t originally exposed to in my work with R during my Graduate program are starting to take hold.

    Because R is a constantly-evolving programming language — there are 22,240 contributed packages on the Comprehensive R Archive Network (CRAN), as of March 19, 2025 — it’s impossible to be familiar with “everything.”

    It is, feasible, however, to be a “journeyman” of sorts, and either know enough about most aspects of the language to adhere to a corollary of the Pareto Principle and complete 80% of your tasks using 20% of the available features — or, to put it another way, to derive 80% of your results from 20% of your effort.

    This puts me at the 75% benchmark for completing the “Associate Data Scientist in R” track.

    To be honest, had I not been laid off, I might not have been motivated to plow through this program. So, it’s been a mixed blessing.

  • There’s more to linear/logistic regression than meets the eye!

    Over the past two days, I finished what were fundamentally “refresher” courses for me – DataCamp’s “Introduction to Regression in R” and “Intermediate Regression in R” — and doing so brought back so many memories.

    CourseProjectAssessment
    #20Introduction to Regression in R
    #22Intermediate Regression in R

    There was a lot to “unpack” and I learned about a few new R package libraries that weren’t available when I took STAT 4601 and STAT 4950 in my Graduate program in Statistics at CSU East Bay.

    To be honest, those packages make the practice of performing regression analysis much easier, although having to perform many of the core calculations manually did have its advantages.

    But, such is the tradeoff of progress — tasks become easier, but some core knowledge may be lost in the process.

    This realization brought to mind a quote I think applies:

    β€œIt is the old experience that a rude instrument in the hand of a master craftsman will achieve more than the finest tool wielded by the uninspired journeyman.”
    β€” Karl Pearson

    I decided to hold off on completing the “Modeling Car Insurance Claim Outcomes” project (#21) until I had completed these two courses. I wanted to make sure that I had worked through the applicable material before tackling it.

    So, that’s next on the agenda!

  • Blog Update – March 12, 2025

    I’ve completed two new elements of the DataCamp Associate Data Scientist in R career track!

    CourseProjectAssessment
    #17Introduction to Writing Functions in R
    #18R Programming
    #19Exploratory Data Analysis in R

    I have to admit that the elements of writing functions was completely new territory for me.

    Even in my Graduate studies in Statistics at CSU East Bay, we didn’t delve very deep into writing functions in R.

    And we certainly didn’t leverage inline CASE statements when writing code.

    I’m really glad that I’ve gotten this level of exposure to writing R functions. Elements of my exposure to FORTRAN (90) from when I first enrolled at Chabot College and my recent years of SQL programming are coalescing with this course material.

    The next sections of this career track are going to dive into Regression, Hypothesis Testing and Supervised/Unsupervised Learning.