Joshua Norfolk
Professional Summary
Prior to my time over the last 1.5 years developing my data science skills independently and then through TripleTen’s program, I worked on a number of fascinating physics projects in my physics B.S. - namely, planning out scientific data collection for a NASA sounding rocket as part of a PSU engineering group and evaluating X-ray detectors’ comparative effectiveness with Python at Lawrence-Livermore National Lab, plus documenting the assembly of a key experiment part for dark matter project LUX-LZ.
Now, I am a Data Scientist/Analyst with a strong foundation in Python, machine learning, and statistical analysis. Leveraging 1+ year of experience in data-driven projects and a B.S. in Physics; excelling in transforming complex ideas into actionable strategies. Known for quick learning, effective mentorship, and fostering collaborative team environments.
Alongside my physics experience, I have thoroughly enjoyed teaching children, working first as a camp counselor and then as a science teacher for Nature’s Classroom. I developed physics-based curriculums and improved my ability to communicate physics and experimental principles to a less knowledgeable audience, while exhibiting patience and empathy. I currently work a few hours per week as a climbing instructor.
As of late spring 2024, I have accumulated several data experiences that have further bolstered my confidence. I worked on an extremely challenging externship with Dataspeak (small tech consulting), where I built a chatbot (like ChatGPT) to answer user questions specifically based on proprietary data, presenting my solution to the CEO. I joined a Data Analysis Hackathon, creating/delivering the final presentation and receiving high marks, contributing to a team win. I then worked on an externship with Besample (small worldwide survey conduction), where I found features that were most important to classifying a user as a bot through reverse-engineered machine learning model outputs, and informed the Besample team on how to use the technique for themselves. Currently, I work with Data Annotation as a freelancer, sharpening my coding skills by conducting reviews of AI (like ChatGPT) outputs, creating science/math-based prompt sets for AI to learn from, and essentially performing a wide variety of tasks contributing to AI development.
Personally speaking: for years I have loved rock climbing and hiking. I devoted a year, sometime between my college graduation and the start of my data science program, to traveling the United States in my car by myself and climbing in the desert and mountains as much as possible. Beyond finding that the world is a beautiful place, I honed my sense of technical and emotional problem-solving in terms of rope systems, trip logistics, and teamwork with strangers. If asked when I’ve had to “think on my feet,” certainly there have been instances in professional settings where this was required of me - but nothing comes to mind more sharply than the numerous times that I took the lead to problem-solve myself and my partners out of unexpected and pressing trouble.
Skills
- Python (Pandas, NumPy, SciPy)
- Machine Learning (Scikit-Learn, NLTK, TensorFlow)
- Data Visualization (Matplotlib, Seaborn, Tableau)
- SQL, Excel, Statistics, Streamlit
Tech Projects
Beach Bandits Hackathon Route Optimization (06/24)
- Created algorithms to create the most optimal route between a starting location and nearby locations - in our case, beaches in Florida.
- Worked as data scientist on cross-functional team with software engineers, learning how to coordinate with other professionals who have (to me) foreign skillsets.
- Pulled data from the internet via APIs for data collection, manually preprocessed a rather imperfect realistic dataset, and presented results to panel of technical and non-technical reviewers.
Zyfra Gold Recovery Prediction (06/23)
- Created a predictive model for gold concentrate yield, optimizing gold production, with real-world company data and unintuitive features.
- Implemented linear regression and random forest models, achieving 7.53% tested sMAPE through extensive validation.
- Link to project
- Developed a predictive model for Interconnect with a 0.90 AUC-ROC score, identifying key churn factors to drive targeted customer retention strategies.
- Streamlined client service optimization by pinpointing tenure, contract type, and internet service as major churn indicators, aiding in churn rate reduction.
- Joined, analyzed, and interpreted raw data from various sources
- Link to project
Ice Video Game Sales (04/23)
- Analyzed video game sales data using Python and visualization tools, predicting 2017 market trends and platform performance.
- Employed t-tests and ANOVA on user ratings to derive actionable insights for game profitability and regional preferences.
- Link to project
OilyGiant Region Selection (06/23)
- Analyzed parameters across three regions using linear regression models, culminating in the selection of a region based on profit and risk.
- Implemented Bootstrapping for risk assessment, maximizing profitability with effective risk management in the oil mining sector.
- Link to project
Experience
Data Annotator/Code Reviewer | Data Annotation (04/2024 - Present)
- Facilitate the training of AI chatbots, contributing to the advancement of sophisticated AI programs through iterative learning processes and algorithm refinement.
- Engage in fact-based research and craft inventive prompts aimed at challenging the LLM (Large Language Model) to produce responses that prioritize safety and ethical considerations.
- Strengthen the understanding of LLMs via review, editing, and evaluation of Python code output, and by creating sample prompt/answer sets.
- Address instances of hallucinations, unsafe responses, verbosity, and adherence to instructions through rigorous monitoring of LLM outputs.
Data Scientist/Analyst | Besample (04/2024-06/2024)
- Uncovered important aspects of bot behavior for Besample, a very small company that collects worldwide data from paid surveys (bots scam Besample out of money and introduce unreliable data).
- Reverse-engineered trained machine learning models to find the most significant features of bot behavior, using an established list of bots.
- Educated Besample staff on how to use this technique to supplement their future analyses during the final presentation.
- Link to project
Data Scientist/AI Engineer | DataSpeak (09/2023-11/2023)
- Developed a question-answering customer service system using Retrieval-Augmented Generation, integrating technologies like Pinecone LLM, Hugging Face, LangChain, and Streamlit.
- Achieved coherent response generation in ~30 seconds and presented the solution to C-level executives at DataSpeak.
- Link to project
Inventory Control Specialist | ADUSA Distribution (01/2023 – 11/2023)
- Streamlined inventory management processes for the team, resulting in a 10% improvement in workflow efficiency.
- Employed Excel functions to develop and maintain accurate inventory tracking systems, reducing data discrepancies.
- Analyzed and resolved complex inventory issues, demonstrating strong problem-solving skills and attention to detail.
- Collaborated with other departments to optimize inventory levels and educated the team on new software.
Scientific Data Analyst Intern | Lawrence-Livermore National Lab (06/2018-08/2018)
- Sponsored by the U.S. Department of Homeland Security (DHS) Domestic Nuclear Detection Office (DNDO).
- Engaged in a rigorous internship focusing on evaluating a novel x-ray detection system.
- Utilized Python for data extraction and analysis, contributing to pivotal project insights.
- Presented findings to senior scientists, influencing project trajectory and gaining valuable experience in data-driven decision-making in a research environment. Also presented to peers and wrote a 13-page report for DHS.
Scientific Researcher | Penn State University (09/2017-04/2019)
- Part of PSU flight program group, working collaboratively and iteratively throughout the project’s life cycle to engineer a rocket payload for a NASA sounding rocket - launched from Norway to study polar mesospheric winter echoes (PMWEs). Link to a paper
- Planned data collection for Langmuir probes, a scientific instrument, enabling science team to better understand PMWEs after data collection.
- Attended launch to provide oversight on ground station setup and to monitor atmospheric conditions in control room if science lead was unavailable.
Education
- Pennsylvania State University | B.S. in Physics with Math Minor | GPA: 3.5 (08/2016-12/2019)
- TripleTen | Data Science Program (01/2023-12/2023): Completed 16 reviewed projects in data science/analysis focusing on EDA, data preprocessing, statistics, and machine learning.