Professional Profile
Seasoned data scientist with an extensive track record building and leading teams to deliver high-impact AI solutions. Expertise in MLOps, speech and language processing, data-centric AI, data management, observational and experimental causal inferences, and public finance. Passionate about building products with state-of-the-art AI techniques to improve decision-making and create value for end users.
Work Experience
Steply AI, Washington DC, Apr 2024 - Present
Chief Executive Officer
- Building a contact center analytics platform that improves operation efficiency by identifying the action sequences in customer service call audio or chat transcripts using AI.
Minerva CQ, Sunnyvale CA, Dec 2022 - Mar 2024
Vice President, Data Science
- Led a team of speech and language processing experts to build real-time conversational AI solutions that enabled contact center agents to help customers more efficiently.
- Finetuned state-of-the-art, pre-trained automatic speech recognition (ASR) and large language models (LLM) for domain-specific tasks: intent and named entity recognition, sentiment and customer satisfaction classification, summarization, topic modeling, dialog structure induction, retrieval-augmented generation.
EY-Parthenon, McLean VA, Mar 2022 - Dec 2022
Director, Data Science
- Developed real-time forecasting models with deep learning for a NASA SBIR Phase II contract to improve energy system resiliency with timely warnings of hazardous geomagnetically-induced currents (GIC) caused by geomagnetic storms.
- Led MLOps projects on automation to improve the operational efficiency of EY’s federal audit teams.
Quantitative Scientific Solutions, Arlington VA, Jan 2022 - Mar 2022
Lead Data Scientist
- The firm was acquired by EY-Parthenon in Mar 2022
Voter Participation Center, Washington DC, Jan 2020 - Dec 2021
Senior Director of Data Science & Analytics, Jan 2021 - Dec 2021
Director of Data Science & Analytics, Jan 2020 - Dec 2020
- Built and led a team of data scientists and engineers to execute end-to-end AI/ML solutions that enabled data-driven targeting and improved the efficiency of $160MM national voter mobilization programs in the 2020 election.
- Deployed ML models that 2x program success rates, generated ~1.6MM voter registrations and ~5MM vote-by-mail applications in key states.
- Designed randomized controlled experiments to improve the reliability of product insights.
- Modernize the org’s data infrastructure by building new data science pipelines on Amazon AWS.
- Communicate model results and analyses to major donors and partners.
Equal Citizens, Washington DC, Apr 2017 - Dec 2019
Finance Director, Apr 2019 - Dec 2019
Executive Director, Apr 2017 - Mar 2019
- Founded the nonprofit organization with Harvard Law Professor Lawrence Lessig to advance U.S. democracy reform via high-profile litigation, advocacy, and education projects — its “faithless electors” case received a ruling from the U.S. Supreme Court in 2020.
- Built partnership networks with legal experts, social media consultants, and volunteers to deliver projects under tight budgets and timelines.
- Developed fundraising and public relations strategies, and maintained IRS/legal compliance.
Joint Economic Committee of the United States Congress, Washington DC, May 2015 - Apr 2017
Senior Economist, Democratic Staff, Jun 2016 - Apr 2017
Economist, Democratic Staff, May 2015 - Jun 2016
- Advised Members of Congress and congressional staff on fiscal and monetary policy issues for congressional hearings and delegations, in consultation with top subject matter experts at the Federal Reserve, federal agencies, international organizations, academia, and think tanks.
- Produced analytical reports on federal policies and legislative proposals.
- Prepared briefing materials and coordinated with expert witnesses for congressional hearings; wrote and edited speeches, hearing statements, press releases for the Ranking Member.
The World Bank, Washington DC, Nov 2011 - May 2015
Consultant, Europe and Central Asia, Poverty Reduction & Economic Management Unit
- Developed and built a fiscal database for the Europe and Central Asia region through collaboration with the World Bank country offices. Analyzed macro and financial indicators in the region to monitor the Euro-zone crisis.
Consultant, Development Economics & Chief Economist, Macroeconomics & Growth Unit
- Analyzed Peru census data using various microeconometric models with Stata for a paper on the socioeconomic impacts of mining activity in Peru, published in a peer-reviewed journal.
Consultant, Latin America & the Caribbean, Financial Management Unit
- Conducted regional study on public financial management reforms and the quality of public service provision in the Latin America & Caribbean region.
American University, Washington DC, Aug 2009 - May 2012
- Instructor: Applied Macroeconometrics II: STATA Lab (Graduate); Introduction to Econometrics: STATA Lab (Undergraduate)
- Teaching Assistant: Applied Macroeconometrics II (Graduate Time Series & Panel Data Econometrics); Senior Thesis Seminar (Undergraduate); Introduction to Econometrics (Undergraduate); Microeconomics (Undergraduate); Macroeconomics (Undergraduate)
ABC Global Systems Inc, New York NY, Apr 2006 - Mar 2009
Budget Analyst
- Monitored cash flow data from payment processing accounts ~$200 million USD per annum.
- Developed management tools to measure and analyze financial and operational information: financial ratios; key performance indicators; sales performance monitoring; product pricing.
- Generated pro-forma financial statement as well as detailed budget, profitability and growth projections to facilitate executive-level decision making and enforce budgeting compliances.
United Nations Children’s Fund (UNICEF), New York NY, Jun 2005 - Dec 2005
Research Assistant Intern, Global Policy Section, Economics and Social Policy Unit
- Supported Senior Programme Officers in policy analyses for the State of the World’s Children and other UNICEF companion publications using child poverty data.
- Assisted in building a SQL database for data from the Poverty Reduction Strategy Papers, which included a broad range of demographic and social indicators on 58 developing countries.
Education & Certification
AWS Certified Solutions Architect — Associate, Jan 2024
Professional Certificate in Artificial Intelligence, Oct 2021
Stanford University, Stanford, CA
Doctor of Philosophy in Economics (ABD)
American University, Washington DC
Fields: Public Finance; Monetary & Financial Economics; Applied Econometrics
Data Science Fellowship, Sep 2019
The Flatiron School, Washington DC
Post Baccalaureate Business Certificate in Finance and Accounting, Dec 2007
Baruch College, City University of New York, New York, NY
Master of Arts in Politics, May 2006
The New School for Social Research, New York, NY
Thesis Title: Essays on Democracy and Income Inequality in Developing Countries
Bachelor of Arts in Economics and Political Science, May 2003
San Francisco State University, San Francisco, CA
Awards & Honors
- The Idiap & IdeArk prize, Idiap Create Challenge, 13th edition, 2024
- Most Influential Article of the Year, Towards Data Science, 2019
- Most Viewed & Shared Blog, KDnuggets, 2019
- Data Science Fellowship, The Flatiron School, Washington DC, 2019
- Graduate Fellowship, American University, Washington DC, 2009 – 2012
- Graduate Scholarship, New School for Social Research, New York NY, 2003 – 2006
Publications
- Phoebe Wong and Robert Bennett (2019), “Everything a Data Scientist Should Know About Data Management,” Towards Data Science.
- Phoebe Wong (2019), “Predicting vs. Explaining,” Towards Data Science.
- Phoebe Wong and Adam Eichen (2018), “Russian Indictments Show that the U.S. Needs Federal Oversight of Election Security,” TechCrunch.
- Joint Economic Committee (2016), “Federal Investment in U.S. Legacy Transit Systems.”
- Joint Economic Committee (2016), “The 2016 Joint Economic Report, Minority View,” Chapter 3, The Effect of the Global Economy.
- Minujin Alberto, Enrique Delamonica and Phoebe Wong (2006), “Exploring the Properties of Child Poverty Indicators in Various Socioeconomic Contexts,” UNICEF–DPP Working Paper Series, New York.
Invited Talks & Workshops
- “Automatic design of conversational models from observation of human-to-human conversation,” JSALT Summer Workshop on Speech and Language Technologies by Johns Hopkins University, Le Mans, France, Jun 26 - Aug 4, 2023.
- “Developing Real-Time Forecasting Capabilities for GIC Hazard Mitigation: A Data-Centric AI Approach,” EPRI AI and Electric Power Summit, Rome, Italy, Oct 4, 2022.
- The Case for Debt-Financing Surface Transportation Infrastructure Investments at the Federal Level in the 115th U.S. Congress,” The Association of University Business and Economic Researchers Fall Conference, Fayetteville, Arkansas, October 25, 2016.
Technical Skills
- Proficient in Python (spaCy, NLTK, PyTorch, PySpark, Keras, TensorFlow, Pandas, NumPy, SciPy, Scikit-learn, Statsmodels, XGBoost, Plotly, Seaborn, Matplotlib), AWS (SageMaker, S3, EC2, Redshift, Athena), SQL, JSON, Bash, Hugging Face, GitHub/GitLab, MongoDB, VS Code, MLflow, Stata, MATLAB, LaTeX, and Microsoft Office
- Proficient in building domain-specific applications using LLMs: ASR, entity and intent recognition, sentiment analysis, extractive and abstractive summarization, topic modeling, information retrieval- and knowledge-based question answering, and task-oriented dialogue systems
- Proficient in MLOps and data-centric AI implementations
- Working knowledge in SAS, R, EViews, SPSS, LexisNexis, Bloomberg Terminal, WordPress, Mathematica, AutoCAD, and Pro/ENGINEER
- Completed Greenberg Seminars on Effective University Teaching at American University, a 3-year certificate program for university teaching
Applied AI & Data Science Projects
- Fine-tuned NVIDIA Riva streaming ASR in es-ES with domain-specific audios and corpora, and reduced the word error rate of the existing model in production by 10 percentage points.
- Developed a first-of-its-kind time series forecasting solution to predict hazardous GIC events with deep learning techniques using a data-centric AI approach, focusing on data quality control throughout the MLOps lifecycle to maintain model accuracy post-deployment.
- Developed and deployed an application to speed up random assignments in randomized controlled trials (RCTs) with multiple treatments across time and large sample sizes by leveraging parallel computing and MPP databases. Reduced processing time for random assignments of 40MM samples from 48+ hours to ~15 minutes.
- Developed and deployed a voter registration likelihood model using XGBoost, with over 5MM records and 300 features. Successful registration rates increased 2x after model deployment.
- Developed an uplift model to predict targets’ net responsiveness to voter turnout outreach, using 10MM records from past RCTs to infer actual treatment effects based on estimated counterfactual behavior.
Languages
- Native English & Cantonese speaker. Advanced Mandarin.