Hi! I'm Beiming

I'm currently finishing my Master of Science in Data Science (MSDS) at USF where I learned advanced machine learning techniques, developed strong programming skills and refined my problem solving skills. I am also currently working as a data science intern at Tally, a Series-B VC funded Fintech startup. I work cross-functionally with the credit, product and customer teams to take off the financial stress from the users. I am looking for full-time opportunities and would love to solve interesting problems with friendly and smart people.

On this page, you will find some of the projects I have worked on recently including Machine Learning, Predictive Analysis, Time-Series Analysis, NLP projects and more. Thank you for taking a look and please do not hesitate to contact me for questions.

Skills

  • Programming: Python(Scikit-Learn, Pandas, XGBoost, LGBM, OpenCV), R
  • Database: SQL(PostgreSQL, Amazon Redshift), NoSQL(Mongo DB)
  • Deep Learning: TensorFlow, Keras, PyTorch
  • Distributed Computing: Spark(spark.ml),
    AWS(EC2, S3)
My resume

Featured Projects

A collection of recent projects:
  •   Click the Image for website/project page
  •   Click Project Titles for github page/code

What's the story behind 17 million
Citi Bike Trips in 2017?

There were nearly 17 million Citi bike trips in New York City in 2017. Where were bikes picked up and dropped off? What factors influence commuters' decision to ride a bike or not? The project aims to answer the questions through visualization, analysis and to predict the future using machine learning techniques.
Machine Learning, Predictive Analysis, Python, GIS data, Target Encoding, Visualization

What does Trump say? (and how the stock market reacts)

President Trump is probably the most active politician on Twitter. What does he say on Twitter? Do his tweets have an impact on the stock market? In this project, I scraped all his tweets since 2017, performed Natural Language Processing and visualized them with the S&P 500 index movement to identify the impacts of his tweets.
Time Series, Sentiment Analysis, NLP, Stock Market, Scraping

LEGOIT

Many faced the challenge of getting a unique birthday/anniversary gift for their loved one or family members. LEGOIT.us is a web application that transforms your photos into a Lego set with Instagram-like filters and helps you turn memorable moments into priceless gifts. The algorithm will apply instagram-like filters and match the colors between the photos and Lego bricks.
Web Application, optimization, Image Processing, LEGO, Filters, Scraping

Global Trade Visualization

Understanding Global Trade is hard but it doesn't have to be! The visualization project by @TimLee and me aim to utilize many data sources to provide a visualize the interwoven nature of international industries and to show the interdependence of nations.
Visualization, Plotly, AWS

Machine Learning Projects

Using Regression, Random Forest, Gradient Boosting Method and Neural Net, one can find a complex yet traceable relationship between data and the target (e.g. House Prices, CTR whatever that we want to predict).

Stop Spamming my Email!

Predict if an email is a spam with 97% accuracy rate using Adaboost and XGBoost methods.
Adaboost, Gradient Boosting Tree, XGBoost, Parameter Tuning

Teach Computers to recognize Digits

Implement simple 3-Layer neural network using MNIST dataset to predict hand written digit with an accuracy rate of 98.29%.
Neural Network, Imgae Recognition

Your own recommendation engine!

Build a movie rating recommendation system from scratch using collaborative filtering with matrix factorization.
Recommender, Collaborative Filtering, Matrix Factorization

How Much is your House Worth?

Use Linear Regression to predict the house prices in Ames, Iowa. Compare regression models of OLS, Ridge, Lasso and Elastic Net techniques and generate a business report.
Regression, Ridge, Lasso

Natural Language Processing (NLP)

Use various techniques to enable programs to understand and interpret human language.

So you watched a movie: Yay or Nay?

Use Glove embeddings on movie review dataset of 50,000 reviews from IMDB. Predict if a review is positive or negative given the content: use XGBoost to achieve an 86.7% accuracy rate.
Sentiment Analysis, Word Embeddings, IMDB

BBC Article Recommendations

Replicate the recommendation system on blog-based website: provide 5 similar articles based on what the user is reading now using word2vec on BBC news data.
Word Embeddings, Similarity Score, word2vec

Twitter Sentiment Analysis

Retrieved tweets via the tweepy API and visualized sentiment with a color gradient using Python
Tweeter Data, Scrape, VADER Sentiment

More About My Journey

I was born and raised in Beijing, China. I received a BSc in Mathematics and MSc in Financial Engineering at the Imperial College London in London United Kingdom, where I built a solid theoretical foundation in Mathematics.

I started my first job in a Venture Capital firm in the Bay Area that invested in artificial intelligence (AI) and autonomous vehicles startups. There, I researched on AI and how it revolutionized business operations in many industries. I also received my first exposure to data science at the company. Wanting to learn more, I enrolled in the University of San Francisco data science masters program last year.

At Tally, a credit card management company, my primary goal is to apply both supervised and unsupervised machine learning techniques to improve credit. Two projects that I have worked on:
            •    Implemented Random Forest and Gradient Boosting Technique to classify both short-term and
                   long-term delinquency risk of Tally's applicants(which outperformed the current risk model)
            •    Developed a model to better categorize bank credit transactions and accurately predict user's
                   income(which will help prevent users from lying about their income to receive credit).

Contact Me

Feel free to contact me via email or LinkedIn for data science opportunities.