题目:How Statistics Help You Get Prepared for Better-paid Job?
汇报人:李润泽
会议时间:2022年12月9日(星期五)10:30-11:30
地点:综合楼644会议室
交流平台:腾讯会议 879-255-966
报告人简介:
李润泽教授是美国宾夕法尼亚州立大学统计系讲习教授。2000毕业于美国北卡罗来纳大学教堂山分校,获统计学博士学位,此后一直在美国宾夕法尼亚大学任教。他的研究领域非常广阔,包括高维数据变量选择,超高维数据的变量筛选,半参数和非参数回归分析,统计在社会科学和神经科学的应用。他在国际统计学重要领域发表高水平论文100余篇,主持多项国家级课题。他是IMS, ASA and AAAS的资深会士,也是一些国际重要期刊如AOS,JASA等的主编和副主编,其中于2013年至2015年担任Annals of Statistics的主编。
摘要:
It is important to quantify the differences in returns to skills using the online job advertisements data, which have attracted great interest in both labor economics and statistics fields. In this paper, we study the relationship between the posted salary and the job requirements in online labor markets. There are two challenges to deal with. First, the posted salary is always presented in an interval-valued form, for example, 5k-10k yuan per month. Simply taking the mid-point or the lower bound as the alternative for salary may result in biased estimators. Second, the number of the potential skill words as predictors generated from the job advertisements by word segmentation is very large and many of them may not contribute to the salary. To this end, we propose a new feature screening method, Absolute Distribution Difference Sure Independence Screening (ADD-SIS), to select important skill words for the interval-valued response. The marginal utility for feature screening is based on the difference of estimated distribution functions via nonparametric maximum likelihood estimation, which sufficiently uses the interval information. It is model-free and robust to outliers. Numerical simulations show that the new method using the interval information is more efficient to select important predictors than the methods only based on the single points of the intervals. In the real data application, we study the text data of job advertisements for data scientists and data analysts in a major China's online job posting website, and explore the important skill words for the salary. We find that the skill words like optimization, long short-term memory (LSTM), convolutional neural networks (CNN), collaborative filtering, are positively correlated with the salary while the words like Excel, Office, datacollection, may negatively contribute to the salary.
上一条: 没有了 |
下一条: 没有了 |