ELEC Oral Comprehensive Exam for Doctoral Candidacy by Yingjie Wang_ECE
When: Friday,
January 26, 2024
11:00 AM
-
1:00 PM
Where: > See description for location
Cost: Free
Description: Topic: Resource Efficient Distributed Computing
Location: Lester W. Cory Conference Room, Science & Engineering Building (SENG), Room 213A
Zoom Conference Link: https://umassd.zoom.us/j/92590350858 Meeting ID: 925 9035 0858
Passcode: 201805
Abstract:
There is a surge of interests in distributed computing, thanks to advances in clustered computing and big data technology. My research explores topics on machine learning and big data technologies related to learning under decentralized resources. Two topics have been studied. One topic focuses on distributing large scale centralized computation to clustered or multi-core computers. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, a method, termed as random projection forests (rpForests), is proposed for fast computation of kNN search. Experiments demonstrated rpForests achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multi-core computers; the running time is shown to be nearly inversely proportional to the number of cores or machines.
Another topic treats the data in machine learning as a computing resource. It is increasingly often that the data are located at distributed sites, and we wish to learn over data from all the sites with low communication overhead. For spectral clustering, a novel framework is proposed that enables computation over data from all the physical nodes, with minimal communications overhead while a major speedup in computation. Experiments on synthetic and large UC Irvine datasets show almost no loss in accuracy with the proposed approach while a 2x speedup under various settings with two distributed sites. As the transmitted data does not need to be in their original form, the framework readily addresses the privacy concern for data sharing in distributed computing.
My future research will explore distributed computing with deep neural networks (DNN). One direction is to improve communication efficiency to move data between centralized server and different physical nodes for DNN distributed training. Another is to speed up training process on distributed nodes by designing better data parallelism approach.
Advisor(s): Dr. Honggang Wang, Professor, Department of Electrical & Computer Engineering, Dr. Donghui Yan, Associate Professor, Department of Mathematics, UMASS Dartmouth
Committee Members: Dr. Liudong Xing, Professor, Department of Electrical & Computer Engineering, UMASS Dartmouth;
Dr. Ping Chen, Professor, Department of Engineering, UMASS Boston
NOTE: All ECE Graduate Students are ENCOURAGED to attend.
All interested parties are invited to attend. Open to the public.
*For further information, please contact Dr. Honggang Wang via email at hwang1@umassd.edu.
Location: Lester W. Cory Conference Room, Science & Engineering Building (SENG), Room 213A
Zoom Conference Link: https://umassd.zoom.us/j/92590350858 Meeting ID: 925 9035 0858
Passcode: 201805
Abstract:
There is a surge of interests in distributed computing, thanks to advances in clustered computing and big data technology. My research explores topics on machine learning and big data technologies related to learning under decentralized resources. Two topics have been studied. One topic focuses on distributing large scale centralized computation to clustered or multi-core computers. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, a method, termed as random projection forests (rpForests), is proposed for fast computation of kNN search. Experiments demonstrated rpForests achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multi-core computers; the running time is shown to be nearly inversely proportional to the number of cores or machines.
Another topic treats the data in machine learning as a computing resource. It is increasingly often that the data are located at distributed sites, and we wish to learn over data from all the sites with low communication overhead. For spectral clustering, a novel framework is proposed that enables computation over data from all the physical nodes, with minimal communications overhead while a major speedup in computation. Experiments on synthetic and large UC Irvine datasets show almost no loss in accuracy with the proposed approach while a 2x speedup under various settings with two distributed sites. As the transmitted data does not need to be in their original form, the framework readily addresses the privacy concern for data sharing in distributed computing.
My future research will explore distributed computing with deep neural networks (DNN). One direction is to improve communication efficiency to move data between centralized server and different physical nodes for DNN distributed training. Another is to speed up training process on distributed nodes by designing better data parallelism approach.
Advisor(s): Dr. Honggang Wang, Professor, Department of Electrical & Computer Engineering, Dr. Donghui Yan, Associate Professor, Department of Mathematics, UMASS Dartmouth
Committee Members: Dr. Liudong Xing, Professor, Department of Electrical & Computer Engineering, UMASS Dartmouth;
Dr. Ping Chen, Professor, Department of Engineering, UMASS Boston
NOTE: All ECE Graduate Students are ENCOURAGED to attend.
All interested parties are invited to attend. Open to the public.
*For further information, please contact Dr. Honggang Wang via email at hwang1@umassd.edu.
Contact: > See Description for contact information
Topical Areas: General Public, University Community, College of Engineering, Electrical and Computer Engineering