Additional Calendars
Calendar Views
All
Athletics
Conferences and Meetings
Law School
Special Events

Special seminar series in Applied Statistics and Data Science

When: Tuesday, April 26, 2022
4:00 PM - 5:00 PM
Where: > See description for location
Description: Featuring 2 talks by our own undergraduate researchers in Applied Statistics and Data Science.


DATE: Apr 26, 2022 (Tuesday)

TIME: 4:00-5:00 PM (ET)

LOCATION: Join Zoom Meeting
https://umassd.zoom.us/j/93625685340?pwd=cjdUZlFjQTZJYlczbXZaMnFQQWM2dz09

Meeting ID: 936 2568 5340
Passcode: 714640



TALK 1: Improving Natural Language Classification With Augmented Data From GPT-3

SPEAKER: Sal Balkus

Abstract:
GPT-3 is a large-scale natural language model developed by OpenAI that can perform many different tasks, including topic classification. Although billed as "few-shot learning" with only a small number of in-context examples required to teach a task, in practice the model requires examples to be either of exceptional quality or of a higher quantity than easily created by hand. To address this issue, we teach GPT-3 to classify whether a question is related to data science using a set of in-context examples augmented by its own generative capabilities - that is, we generate additional examples using GPT-3 itself. This study compares two classifiers: the GPT-3 Classification endpoint, and the GPT-3 Completion endpoint with optimal in-context examples chosen via genetic algorithm. We find that, while the Completion endpoint achieves upwards of 80 percent testing accuracy, using the Classification endpoint with an augmented example set yields far improved accuracy during validation.



----------------------------------------------------------

TALK 2: Generating Cancer Images to Improve Cancer Diagnosis Accuracy

SPEAKER: Ben Pfeffer

Abstract:
The lack of publicly available medical data has been negatively impacting Artificial Intelligence in its ability to be used in the medical field. Generative Adversarial Networks, or GANs, have been used to create similar, but novel images using real images. Hence, GANs may be used to produce images that can augment a small amount of publicly available medical data in a way that leads to the improvement of Artificial Intelligence's accuracy. Here, breast cancer tissue images and their corresponding h-scores were scraped from Stanford's TMA database and converted into GLCMs, which were then read by GANs for each h-score to produce novel images of each score. Then, the new and generated data were used to train an Ordinal-Convolutional Neural Network, whose results were compared to a network that was trained without the generated data. The O-CNN whose training data consisted of the generated images as well as the base images had an increased accuracy and a decreased MSE when compared to the O-CNN whose training data consisted only of the base images. These results provide a method for improving Artificial Intelligence's ability on smaller datasets and may help provide a slight improvement when dealing with the issue of a lack of publicly available medical data.

All are welcome!
For more information, please contact Donghui Yan (dyan@umassd.edu).
Contact: > See Description for contact information
Topical Areas: University Community, Mathematics, Computer and Information Science, Research, Undergraduate Research, Lectures and Seminars