Soumya Ray
(Ph. D., University of Wisconsin-Madison, 2005)
Assistant Professor
Department of Electrical Engineering and Computer Science
Case Western Reserve University
Office: Olin 516
Email: sray AT case_edu
Mailing Address: Department of EECS, Glennan 320, 10900 Euclid Ave, Cleveland OH 44106-7071
Research Areas: Artificial Intelligence, Machine Learning, Reinforcement Learning and Planning
Teaching
I teach undergraduate AI (EECS 391) in the spring and graduate machine learning (EECS 440) in the fall. I have also co-taught graduate AI (EECS 491) in the fall; however, this is being split into two courses starting Fall 2013. I also co-ordinate the EECS 500 colloquium series each semester. You can visit this page to learn more about these courses.
I am the current AI minor advisor.
Current teaching: Spring 2013: EECS 391 (Introduction to Artificial Intelligence)
If you are interested in machine learning research, here is my reading group:Machine Learning Reading Group. You can send me email if you want to know what we are reading. Please note that you will need to have taken EECS 440 or 491 or equivalent to contribute effectively.
Students
- Gary Doran (Ph.D.)
- Kai Liang (MS)
- Nathan McKinley (MS)
- Scott Sosnowski (MS)
- Tyler Goeringer (MS)
- Feng Cao (MS, graduated Spring 2012, at Amazon inc.)
Thesis: Classification, Detection
and Prediction of Adverse and Anomalous Events in Medical Robots
- Tim Ernsberger (MS, graduated Fall 2012, at Amazon inc.)
Thesis: A Hierarchical
Integration of Planning and Reinforcement Learning
- Howie Richmond (MS, graduated Fall 2011, at MIM Software, co-supervised with Andy Podgurski)
Thesis: Knowledge Transfer for
Incremental Fault Localization
Current Research
- Bayesian Hierarchical Reinforcement Learning
Humans manage complex tasks by decomposition and also using prior information collected by solving other tasks. How can we integrate these ideas into autonomous agents? This research seeks ways to answer this question. (Students: Kai Liang, Feng Cao)
- Kernel Methods for Multiple-Instance Learning
Kernels are powerful and flexible representation transformations of feature vectors which enable nonlinear classifiers to be learned efficiently in a supervised setting. How do these behave when applied to multiple-instance data, where sets of feature vectors are mapped? A variety of surprising behaviors emerge in this case. This research seeks to characterize this behavior and understand the strengths and weaknesses of this technique for MI data. (Students: Gary Doran)
- Machine Learning for Software Engineering
The scale and complexity of modern software makes it prone to bugs. Automated techniques using machine learning can assist developers during testing and debugging to quickly locate and remove faults. This research seeks to develop automated and collaborative methods to detects software defects and improve software reliability. With
Andy
Podgurski. (Students: Boya Sun, Gang Shu, Zhuofu Bai, Howie Richmond)
- Machine Learning for Automated Stent Detection in OCT images
Millions of people receive stent implants as treatment for coronary artery disease. Subsequently, these are imaged to see if further intervention is needed if complications arise. These images are currently manually processed by radiologists and take 6-16 hours per stent in intravascular Optical Coherence Tomography (iOCT) images. We are developing automated techniques using machine learning that can effectively help radiologists process these images in an hour or less, reducing cost and lowering errors. With David Wilson. (Students: Hong Lu, CWRU Biomedical Engineering)
- Preventing Adverse and Anomalous Events in Cyber-physical systems
Many modern robotic
systems are used in situations where reliability is critical, but how
to estimate reliability of these complex systems under various
circumstances is not well understood. We are designing processes to
estimate the reliability of these systems, in particular
to detect and prevent adverse and anomalous (A&A) events in medical robots online. With
Andy
Podgurski and Cenk
Cavusoglu. Supported by NSF. (Students: Kai Liang, Feng Cao, Zhuofu Bai, Mark Renfrew)
Previous Research
- Machine Learning for spam
filtering
We are working on adaptive methods for spam
detection. Our goal is to develop methods that can detect these
messages as early as possible in network traffic, thereby saving
bandwidth and decreasing congestion. With Michael Rabinovich and Mark Allman. (Students: Tu Ouyang)
- Knowledge Transfer in Reinforcement Learning
"Transfer Learning" focuses on methods that can
effectively transfer knowledge acquired about one task to help in
solving another, different task. We are developing
techniques that transfer knowledge between different sequential decision processes, using
real-time strategy games as our testbeds. With Alan Fern, Prasad Tadepalli,
Tom Dietterich. (Students: Neville
Mehta and Aaron Wilson)
- Efficient learning for hard Boolean functions
Certain Boolean functions, such as parity, are known to be hard to learn efficiently. In this work, we demonstrated that the hardness of
learning these functions is linked to the input distribution of the data; if the input distribution is "significantly different" from
the uniform distribution, these functions may
be efficiently learnable. Based on this observation, we developed a method called Skewing that is often able to learn such functions
efficiently, given enough observations. With David Page, Lisa Hellerstein, Bard Rosell, Eric
Lantz and Eric Bach.
- Learning from Multiple-Instance Data
In standard supervised learning, examples are described by a tuple of attribute-value pairs. In some problems, such as predicting the binding
affinities of small molecules to a target protein,
examples are described by sets of such tuples. We have developed new
algorithms for classification and regression from such "multiple-instance" data, and shown that the straightforward extension of linear
regression to this setting is NP-complete. With Mark Craven, David Page and Burr Settles.
- Information Extraction from Free Text
Information extraction is the task of creating structured relations out of free text. In our work, we have developed statistical
methods for doing this that also incorporate grammatical information about sentences obtained using an automated parser, Sundance. With Mark Craven and Marios Skounakis.
- Machine Learning for Question Answering
Question answering systems are designed to provide accurate responses to short factual questions asked in natural language. In our work, we
have developed a method that the system can use to learn from past questions to improve accuracy on future questions. With Eric Brill. (This work is not publicly available)
Publications
-
G. Doran and S. Ray (2013).
SMILe: Shuffled
Multiple-Instance Learning.
To appear in the Proceedings of the
27th AAAI Conference on Artificial Intelligence,
Bellevue, Washington, USA.
- K. Liang, F. Cao, Z. Bai, M. Renfrew, M.
Cenk Cavusoglu, A. Podgurski and S. Ray (2013).
Detection and Prediction of Adverse and Anomalous Events in Medical
Robots.
To appear in the Proceedings of the 25th Annual
Conference on Innovative Applications of Artificial Intelligence (IAAI),
Bellevue, Washington, USA.
- S. Sosnowski, T. Ernsberger,
F. Cao and S. Ray (2013).
SEPIA: A Scalable Game
Environment for Artificial Intelligence Teaching and Research.
To
appear in the Proceedings of the Fourth Symposium on Educational
Advances in Artificial Intelligence (EAAI), Bellevue, Washington, USA.
- H. Lu, M. Gargesha, Z. Wang, D. Chamie, G. F. Attizzani,
T. Kanaya, S. Ray, M. A. Costa, A. M. Rollins, H. G. Bezerra and
D. L. Wilson (2013).
Automatic stent strut detection in intravascular OCT images
using image processing and classification technique.
Appears in the
Proceedings of SPIE Medical Imaging: Computer-Aided Diagnosis,
vol 8670, eds. Carol Novak, Stephen Aylward.
-
F. Cao and S. Ray (2012).
Bayesian Hierarchical Reinforcement
Learning.
Appears in the Proceedings of the 26th Annual Conference
on Neural Information Processing Systems (NIPS), Lake Tahoe, Nevada,
USA.
-
H. Lu, M. Gargesha, Z. Wang, D. Chamie, G. F. Attizzani,
T. Kanaya, S. Ray, M. A. Costa, A. M. Rollins, H. G. Bezerra and
D. L. Wilson (2012).
Automatic stent detection in intravascular
OCT images using bagged decision trees.
Biomedical Optics
Express, vol 3 no. 11(Nov):2809--2824.
- M. Ruffalo, M. Koyuturk, S. Ray and T.
LaFramboise (2012).
Accurate Estimation of Short Read Mapping Quality for
Next Generation Genome Sequencing.
Appears in the Proceedings of the
11th European Conference on Computational Biology, Basel,
Switzerland. (also published in a special issue in Bioinformatics)
- B. Sun, G. Shu, A. Podgurski and S. Ray (2012).
CARIAL: Cost-Aware Software Reliability Improvement with Active Learning.
In Proceedings of the Fifth IEEE International Conference on Software Testing, Verification and Validation (ICST), Montreal, CA.
- N. Mehta, S. Ray, P. Tadepalli and T. Dietterich (2011).
Automatic
Discovery and Transfer of Task Hierarchies in Reinforcement
Learning.
Appears in AI Magazine, special issue on
Transfer of Structured Knowledge. (reviewed article)
- T. Ouyang, S. Ray, M. Rabinovich and M. Allman (2011).
Can Network Characteristics Detect Spam Effectively in a Stand-Alone
Enterprise?
In Proceedings of the 12th Passive and Active
Measurement Conference, Atlanta, GA, USA.
- B. Sun, A. Podgurski and S. Ray (2010).
Improving the Precision of Dependence-Based Defect Mining by
Supervised Learning of Rule and Violation Graphs.
In Proceedings of
the 21st IEEE International Symposium on Software Reliability
Engineering, San Jose, CA, USA.
- L. Hellerstein, B. Rosell, E. Bach, S. Ray and D. Page (2009).
Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions. Journal of Machine Learning
Research, vol 10(Oct):2374--2411.
Tech report version
Eric Bach's paper on improved bounds for the number of correlation-immune functions.
- N. Mehta, S. Ray, P. Tadepalli and T. Dietterich (2008).
Automatic Discovery and Transfer of MAXQ Hierarchies.
pdf
Appears in the Proceedings of the 25th International Conference
on Machine Learning, Helsinki, Finland.
- B. Settles, M. Craven and S. Ray (2007).
Multiple-Instance Active Learning.
pdf
Appears in the Proceedings of the 21st Conference on Neural Information Processing Systems, Vancouver, BC, Canada.
- H. Chan, A. Fern, S. Ray, N. Wilson and C. Ventura (2007).
Online Planning for Resource Production in Real-Time Strategy Games.
pdf
Appears in the Proceedings of the 17th International Conference on Automated Planning & Scheduling, Providence, RI, USA.
- E. Lantz, S. Ray and D. Page (2007).
Learning Bayesian Network Structure from Correlation-Immune Data.
pdf
Appears in the Proceedings of the 23rd Conference
on Uncertainty in Artificial Intelligence, Vancouver, BC, Canada.
- A. Wilson, A. Fern, S. Ray and P. Tadepalli (2007).
Multi-task Reinforcement Learning: A Hierarchical Bayesian
Approach. pdf ps.gz
Appears in the Proceedings of the 24th International Conference
on Machine Learning, Corvallis, OR, USA.
- J. Davis, V. S. Costa, S. Ray and D. Page (2007).
An Integrated Approach to Feature Invention and Model Construction for
Drug Activity Prediction. pdf
ps.gz
Appears in the Proceedings of the 24th International Conference
on Machine Learning, Corvallis, OR, USA.
- S. Ray (2005).
Learning from Data with Complex Interactions and Ambiguous
Labels. ps
pdf ps.gz
PhD thesis, Department of Computer Sciences, University of
Wisconsin-Madison, Madison, WI, USA.
- S. Ray & M. Craven (2005).
Supervised versus Multiple-Instance Learning: An Empirical
Comparison. ps
pdf ps.gz
Appears in the Proceedings of the 22nd International Conference
on Machine Learning, Bonn, Germany.
- B. Rosell, L. Hellerstein, S. Ray & D. Page (2005).
Why Skewing works: Learning Difficult Boolean Functions with Greedy
Tree Learners. ps
pdf ps.gz
Appears in the Proceedings of the 22nd International Conference
on Machine Learning, Bonn, Germany.
- S. Ray & D. Page (2005).
Generalized Skewing for Functions with Continuous and Nominal
Attributes. ps pdf ps.gz
Appears in the Proceedings of the 22nd International Conference
on Machine Learning, Bonn, Germany.
- S. Ray & M. Craven (2005).
Learning Statistical
Models for Annotating Proteins with Function Information using
Biomedical Text.
Appears in BMC Bioinformatics,
Vol 6(Suppl 1). online
ps pdf ps.gz
- S. Ray & D. Page (2004).
Sequential Skewing: An Improved Skewing Algorithm. ps pdf ps.gz
Appears in the Proceedings of the 21st International Conference
on Machine Learning, Banff, Canada.
- D. Page & S. Ray (2003).
Skewing: An Efficient Alternative to Lookahead for Decision Tree
Induction. ps pdf ps.gz
Appears in the Proceedings of the 18th International Joint Conference
on Artificial Intelligence, Acapulco, Mexico.
- M. Skounakis, M. Craven & S. Ray (2003).
Hierarchical Hidden Markov Models for Information Extraction. pdf
Appears in the Proceedings of the 18th International Joint Conference
on Artificial Intelligence, Acapulco, Mexico.
- S. Ray & M. Craven (2001).
Representing Sentence Structure in Hidden Markov Models for Information
Extraction. ps
pdf ps.gz
Appears
in the Proceedings
of the 17th International Joint Conference on Artificial Intelligence,
Seattle, WA, USA.
- S. Ray & D. Page (2001).
Multiple Instance Regression. ps pdf ps.gz
Appears
in the Proceedings
of the 18th International Conference on Machine Learning, Williamstown,
MA, USA.
Workshop Publications
- A. Wilson, A. Fern, S. Ray, P. Tadepalli (2008).
Learning and Transferring Roles in Multi-Agent MDPs.
pdf
Transfer Learning for Complex Tasks Workshop, 23rd AAAI Conference on Artificial Intelligence, Chicago, USA.
- N. Mehta, M. Wynkoop, S. Ray, P. Tadepalli and T. Dietterich
(2007).
Automatic Induction of MAXQ Hierarchies.
pdf
Hierarchical Organization of Behavior Workshop, 21st Conference on Neural Information Processing Systems, Vancouver, BC, Canada.
- H. Chan, A. Fern, S. Ray, N. Wilson and C. Ventura (2007).
Extending Online Planning for Resource Production in Real-Time Strategy Games with Search.
pdf
Workshop on Planning in Games, ICAPS 2007, Providence, RI, USA.
Contributed Chapters
- S. Ray, S. Scott and H. Blockeel (2009). Multiple Instance Learning.
Encyclopedia of Machine Learning, eds C. Sammut and G. Webb. Springer. ISBN: 978-0-387-30768-8, Springer.
- S. Ray and P. Tadepalli (2009). Model-based Reinforcement Learning.
Encyclopedia of Machine Learning,eds C. Sammut and G. Webb. Springer. ISBN: 978-0-387-30768-8, Springer.
Posters and Technical Reports
- G. Doran and S. Ray (2012). Kernel methods for Multiple-Instance
Learning. Poster at the 29th International Conference on Machine
Learning, Edinburgh, Scotland.
- C-Y. Wu, S. Ray, K. P. Barry and A. I. Jack (2010). On
Functional Annotation of the Human Brain by Combining Resting State
Connectivity and Activation Foci. In the {\em 40th Annual Meeting of the
Society for Neuroscience}, San Diego, CA, USA (poster).
- T. Ouyang, S. Ray, M. Allman and M. Rabinovich (2009). A Large Scale Empirical Analysis of Email Spam Detection Through Transport Level Characteristics. Technical Report TR 10-001, International Computer Science Institute, CA, USA.
Miscellaneous Activities
Home
IIT Kharagpur
UW-Computer Sciences
Oregon State EECS
Case Western EECS
Last update 5/16/2013 by Soumya Ray