Show simple item record

dc.contributor.advisorAlex P. Pentland.en_US
dc.contributor.authorBasu, Sumiten_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2005-10-14T19:33:45Z
dc.date.available2005-10-14T19:33:45Z
dc.date.copyright2002en_US
dc.date.issued2002en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/29270
dc.descriptionThesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.en_US
dc.descriptionIncludes bibliographical references (p. 106-109).en_US
dc.description.abstractIn this thesis, we develop computational tools for analyzing conversations based on nonverbal auditory cues. We develop a notion of conversations as being made up of a variety of scenes: in each scene, either one speaker is holding the floor or both are speaking at equal levels. Our goal is to find conversations, find the scenes within them, determine what is happening inside the scenes, and then use the scene structure to characterize entire conversations. We begin by developing a series of mid-level feature detectors, including a joint voicing and speech detection method that is extremely robust to noise and microphone distance. Leveraging the results of this powerful mechanism, we develop a probabilistic pitch tracking mechanism, methods for estimating speaking rate and energy, and means to segment the stream into multiple speakers, all in significant noise conditions. These features gives us the ability to sense the interactions and characterize the style of each speaker's behavior. We then turn to the domain of conversations. We first show how we can very accurately detect conversations from independent or dependent auditory streams with measures derived from our mid-level features. We then move to developing methods to accurately classify and segment a conversation into scenes. We also show preliminary results on characterizing the varying nature of the speakers' behavior during these regions. Finally, we design features to describe entire conversations from the scene structure, and show how we can describe and browse through conversation types in this way.en_US
dc.description.statementofresponsibilityby Sumit Basu.en_US
dc.format.extent109 leavesen_US
dc.format.extent5478355 bytes
dc.format.extent5478164 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/pdf
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleConversational scene analysisen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc52052659en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record