Some projects find it useful to transcribe beyond the spoken word. This can take a number of different forms.
- Transana supports Jeffersonian Notation, a formal system of codes that provide information in the transcript about a variety of speech characteristics, such as emphasis, intonation, pace, and interactional components such as interruptions and overlapping speech. (A number of other notation systems are also possible using standard keyboard characters as well.)

- Transana can be used with a variety of behavioral or gestural coding schemes implemented via transcripts. Researchers who study animal behavior may have elaborate transcripts which help them locate instances of particular behaviors and postures they wish to explore further.
- Some researchers create multiple transcripts to capture different analytic layers. One group studying mathematics instructions creates independent verbal transcript and gestural transcripts, then displays them both simultaneously to look at both layers of instruction and their interaction. Another research team analyzes multiple overlapping layers of student-produced media products, looking at such issues as sound, editing, and cinematography as separate analytic layers operationalized through mutiple simultaneous transcripts.
Transana allows you to enter "time codes" as part of the process of creating your transcript(s). Time codes link particular positions in the transcript with the corresponding positions in the media file, effectively linking the video and the transcript. This has several important implications.
First, as the video plays, Transana highlights the transcript text corresponding to the part of the video being displayed. If the audio is unclear or includes overlapping talk by several people, these spots in the video can be teased out during transcription and the different parts are easily accessible thereafter.
Second, if you find an interesting passage in the transcript, you can instantly call up the video associated with that passage. For many researchers, it's easier and faster to identify interesting passages in the transcript than it is in the video, but this allows review of the associated video at all times. This allows you to remain extremely close to your source data. (This concept of remaining close to your source data is central to Transana's design and is reflected in a number of ways we will discuss later in the guided tour.)
Finally, the analytic act of placing time codes helps to delineate boundaries in the video, to signal when a particular segment of video you want to study further begins or ends.
Transcription is an analytic act which requires many analytic decisions. How much transcription is necessary, at what level of detail, for each research project? What part of transcription can be hired out to transcriptionists, and how much needs to be done by trained researchers? Where do particular interesting acts begin and end? How frequently should one insert time codes for a particular piece of video?
It should be noted that Transana does not attempt to use voice recognition technology to create transcripts. First, as the above discussion indicates, there are many types of transcription, most of which would not benefit from voice recognition. But second, voice recognition technology does not seem to have reached the level where it is capable of handling the kinds of data most people analyze in Transana. Voice recognition typically requires a single speaker who has spent time training the software, good audio quality, and no overlapping speech or background noise. Very little research video or audio meets any of these criteria, let alone all of them. As a result, Transana has tools that facilitate manual transcription but does not attempt automated transcription of any type.
As Harrie Mazeland notes, transcription can be the primary, and indeed the only, analytic act for some researchers using Transana. It is primarily through the careful transcription (taking about 60 hours to transcribe each hour of video) and conversation-analysis of those transcripts that Dr. Mazeland arrives at a theoretical understanding of his video data.
