List of Accepted Papers

Following is the list of accepted ASRU 2017 papers, sorted by paper title. You can use the search feature of your web browser to find your paper number. Notifications to all authors have also been sent by email. If you have not received your notification of the results by email, please contact us at papers@asru2017.org.

1212A CONTEXT-AWARE SPEECH RECOGNITION AND UNDERSTANDING SYSTEM FOR AIR TRAFFIC CONTROL DOMAIN
1205A HIERARCHICAL ATTENTION BASED MODEL FOR OFF-TOPIC SPONTANEOUS SPOKEN RESPONSE DETECTION
1078AALTO SYSTEM FOR THE 2017 ARABIC MULTI-GENRE BROADCAST CHALLENGE
1058ACOUSTIC-TO-WORD MODEL WITHOUT OOV
1116ADVERSARIAL MANIFOLD LEARNING FOR SPEAKER RECOGNITION
1191ADVERSARIAL TRAINING FOR DATA-DRIVEN SPEECH ENHANCEMENT WITHOUT PARALLEL CORPUS
1079AN EMBEDDED SEGMENTAL K-MEANS MODEL FOR UNSUPERVISED SEGMENTATION AND CLUSTERING OF SPEECH
1301AN INVESTIGATION OF MULTI-SPEAKER TRAINING FOR WAVENET VOCODER
1290ATTENTION-BASED WAV2TEXT WITH FEATURE TRANSFER LEARNING
1143AUTOMATIC SPEECH RECOGNITION OF ARABIC MULTI-GENRE BROADCAST MEDIA
1147BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH
1131CHARACTER-BASED UNITS FOR UNLIMITED VOCABULARY CONTINUOUS SPEECH RECOGNITION
1223COMPARISON OF MULTIPLE FEATURES AND MODELING METHODS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
1242COMPOSITE EMBEDDING SYSTEMS FOR ZEROSPEECH2017 TRACK1
1118COMPUTATIONAL COST REDUCTION OF LONG SHORT-TERM MEMORY BASED ON SIMULTANEOUS COMPRESSION OF INPUT AND HIDDEN STATE
1065CONSISTENT DNN UNCERTAINTY TRAINING AND DECODING FOR ROBUST ASR
1282CRACKING THE COCKTAIL PARTY PROBLEM BY MULTI-BEAM DEEP ATTRACTOR NETWORK
1125CROSS-DOMAIN SPEECH RECOGNITION USING NONPARALLEL CORPORA WITH CYCLE-CONSISTENT ADVERSARIAL NETWORKS
1244DBLSTM BASED MULTILINGUAL ARTICULATORY FEATURE EXTRACTION FOR LANGUAGE DOCUMENTATION
1285DEEP LEARNING METHODS FOR UNSUPERVISED ACOUSTIC MODELING - LEAP SUBMISSION TO ZEROSPEECH CHALLENGE 2017
1019DEEP QUATERNION NEURAL NETWORKS FOR SPOKEN LANGUAGE UNDERSTANDING
1096DENOTATION EXTRACTION FOR INTERACTIVE LEARNING IN DIALOGUE SYSTEMS
1228DIRECT MODELING OF RAW AUDIO WITH DNNS FOR WAKE WORD DETECTION
1215DYNAMIC TIME-AWARE ATTENTION TO SPEAKER ROLES AND CONTEXTS FOR SPOKEN LANGUAGE UNDERSTANDING
1271EARLY AND LATE INTEGRATION OF AUDIO FEATURES FOR AUTOMATIC VIDEO DESCRIPTION
1080END-TO-END TEXT-INDEPENDENT SPEAKER VERIFICATION WITH FLEXIBILITY IN UTTERANCE DURATION
1121ERROR DETECTION OF GRAPHEME-TO-PHONEME CONVERSION IN TEXT-TO-SPEECH SYNTHESIS USING SPEECH SIGNAL AND LEXICAL CONTEXT
1154EXPLORING ARCHITECTURES, DATA AND UNITS FOR STREAMING END-TO-END SPEECH RECOGNITION WITH RNN-TRANSDUCER
1312EXPLORING ASR-FREE END-TO-END MODELING TO IMPROVE SPOKEN LANGUAGE UNDERSTANDING IN A CLOUD-BASED DIALOG SYSTEM
1075EXPLORING THE USE OF ACOUSTIC EMBEDDINGS IN NEURAL MACHINE TRANSLATION
1134EXTRACTING BOTTLENECK FEATURES AND WORD-LIKE PAIRS FROM UNTRANSCRIBED SPEECH FOR FEATURE REPRESENTATION
1213FEATURE OPTIMIZED DPGMM CLUSTERING FOR UNSUPERVISED SUBWORD MODELING: A CONTRIBUTION TO ZEROSPEECH 2017
1055FUTURE VECTOR ENHANCED LSTM LANGUAGE MODEL FOR LVCSR
1051FUTURE WORD CONTEXTS IN NEURAL NETWORK LANGUAGE MODELS
1132GATED CONVOLUTIONAL NETWORKS BASED HYBRID ACOUSTIC MODELS FOR LOW RESOURCE SPEECH RECOGNITION
1033GROUND TRUTH ESTIMATION OF SPOKEN ENGLISH FLUENCY SCORE USING DECORRELATION PENALIZED LOW-RANK MATRIX FACTORIZATION
1072GROUNDED LANGUAGE UNDERSTANDING FOR MANIPULATION INSTRUCTIONS USING GAN-BASED CLASSIFICATION
1142HIERARCHICAL RECURRENT NEURAL NETWORK FOR STORY SEGMENTATION USING FUSION OF LEXICAL AND ACOUSTIC FEATURES
1169IMPROVING NATIVE LANGUAGE (L1) IDENTIFATION WITH BETTER VAD AND TDNN TRAINED SEPARATELY ON NATIVE AND NON-NATIVE ENGLISH CORPORA
1295IMPROVING SEPARATION OF OVERLAPPED SPEECH FOR MEETING CONVERSATIONS USING UNCALIBRATED MICROPHONE ARRAY
1245IMPROVING THE EFFICIENCY OF FORWARD-BACKWARD ALGORITHM USING BATCHED COMPUTATION IN TENSORFLOW
1184INCREMENTAL TRAINING AND CONSTRUCTING THE VERY DEEP CONVOLUTIONAL RESIDUAL NETWORK ACOUSTIC MODELS
1229INTEGRATED SPEAKER-ADAPTIVE SPEECH SYNTHESIS
1241INVESTIGATING NATIVE AND NON-NATIVE ENGLISH CLASSIFICATION AND TRANSFER EFFECTS USING LEGENDRE POLYNOMIAL COEFFICIENT CLUSTERING
1025INVESTIGATION OF LATTICE-FREE MAXIMUM MUTUAL INFORMATION-BASED ACOUSTIC MODELS WITH SEQUENCE-LEVEL KULLBACK-LEIBLER DIVERGENCE
1263INVESTIGATION OF TRANSFER LEARNING FOR ASR USING LF-MMI TRAINED NEURAL NETWORKS
1092ITERATIVE POLICY LEARNING IN END-TO-END TRAINABLE TASK-ORIENTED NEURAL DIALOG MODELS
1097JHU KALDI SYSTEM FOR ARABIC MGB-3 ASR CHALLENGE USING DIARIZATION, AUDIO-TRANSCRIPT ALIGNMENT AND TRANSFER LEARNING
1260KEYWORD SPOTTING FOR GOOGLE ASSISTANT USING CONTEXTUAL SPEECH RECOGNITION
1045LANGUAGE DIARIZATION FOR SEMI-SUPERVISED BILINGUAL ACOUSTIC MODEL TRAINING
1257LANGUAGE INDEPENDENT END-TO-END ARCHITECTURE FOR JOINT LANGUAGE IDENTIFICATION AND SPEECH RECOGNITION
1217LANGUAGE MODELING WITH HIGHWAY LSTM
1308LANGUAGE MODELING WITH NEURAL TRANS-DIMENSIONAL RANDOM FIELDS
1148LATTICE RESCORING STRATEGIES FOR LONG SHORT TERM MEMORY LANGUAGE MODELS IN SPEECH RECOGNITION
1259LEARNING MODALITY-INVARIANT REPRESENTATIONS FOR SPEECH AND IMAGES
1120LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION
1199LEVERAGING NATIVE LANGUAGE SPEECH FOR ACCENT IDENTIFICATION USING DEEP SIAMESE NETWORKS
1020LEVERAGING SIDE INFORMATION FOR SPEAKER IDENTIFICATION WITH THE ENRON CONVERSATIONAL TELEPHONE SPEECH COLLECTION
1293LISTENING WHILE SPEAKING: SPEECH CHAIN BY DEEP LEARNING
1189MEETING RECOGNITION WITH ASYNCHRONOUS DISTRIBUTED MICROPHONE ARRAY
1183MGB-3 BUT SYSTEM: LOW-RESOURCE ASR ON EGYPTIAN YOUTUBE DATA
1010MINIMALLY SUPERVISED WRITTEN-TO-SPOKEN TEXT NORMALIZATION
1304MITIGATING THE IMPACT OF SPEECH RECOGNITION ERRORS ON CHATBOT USING SEQUENCE-TO-SEQUENCE MODEL
1266MIT-QCRI ARABIC DIALECT IDENTIFICATION SYSTEM FOR THE 2017 MULTI-GENRE BROADCAST CHALLENGE
1178MODELING CHOICES IN END-TO-END SPEECH RECOGNITION
1270MULTI-LEVEL LANGUAGE MODELING AND DECODING FOR OPEN VOCABULARY END-TO-END SPEECH RECOGNITION
1133MULTILINGUAL BOTTLE-NECK FEATURE LEARNING FROM UNTRANSCRIBED SPEECH
1042MULTI-TASK ENSEMBLES WITH STUDENT-TEACHER TRAINING
1108MULTITASK TRAINING WITH UNLABELED DATA FOR END-TO-END SIGN LANGUAGE FINGERSPELLING RECOGNITION
1182MULTI-VIEW (JOINT) PROBABILITY LINEAR DISCRIMINATION ANALYSIS FOR J-VECTOR BASED TEXT DEPENDENT SPEAKER VERIFICATION
1117NEURAL RELEVANCE-AWARE QUERY MODELING FOR SPOKEN DOCUMENT RETRIEVAL
1044NOISE-ROBUST EXEMPLAR MATCHING FOR RESCORING QUERY-BY-EXAMPLE SEARCH
1192ON LATTICE GENERATION FOR LARGE VOCABULARY SPEECH RECOGNITION
1175ONENET: JOINT DOMAIN, INTENT, SLOT PREDICTION FOR SPOKEN LANGUAGE UNDERSTANDING
1023PERCEPTUAL QUALITY AND MODELING ACCURACY OF EXCITATION PARAMETERS IN DLSTM-BASED SPEECH SYNTHESIS SYSTEMS
1152PERSONALIZED WORD REPRESENTATIONS CARRYING PERSONALIZED SEMANTICS LEARNED FROM SOCIAL NETWORK POSTS
1015REDUCING THE COMPUTATIONAL COMPLEXITY FOR WHOLE WORD MODELS
1240SCALABLE MULTI-DOMAIN DIALOGUE STATE TRACKING
1047SEEING AND HEARING TOO: AUDIO REPRESENTATION FOR VIDEO CAPTIONING
1027SEMI-SUPERVISED TRAINING STRATEGIES FOR DEEP NEURAL NETWORKS
1151SEQUENCE TRAINING OF DNN ACOUSTIC MODELS WITH NATURAL GRADIENT
1203SIMPLIFYING VERY DEEP CONVOLUTIONAL NEURAL NETWORK ARCHITECTURES FOR ROBUST SPEECH RECOGNITION
1048SPARSE REPRESENTATION OF PHONETIC FEATURES FOR VOICE CONVERSION WITH AND WITHOUT PARALLEL DATA
1174SPEAKER-SENSITIVE DUAL MEMORY NETWORKS FOR MULTI-TURN SLOT TAGGING
1317SPEECH RECOGNITION CHALLENGE IN THE WILD: ARABIC MGB-3
1238SPOKEN LANGUAGE BIOMARKERS FOR DETECTING COGNITIVE IMPAIRMENT
1095SPOOFING DETECTION VIA SIMULTANEOUS VERIFICATION OF AUDIO-VISUAL SYNCHRONICITY AND TRANSCRIPTION
1110STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING GENERATIVE ADVERSARIAL NETWORKS UNDER A MULTI-TASK LEARNING FRAMEWORK
1251STREAMING SMALL-FOOTPRINT KEYWORD SPOTTING USING SEQUENCE-TO-SEQUENCE MODELS
1196SUBBAND WAVENET WITH OVERLAPPED SINGLE-SIDEBAND FILTERBANKS
1150SYLLABLE-BASED ACOUSTIC MODELING WITH CTC-SMBR-LSTM
1193TACKLING UNSEEN ACOUSTIC CONDITIONS IN QUERY-BY-EXAMPLE SEARCH USING TIME AND FREQUENCY CONVOLUTION FOR MULTILINGUAL DEEP BOTTLENECK FEATURES
1319THE BLIZZARD MACHINE LEARNING CHALLENGE 2017
1088THE CMU ENTRY TO BLIZZARD MACHINE LEARNING CHALLENGE
1220THE IFLYTEK SYSTEM FOR BLIZZARD MACHINE LEARNING CHALLENGE 2017-ES1
1216THE USTC SYSTEM FOR BLIZZARD MACHINE LEARNING CHALLENGE 2017-ES2
1318THE ZERO RESOURCE SPEECH CHALLENGE 2017
1066TOPIC SEGMENTATION IN ASR TRANSCRIPTS USING BIDIRECTIONAL RNNS FOR CHANGE DETECTION
1105TURBO FUSION OF MAGNITUDE AND PHASE INFORMATION FOR DNN-BASED PHONEME RECOGNITION
1160UNSUPERVISED ADAPTATION OF STUDENT DNNS LEARNED FROM TEACHER RNNS FOR IMPROVED ASR PERFORMANCE
1181UNSUPERVISED ADAPTATION WITH DOMAIN SEPARATION NETWORKS FOR ROBUST SPEECH RECOGNITION
1138UNSUPERVISED DOMAIN ADAPTATION FOR ROBUST SPEECH RECOGNITION VIA VARIATIONAL AUTOENCODER-BASED DATA AUGMENTATION
1284UNSUPERVISED HMM POSTERIOGRAMS FOR LANGUAGE INDEPENDENT ACOUSTIC MODELING IN ZERO RESOURCE CONDITIONS
1250UNWRITTEN LANGUAGES DEMAND ATTENTION TOO! WORD DISCOVERY WITH ENCODER-DECODER MODELS
1164UTD-CRSS SUBMISSION FOR MGB-3 ARABIC DIALECT IDENTIFICATION: FRONT-END AND BACK-END ADVANCEMENTS ON BROADCAST SPEECH
1128WERD: USING SOCIAL TEXT SPELLING VARIANTS FOR EVALUATING DIALECTAL SPEECH RECOGNITION

Sponsors

Technical Sponsor