The availability of new and improved display, tracking and input devices for Virtual Reality experiences has facilitated the use of partial and full body self-avatars in interaction with virtual objects in the environment. However, scaling the avatar to match the user's body dimensions remains to be a cumbersome process. Moreover, the effect of body-scaled self-avatars on size perception of virtual handheld objects and related action capabilities has been relatively unexplored. To this end, we present an empirical evaluation investigating the effect of the presence or absence of body-scaled self-avatars and visuo-motor calibration on frontal passability affordance judgments when interacting with virtual handheld objects. The self-avatar's dimensions were scaled to match the participant's eyeheight, arms length, shoulder width and body depth along the mid section. The results indicate that the presence of body-scaled self-avatars produce more realistic judgments of passability and aid the calibration process when interacting with virtual objects. Also, participants rely on the visual size of virtual objects to make judgments even though the kinesthetic and proprioceptive feedback of the object is missing or mismatched.Using optical sensors to track hand gestures in virtual reality (VR) simulations requires issues such as occlusion, field-of-view, accuracy and stability of sensors to be addressed or mitigated. We introduce an optical hand-based interaction system that comprises two Leap Motion sensors mounted onto a VR headset at different orientations. Our system collects sensor data from the leap motions, combines and processes it to produce optimal hand tracking data, that minimizes the effect of sensor occlusion and noise. This contrasts with previous systems that do not use multiple head-mounted sensors or incorporate hand-data aggregation. We also present a study that compares the proposed system with glove-based and traditional motion controller-based interaction. We investigate hand interactions effect on the feeling of naturalness and immersion. The results show that the use of two head-mounted sensors and the data aggregation system increased the number of valid hands presented to the user and can be successfully applied to VR. The user study shows that there is a strong preference for the proposed system in terms of natural feeling and freeing interaction. The absence of an indirect interface such as gloves or controllers was found to aid in creating a more natural and immersive experience.Depth completion aims to recover a dense depth map from the sparse depth data and the corresponding single RGB image. The observed pixels provide the significant guidance for the recovery of the unobserved pixels' depth. However, due to the sparsity of the depth data, the standard convolution operation, exploited by most of existing methods, is not effective to model the observed contexts with depth values. To address this issue, we propose to adopt the graph propagation to capture the observed spatial contexts. Specifically, we first construct multiple graphs at different scales from observed pixels. Since the graph structure varies from sample to sample, we then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Furthermore, considering the mutli-modality of input data, we exploit the graph propagation on the two modalities respectively to extract multi-modal representations. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. The proposed strategy preserves the original information for one modality and also absorbs complementary information from the other through learning the adaptive gating weights. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks, i.e., KITTI and NYU-v2, and at the same time has fewer parameters than latest models. Our code is available at https//github.com/sshan-zhao/ACMNet.Auto-Encoder (AE)-based deep subspace clustering (DSC) methods have achieved impressive performance due to the powerful representation extracted using deep neural networks while prioritizing categorical separability. However, self-reconstruction loss of an AE ignores rich useful relation information and might lead to indiscriminative representation, which inevitably degrades the clustering performance. https://www.selleckchem.com/products/Gefitinib.html It is also challenging to learn high-level similarity without feeding semantic labels. Another unsolved problem facing DSC is the huge memory cost due to n×n similarity matrix, which is incurred by the self-expression layer between an encoder and decoder. To tackle these problems, we use pairwise similarity to weigh the reconstruction loss to capture local structure information, while a similarity is learned by the self-expression layer. Pseudo-graphs and pseudo-labels, which allow benefiting from uncertain knowledge acquired during network training, are further employed to supervise similarity learning. Joint learning and iterative training facilitate to obtain an overall optimal solution. Extensive experiments on benchmark datasets demonstrate the superiority of our approach. By combining with the k -nearest neighbors algorithm, we further show that our method can address the large-scale and out-of-sample problems. The source code of our method is available https//github.com/sckangz/SelfsupervisedSC.Automatic classification and segmentation of wireless capsule endoscope (WCE) images are two clinically significant and relevant tasks in a computer-aided diagnosis system for gastrointestinal diseases. Most of existing approaches, however, considered these two tasks individually and ignored their complementary information, leading to limited performance. To overcome this bottleneck, we propose a deep synergistic interaction network (DSI-Net) for joint classification and segmentation with WCE images, which mainly consists of the classification branch (C-Branch), the coarse segmentation (CS-Branch) and the fine segmentation branches (FS-Branch). In order to facilitate the classification task with the segmentation knowledge, a lesion location mining (LLM) module is devised in C-Branch to accurately highlight lesion regions through mining neglected lesion areas and erasing misclassified background areas. To assist the segmentation task with the classification prior, we propose a category-guided feature generation (CFG) module in FS-Branch to improve pixel representation by leveraging the category prototypes of C-Branch to obtain the category-aware features.