Automatic inference of missing road attributes (e.g., road type and speed limit) for enriching digital maps has attracted significant research attention in recent years. A number of machine learning based approaches have been proposed to detect road attributes from GPS traces, dash-cam videos, or satellite images. However, existing solutions mostly focus on a single modality without modeling the correlations among multiple data sources. To bridge the gap, we present a multimodal road attribute detection method, which improves the robustness by performing pixel-level fusion of crowdsourced GPS traces and satellite images. A GPS trace is usually given by a sequence of location, bearing, and speed. To align it with satellite imagery in the spatial domain, we render GPS traces into a sequence of multi-channel images that simultaneously capture the global distribution of the GPS points, the local distribution of vehicles’ moving directions and speeds, and their temporal changes over time, at each pixel. Unlike previous GPS based road feature extraction methods, our proposed GPS rendering does not require map matching in the data preprocessing step. Moreover, our multimodal solution addresses single-modal challenges such as occlusions in satellite images and data sparsity in GPS traces by learning the pixel-wise correspondences among different data sources. Extensive experiments have been conducted on two real-world datasets in Singapore and Jakarta. Compared with previous work, our method is able to improve the detection accuracy on road attributes by a large margin.