Skip to main content

Data science from the command line: a look back at 2 years of using xan

AW1.120 | Day 2 | 13:15 - 13:45 | Speakers: Béatrice Mazoyer, Benjamin Ooghe-Tabanou, Guillaume Plique

Data science from the command line: a look back at 2 years of using xan
A picture of a devroom at FOSDEM 2024
Open in browser

Notes

Abstract

Xan is a command-line tool designed to manipulate CSV files directly from the comfort of the terminal.

Originally developed within a sociology research lab to perform common operations on very large datasets collected from the web (exploration, sorting, computing frequency tables, joins, aggregations, etc.), it has become a go-to solution for its users for many more use-cases, including lexicometry analysis, plotting histograms, time series or heatmaps, and even generating network graphs. And while the tool was initially created to deal with very large CSV files, it is now also used by people to process small files, and other file formats. The tool was thus included in the daily data manipulation practices of its users, who saw it as an opportunity to never leave their shells, without having to rely on GUIs or notebooks.

This presentation, given by a research engineer after two years of regular use, examines the reasons for this appropriation, which relates both to the constraints of research in the Humanities and Social Sciences and to the interface design choices that make xan effective.


Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.