Latest news will appear here soon.

Tag: processing

Looking for better ways to convert between QGIS VectorLayer and (Geo)DataFrame

Plugin developers who want to use (Geo)Pandas-based functionality in their plugins regularly face the challenge of converting QGIS vector layers to (Geo)DataFrames. There is currently no built-in convenience function.

In Trajectools, so far, I have been performing the conversion manually, looping through all features and taking care of tricky column types, such as datetimes and geometries:

def df_from_layer_trajectools(layer,time_field_name="t"):
    # Original Trajectools 2.7 version
    names = [field.name() for field in layer.fields()]
    data = []
    for feature in layer.getFeatures():
        my_dict = {}
        for i, a in enumerate(feature.attributes()):
            if names[i] == time_field_name and isinstance(a, QDateTime):
                a = a.toPyDateTime()
            my_dict[names[i]] = a
        pt = feature.geometry().asPoint()
        my_dict["geom_x"] = pt.x()
        my_dict["geom_y"] = pt.y()
        data.append(my_dict)
    df = pd.DataFrame(data)
    return df

It works (mostly), but it’s far from fast. For the 25 million Geolife points, it takes 4 minutes:

In an attempt to speed-up (and make the conversion more robust, e.g. regarding datetime/timezone conversion and null values), I’ve spent some time at SDSL2025 with Joris Van den Bossche trying a workaround that writes the QGIS layer to an Arrow file and then reads that file with pyogrio:

def gdf_from_layer_arrow(layer):
    # SDSL2025 version
    with tempfile.TemporaryDirectory() as tmpdirname:
        path = os.path.join(tmpdirname, "data.arrow")

        options = QgsVectorFileWriter.SaveVectorOptions()
        options.actionOnExistingFile = QgsVectorFileWriter.CreateOrOverwriteFile 
        options.layerName = 'data'
        options.driverName = "arrow"
        
        QgsVectorFileWriter.writeAsVectorFormatV3(
            layer, path, QgsProject.instance().transformContext(), options
        )
       
        meta, table = pyogrio.read_arrow(path)
        gdf = gpd.GeoDataFrame.from_arrow(table)

    return gdf

Not only do we get a GeoDataFrame in return, this also runs in half the time, i.e. in 2 minutes instead of 4:

Switching to this approach will require adding pyogrio to the plugin dependencies. Looks like it could be worth it.

We also discussed another alternative: It would be faster to read the vector layer data source directly, in case it is a supported file format. However, this means we’d need separate handling for other input layers.

There’s also the issue of supporting the Processing feature that allows users to run the algorithm only on the selected features because selected features are only exposed through QgsProcessingParameterFeatureSource (and not through QgsProcessingParameterVectorLayer). Maybe the Export Selected Features algorithm can cover this case but it will export an empty layer if there is no selection.

Are you aware of any other / better ways to approach this issue? Any pointers are appreciated.

Learn More

New map coloring algorithms in QGIS 3.0

It’s been a long time since I last blogged here. Let’s just blame that on the amount of changes going into QGIS 3.0 and move on…

One new feature which landed in QGIS 3.0 today is a processing algorithm for automatic coloring of a map in such a way that adjoining polygons are all assigned different color indexes. Astute readers may be aware that this was possible in earlier versions of QGIS through the use of either the (QGIS 1.x only!) Topocolor plugin, or the Coloring a map plugin (2.x).

What’s interesting about this new processing algorithm is that it introduces several refinements for cartographically optimising the coloring. The earlier plugins both operated by pure “graph” coloring techniques. What this means is that first a graph consisting of each set of adjoining features is generated. Then, based purely on this abstract graph, the coloring algorithms are applied to optimise the solution so that connected graph nodes are assigned different colors, whilst keeping the total number of colors required minimised.

The new QGIS algorithm works in a different way. Whilst the first step is still calculating the graph of adjoining features (now super-fast due to use of spatial indexes and prepared geometry intersection tests!), the colors for the graph are assigned while considering the spatial arrangement of all features. It’s gone from a purely abstract mathematical solution to a context-sensitive cartographic solution.

The “Topological coloring” processing algorithm

Let’s explore the differences. First up, the algorithm has an option for the “minimum distance between features”. It’s often the case that features aren’t really touching, but are instead just very close to each other. Even though they aren’t touching, we still don’t want these features to be assigned the same color. This option allows you to control the minimum distance which two features can be to each other before they can be assigned the same color.

The biggest change comes in the “balancing” techniques available in the new algorithm. By default, the algorithm now tries to assign colors in such a way that the total number of features assigned each color is equalised. This avoids having a color which is only assigned to a couple of features in a large dataset, resulting in an odd looking map coloration.

Balancing color assignment by count – notice how each class has a (almost!) equal count

Another available balancing technique is to balance the color assignment by total area. This technique assigns colors so that the total area of the features assigned to each color is balanced. This mode can be useful to help avoid large features resulting in one of the colors appearing more dominant on a colored map.

Balancing assignment by area – note how only one large feature is assigned the red color

The final technique, and my personal preference, is to balance colors by distance between colors. This mode will assign colors in order to maximize the distance between features of the same color. Maximising the distance helps to create a more uniform distribution of colors across a map, and avoids certain colors clustering in a particular area of the map. It’s my preference as it creates a really nice balanced map – at a glance the colors look “randomly” assigned with no discernible pattern to the arrangement.

Balancing colors by distance

As these examples show, considering the geographic arrangement of features while coloring allows us to optimise the assigned colors for cartographic output.

The other nice thing about having this feature implemented as a processing algorithm is that unlike standalone plugins, processing algorithms can be incorporated as just one step of a larger model (and also reused by other plugins!).

QGIS 3.0 has tons of great new features, speed boosts and stability bumps. This is just a tiny taste of the handy new features which will be available when 3.0 is released!

Learn More