Tag: python

Configure editing form widgets using PyQGIS

PT | EN

As I was preparing a QGIS Project to read a database structured according to the new rules and technical specifications for the Portuguese Cartography, I started to configure the editing forms for several layers, so that:

  1. Make some fields read-only, like for example an identifier field.
  2. Configure widgets better suited for each field, to help the user and avoid errors. For example, date-time files with a pop-up calendar, and value lists with dropdown selectors.

Basically, I wanted something like this:

Peek 2019-09-30 15-04_2

Let me say that, in PostGIS layers, QGIS does a great job in figuring out the best widget to use for each field, as well as the constraints to apply. Which is a great help. Nevertheless, some need some extra configuration.

If I had only a few layers and fields, I would have done them all by hand, but after the 5th layer my personal mantra started to chime in:

“If you are using a computer to perform a repetitive manual task, you are doing it wrong!”

So, I began to think how could I configure the layers and fields more systematically. After some research and trial and error, I came up with the following PyQGIS functions.

Make a field Read-only

The identifier field (“identificador”) is automatically generated by the database. Therefore, the user shouldn’t edit it. So I had better make it read only

Layer Properties - cabo_electrico | Attributes Form_103

To make all the identifier fields read-only, I used the following code.

def field_readonly(layer, fieldname, option = True):
    fields = layer.fields()
    field_idx = fields.indexOf(fieldname)
    if field_idx >= 0:
        form_config = layer.editFormConfig()
        form_config.setReadOnly(field_idx, option)
        layer.setEditFormConfig(form_config)

# Example for the field "identificador"

project = QgsProject.instance()
layers = project.mapLayers() 

for layer in layers.values():
    field_readonly(layer,'identificador')

Set fields with DateTime widget

The date fields are configured automatically, but the default widget setting only outputs the date, and not date-time, as the rules required.

I started by setting a field in a layer exactly how I wanted, then I tried to figure out how those setting were saved in PyQGIS using the Python console:

>>>layer = iface.mapCanvas().currentLayer()
>>>layer.fields().indexOf('inicio_objeto')
1
>>>field = layer.fields()[1]
>>>field.editorWidgetSetup().type()
'DateTime'
>>>field.editorWidgetSetup().config()
{'allow_null': True, 'calendar_popup': True, 'display_format': 'yyyy-MM-dd HH:mm:ss', 'field_format': 'yyyy-MM-dd HH:mm:ss', 'field_iso_format': False}

Knowing this, I was able to create a function that allows configuring a field in a layer using the exact same settings, and apply it to all layers.

def field_to_datetime(layer, fieldname):
    config = {'allow_null': True,
              'calendar_popup': True,
              'display_format': 'yyyy-MM-dd HH:mm:ss',
              'field_format': 'yyyy-MM-dd HH:mm:ss',
              'field_iso_format': False}
    type = 'Datetime'
    fields = layer.fields()
    field_idx = fields.indexOf(fieldname)
    if field_idx >= 0:
        widget_setup = QgsEditorWidgetSetup(type,config)
        layer.setEditorWidgetSetup(field_idx, widget_setup)

# Example applied to "inicio_objeto" e "fim_objeto"

for layer in layers.values():
    field_to_datetime(layer,'inicio_objeto')
    field_to_datetime(layer,'fim_objeto')

Setting a field with the Value Relation widget

In the data model, many tables have fields that only allow a limited number of values. Those values are referenced to other tables, the Foreign keys.

In these cases, it’s quite helpful to use a Value Relation widget. To configure fields with it in a programmatic way, it’s quite similar to the earlier example, where we first neet to set an example and see how it’s stored, but in this case, each field has a slightly different settings

Luckily, whoever designed the data model, did a favor to us all by giving the same name to the fields and the related tables, making it possible to automatically adapt the settings for each case.

The function stars by gathering all fields in which the name starts with ‘valor_’ (value). Then, iterating over those fields, adapts the configuration to use the reference layer that as the same name as the field.

def field_to_value_relation(layer):
    fields = layer.fields()
    pattern = re.compile(r'^valor_')
    fields_valor = [field for field in fields if pattern.match(field.name())]
    if len(fields_valor) > 0:
        config = {'AllowMulti': False,
                  'AllowNull': True,
                  'FilterExpression': '',
                  'Key': 'identificador',
                  'Layer': '',
                  'NofColumns': 1,
                  'OrderByValue': False,
                  'UseCompleter': False,
                   'Value': 'descricao'}
        for field in fields_valor:
            field_idx = fields.indexOf(field.name())
            if field_idx >= 0:
                print(field)
                try:
                    target_layer = QgsProject.instance().mapLayersByName(field.name())[0]
                    config['Layer'] = target_layer.id()
                    widget_setup = QgsEditorWidgetSetup('ValueRelation',config)
                    layer.setEditorWidgetSetup(field_idx, widget_setup)
                except:
                    pass
            else:
                return False
    else:
        return False
    return True
    
# Correr função em todas as camadas
for layer in layers.values():
    field_to_value_relation(layer)

Conclusion

In a relatively quick way, I was able to set all the project’s layers with the widgets I needed.Peek 2019-09-30 16-06

This seems to me like the tip of the iceberg. If one has the need, with some search and patience, other configurations can be changed using PyQGIS. Therefore, think twice before embarking in configuring a big project, layer by layer, field by fields.

Learn More

QGIS Versioning now supports foreign keys!

QGIS-versioning is a QGIS and PostGIS plugin dedicated to data versioning and history management. It supports :

  • Keeping full table history with all modifications
  • Transparent access to current data
  • Versioning tables with branches
  • Work offline
  • Work on a data subset
  • Conflict management with a GUI

QGIS versioning conflict management

In a previous blog article we detailed how QGIS versioning can manage data history, branches, and work offline with PostGIS-stored data and QGIS. We recently added foreign key support to QGIS versioning so you can now historize any complex database schema.

This QGIS plugin is available in the official QGIS plugin repository, and you can fork it on GitHub too !

Foreign key support

TL;DR

When a user decides to historize its PostgreSQL database with QGIS-versioning, the plugin alters the existing database schema and adds new fields in order to track down the different versions of a single table row. Every access to these versioned tables are subsequently made through updatable views in order to automatically fill in the new versioning fields.

Up to now, it was not possible to deal with primary keys and foreign keys : the original tables had to be constraints-free.  This limitation has been lifted thanks to this contribution.

To make it simple, the solution is to remove all constraints from the original database and transform them into a set of SQL check triggers installed on the working copy databases (SQLite or PostgreSQL). As verifications are made on the client side, it’s impossible to propagate invalid modifications on your base server when you “commit” updates.

Behind the curtains

When you choose to historize an existing database, a few fields are added to the existing table. Among these fields, versioning_ididentifies  one specific version of a row. For one existing row, there are several versions of this row, each with a different versioning_id but with the same original primary key field. As a consequence, that field cannot satisfy the unique constraint, so it cannot be a key, therefore no foreign key neither.

We therefore have to drop the primary key and foreign key constraints when historizing the table. Before removing them, constraints definitions are stored in a dedicated table so that these constraints can be checked later.

When the user checks out a specific table on a specific branch, QGIS-versioning uses that constraint table to build constraint checking triggers in the working copy. The way constraints are built depends on the checkout type (you can checkout in a SQLite file, in the master PostgreSQL database or in another PostgreSQL database).

What do we check ?

That’s where the fun begins ! The first thing we have to check is key uniqueness or foreign key referencing an existing key on insert or update. Remember that there are no primary key and foreign key anymore, we dropped them when activating historization. We keep the term for better understanding.

You also have to deal with deleting or updating a referenced row and the different ways of propagating the modification : cascade, set default, set null, or simply failure, as explained in PostgreSQL Foreign keys documentation .

Nevermind all that, this problem has been solved for you and everything is done automatically in QGIS-versioning. Before you ask, yes foreign keys spanning on multiple fields are also supported.

What’s new in QGIS ?

You will get a new message you probably already know about, when you try to make an invalid modification committing your changes to the master database

Error when foreign key constraint is violated

Partial checkout

One existing Qgis-versioning feature is partial checkout. It allows a user to select a subset of data to checkout in its working copy. It avoids downloading gigabytes of data you do not care about. You can, for instance, checkout features within a given spatial extent.

So far, so good. But if you have only a part of your data, you cannot ensure that modifying a data field as primary key will keep uniqueness. In this particular case, QGIS-versioning will trigger errors on commit, pointing out the invalid rows you have to modify so the unique constraint remains valid.

Error when committing non unique key after a partial checkout

Tests

There is a lot to check when you intend to replace the existing constraint system with your own constraint system based on triggers. In order to ensure QGIS-Versioning stability and reliability, we put some special effort on building a test set that cover all use cases and possible exceptions.

What’s next

There is now no known limitations on using QGIS-versioning on any of your database. If you think about a missing feature or just want to know more about QGIS and QGIS-versioning, feel free to contact us at infos+data@oslandia.com. And please have a look at our support offering for QGIS.

Many thanks to eHealth Africa who helped us develop these new features. eHealth Africa is a non-governmental organization based in Nigeria. Their mission is to build stronger health systems through the design and implementation of data-driven solutions.

Learn More

(Fr) Oslandia recrute : développeur(se) C++ et Python

Sorry, this entry is only available in French.
Learn More

Thoughts on “FOSS4G/SOTM Oceania 2018”, and the PyQGIS API improvements which it caused

Last week the first official “FOSS4G/SOTM Oceania” conference was held at Melbourne University. This was a fantastic event, and there’s simply no way I can extend sufficient thanks to all the organisers and volunteers who put this event together. They did a brilliant job, and their efforts are even more impressive considering it was the inaugural event!

Upfront — this is not a recap of the conference (I’m sure someone else is working on a much more detailed write up of the event!), just some musings I’ve had following my experiences assisting Nathan Woodrow deliver an introductory Python for QGIS workshop he put together for the conference. In short, we both found that delivering this workshop to a group of PyQGIS newcomers was a great way for us to identify “pain points” in the PyQGIS API and areas where we need to improve. The good news is that as a direct result of the experiences during this workshop the API has been improved and streamlined! Let’s explore how:

Part of Nathan’s workshop (notes are available here) focused on a hands-on example of creating a custom QGIS “Processing” script. I’ve found that preparing workshops is guaranteed to expose a bunch of rare and tricky software bugs, and this was no exception! Unfortunately the workshop was scheduled just before the QGIS 3.4.2 patch release which fixed these bugs, but at least they’re fixed now and we can move on…

The bulk of Nathan’s example algorithm is contained within the following block (where “distance” is the length of line segments we want to chop our features up into):

for input_feature in enumerate(features):
    geom = feature.geometry().constGet()
    if isinstance(geom, QgsLineString):
        continue
    first_part = geom.geometryN(0)
    start = 0
    end = distance
    length = first_part.length()

    while start < length:
        new_geom = first_part.curveSubstring(start,end)

        output_feature = input_feature
        output_feature.setGeometry(QgsGeometry(new_geom))
        sink.addFeature(output_feature)

        start += distance
        end += distance

There’s a lot here, but really the guts of this algorithm breaks down to one line:

new_geom = first_part.curveSubstring(start,end)

Basically, a new geometry is created for each trimmed section in the output layer by calling the “curveSubstring” method on the input geometry and passing it a start and end distance along the input line. This returns the portion of that input LineString (or CircularString, or CompoundCurve) between those distances. The PyQGIS API nicely hides the details here – you can safely call this one method and be confident that regardless of the input geometry type the result will be correct.

Unfortunately, while calling the “curveSubstring” method is elegant, all the code surrounding this call is not so elegant. As a (mostly) full-time QGIS developer myself, I tend to look over oddities in the API. It’s easy to justify ugly API as just “how it’s always been”, and over time it’s natural to develop a type of blind spot to these issues.

Let’s start with the first ugly part of this code:

geom = input_feature.geometry().constGet()
if isinstance(geom, QgsLineString):
    continue
first_part = geom.geometryN(0)
# chop first_part into sections of desired length
...

This is rather… confusing… logic to follow. Here the script is fetching the geometry of the input feature, checking if it’s a LineString, and if it IS, then it skips that feature and continues to the next. Wait… what? It’s skipping features with LineString geometries?

Well, yes. The algorithm was written specifically for one workshop, which was using a MultiLineString layer as the demo layer. The script takes a huge shortcut here and says “if the input feature isn’t a MultiLineString, ignore it — we only know how to deal with multi-part geometries”. Immediately following this logic there’s a call to geometryN( 0 ), which returns just the first part of the MultiLineString geometry.

There’s two issues here — one is that the script just plain won’t work for LineString inputs, and the second is that it ignores everything BUT the first part in the geometry. While it would be possible to fix the script and add a check for the input geometry type, put in logic to loop over all the parts of a multi-part input, etc, that’s instantly going to add a LOT of complexity or duplicate code here.

Fortunately, this was the perfect excuse to improve the PyQGIS API itself so that this kind of operation is simpler in future! Nathan and I had a debrief/brainstorm after the workshop, and as a result a new “parts iterator” has been implemented and merged to QGIS master. It’ll be available from version 3.6 on. Using the new iterator, we can simplify the script:

geom = input_feature.geometry()
for part in geom.parts():
    # chop part into sections of desired length
    ...

Win! This is simultaneously more readable, more Pythonic, and automatically works for both LineString and MultiLineString inputs (and in the case of MultiLineStrings, we now correctly handle all parts).

Here’s another pain-point. Looking at the block:

new_geom = part.curveSubstring(start,end)
output_feature = input_feature
output_feature.setGeometry(QgsGeometry(new_geom))

At first glance this looks reasonable – we use curveSubstring to get the portion of the curve, then make a copy of the input_feature as output_feature (this ensures that the features output by the algorithm maintain all the attributes from the input features), and finally set the geometry of the output_feature to be the newly calculated curve portion. The ugliness here comes in this line:

output_feature.setGeometry(QgsGeometry(new_geom))

What’s that extra QgsGeometry(…) call doing here? Without getting too sidetracked into the QGIS geometry API internals, QgsFeature.setGeometry requires a QgsGeometry argument, not the QgsAbstractGeometry subclass which is returned by curveSubstring.

This is a prime example of a “paper-cut” style issue in the PyQGIS API. Experienced developers know and understand the reasons behind this, but for newcomers to PyQGIS, it’s an obscure complexity. Fortunately the solution here was simple — and after the workshop Nathan and I added a new overload to QgsFeature.setGeometry which accepts a QgsAbstractGeometry argument. So in QGIS 3.6 this line can be simplified to:

output_feature.setGeometry(new_geom)

Or, if you wanted to make things more concise, you could put the curveSubstring call directly in here:

output_feature = input_feature
output_feature.setGeometry(part.curveSubstring(start,end))

Let’s have a look at the simplified script for QGIS 3.6:

for input_feature in enumerate(features):
    geom = feature.geometry()
    for part in geom.parts():
        start = 0
        end = distance
        length = part.length()

        while start < length:
            output_feature = input_feature
            output_feature.setGeometry(part.curveSubstring(start,end))
            sink.addFeature(output_feature)

            start += distance
            end += distance

This is MUCH nicer, and will be much easier to explain in the next workshop! The good news is that Nathan has more niceness on the way which will further improve the process of writing QGIS Processing script algorithms. You can see some early prototypes of this work here:

So there we go. The process of writing and delivering a workshop helps to look past “API blind spots” and identify the ugly points and traps for those new to the API. As a direct result of this FOSS4G/SOTM Oceania 2018 Workshop, the QGIS 3.6 PyQGIS API will be easier to use, more readable, and less buggy! That’s a win all round!

Learn More

Speeding up your PyQGIS scripts

I’ve recently spent some time optimising the performance of various QGIS plugins and algorithms, and I’ve noticed that there’s a few common performance traps which developers fall into when fetching features from a vector layer. In this post I’m going to explore these traps, what makes them slow, and how to avoid them.

As a bit of background, features are fetched from a vector layer in QGIS using a QgsFeatureRequest object. Common use is something like this:

request = QgsFeatureRequest()
for feature in vector_layer.getFeatures(request):
    # do something

This code would iterate over all the features in layer. Filtering the features is done by tweaking the QgsFeatureRequest, such as:

request = QgsFeatureRequest().setFilterFid(1001)
feature_1001 = next(vector_layer.getFeatures(request))

In this case calling getFeatures(request) just returns the single feature with an ID of 1001 (which is why we shortcut and use next(…) here instead of iterating over the results).

Now, here’s the trap: calling getFeatures is expensive. If you call it on a vector layer, QGIS will be required to setup an new connection to the data store (the layer provider), create some query to return data, and parse each result as it is returned from the provider. This can be slow, especially if you’re working with some type of remote layer, such as a PostGIS table over a VPN connection. This brings us to our first trap:

Trap #1: Minimise the calls to getFeatures()

A common task in PyQGIS code is to take a list of feature IDs and then request those features from the layer. A see a lot of older code which does this using something like:

for id in some_list_of_feature_ids:
    request = QgsFeatureRequest().setFilterFid(id)
    feature = next(vector_layer.getFeatures(request))
    # do something with the feature

Why is this a bad idea? Well, remember that every time you call getFeatures() QGIS needs to do a whole bunch of things before it can start giving you the matching features. In this case, the code is calling getFeatures() once for every feature ID in the list. So if the list had 100 features, that means QGIS is having to create a connection to the data source, set up and prepare a query to match a single feature, wait for the provider to process that, and then finally parse the single feature result. That’s a lot of wasted processing!

If the code is rewritten to take the call to getFeatures() outside of the loop, then the result is:

request = QgsFeatureRequest().setFilterFids(some_list_of_feature_ids)
for feature in vector_layer.getFeatures(request):
    # do something with the feature

Now there’s just a single call to getFeatures() here. QGIS optimises this request by using a single connection to the data source, preparing the query just once, and fetching the results in appropriately sized batches. The difference is huge, especially if you’re dealing with a large number of features.

Trap #2: Use QgsFeatureRequest filters appropriately

Here’s another common mistake I see in PyQGIS code. I often see this one when an author is trying to do something with all the selected features in a layer:

for feature in vector_layer.getFeatures():
    if not feature.id() in vector_layer.selectedFeaturesIds():
        continue

    # do something with the feature

What’s happening here is that the code is iterating over all the features in the layer, and then skipping over any which aren’t in the list of selected features. See the problem here? This code iterates over EVERY feature in the layer. If you’re layer has 10 million features, we are fetching every one of these from the data source, going through all the work of parsing it into a QGIS feature, and then promptly discarding it if it’s not in our list of selected features. It’s very inefficient, especially if fetching features is slow (such as when connecting to a remote database source).

Instead, this code should use the setFilterFids() method for QgsFeatureRequest:

request = QgsFeatureRequest().setFilterFids(vector_layer.selectedFeaturesIds())
for feature in vector_layer.getFeatures(request):
    # do something with the feature

Now, QGIS will only fetch features from the provider with matching feature IDs from the list. Instead of fetching and processing every feature in the layer, only the actual selected features will be fetched. It’s not uncommon to see operations which previously took many minutes (or hours!) drop down to a few seconds after applying this fix.

Another variant of this trap uses expressions to test the returned features:

filter_expression = QgsExpression('my_field &gt; 20')
for feature in vector_layer.getFeatures():
    if not filter_expression.evaluate(feature):
        continue

    # do something with the feature

Again, this code is fetching every single feature from the layer and then discarding it if it doesn’t match the “my_field > 20” filter expression. By rewriting this to:

request = QgsFeatureRequest().setFilterExpression('my_field &gt; 20')
for feature in vector_layer.getFeatures(request):
    # do something with the feature

we hand over the bulk of the filtering to the data source itself. Recent QGIS versions intelligently translate the filter into a format which can be applied directly at the provider, meaning that any relevant indexes and other optimisations can be applied by the provider itself. In this case the rewritten code means that ONLY the features matching the ‘my_field > 20’ criteria are fetched from the provider – there’s no time wasted messing around with features we don’t need.

 

Trap #3: Only request values you need

The last trap I often see is that more values are requested from the layer then are actually required. Let’s take the code:

my_sum = 0
for feature in vector_layer.getFeatures(request):
    my_sum += feature['value']

In this case there’s no way we can optimise the filters applied, since we need to process every feature in the layer. But – this code is still inefficient. By default QGIS will fetch all the details for a feature from the provider. This includes all attribute values and the feature’s geometry. That’s a lot of processing – QGIS needs to transform the values from their original format into a format usable by QGIS, and the feature’s geometry needs to be parsed from it’s original type and rebuilt as a QgsGeometry object. In our sample code above we aren’t doing anything with the geometry, and we are only using a single attribute from the layer. By calling setFlags( QgsFeatureRequest.NoGeometry ) and setSubsetOfAttributes() we can tell QGIS that we don’t need the geometry, and we only require a single attribute’s value:

my_sum = 0
request = QgsFeatureRequest().setFlags(QgsFeatureRequest.NoGeometry).setSubsetOfAttributes(['value'], vector_layer.fields() )
for feature in vector_layer.getFeatures(request):
    my_sum += feature['value']

None of the unnecessary geometry parsing will occur, and only the ‘value’ attribute will be fetched and populated in the features. This cuts down both on the processing required AND the amount of data transfer between the layer’s provider and QGIS. It’s a significant improvement if you’re dealing with larger layers.

Conclusion

Optimising your feature requests is one of the easiest ways to speed up your PyQGIS script! It’s worth spending some time looking over all your uses of getFeatures() to see whether you can cut down on what you’re requesting – the results can often be mind blowing!

Learn More